From paul at haleyai.com Sat Jan 4 20:34:59 2020 From: paul at haleyai.com (paul at haleyai.com) Date: Sat, 4 Jan 2020 14:34:59 -0500 Subject: [developers] issues building PET w/ Boost, trigrams, etc. Message-ID: <004201d5c336$0e7e82c0$2b7b8840$@haleyai.com> Greetings Folks, I found the new quoting convention in the ERG?s TDL (lextypes) when upgrading and that I needed a more recent version of PET to process the current ERG. In the course of upgrading I found a few issues with the build process and its instructions. I?ve done many builds of all the above over recent years, so hopefully this will be of assistance... FYI, I typically build using Ubuntu inside Docker, so any of this is completely repeatable and has nothing to do with my system, per se. The first problem I encountered involved the lack of support for the current version of GCC supported on 16.04 (or 18.04 which I upgraded to in this process). This problem arises from the (outdated?) version of boost.m4 cached in the PET repository. I overcame it by inserting the current versions around line 1419, FYI. The second problem was that additional modules of Boost appear to be required (as reported by errors during configure). These included system, filesystem, and iostream. After this, compilation proceeded but failed for 2 reasons. First, the addition of the trigram subdirectory under the cheap directory seems not properly reflected in cheap/Makefile.am (it omits the relative path when building from the release directory, for example). The same was true for the (?new??) repp subdirectory of cheap. Both were resolved by the following edit to line 12 of cheap/Makefile.am, FYI. I?m not sure that?s the right approach, but ?works for me?! CPPFLAGS += -I$(top_srcdir)/common -I$(top_srcdir)/fspp -I$(top_srcdir)/cheap/repp -I$(top_srcdir)/cheap/trigram @CHEAPCPPFLAGS@ Regards, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Thu Jan 16 10:33:54 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Thu, 16 Jan 2020 17:33:54 +0800 Subject: [developers] EDM implementations Message-ID: Hello developers, Recently I wanted to try out Elementary Dependency Match (EDM) but I did not find an easy way to do it. I saw lisp code in the LKB's repository and Bec's Perl code, but I'm not sure how to call the former from the command line and the latter seems outdated (I don't see the "export" command required by its instructions). The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd implement it on top of PyDelphin. The result is here: https://github.com/delph-in/delphin.edm. It requires the latest version of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text files or [incr tsdb()] profiles. When I nearly had my version working I found that Stephan et al.'s mtool ( https://github.com/cfmrp/mtool) also had an implementation of EDM, so I used that to compare with my outputs (as I couldn't get the previous implementations to work). In this process I think I found some differences from Dridan & Oepen, 2011's description, and this email is to confirm those findings. Namely, that mtool's (and now my) implementation do the following: * CARGs are treated as property triples ("class 3 information"). Previously they were combined with the predicate name. This change means that predicates like 'named' will match even if their CARGs don't and the CARGs are a separate thing that needs to be matched. * The identification of the graph's TOP counts as a triple. One difference between mtool and delphin.edm is that mtool does not count "variable" properties from EDS, but that's just because its EDS parser does not yet handle them while PyDelphin's does. Can anyone familiar with EDM confirm the above? Or can anyone explain how to call the Perl or LKB code so I can compare? -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bec.dridan at gmail.com Thu Jan 16 11:33:17 2020 From: bec.dridan at gmail.com (Bec Dridan) Date: Thu, 16 Jan 2020 21:33:17 +1100 Subject: [developers] EDM implementations In-Reply-To: References: Message-ID: Wow, that is some old code... From memory, export was a wrapper around `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values* set. I don't know the mtool code at all, but re-reading the paper and looking at the perl code, I don't think the original implementation evaluated CARG at all. We only checked that the correct character span had a pred name of`named`. I think you are right that the triple export at the time did not produce a triple for TOP and it hence would not have been counted. That match your memory Stephan? Bec On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com wrote: > Hello developers, > > Recently I wanted to try out Elementary Dependency Match (EDM) but I did > not find an easy way to do it. I saw lisp code in the LKB's repository and > Bec's Perl code, but I'm not sure how to call the former from the command > line and the latter seems outdated (I don't see the "export" command > required by its instructions). > > The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd > implement it on top of PyDelphin. The result is here: > https://github.com/delph-in/delphin.edm. It requires the latest version > of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text > files or [incr tsdb()] profiles. > > When I nearly had my version working I found that Stephan et al.'s mtool ( > https://github.com/cfmrp The paper > example > /mtool ) also had an implementation of > EDM, so I used that to compare with my outputs (as I couldn't get the > previous implementations to work). In this process I think I found some > differences from Dridan & Oepen, 2011's description, and this email is to > confirm those findings. Namely, that mtool's (and now my) implementation do > the following: > > * CARGs are treated as property triples ("class 3 information"). > Previously they were combined with the predicate name. This change means > that predicates like 'named' will match even if their CARGs don't and the > CARGs are a separate thing that needs to be matched. > > * The identification of the graph's TOP counts as a triple. > > One difference between mtool and delphin.edm is that mtool does not count > "variable" properties from EDS, but that's just because its EDS parser does > not yet handle them while PyDelphin's does. > > Can anyone familiar with EDM confirm the above? Or can anyone explain how > to call the Perl or LKB code so I can compare? > > -- > -Michael Wayne Goodman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Fri Jan 17 01:42:08 2020 From: ebender at uw.edu (Emily M. Bender) Date: Thu, 16 Jan 2020 16:42:08 -0800 Subject: [developers] Skipping non-parsed items in fftb Message-ID: Dear all, We are doing some treebanking here at UW with fftb with grammars that have very low coverage over their associated test corpora. The current behavior of fftb with these profiles is to include all items for treebanking, but give a 404 for each one with no parse forest stored. This necessitates clicking the back button and tracking which one is next (since nothing changes color). In that light, two questions: (1) Is there some option we can pass fftb so that it just doesn't present items with no parses? (2) Failing that, is it fairly straightforward with pydelphin, [incr tsdb()] or something else to export a version of the profiles that only includes items which the grammar successfully parsed? Thanks, Emily -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Fri Jan 17 03:33:03 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 17 Jan 2020 10:33:03 +0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Hi Emily, For (2), here is how you could do it with PyDelphin: delphin process -g grm.dat original-profile/ delphin mkprof --full --where 'readings > 0' --source original-profile/ new-profile/ delphin process -g grm.dat --full-forest new-profile/ Note that original-profile/ is first parsed in regular (non-forest) mode, because in full-forest mode the number of readings is essentially unknown until they are enumerated and thus the 'readings' field is always 0. The second command not only prunes lines in the 'parse' file with readings == 0, but also lines in the 'item' file which correspond to those 'parse' lines. Once you have created new-profile/, you can parse again with --full-forest for use with FFTB (and of course you don't have to use PyDelphin for the parsing steps, if you prefer other means). Also note that this results in a profile with no edges for partial parses. I think this is what you want. There should be a way to prune the full-forest profile directly while keeping partial parses, but while investigating this use case I found a bug, so I don't recommend it yet. Try `delphin mkprof --help` to see descriptions of these and other options. They map fairly directly to the function documented here: https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html#mkprof On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender wrote: > Dear all, > > We are doing some treebanking here at UW with fftb with grammars that have > very low coverage over their associated test corpora. The current behavior > of fftb with these profiles is to include all items for treebanking, but > give a 404 for each one with no parse forest stored. This necessitates > clicking the back button and tracking which one is next (since > nothing changes color). In that light, two questions: > > (1) Is there some option we can pass fftb so that it just doesn't present > items with no parses? > (2) Failing that, is it fairly straightforward with pydelphin, [incr > tsdb()] or something else to export a version of the profiles that only > includes items which the grammar successfully parsed? > > Thanks, > Emily > > > -- > Emily M. Bender (she/her) > Howard and Frances Nostrand Endowed Professor > Department of Linguistics > Faculty Director, CLMS > University of Washington > Twitter: @emilymbender > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Fri Jan 17 03:36:51 2020 From: ebender at uw.edu (Emily M. Bender) Date: Thu, 16 Jan 2020 18:36:51 -0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Thanks, Mike! I will give this a try. On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com wrote: > Hi Emily, > > For (2), here is how you could do it with PyDelphin: > > delphin process -g grm.dat original-profile/ > delphin mkprof --full --where 'readings > 0' --source > original-profile/ new-profile/ > delphin process -g grm.dat --full-forest new-profile/ > > Note that original-profile/ is first parsed in regular (non-forest) mode, > because in full-forest mode the number of readings is essentially unknown > until they are enumerated and thus the 'readings' field is always 0. The > second command not only prunes lines in the 'parse' file with readings == > 0, but also lines in the 'item' file which correspond to those 'parse' > lines. Once you have created new-profile/, you can parse again with > --full-forest for use with FFTB (and of course you don't have to use > PyDelphin for the parsing steps, if you prefer other means). > > Also note that this results in a profile with no edges for partial parses. > I think this is what you want. There should be a way to prune the > full-forest profile directly while keeping partial parses, but while > investigating this use case I found a bug, so I don't recommend it yet. > > Try `delphin mkprof --help` to see descriptions of these and other > options. They map fairly directly to the function documented here: > https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html > #mkprof > > > On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender wrote: > >> Dear all, >> >> We are doing some treebanking here at UW with fftb with grammars that >> have very low coverage over their associated test corpora. The current >> behavior of fftb with these profiles is to include all items for >> treebanking, but give a 404 for each one with no parse forest stored. This >> necessitates clicking the back button and tracking which one is next (since >> nothing changes color). In that light, two questions: >> >> (1) Is there some option we can pass fftb so that it just doesn't present >> items with no parses? >> (2) Failing that, is it fairly straightforward with pydelphin, [incr >> tsdb()] or something else to export a version of the profiles that only >> includes items which the grammar successfully parsed? >> >> Thanks, >> Emily >> >> >> -- >> Emily M. Bender (she/her) >> Howard and Frances Nostrand Endowed Professor >> Department of Linguistics >> Faculty Director, CLMS >> University of Washington >> Twitter: @emilymbender >> > > > -- > -Michael Wayne Goodman > -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Fri Jan 17 03:45:49 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 17 Jan 2020 10:45:49 +0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Let me know how it goes. And a clarification: the --full option on `mkprof` doesn't hurt, but it's unnecessary since you're re-parsing the created profile. Also here's the bug report for the other thing, if you're interested in that use case: https://github.com/delph-in/pydelphin/issues/273 On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender wrote: > Thanks, Mike! I will give this a try. > > On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com < > goodman.m.w at gmail.com> wrote: > >> Hi Emily, >> >> For (2), here is how you could do it with PyDelphin: >> >> delphin process -g grm.dat original-profile/ >> delphin mkprof --full --where 'readings > 0' --source >> original-profile/ new-profile/ >> delphin process -g grm.dat --full-forest new-profile/ >> >> Note that original-profile/ is first parsed in regular (non-forest) mode, >> because in full-forest mode the number of readings is essentially unknown >> until they are enumerated and thus the 'readings' field is always 0. The >> second command not only prunes lines in the 'parse' file with readings == >> 0, but also lines in the 'item' file which correspond to those 'parse' >> lines. Once you have created new-profile/, you can parse again with >> --full-forest for use with FFTB (and of course you don't have to use >> PyDelphin for the parsing steps, if you prefer other means). >> >> Also note that this results in a profile with no edges for partial >> parses. I think this is what you want. There should be a way to prune the >> full-forest profile directly while keeping partial parses, but while >> investigating this use case I found a bug, so I don't recommend it yet. >> >> Try `delphin mkprof --help` to see descriptions of these and other >> options. They map fairly directly to the function documented here: >> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html >> #mkprof >> >> >> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender wrote: >> >>> Dear all, >>> >>> We are doing some treebanking here at UW with fftb with grammars that >>> have very low coverage over their associated test corpora. The current >>> behavior of fftb with these profiles is to include all items for >>> treebanking, but give a 404 for each one with no parse forest stored. This >>> necessitates clicking the back button and tracking which one is next (since >>> nothing changes color). In that light, two questions: >>> >>> (1) Is there some option we can pass fftb so that it just doesn't >>> present items with no parses? >>> (2) Failing that, is it fairly straightforward with pydelphin, [incr >>> tsdb()] or something else to export a version of the profiles that only >>> includes items which the grammar successfully parsed? >>> >>> Thanks, >>> Emily >>> >>> >>> -- >>> Emily M. Bender (she/her) >>> Howard and Frances Nostrand Endowed Professor >>> Department of Linguistics >>> Faculty Director, CLMS >>> University of Washington >>> Twitter: @emilymbender >>> >> >> >> -- >> -Michael Wayne Goodman >> > -- > Emily M. Bender (she/her) > Howard and Frances Nostrand Endowed Professor > Department of Linguistics > Faculty Director, CLMS > University of Washington > Twitter: @emilymbender > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Fri Jan 17 07:39:26 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 17 Jan 2020 14:39:26 +0800 Subject: [developers] EDM implementations In-Reply-To: References: Message-ID: Thanks, Bec! I manually put in the :ltriples in the parse script and was able to produce some output that edm_eval.pl could read. Regarding the CARG being combined with the predicate name, that was what I guessed by looking at the Lisp code. Thanks for correcting my mistake. One more detail is what to do when the two sides (gold and test) have different numbers of items. Currently my code stops as soon as either a gold or test item is missing, which is what smatch (the similar metric made for AMR) does, but I think that may be wrong because parsing profiles are likely to have missing or extra (overgeneration) items in the middle. So the question is whether we ignore it or count it as a full mismatch. On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan wrote: > Wow, that is some old code... From memory, export was a wrapper around > `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values* > set. > > I don't know the mtool code at all, but re-reading the paper and looking > at the perl code, I don't think the original implementation evaluated CARG > at all. We only checked that the correct character span had a pred name > of`named`. > > I think you are right that the triple export at the time did not produce a > triple for TOP and it hence would not have been counted. > > That match your memory Stephan? > > Bec > > > On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com < > goodman.m.w at gmail.com> wrote: > >> Hello developers, >> >> Recently I wanted to try out Elementary Dependency Match (EDM) but I did >> not find an easy way to do it. I saw lisp code in the LKB's repository and >> Bec's Perl code, but I'm not sure how to call the former from the command >> line and the latter seems outdated (I don't see the "export" command >> required by its instructions). >> >> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd >> implement it on top of PyDelphin. The result is here: >> https://github.com/delph-in/delphin.edm. It requires the latest version >> of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text >> files or [incr tsdb()] profiles. >> >> When I nearly had my version working I found that Stephan et al.'s mtool ( >> https://github.com/cfmrp The paper >> example >> /mtool ) also had an implementation of >> EDM, so I used that to compare with my outputs (as I couldn't get the >> previous implementations to work). In this process I think I found some >> differences from Dridan & Oepen, 2011's description, and this email is to >> confirm those findings. Namely, that mtool's (and now my) implementation do >> the following: >> >> * CARGs are treated as property triples ("class 3 information"). >> Previously they were combined with the predicate name. This change means >> that predicates like 'named' will match even if their CARGs don't and the >> CARGs are a separate thing that needs to be matched. >> >> * The identification of the graph's TOP counts as a triple. >> >> One difference between mtool and delphin.edm is that mtool does not count >> "variable" properties from EDS, but that's just because its EDS parser does >> not yet handle them while PyDelphin's does. >> >> Can anyone familiar with EDM confirm the above? Or can anyone explain how >> to call the Perl or LKB code so I can compare? >> >> -- >> -Michael Wayne Goodman >> > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bec.dridan at gmail.com Fri Jan 17 11:14:28 2020 From: bec.dridan at gmail.com (Bec Dridan) Date: Fri, 17 Jan 2020 21:14:28 +1100 Subject: [developers] EDM implementations In-Reply-To: References: Message-ID: On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com wrote: > > One more detail is what to do when the two sides (gold and test) have > different numbers of items. Currently my code stops as soon as either a > gold or test item is missing, which is what smatch (the similar metric made > for AMR) does, but I think that may be wrong because parsing profiles are > likely to have missing or extra (overgeneration) items in the middle. So > the question is whether we ignore it or count it as a full mismatch. > If you are asking what is 'correct', I guess that depends on why you are evaluating. The perl implementation wouldn't have noticed missing gold parses, because it used the gold set as the definition of the set. A missing test item, on the other hand, by default counts as a full mismatch, but there is a command line option to ignore any gold parse with no corresponding test parse. The ignore option is useful when the purpose of the evaluation is assessing the system you are working on (and you consider coverage separately). For comparing across systems, I imagine you probably want to count parse failure as a full mismatch. It was useful for me to have both options. Bec > > On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan wrote: > >> Wow, that is some old code... From memory, export was a wrapper around >> `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values* >> set. >> >> I don't know the mtool code at all, but re-reading the paper and looking >> at the perl code, I don't think the original implementation evaluated CARG >> at all. We only checked that the correct character span had a pred name >> of`named`. >> >> I think you are right that the triple export at the time did not produce >> a triple for TOP and it hence would not have been counted. >> >> That match your memory Stephan? >> >> Bec >> >> >> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com < >> goodman.m.w at gmail.com> wrote: >> >>> Hello developers, >>> >>> Recently I wanted to try out Elementary Dependency Match (EDM) but I did >>> not find an easy way to do it. I saw lisp code in the LKB's repository and >>> Bec's Perl code, but I'm not sure how to call the former from the command >>> line and the latter seems outdated (I don't see the "export" command >>> required by its instructions). >>> >>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd >>> implement it on top of PyDelphin. The result is here: >>> https://github.com/delph-in/delphin.edm. It requires the latest version >>> of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text >>> files or [incr tsdb()] profiles. >>> >>> When I nearly had my version working I found that Stephan et al.'s mtool >>> (https://github.com/cfmrp The paper >>> example >>> /mtool ) also had an implementation of >>> EDM, so I used that to compare with my outputs (as I couldn't get the >>> previous implementations to work). In this process I think I found some >>> differences from Dridan & Oepen, 2011's description, and this email is to >>> confirm those findings. Namely, that mtool's (and now my) implementation do >>> the following: >>> >>> * CARGs are treated as property triples ("class 3 information"). >>> Previously they were combined with the predicate name. This change means >>> that predicates like 'named' will match even if their CARGs don't and the >>> CARGs are a separate thing that needs to be matched. >>> >>> * The identification of the graph's TOP counts as a triple. >>> >>> One difference between mtool and delphin.edm is that mtool does not >>> count "variable" properties from EDS, but that's just because its EDS >>> parser does not yet handle them while PyDelphin's does. >>> >>> Can anyone familiar with EDM confirm the above? Or can anyone explain >>> how to call the Perl or LKB code so I can compare? >>> >>> -- >>> -Michael Wayne Goodman >>> >> > > -- > -Michael Wayne Goodman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Sat Jan 18 00:41:16 2020 From: ebender at uw.edu (Emily M. Bender) Date: Fri, 17 Jan 2020 15:41:16 -0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Dear Mike, Alas, I'm hitting this error: (run_agg) ebender at patas:~$ delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ Traceback (most recent call last): File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in sys.exit(main()) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", line 42, in main args.func(args) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", line 135, in call_process gzip=args.gzip) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", line 540, in process source = itsdb.TestSuite(source) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py", line 644, in __init__ '*schema* argument is required for new test suites') delphin.itsdb.ITSDBError: *schema* argument is required for new test suites I'll poke around and see where the schema requirement is coming from (nothing in the bit on "process" in the documentation page mentions it), but thought I'd post here too in the meantime. Emily On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com wrote: > Let me know how it goes. > > And a clarification: the --full option on `mkprof` doesn't hurt, but it's > unnecessary since you're re-parsing the created profile. > > Also here's the bug report for the other thing, if you're interested in > that use case: https://github.com/delph-in/pydelphin/issues/273 > > On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender wrote: > >> Thanks, Mike! I will give this a try. >> >> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com < >> goodman.m.w at gmail.com> wrote: >> >>> Hi Emily, >>> >>> For (2), here is how you could do it with PyDelphin: >>> >>> delphin process -g grm.dat original-profile/ >>> delphin mkprof --full --where 'readings > 0' --source >>> original-profile/ new-profile/ >>> delphin process -g grm.dat --full-forest new-profile/ >>> >>> Note that original-profile/ is first parsed in regular (non-forest) >>> mode, because in full-forest mode the number of readings is essentially >>> unknown until they are enumerated and thus the 'readings' field is always >>> 0. The second command not only prunes lines in the 'parse' file with >>> readings == 0, but also lines in the 'item' file which correspond to those >>> 'parse' lines. Once you have created new-profile/, you can parse again with >>> --full-forest for use with FFTB (and of course you don't have to use >>> PyDelphin for the parsing steps, if you prefer other means). >>> >>> Also note that this results in a profile with no edges for partial >>> parses. I think this is what you want. There should be a way to prune the >>> full-forest profile directly while keeping partial parses, but while >>> investigating this use case I found a bug, so I don't recommend it yet. >>> >>> Try `delphin mkprof --help` to see descriptions of these and other >>> options. They map fairly directly to the function documented here: >>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html >>> #mkprof >>> >>> >>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender wrote: >>> >>>> Dear all, >>>> >>>> We are doing some treebanking here at UW with fftb with grammars that >>>> have very low coverage over their associated test corpora. The current >>>> behavior of fftb with these profiles is to include all items for >>>> treebanking, but give a 404 for each one with no parse forest stored. This >>>> necessitates clicking the back button and tracking which one is next (since >>>> nothing changes color). In that light, two questions: >>>> >>>> (1) Is there some option we can pass fftb so that it just doesn't >>>> present items with no parses? >>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr >>>> tsdb()] or something else to export a version of the profiles that only >>>> includes items which the grammar successfully parsed? >>>> >>>> Thanks, >>>> Emily >>>> >>>> >>>> -- >>>> Emily M. Bender (she/her) >>>> Howard and Frances Nostrand Endowed Professor >>>> Department of Linguistics >>>> Faculty Director, CLMS >>>> University of Washington >>>> Twitter: @emilymbender >>>> >>> >>> >>> -- >>> -Michael Wayne Goodman >>> >> -- >> Emily M. Bender (she/her) >> Howard and Frances Nostrand Endowed Professor >> Department of Linguistics >> Faculty Director, CLMS >> University of Washington >> Twitter: @emilymbender >> > > > -- > -Michael Wayne Goodman > -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Sat Jan 18 00:51:48 2020 From: ebender at uw.edu (Emily M. Bender) Date: Fri, 17 Jan 2020 15:51:48 -0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Apologies --- that error meant I hadn't given the right path to the testsuite. Correcting that, I now see: (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$ delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ Traceback (most recent call last): File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in sys.exit(main()) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", line 42, in main args.func(args) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", line 135, in call_process gzip=args.gzip) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", line 542, in process column, tablename, condition = _interpret_selection(select, source) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", line 562, in _interpret_selection if len(queryobj['tables']) == 1: KeyError: 'tables' On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender wrote: > Dear Mike, > > Alas, I'm hitting this error: > > (run_agg) ebender at patas:~$ delphin process -g > ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ > Traceback (most recent call last): > File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in > sys.exit(main()) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", > line 42, in main > args.func(args) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", > line 135, in call_process > gzip=args.gzip) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", > line 540, in process > source = itsdb.TestSuite(source) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py", > line 644, in __init__ > '*schema* argument is required for new test suites') > delphin.itsdb.ITSDBError: *schema* argument is required for new test suites > > I'll poke around and see where the schema requirement is coming from > (nothing in the bit on "process" in the documentation page mentions it), > but thought I'd post here too in the meantime. > > Emily > > On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com < > goodman.m.w at gmail.com> wrote: > >> Let me know how it goes. >> >> And a clarification: the --full option on `mkprof` doesn't hurt, but it's >> unnecessary since you're re-parsing the created profile. >> >> Also here's the bug report for the other thing, if you're interested in >> that use case: https://github.com/delph-in/pydelphin/issues/273 >> >> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender wrote: >> >>> Thanks, Mike! I will give this a try. >>> >>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com < >>> goodman.m.w at gmail.com> wrote: >>> >>>> Hi Emily, >>>> >>>> For (2), here is how you could do it with PyDelphin: >>>> >>>> delphin process -g grm.dat original-profile/ >>>> delphin mkprof --full --where 'readings > 0' --source >>>> original-profile/ new-profile/ >>>> delphin process -g grm.dat --full-forest new-profile/ >>>> >>>> Note that original-profile/ is first parsed in regular (non-forest) >>>> mode, because in full-forest mode the number of readings is essentially >>>> unknown until they are enumerated and thus the 'readings' field is always >>>> 0. The second command not only prunes lines in the 'parse' file with >>>> readings == 0, but also lines in the 'item' file which correspond to those >>>> 'parse' lines. Once you have created new-profile/, you can parse again with >>>> --full-forest for use with FFTB (and of course you don't have to use >>>> PyDelphin for the parsing steps, if you prefer other means). >>>> >>>> Also note that this results in a profile with no edges for partial >>>> parses. I think this is what you want. There should be a way to prune the >>>> full-forest profile directly while keeping partial parses, but while >>>> investigating this use case I found a bug, so I don't recommend it yet. >>>> >>>> Try `delphin mkprof --help` to see descriptions of these and other >>>> options. They map fairly directly to the function documented here: >>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html >>>> #mkprof >>>> >>>> >>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender wrote: >>>> >>>>> Dear all, >>>>> >>>>> We are doing some treebanking here at UW with fftb with grammars that >>>>> have very low coverage over their associated test corpora. The current >>>>> behavior of fftb with these profiles is to include all items for >>>>> treebanking, but give a 404 for each one with no parse forest stored. This >>>>> necessitates clicking the back button and tracking which one is next (since >>>>> nothing changes color). In that light, two questions: >>>>> >>>>> (1) Is there some option we can pass fftb so that it just doesn't >>>>> present items with no parses? >>>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr >>>>> tsdb()] or something else to export a version of the profiles that only >>>>> includes items which the grammar successfully parsed? >>>>> >>>>> Thanks, >>>>> Emily >>>>> >>>>> >>>>> -- >>>>> Emily M. Bender (she/her) >>>>> Howard and Frances Nostrand Endowed Professor >>>>> Department of Linguistics >>>>> Faculty Director, CLMS >>>>> University of Washington >>>>> Twitter: @emilymbender >>>>> >>>> >>>> >>>> -- >>>> -Michael Wayne Goodman >>>> >>> -- >>> Emily M. Bender (she/her) >>> Howard and Frances Nostrand Endowed Professor >>> Department of Linguistics >>> Faculty Director, CLMS >>> University of Washington >>> Twitter: @emilymbender >>> >> >> >> -- >> -Michael Wayne Goodman >> > > > -- > Emily M. Bender (she/her) > Howard and Frances Nostrand Endowed Professor > Department of Linguistics > Faculty Director, CLMS > University of Washington > Twitter: @emilymbender > -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Sat Jan 18 01:17:13 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Sat, 18 Jan 2020 08:17:13 +0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Hi Emily, Yes those error messages are not very clear. But the second one looks like old code, as 'tables' is no longer a key in the object it's being looked up on. I suggest making sure that your run_agg environment has an updated version of PyDelphin. While the environment is active, try `pip install -U pydelphin` and make sure it has a 1.0 or newer version (`delphin --version`), then try again. On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender wrote: > Apologies --- that error meant I hadn't given the right path to the > testsuite. Correcting that, I now see: > > (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$ > delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ > Traceback (most recent call last): > File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in > sys.exit(main()) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", > line 42, in main > args.func(args) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", > line 135, in call_process > gzip=args.gzip) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", > line 542, in process > column, tablename, condition = _interpret_selection(select, source) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", > line 562, in _interpret_selection > if len(queryobj['tables']) == 1: > KeyError: 'tables' > > On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender wrote: > >> Dear Mike, >> >> Alas, I'm hitting this error: >> >> (run_agg) ebender at patas:~$ delphin process -g >> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ >> Traceback (most recent call last): >> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >> sys.exit(main()) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >> line 42, in main >> args.func(args) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >> line 135, in call_process >> gzip=args.gzip) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >> line 540, in process >> source = itsdb.TestSuite(source) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py", >> line 644, in __init__ >> '*schema* argument is required for new test suites') >> delphin.itsdb.ITSDBError: *schema* argument is required for new test >> suites >> >> I'll poke around and see where the schema requirement is coming from >> (nothing in the bit on "process" in the documentation page mentions it), >> but thought I'd post here too in the meantime. >> >> Emily >> >> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com < >> goodman.m.w at gmail.com> wrote: >> >>> Let me know how it goes. >>> >>> And a clarification: the --full option on `mkprof` doesn't hurt, but >>> it's unnecessary since you're re-parsing the created profile. >>> >>> Also here's the bug report for the other thing, if you're interested in >>> that use case: https://github.com/delph-in/pydelphin/issues/273 >>> >>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender wrote: >>> >>>> Thanks, Mike! I will give this a try. >>>> >>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com < >>>> goodman.m.w at gmail.com> wrote: >>>> >>>>> Hi Emily, >>>>> >>>>> For (2), here is how you could do it with PyDelphin: >>>>> >>>>> delphin process -g grm.dat original-profile/ >>>>> delphin mkprof --full --where 'readings > 0' --source >>>>> original-profile/ new-profile/ >>>>> delphin process -g grm.dat --full-forest new-profile/ >>>>> >>>>> Note that original-profile/ is first parsed in regular (non-forest) >>>>> mode, because in full-forest mode the number of readings is essentially >>>>> unknown until they are enumerated and thus the 'readings' field is always >>>>> 0. The second command not only prunes lines in the 'parse' file with >>>>> readings == 0, but also lines in the 'item' file which correspond to those >>>>> 'parse' lines. Once you have created new-profile/, you can parse again with >>>>> --full-forest for use with FFTB (and of course you don't have to use >>>>> PyDelphin for the parsing steps, if you prefer other means). >>>>> >>>>> Also note that this results in a profile with no edges for partial >>>>> parses. I think this is what you want. There should be a way to prune the >>>>> full-forest profile directly while keeping partial parses, but while >>>>> investigating this use case I found a bug, so I don't recommend it yet. >>>>> >>>>> Try `delphin mkprof --help` to see descriptions of these and other >>>>> options. They map fairly directly to the function documented here: >>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html >>>>> #mkprof >>>>> >>>>> >>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender >>>>> wrote: >>>>> >>>>>> Dear all, >>>>>> >>>>>> We are doing some treebanking here at UW with fftb with grammars that >>>>>> have very low coverage over their associated test corpora. The current >>>>>> behavior of fftb with these profiles is to include all items for >>>>>> treebanking, but give a 404 for each one with no parse forest stored. This >>>>>> necessitates clicking the back button and tracking which one is next (since >>>>>> nothing changes color). In that light, two questions: >>>>>> >>>>>> (1) Is there some option we can pass fftb so that it just doesn't >>>>>> present items with no parses? >>>>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr >>>>>> tsdb()] or something else to export a version of the profiles that only >>>>>> includes items which the grammar successfully parsed? >>>>>> >>>>>> Thanks, >>>>>> Emily >>>>>> >>>>>> >>>>>> -- >>>>>> Emily M. Bender (she/her) >>>>>> Howard and Frances Nostrand Endowed Professor >>>>>> Department of Linguistics >>>>>> Faculty Director, CLMS >>>>>> University of Washington >>>>>> Twitter: @emilymbender >>>>>> >>>>> >>>>> >>>>> -- >>>>> -Michael Wayne Goodman >>>>> >>>> -- >>>> Emily M. Bender (she/her) >>>> Howard and Frances Nostrand Endowed Professor >>>> Department of Linguistics >>>> Faculty Director, CLMS >>>> University of Washington >>>> Twitter: @emilymbender >>>> >>> >>> >>> -- >>> -Michael Wayne Goodman >>> >> >> >> -- >> Emily M. Bender (she/her) >> Howard and Frances Nostrand Endowed Professor >> Department of Linguistics >> Faculty Director, CLMS >> University of Washington >> Twitter: @emilymbender >> > > > -- > Emily M. Bender (she/her) > Howard and Frances Nostrand Endowed Professor > Department of Linguistics > Faculty Director, CLMS > University of Washington > Twitter: @emilymbender > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Mon Jan 20 02:14:53 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Mon, 20 Jan 2020 09:14:53 +0800 Subject: [developers] EDM implementations In-Reply-To: References: Message-ID: Thanks again, Bec. I just want to make sure my implementation gets the same scores for the same inputs under the same assumptions as the original implementation. For this to work, its behavior concerning the points I've sought clarification for should be intentional. In light of your responses, I've separated the CARG triples from other properties and have given it its own weight. Thus I should be able to get the same scores as your code by setting the weights of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add an option to ignore missing test items and otherwise treat them as mismatches. On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan wrote: > > > On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com < > goodman.m.w at gmail.com> wrote: > >> >> One more detail is what to do when the two sides (gold and test) have >> different numbers of items. Currently my code stops as soon as either a >> gold or test item is missing, which is what smatch (the similar metric made >> for AMR) does, but I think that may be wrong because parsing profiles are >> likely to have missing or extra (overgeneration) items in the middle. So >> the question is whether we ignore it or count it as a full mismatch. >> > > If you are asking what is 'correct', I guess that depends on why you are > evaluating. The perl implementation wouldn't have noticed missing gold > parses, because it used the gold set as the definition of the set. A > missing test item, on the other hand, by default counts as a full mismatch, > but there is a command line option to ignore any gold parse with no > corresponding test parse. The ignore option is useful when the purpose of > the evaluation is assessing the system you are working on (and you consider > coverage separately). For comparing across systems, I imagine you probably > want to count parse failure as a full mismatch. It was useful for me to > have both options. > > Bec > > >> >> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan wrote: >> >>> Wow, that is some old code... From memory, export was a wrapper around >>> `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values* >>> set. >>> >>> I don't know the mtool code at all, but re-reading the paper and looking >>> at the perl code, I don't think the original implementation evaluated CARG >>> at all. We only checked that the correct character span had a pred name >>> of`named`. >>> >>> I think you are right that the triple export at the time did not produce >>> a triple for TOP and it hence would not have been counted. >>> >>> That match your memory Stephan? >>> >>> Bec >>> >>> >>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com < >>> goodman.m.w at gmail.com> wrote: >>> >>>> Hello developers, >>>> >>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I >>>> did not find an easy way to do it. I saw lisp code in the LKB's repository >>>> and Bec's Perl code, but I'm not sure how to call the former from the >>>> command line and the latter seems outdated (I don't see the "export" >>>> command required by its instructions). >>>> >>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd >>>> implement it on top of PyDelphin. The result is here: >>>> https://github.com/delph-in/delphin.edm. It requires the latest >>>> version of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it >>>> reads text files or [incr tsdb()] profiles. >>>> >>>> When I nearly had my version working I found that Stephan et al.'s >>>> mtool (https://github.com/cfmrp The >>>> paper example >>>> /mtool ) also had an implementation of >>>> EDM, so I used that to compare with my outputs (as I couldn't get the >>>> previous implementations to work). In this process I think I found some >>>> differences from Dridan & Oepen, 2011's description, and this email is to >>>> confirm those findings. Namely, that mtool's (and now my) implementation do >>>> the following: >>>> >>>> * CARGs are treated as property triples ("class 3 information"). >>>> Previously they were combined with the predicate name. This change means >>>> that predicates like 'named' will match even if their CARGs don't and the >>>> CARGs are a separate thing that needs to be matched. >>>> >>>> * The identification of the graph's TOP counts as a triple. >>>> >>>> One difference between mtool and delphin.edm is that mtool does not >>>> count "variable" properties from EDS, but that's just because its EDS >>>> parser does not yet handle them while PyDelphin's does. >>>> >>>> Can anyone familiar with EDM confirm the above? Or can anyone explain >>>> how to call the Perl or LKB code so I can compare? >>>> >>>> -- >>>> -Michael Wayne Goodman >>>> >>> >> >> -- >> -Michael Wayne Goodman >> > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Wed Jan 22 02:19:18 2020 From: ebender at uw.edu (Emily M. Bender) Date: Tue, 21 Jan 2020 17:19:18 -0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Updating PyDelphin caused the error to change, at least: Traceback (most recent call last): File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in sys.exit(main()) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", line 40, in main args.func(args) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/cli/process.py", line 46, in call_process gzip=args.gzip) File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", line 602, in process with processor(grammar, cmdargs=options, **kwargs) as cpu: File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py", line 110, in __init__ self._open() File "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py", line 143, in _open raise ACEProcessError('Process closed on startup; see .') delphin.ace.ACEProcessError: Process closed on startup; see . ... I'm a little puzzled as to how I got about "seeing " as I didn't myself do anything to redirect it somewhere else, but it's not printing to the console... Emily On Fri, Jan 17, 2020 at 4:17 PM goodman.m.w at gmail.com wrote: > Hi Emily, > > Yes those error messages are not very clear. But the second one looks like > old code, as 'tables' is no longer a key in the object it's being looked up > on. I suggest making sure that your run_agg environment has an updated > version of PyDelphin. While the environment is active, try `pip install -U > pydelphin` and make sure it has a 1.0 or newer version (`delphin > --version`), then try again. > > On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender wrote: > >> Apologies --- that error meant I hadn't given the right path to the >> testsuite. Correcting that, I now see: >> >> (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$ >> delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ >> Traceback (most recent call last): >> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >> sys.exit(main()) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >> line 42, in main >> args.func(args) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >> line 135, in call_process >> gzip=args.gzip) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >> line 542, in process >> column, tablename, condition = _interpret_selection(select, source) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >> line 562, in _interpret_selection >> if len(queryobj['tables']) == 1: >> KeyError: 'tables' >> >> On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender wrote: >> >>> Dear Mike, >>> >>> Alas, I'm hitting this error: >>> >>> (run_agg) ebender at patas:~$ delphin process -g >>> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ >>> Traceback (most recent call last): >>> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >>> sys.exit(main()) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>> line 42, in main >>> args.func(args) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>> line 135, in call_process >>> gzip=args.gzip) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >>> line 540, in process >>> source = itsdb.TestSuite(source) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py", >>> line 644, in __init__ >>> '*schema* argument is required for new test suites') >>> delphin.itsdb.ITSDBError: *schema* argument is required for new test >>> suites >>> >>> I'll poke around and see where the schema requirement is coming from >>> (nothing in the bit on "process" in the documentation page mentions it), >>> but thought I'd post here too in the meantime. >>> >>> Emily >>> >>> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com < >>> goodman.m.w at gmail.com> wrote: >>> >>>> Let me know how it goes. >>>> >>>> And a clarification: the --full option on `mkprof` doesn't hurt, but >>>> it's unnecessary since you're re-parsing the created profile. >>>> >>>> Also here's the bug report for the other thing, if you're interested in >>>> that use case: https://github.com/delph-in/pydelphin/issues/273 >>>> >>>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender >>>> wrote: >>>> >>>>> Thanks, Mike! I will give this a try. >>>>> >>>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com < >>>>> goodman.m.w at gmail.com> wrote: >>>>> >>>>>> Hi Emily, >>>>>> >>>>>> For (2), here is how you could do it with PyDelphin: >>>>>> >>>>>> delphin process -g grm.dat original-profile/ >>>>>> delphin mkprof --full --where 'readings > 0' --source >>>>>> original-profile/ new-profile/ >>>>>> delphin process -g grm.dat --full-forest new-profile/ >>>>>> >>>>>> Note that original-profile/ is first parsed in regular (non-forest) >>>>>> mode, because in full-forest mode the number of readings is essentially >>>>>> unknown until they are enumerated and thus the 'readings' field is always >>>>>> 0. The second command not only prunes lines in the 'parse' file with >>>>>> readings == 0, but also lines in the 'item' file which correspond to those >>>>>> 'parse' lines. Once you have created new-profile/, you can parse again with >>>>>> --full-forest for use with FFTB (and of course you don't have to use >>>>>> PyDelphin for the parsing steps, if you prefer other means). >>>>>> >>>>>> Also note that this results in a profile with no edges for partial >>>>>> parses. I think this is what you want. There should be a way to prune the >>>>>> full-forest profile directly while keeping partial parses, but while >>>>>> investigating this use case I found a bug, so I don't recommend it yet. >>>>>> >>>>>> Try `delphin mkprof --help` to see descriptions of these and other >>>>>> options. They map fairly directly to the function documented here: >>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html >>>>>> #mkprof >>>>>> >>>>>> >>>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender >>>>>> wrote: >>>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> We are doing some treebanking here at UW with fftb with grammars >>>>>>> that have very low coverage over their associated test corpora. The current >>>>>>> behavior of fftb with these profiles is to include all items for >>>>>>> treebanking, but give a 404 for each one with no parse forest stored. This >>>>>>> necessitates clicking the back button and tracking which one is next (since >>>>>>> nothing changes color). In that light, two questions: >>>>>>> >>>>>>> (1) Is there some option we can pass fftb so that it just doesn't >>>>>>> present items with no parses? >>>>>>> (2) Failing that, is it fairly straightforward with pydelphin, [incr >>>>>>> tsdb()] or something else to export a version of the profiles that only >>>>>>> includes items which the grammar successfully parsed? >>>>>>> >>>>>>> Thanks, >>>>>>> Emily >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Emily M. Bender (she/her) >>>>>>> Howard and Frances Nostrand Endowed Professor >>>>>>> Department of Linguistics >>>>>>> Faculty Director, CLMS >>>>>>> University of Washington >>>>>>> Twitter: @emilymbender >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -Michael Wayne Goodman >>>>>> >>>>> -- >>>>> Emily M. Bender (she/her) >>>>> Howard and Frances Nostrand Endowed Professor >>>>> Department of Linguistics >>>>> Faculty Director, CLMS >>>>> University of Washington >>>>> Twitter: @emilymbender >>>>> >>>> >>>> >>>> -- >>>> -Michael Wayne Goodman >>>> >>> >>> >>> -- >>> Emily M. Bender (she/her) >>> Howard and Frances Nostrand Endowed Professor >>> Department of Linguistics >>> Faculty Director, CLMS >>> University of Washington >>> Twitter: @emilymbender >>> >> >> >> -- >> Emily M. Bender (she/her) >> Howard and Frances Nostrand Endowed Professor >> Department of Linguistics >> Faculty Director, CLMS >> University of Washington >> Twitter: @emilymbender >> > > > -- > -Michael Wayne Goodman > -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Wed Jan 22 02:36:07 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Wed, 22 Jan 2020 09:36:07 +0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Ok, that's progress. This error shows up when ACE exited abnormally (recall that PyDelphin calls ACE in a subprocess in order to process profiles). Since I don't capture ACE's stderr, it should be printed in the terminal just above the stacktrace. The PyDelphin error is directing you to look for that message to fix the problem. Most likely, the path to the grammar image is incorrect or the grammar image was compiled with a different version of ACE. The stacktrace and error message are both printed to stderr so if you see one you should see both (unless ACE exited abnormally without printing anything). By the way, I've recently pushed some commits to suppress the stacktrace when encountering anticipated errors from the `delphin` command, as I don't think the stacktrace is useful except for me (it can be shown again when called in DEBUG mode). In addition I tried to provide more useful messages for common situations. These changes will be part of the next release. On Wed, Jan 22, 2020 at 9:19 AM Emily M. Bender wrote: > Updating PyDelphin caused the error to change, at least: > > Traceback (most recent call last): > File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in > sys.exit(main()) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", > line 40, in main > args.func(args) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/cli/process.py", > line 46, in call_process > gzip=args.gzip) > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", > line 602, in process > with processor(grammar, cmdargs=options, **kwargs) as cpu: > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py", > line 110, in __init__ > self._open() > File > "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py", > line 143, in _open > raise ACEProcessError('Process closed on startup; see .') > delphin.ace.ACEProcessError: Process closed on startup; see . > > > ... I'm a little puzzled as to how I got about "seeing " as I > didn't myself do anything to redirect it somewhere else, but it's not > printing to the console... > > Emily > > On Fri, Jan 17, 2020 at 4:17 PM goodman.m.w at gmail.com < > goodman.m.w at gmail.com> wrote: > >> Hi Emily, >> >> Yes those error messages are not very clear. But the second one looks >> like old code, as 'tables' is no longer a key in the object it's being >> looked up on. I suggest making sure that your run_agg environment has an >> updated version of PyDelphin. While the environment is active, try `pip >> install -U pydelphin` and make sure it has a 1.0 or newer version (`delphin >> --version`), then try again. >> >> On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender wrote: >> >>> Apologies --- that error meant I hadn't given the right path to the >>> testsuite. Correcting that, I now see: >>> >>> (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$ >>> delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ >>> Traceback (most recent call last): >>> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >>> sys.exit(main()) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>> line 42, in main >>> args.func(args) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>> line 135, in call_process >>> gzip=args.gzip) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >>> line 542, in process >>> column, tablename, condition = _interpret_selection(select, source) >>> File >>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >>> line 562, in _interpret_selection >>> if len(queryobj['tables']) == 1: >>> KeyError: 'tables' >>> >>> On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender wrote: >>> >>>> Dear Mike, >>>> >>>> Alas, I'm hitting this error: >>>> >>>> (run_agg) ebender at patas:~$ delphin process -g >>>> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ >>>> Traceback (most recent call last): >>>> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >>>> sys.exit(main()) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>>> line 42, in main >>>> args.func(args) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>>> line 135, in call_process >>>> gzip=args.gzip) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >>>> line 540, in process >>>> source = itsdb.TestSuite(source) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py", >>>> line 644, in __init__ >>>> '*schema* argument is required for new test suites') >>>> delphin.itsdb.ITSDBError: *schema* argument is required for new test >>>> suites >>>> >>>> I'll poke around and see where the schema requirement is coming from >>>> (nothing in the bit on "process" in the documentation page mentions it), >>>> but thought I'd post here too in the meantime. >>>> >>>> Emily >>>> >>>> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com < >>>> goodman.m.w at gmail.com> wrote: >>>> >>>>> Let me know how it goes. >>>>> >>>>> And a clarification: the --full option on `mkprof` doesn't hurt, but >>>>> it's unnecessary since you're re-parsing the created profile. >>>>> >>>>> Also here's the bug report for the other thing, if you're interested >>>>> in that use case: https://github.com/delph-in/pydelphin/issues/273 >>>>> >>>>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender >>>>> wrote: >>>>> >>>>>> Thanks, Mike! I will give this a try. >>>>>> >>>>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com < >>>>>> goodman.m.w at gmail.com> wrote: >>>>>> >>>>>>> Hi Emily, >>>>>>> >>>>>>> For (2), here is how you could do it with PyDelphin: >>>>>>> >>>>>>> delphin process -g grm.dat original-profile/ >>>>>>> delphin mkprof --full --where 'readings > 0' --source >>>>>>> original-profile/ new-profile/ >>>>>>> delphin process -g grm.dat --full-forest new-profile/ >>>>>>> >>>>>>> Note that original-profile/ is first parsed in regular (non-forest) >>>>>>> mode, because in full-forest mode the number of readings is essentially >>>>>>> unknown until they are enumerated and thus the 'readings' field is always >>>>>>> 0. The second command not only prunes lines in the 'parse' file with >>>>>>> readings == 0, but also lines in the 'item' file which correspond to those >>>>>>> 'parse' lines. Once you have created new-profile/, you can parse again with >>>>>>> --full-forest for use with FFTB (and of course you don't have to use >>>>>>> PyDelphin for the parsing steps, if you prefer other means). >>>>>>> >>>>>>> Also note that this results in a profile with no edges for partial >>>>>>> parses. I think this is what you want. There should be a way to prune the >>>>>>> full-forest profile directly while keeping partial parses, but while >>>>>>> investigating this use case I found a bug, so I don't recommend it yet. >>>>>>> >>>>>>> Try `delphin mkprof --help` to see descriptions of these and other >>>>>>> options. They map fairly directly to the function documented here: >>>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html >>>>>>> #mkprof >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender >>>>>>> wrote: >>>>>>> >>>>>>>> Dear all, >>>>>>>> >>>>>>>> We are doing some treebanking here at UW with fftb with grammars >>>>>>>> that have very low coverage over their associated test corpora. The current >>>>>>>> behavior of fftb with these profiles is to include all items for >>>>>>>> treebanking, but give a 404 for each one with no parse forest stored. This >>>>>>>> necessitates clicking the back button and tracking which one is next (since >>>>>>>> nothing changes color). In that light, two questions: >>>>>>>> >>>>>>>> (1) Is there some option we can pass fftb so that it just doesn't >>>>>>>> present items with no parses? >>>>>>>> (2) Failing that, is it fairly straightforward with pydelphin, >>>>>>>> [incr tsdb()] or something else to export a version of the profiles that >>>>>>>> only includes items which the grammar successfully parsed? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Emily >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Emily M. Bender (she/her) >>>>>>>> Howard and Frances Nostrand Endowed Professor >>>>>>>> Department of Linguistics >>>>>>>> Faculty Director, CLMS >>>>>>>> University of Washington >>>>>>>> Twitter: @emilymbender >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> -Michael Wayne Goodman >>>>>>> >>>>>> -- >>>>>> Emily M. Bender (she/her) >>>>>> Howard and Frances Nostrand Endowed Professor >>>>>> Department of Linguistics >>>>>> Faculty Director, CLMS >>>>>> University of Washington >>>>>> Twitter: @emilymbender >>>>>> >>>>> >>>>> >>>>> -- >>>>> -Michael Wayne Goodman >>>>> >>>> >>>> >>>> -- >>>> Emily M. Bender (she/her) >>>> Howard and Frances Nostrand Endowed Professor >>>> Department of Linguistics >>>> Faculty Director, CLMS >>>> University of Washington >>>> Twitter: @emilymbender >>>> >>> >>> >>> -- >>> Emily M. Bender (she/her) >>> Howard and Frances Nostrand Endowed Professor >>> Department of Linguistics >>> Faculty Director, CLMS >>> University of Washington >>> Twitter: @emilymbender >>> >> >> >> -- >> -Michael Wayne Goodman >> > > > -- > Emily M. Bender (she/her) > Howard and Frances Nostrand Endowed Professor > Department of Linguistics > Faculty Director, CLMS > University of Washington > Twitter: @emilymbender > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Wed Jan 22 02:39:57 2020 From: ebender at uw.edu (Emily M. Bender) Date: Tue, 21 Jan 2020 17:39:57 -0800 Subject: [developers] Skipping non-parsed items in fftb In-Reply-To: References: Message-ID: Nice -- got it working! (In fact, it was a permissions error on the .dat file.) Yes, I think that the trace isn't that helpful, especially given that it functionally hid the actionable bit of information from me. Thanks for your help! Emily On Tue, Jan 21, 2020 at 5:36 PM goodman.m.w at gmail.com wrote: > Ok, that's progress. This error shows up when ACE exited abnormally > (recall that PyDelphin calls ACE in a subprocess in order to process > profiles). Since I don't capture ACE's stderr, it should be printed in the > terminal just above the stacktrace. The PyDelphin error is directing you to > look for that message to fix the problem. Most likely, the path to the > grammar image is incorrect or the grammar image was compiled with a > different version of ACE. The stacktrace and error message are both printed > to stderr so if you see one you should see both (unless ACE exited > abnormally without printing anything). > > By the way, I've recently pushed some commits to suppress the stacktrace > when encountering anticipated errors from the `delphin` command, as I don't > think the stacktrace is useful except for me (it can be shown again when > called in DEBUG mode). In addition I tried to provide more useful messages > for common situations. These changes will be part of the next release. > > On Wed, Jan 22, 2020 at 9:19 AM Emily M. Bender wrote: > >> Updating PyDelphin caused the error to change, at least: >> >> Traceback (most recent call last): >> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >> sys.exit(main()) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >> line 40, in main >> args.func(args) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/cli/process.py", >> line 46, in call_process >> gzip=args.gzip) >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >> line 602, in process >> with processor(grammar, cmdargs=options, **kwargs) as cpu: >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py", >> line 110, in __init__ >> self._open() >> File >> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/ace.py", >> line 143, in _open >> raise ACEProcessError('Process closed on startup; see .') >> delphin.ace.ACEProcessError: Process closed on startup; see . >> >> >> ... I'm a little puzzled as to how I got about "seeing " as I >> didn't myself do anything to redirect it somewhere else, but it's not >> printing to the console... >> >> Emily >> >> On Fri, Jan 17, 2020 at 4:17 PM goodman.m.w at gmail.com < >> goodman.m.w at gmail.com> wrote: >> >>> Hi Emily, >>> >>> Yes those error messages are not very clear. But the second one looks >>> like old code, as 'tables' is no longer a key in the object it's being >>> looked up on. I suggest making sure that your run_agg environment has an >>> updated version of PyDelphin. While the environment is active, try `pip >>> install -U pydelphin` and make sure it has a 1.0 or newer version (`delphin >>> --version`), then try again. >>> >>> On Sat, Jan 18, 2020 at 7:52 AM Emily M. Bender wrote: >>> >>>> Apologies --- that error meant I hadn't given the right path to the >>>> testsuite. Correcting that, I now see: >>>> >>>> (run_agg) ebender at patas:/home2/kphowell/run_aggregation/output/emb_treebank$ >>>> delphin process -g ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ >>>> Traceback (most recent call last): >>>> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >>>> sys.exit(main()) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>>> line 42, in main >>>> args.func(args) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>>> line 135, in call_process >>>> gzip=args.gzip) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >>>> line 542, in process >>>> column, tablename, condition = _interpret_selection(select, source) >>>> File >>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >>>> line 562, in _interpret_selection >>>> if len(queryobj['tables']) == 1: >>>> KeyError: 'tables' >>>> >>>> On Fri, Jan 17, 2020 at 3:41 PM Emily M. Bender wrote: >>>> >>>>> Dear Mike, >>>>> >>>>> Alas, I'm hitting this error: >>>>> >>>>> (run_agg) ebender at patas:~$ delphin process -g >>>>> ctn1_grammar_fixed/ace/ctn1.dat ctn_orig/ >>>>> Traceback (most recent call last): >>>>> File "/home2/kphowell/Envs/run_agg/bin/delphin", line 11, in >>>>> sys.exit(main()) >>>>> File >>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>>>> line 42, in main >>>>> args.func(args) >>>>> File >>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/main.py", >>>>> line 135, in call_process >>>>> gzip=args.gzip) >>>>> File >>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/commands.py", >>>>> line 540, in process >>>>> source = itsdb.TestSuite(source) >>>>> File >>>>> "/home2/kphowell/Envs/run_agg/lib/python3.6/site-packages/delphin/itsdb.py", >>>>> line 644, in __init__ >>>>> '*schema* argument is required for new test suites') >>>>> delphin.itsdb.ITSDBError: *schema* argument is required for new test >>>>> suites >>>>> >>>>> I'll poke around and see where the schema requirement is coming from >>>>> (nothing in the bit on "process" in the documentation page mentions it), >>>>> but thought I'd post here too in the meantime. >>>>> >>>>> Emily >>>>> >>>>> On Thu, Jan 16, 2020 at 6:46 PM goodman.m.w at gmail.com < >>>>> goodman.m.w at gmail.com> wrote: >>>>> >>>>>> Let me know how it goes. >>>>>> >>>>>> And a clarification: the --full option on `mkprof` doesn't hurt, but >>>>>> it's unnecessary since you're re-parsing the created profile. >>>>>> >>>>>> Also here's the bug report for the other thing, if you're interested >>>>>> in that use case: https://github.com/delph-in/pydelphin/issues/273 >>>>>> >>>>>> On Fri, Jan 17, 2020 at 10:37 AM Emily M. Bender >>>>>> wrote: >>>>>> >>>>>>> Thanks, Mike! I will give this a try. >>>>>>> >>>>>>> On Thu, Jan 16, 2020 at 6:33 PM goodman.m.w at gmail.com < >>>>>>> goodman.m.w at gmail.com> wrote: >>>>>>> >>>>>>>> Hi Emily, >>>>>>>> >>>>>>>> For (2), here is how you could do it with PyDelphin: >>>>>>>> >>>>>>>> delphin process -g grm.dat original-profile/ >>>>>>>> delphin mkprof --full --where 'readings > 0' --source >>>>>>>> original-profile/ new-profile/ >>>>>>>> delphin process -g grm.dat --full-forest new-profile/ >>>>>>>> >>>>>>>> Note that original-profile/ is first parsed in regular (non-forest) >>>>>>>> mode, because in full-forest mode the number of readings is essentially >>>>>>>> unknown until they are enumerated and thus the 'readings' field is always >>>>>>>> 0. The second command not only prunes lines in the 'parse' file with >>>>>>>> readings == 0, but also lines in the 'item' file which correspond to those >>>>>>>> 'parse' lines. Once you have created new-profile/, you can parse again with >>>>>>>> --full-forest for use with FFTB (and of course you don't have to use >>>>>>>> PyDelphin for the parsing steps, if you prefer other means). >>>>>>>> >>>>>>>> Also note that this results in a profile with no edges for partial >>>>>>>> parses. I think this is what you want. There should be a way to prune the >>>>>>>> full-forest profile directly while keeping partial parses, but while >>>>>>>> investigating this use case I found a bug, so I don't recommend it yet. >>>>>>>> >>>>>>>> Try `delphin mkprof --help` to see descriptions of these and other >>>>>>>> options. They map fairly directly to the function documented here: >>>>>>>> https://pydelphin.readthedocs.io/en/latest/api/delphin.commands.html >>>>>>>> #mkprof >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 17, 2020 at 8:44 AM Emily M. Bender >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> We are doing some treebanking here at UW with fftb with grammars >>>>>>>>> that have very low coverage over their associated test corpora. The current >>>>>>>>> behavior of fftb with these profiles is to include all items for >>>>>>>>> treebanking, but give a 404 for each one with no parse forest stored. This >>>>>>>>> necessitates clicking the back button and tracking which one is next (since >>>>>>>>> nothing changes color). In that light, two questions: >>>>>>>>> >>>>>>>>> (1) Is there some option we can pass fftb so that it just doesn't >>>>>>>>> present items with no parses? >>>>>>>>> (2) Failing that, is it fairly straightforward with pydelphin, >>>>>>>>> [incr tsdb()] or something else to export a version of the profiles that >>>>>>>>> only includes items which the grammar successfully parsed? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Emily >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Emily M. Bender (she/her) >>>>>>>>> Howard and Frances Nostrand Endowed Professor >>>>>>>>> Department of Linguistics >>>>>>>>> Faculty Director, CLMS >>>>>>>>> University of Washington >>>>>>>>> Twitter: @emilymbender >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -Michael Wayne Goodman >>>>>>>> >>>>>>> -- >>>>>>> Emily M. Bender (she/her) >>>>>>> Howard and Frances Nostrand Endowed Professor >>>>>>> Department of Linguistics >>>>>>> Faculty Director, CLMS >>>>>>> University of Washington >>>>>>> Twitter: @emilymbender >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -Michael Wayne Goodman >>>>>> >>>>> >>>>> >>>>> -- >>>>> Emily M. Bender (she/her) >>>>> Howard and Frances Nostrand Endowed Professor >>>>> Department of Linguistics >>>>> Faculty Director, CLMS >>>>> University of Washington >>>>> Twitter: @emilymbender >>>>> >>>> >>>> >>>> -- >>>> Emily M. Bender (she/her) >>>> Howard and Frances Nostrand Endowed Professor >>>> Department of Linguistics >>>> Faculty Director, CLMS >>>> University of Washington >>>> Twitter: @emilymbender >>>> >>> >>> >>> -- >>> -Michael Wayne Goodman >>> >> >> >> -- >> Emily M. Bender (she/her) >> Howard and Frances Nostrand Endowed Professor >> Department of Linguistics >> Faculty Director, CLMS >> University of Washington >> Twitter: @emilymbender >> > > > -- > -Michael Wayne Goodman > -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Mon Jan 27 18:42:15 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Mon, 27 Jan 2020 18:42:15 +0100 Subject: [developers] EDM implementations In-Reply-To: References: Message-ID: hi mike, belatedly, thanks (once again) for pushing forward standardization! and also my apologies for returning to this thread a little late! regarding EDM, i used to think of the Common-Lisp implementation (which it appears i produced in early 2012, i.e. more recently than the Perl version by bec) as the reference until recently. last year, when comparing its scores to my re-implementation in Python as part of mtool, that comparison also turned up the two questions you raised, viz. the treatment of the TOP property and how to score parameterized predicates. regarding the first, this appears to be one of the better-kept secrets in meaning representation comparison: in my view, it is a semantically highly relevant property (marking the contrast between e.g. 'all fierce dogs bark' vs. 'all barking dogs are fierce'), but neither the original EDM paper nor its derivative in the AMR world (Cai & Knight, 2013) discuss it. yet, both the Lisp implementation of EDM and SMATCH seem to always have scored the TOP node as an additional tuple (counted among the 'argument' tuples for EDM, while considered among the 'attribute' tuples in SMATCH). the Perl implementation of EDM, on the other hand, worked off my 'ltriples' export format for EDS, which appears to not include a separate TOP tuple. i confirmed the nature of those triples by reminding myself of what became of the 'export' script mentioned in the original EDM wiki notes you had found. it was folded into the LOGON 'redwoods' script, so something like the following actually works today to prepare the input for the Perl implementation of EDM: $LOGONROOT/redwoods --erg --export ltriples --target /tmp mrs i attach the output for item #21 from the MRS test suite, for reference. so, i agree with the conclusion bec and you have already reached: the original Perl implementation of EDM did not consider TOP tuples. the Lisp implementation, on the other hand, appears to have had TOP tuples from its very beginning. regarding the second design choice you raise, parameterized relations (involving one or more constant arguments), it appears that both the Lisp and Perl implementations of EDM do the same thing, viz. assume that there can be at most one constant argument in a relation and 'inline' its value (if present) with the predicate itself, e.g. internally using node label shorthands like 'named(Abrams)'. in this regard, i suspect bec and you actually may have arrived at the wrong conclusion about historic behavior; thus, personally, i see no reason for pyDelphin to provide a special-cased version of EDM that wholly ignores constant arguments. looking at this particular design choice today, however, it seems too limiting an assumption and meshing together two things that arguably should be considered separate. even though ERG versions for the past 15 or more years have not used predicates with multiple (constant) parameters, there would be nothing wrong with representing, say, the fraction '2/3' as involving two constant arguments, e.g. something like fraction [ CARG1 "2", CARG2 "3" ]. this is, for example, what AMR does for complex proper names. thus, even though our two historic EDM implementations appear to agree on the 'inlining' treatment of constant arguments, i would be prepared to argue that CARG et al. values should rather be treated as separate node properties, i.e. for the above example the 'named' predicate and the 'CARG' == 'Abrams' value should be treated as two distinct tuples. in part for cross-framework compatibility, this is what we ended up doing in mtool, including in its re-implementation of EDM, see: http://mrp.nlpl.eu/index.php?page=5 in summary, it sounds as if your EDM re-implementation, mike, had arrived at the same conclusions: TOP tuples should be scored, and constant arguments considered as separate properties. i would expect your implementation and mtool should then come to the exact same results (on EDSs stripped of MRS variable properties, which the current mtool EDS reader deliberately discards; see below)? seeing as we have identified two ways in which this way of computing EDM differs from the original publication and the two earlier implementations (in Perl and Lisp), i would like to suggest we formally coin this refinement of the metric EDM 2.0. regarding how to deal with missing graphs on either the gold or system side of the comparison: it appears the Lisp implementation of EDM provides a toggle *redwooods-score-all-p*, which selects between two modes of computing EDM over two sets of corresponding items, either on the intersection of items only; or on their union, treating gaps on either side of the comparison as empty graphs (thus, incurring recall or precision penalties). in practice, i believe we used to near-exclusively compute EDM over sets of items for which there was both a gold and a system graph. but that can of course only give comparable results when fixing that very set of items. thus, the setup of scoring 'all' items seems more general, robust to attempts at gaming, and in my view should be considered the default. finally, regarding variable properties in mtool: for the 2019 CoNLL shared task on meaning representation parsing (MRP 2019), we had agreed with other framework developers to keep morpho-semantic decorations out of the comparison. hence, the MRP 2019 graphs did not include tense, aspect, or number information from the full ERSs. but technically, i would consider that a property of the EDS used in MRP 2019, not a design decision in mtool. for the re-run of the MRP task at CoNLL 2020, we are currently preparing to throw these properties back into the mix (also in other frameworks, where annotations are available), which means the EDS reader in mtool in the near future will no longer discard (underlying) variable properties by default. best wishes, oe On Mon, Jan 20, 2020 at 2:15 AM goodman.m.w at gmail.com wrote: > > Thanks again, Bec. > > I just want to make sure my implementation gets the same scores for the same inputs under the same assumptions as the original implementation. For this to work, its behavior concerning the points I've sought clarification for should be intentional. In light of your responses, I've separated the CARG triples from other properties and have given it its own weight. Thus I should be able to get the same scores as your code by setting the weights of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add an option to ignore missing test items and otherwise treat them as mismatches. > > On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan wrote: >> >> >> >> On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com wrote: >>> >>> >>> One more detail is what to do when the two sides (gold and test) have different numbers of items. Currently my code stops as soon as either a gold or test item is missing, which is what smatch (the similar metric made for AMR) does, but I think that may be wrong because parsing profiles are likely to have missing or extra (overgeneration) items in the middle. So the question is whether we ignore it or count it as a full mismatch. >> >> >> If you are asking what is 'correct', I guess that depends on why you are evaluating. The perl implementation wouldn't have noticed missing gold parses, because it used the gold set as the definition of the set. A missing test item, on the other hand, by default counts as a full mismatch, but there is a command line option to ignore any gold parse with no corresponding test parse. The ignore option is useful when the purpose of the evaluation is assessing the system you are working on (and you consider coverage separately). For comparing across systems, I imagine you probably want to count parse failure as a full mismatch. It was useful for me to have both options. >> >> Bec >> >>> >>> >>> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan wrote: >>>> >>>> Wow, that is some old code... From memory, export was a wrapper around `parse --export`, where I could add :ltriples to the tsdb::*redwoods-export-values* set. >>>> >>>> I don't know the mtool code at all, but re-reading the paper and looking at the perl code, I don't think the original implementation evaluated CARG at all. We only checked that the correct character span had a pred name of`named`. >>>> >>>> I think you are right that the triple export at the time did not produce a triple for TOP and it hence would not have been counted. >>>> >>>> That match your memory Stephan? >>>> >>>> Bec >>>> >>>> >>>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com wrote: >>>>> >>>>> Hello developers, >>>>> >>>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I did not find an easy way to do it. I saw lisp code in the LKB's repository and Bec's Perl code, but I'm not sure how to call the former from the command line and the latter seems outdated (I don't see the "export" command required by its instructions). >>>>> >>>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd implement it on top of PyDelphin. The result is here: https://github.com/delph-in/delphin.edm. It requires the latest version of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text files or [incr tsdb()] profiles. >>>>> >>>>> When I nearly had my version working I found that Stephan et al.'s mtool (https://github.com/cfmrpThe paper example >>>>> /mtool) also had an implementation of EDM, so I used that to compare with my outputs (as I couldn't get the previous implementations to work). In this process I think I found some differences from Dridan & Oepen, 2011's description, and this email is to confirm those findings. Namely, that mtool's (and now my) implementation do the following: >>>>> >>>>> * CARGs are treated as property triples ("class 3 information"). Previously they were combined with the predicate name. This change means that predicates like 'named' will match even if their CARGs don't and the CARGs are a separate thing that needs to be matched. >>>>> >>>>> * The identification of the graph's TOP counts as a triple. >>>>> >>>>> One difference between mtool and delphin.edm is that mtool does not count "variable" properties from EDS, but that's just because its EDS parser does not yet handle them while PyDelphin's does. >>>>> >>>>> Can anyone familiar with EDM confirm the above? Or can anyone explain how to call the Perl or LKB code so I can compare? >>>>> >>>>> -- >>>>> -Michael Wayne Goodman >>> >>> >>> >>> -- >>> -Michael Wayne Goodman > > > > -- > -Michael Wayne Goodman -------------- next part -------------- A non-text attachment was scrubbed... Name: 21.gz Type: application/gzip Size: 271 bytes Desc: not available URL: From goodman.m.w at gmail.com Tue Jan 28 02:57:27 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Tue, 28 Jan 2020 09:57:27 +0800 Subject: [developers] EDM implementations In-Reply-To: References: Message-ID: Thanks for the reply, Stephan, > [...] it appears that both the > Lisp and Perl implementations of EDM do the same thing, viz. assume > that there can be at most one constant argument in a relation and > 'inline' its value (if present) with the predicate itself, e.g. > internally using node label shorthands like 'named(Abrams)'. in this > regard, i suspect bec and you actually may have arrived at the wrong > conclusion about historic behavior; Thanks for confirming how the Lisp implementation works. I took your 21.gz file and created a version that replaced "Abrams" with "Brown", then used edm_eval.pl to compare; it reports a full match (1.0), so based on this limited test I think Bec was correct about the Perl version. > thus, personally, i see no reason > for pyDelphin to provide a special-cased version of EDM that wholly > ignores constant arguments. Me too, and that's not the case. I separated CARGs into their own category and callers of the script can give the category a weight of zero to ignore them, which allows them to recreate the results of the Perl implementation. Otherwise, the default weight for all categories (arguments (-A), names/predicates (-N), morphosemantic properties (-P), constants (-C), and tops (-T)) is 1.0. > i would expect > your implementation and mtool should then come to the exact same > results (on EDSs stripped of MRS variable properties, [...]) Yes, but there's no need to strip the properties; just give the category a weight of zero. I've confirmed on a few test items that my implementation gets the exact same scores as mtool with -P0. Furthermore, I think the following option configurations for my re-implementation cover all current and historical use cases except for the inlined constants of the Lisp version, which interact with node names in a way that isn't reproducible with weights alone. * Perl: `delphin edm -C0 -T0 --ignore-missing=gold` * Perl with -i option: `delphin edm -C0 -T0 --ignore-missing=both` * Lisp where *redwooods-score-all-p* is true: `delphin edm` * Lisp where *redwooods-score-all-p* is false: `delphin edm --ignore-missing=both` * mtool (MRP 2019): `delphin edm -P0` * mtool (MRP 2020? or EDM 2.0): `delphin edm` On Tue, Jan 28, 2020 at 1:42 AM Stephan Oepen wrote: > hi mike, > > belatedly, thanks (once again) for pushing forward standardization! > and also my apologies for returning to this thread a little late! > > regarding EDM, i used to think of the Common-Lisp implementation > (which it appears i produced in early 2012, i.e. more recently than > the Perl version by bec) as the reference until recently. last year, > when comparing its scores to my re-implementation in Python as part of > mtool, that comparison also turned up the two questions you raised, > viz. the treatment of the TOP property and how to score parameterized > predicates. > > regarding the first, this appears to be one of the better-kept secrets > in meaning representation comparison: in my view, it is a semantically > highly relevant property (marking the contrast between e.g. 'all > fierce dogs bark' vs. 'all barking dogs are fierce'), but neither the > original EDM paper nor its derivative in the AMR world (Cai & Knight, > 2013) discuss it. yet, both the Lisp implementation of EDM and SMATCH > seem to always have scored the TOP node as an additional tuple > (counted among the 'argument' tuples for EDM, while considered among > the 'attribute' tuples in SMATCH). the Perl implementation of EDM, on > the other hand, worked off my 'ltriples' export format for EDS, which > appears to not include a separate TOP tuple. > > i confirmed the nature of those triples by reminding myself of what > became of the 'export' script mentioned in the original EDM wiki notes > you had found. it was folded into the LOGON 'redwoods' script, so > something like the following actually works today to prepare the input > for the Perl implementation of EDM: > > $LOGONROOT/redwoods --erg --export ltriples --target /tmp mrs > > i attach the output for item #21 from the MRS test suite, for > reference. so, i agree with the conclusion bec and you have already > reached: the original Perl implementation of EDM did not consider TOP > tuples. the Lisp implementation, on the other hand, appears to have > had TOP tuples from its very beginning. > > regarding the second design choice you raise, parameterized relations > (involving one or more constant arguments), it appears that both the > Lisp and Perl implementations of EDM do the same thing, viz. assume > that there can be at most one constant argument in a relation and > 'inline' its value (if present) with the predicate itself, e.g. > internally using node label shorthands like 'named(Abrams)'. in this > regard, i suspect bec and you actually may have arrived at the wrong > conclusion about historic behavior; thus, personally, i see no reason > for pyDelphin to provide a special-cased version of EDM that wholly > ignores constant arguments. > > looking at this particular design choice today, however, it seems too > limiting an assumption and meshing together two things that arguably > should be considered separate. even though ERG versions for the past > 15 or more years have not used predicates with multiple (constant) > parameters, there would be nothing wrong with representing, say, the > fraction '2/3' as involving two constant arguments, e.g. something > like fraction [ CARG1 "2", CARG2 "3" ]. this is, for example, what > AMR does for complex proper names. > > thus, even though our two historic EDM implementations appear to agree > on the 'inlining' treatment of constant arguments, i would be prepared > to argue that CARG et al. values should rather be treated as separate > node properties, i.e. for the above example the 'named' predicate and > the 'CARG' == 'Abrams' value should be treated as two distinct tuples. > in part for cross-framework compatibility, this is what we ended up > doing in mtool, including in its re-implementation of EDM, see: > > http://mrp.nlpl.eu/index.php?page=5 > > in summary, it sounds as if your EDM re-implementation, mike, had > arrived at the same conclusions: TOP tuples should be scored, and > constant arguments considered as separate properties. i would expect > your implementation and mtool should then come to the exact same > results (on EDSs stripped of MRS variable properties, which the > current mtool EDS reader deliberately discards; see below)? seeing as > we have identified two ways in which this way of computing EDM differs > from the original publication and the two earlier implementations (in > Perl and Lisp), i would like to suggest we formally coin this > refinement of the metric EDM 2.0. > > regarding how to deal with missing graphs on either the gold or system > side of the comparison: it appears the Lisp implementation of EDM > provides a toggle *redwooods-score-all-p*, which selects between two > modes of computing EDM over two sets of corresponding items, either on > the intersection of items only; or on their union, treating gaps on > either side of the comparison as empty graphs (thus, incurring recall > or precision penalties). in practice, i believe we used to > near-exclusively compute EDM over sets of items for which there was > both a gold and a system graph. but that can of course only give > comparable results when fixing that very set of items. thus, the > setup of scoring 'all' items seems more general, robust to attempts at > gaming, and in my view should be considered the default. > > finally, regarding variable properties in mtool: for the 2019 CoNLL > shared task on meaning representation parsing (MRP 2019), we had > agreed with other framework developers to keep morpho-semantic > decorations out of the comparison. hence, the MRP 2019 graphs did not > include tense, aspect, or number information from the full ERSs. but > technically, i would consider that a property of the EDS used in MRP > 2019, not a design decision in mtool. for the re-run of the MRP task > at CoNLL 2020, we are currently preparing to throw these properties > back into the mix (also in other frameworks, where annotations are > available), which means the EDS reader in mtool in the near future > will no longer discard (underlying) variable properties by default. > > best wishes, oe > > > > > > On Mon, Jan 20, 2020 at 2:15 AM goodman.m.w at gmail.com > wrote: > > > > Thanks again, Bec. > > > > I just want to make sure my implementation gets the same scores for the > same inputs under the same assumptions as the original implementation. For > this to work, its behavior concerning the points I've sought clarification > for should be intentional. In light of your responses, I've separated the > CARG triples from other properties and have given it its own weight. Thus I > should be able to get the same scores as your code by setting the weights > of CARGs (but not properties) and graph-tops to zero. Similarly, I'll add > an option to ignore missing test items and otherwise treat them as > mismatches. > > > > On Fri, Jan 17, 2020 at 6:14 PM Bec Dridan wrote: > >> > >> > >> > >> On Fri, Jan 17, 2020 at 5:39 PM goodman.m.w at gmail.com < > goodman.m.w at gmail.com> wrote: > >>> > >>> > >>> One more detail is what to do when the two sides (gold and test) have > different numbers of items. Currently my code stops as soon as either a > gold or test item is missing, which is what smatch (the similar metric made > for AMR) does, but I think that may be wrong because parsing profiles are > likely to have missing or extra (overgeneration) items in the middle. So > the question is whether we ignore it or count it as a full mismatch. > >> > >> > >> If you are asking what is 'correct', I guess that depends on why you > are evaluating. The perl implementation wouldn't have noticed missing gold > parses, because it used the gold set as the definition of the set. A > missing test item, on the other hand, by default counts as a full mismatch, > but there is a command line option to ignore any gold parse with no > corresponding test parse. The ignore option is useful when the purpose of > the evaluation is assessing the system you are working on (and you consider > coverage separately). For comparing across systems, I imagine you probably > want to count parse failure as a full mismatch. It was useful for me to > have both options. > >> > >> Bec > >> > >>> > >>> > >>> On Thu, Jan 16, 2020 at 6:33 PM Bec Dridan > wrote: > >>>> > >>>> Wow, that is some old code... From memory, export was a wrapper > around `parse --export`, where I could add :ltriples to the > tsdb::*redwoods-export-values* set. > >>>> > >>>> I don't know the mtool code at all, but re-reading the paper and > looking at the perl code, I don't think the original implementation > evaluated CARG at all. We only checked that the correct character span had > a pred name of`named`. > >>>> > >>>> I think you are right that the triple export at the time did not > produce a triple for TOP and it hence would not have been counted. > >>>> > >>>> That match your memory Stephan? > >>>> > >>>> Bec > >>>> > >>>> > >>>> On Thu, Jan 16, 2020 at 8:34 PM goodman.m.w at gmail.com < > goodman.m.w at gmail.com> wrote: > >>>>> > >>>>> Hello developers, > >>>>> > >>>>> Recently I wanted to try out Elementary Dependency Match (EDM) but I > did not find an easy way to do it. I saw lisp code in the LKB's repository > and Bec's Perl code, but I'm not sure how to call the former from the > command line and the latter seems outdated (I don't see the "export" > command required by its instructions). > >>>>> > >>>>> The Dridan & Oepen, 2011 algorithm was simple enough so I though I'd > implement it on top of PyDelphin. The result is here: > https://github.com/delph-in/delphin.edm. It requires the latest version > of PyDelphin (v1.2.0). It works with MRS, EDS, and DMRS, and it reads text > files or [incr tsdb()] profiles. > >>>>> > >>>>> When I nearly had my version working I found that Stephan et al.'s > mtool (https://github.com/cfmrpThe paper example > >>>>> /mtool) also had an implementation of EDM, so I used that to compare > with my outputs (as I couldn't get the previous implementations to work). > In this process I think I found some differences from Dridan & Oepen, > 2011's description, and this email is to confirm those findings. Namely, > that mtool's (and now my) implementation do the following: > >>>>> > >>>>> * CARGs are treated as property triples ("class 3 information"). > Previously they were combined with the predicate name. This change means > that predicates like 'named' will match even if their CARGs don't and the > CARGs are a separate thing that needs to be matched. > >>>>> > >>>>> * The identification of the graph's TOP counts as a triple. > >>>>> > >>>>> One difference between mtool and delphin.edm is that mtool does not > count "variable" properties from EDS, but that's just because its EDS > parser does not yet handle them while PyDelphin's does. > >>>>> > >>>>> Can anyone familiar with EDM confirm the above? Or can anyone > explain how to call the Perl or LKB code so I can compare? > >>>>> > >>>>> -- > >>>>> -Michael Wayne Goodman > >>> > >>> > >>> > >>> -- > >>> -Michael Wayne Goodman > > > > > > > > -- > > -Michael Wayne Goodman > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweaglesw at sweaglesw.org Wed Feb 5 01:35:49 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Tue, 4 Feb 2020 16:35:49 -0800 Subject: [developers] character-based discriminants In-Reply-To: References: Message-ID: <5E3A0DE5.4020502@sweaglesw.org> Stephan and Dan, and other interested parties, Happy new year to you all. In the course of taking a closer look at how the proposed character-based discriminant system might work, I've run across a few cases that perhaps would benefit from a bit of discussion. First, my attempt to distill the proposed action plan for an automatic update (downdate?) of the ERG treebanks to the venerable PTB punctuation convention is as follows: 1. Modify ACE and other engines to use input character positions as token vertex identifiers, so that data coming out -- particularly the full forest record in the "edge" relation -- uses these to identify constituent boundaries instead of the existing identifiers (corresponding roughly to whitespace areas). 2. Mechanically revise a copy of the "decisions" relation from the old gold treebank so that the vertex identifiers in it are also character-based, in hopes of matching those used in the new full forest profiles. Destroy any discriminants that are judged unlikely to match correctly. 3. Run an automatic treebank update to achieve a high coverage gold treebank under the new punctuation convention; manually fix any items that didn't quite make it. Stephan pointed out that the +FROM/+TO values on token AVMs are a way to convert existing vertices to character positions. Thinking a bit more closely about this, there is at least one obvious problem: adjacent tokens T1,T2 do not generally have the property that T1.+TO = T2.+FROM, because there is usually whitespace between them. Therefore the revised scheme will have the property that whitespace adjacent to a constituent will in a sense be considered part of the constituent in some cases. I consider that slightly weird, but perhaps not too big a deal. The main thing is we need to pick a convention as to which position in the whitespace is to be considered the label of the vertex. One candidate convention would be that for any given vertex, its character-based label is the smallest +FROM value of any token starting from it, if any, and if no token starts at it, then the largest +TO value of any token ending at it. I would expect that at least in ordinary cases, possibly all cases, all the incident +FROMs would be identical and all the +TOs would be identical also, just with a difference between the +FROMs and +TOs. A somewhat more troubling problem is that multiple token vertices in the ERG can share the same +FROM and +TO. This happens quite productively with hyphenation, e.g.: A four-footed zebra arose. The historical ERG assigns [ +FROM "2" +TO "13" ] to both "four" and "footed" even while the token lattice is split in the middle, i.e. there are two tokens and there is a vertex "in between" them, but there is no sensible character offset available to assign to it. In the existing vertex labeling scheme, the vertex labels are generated based on a topological sort of the lattice, so we get: a(0,1) four(1,2) footed(2,3) zebra(3,4) arose(4,5) Using the convention proposed above, this would translate into: a(0,3) four(3,3) footed(3,14) zebra(14,20) arose(20,26) As you can see, there is a problem: two distinct vertices got smushed into character position 3. The situation is detectable automatically, of course, and ACE actually already has a built-in hack to adjust token +FROM and +TO in this case (making it possible to use the mouse to select parts of a hyphenated group like that in FFTB), but relying on that hack means hoping that ACE made the same decisions as the new punctuation rules in this case and any others that I haven't thought of. I am tempted to look at an alternative way of achieving the primary goal (i.e. synchronizing the ERG treebanks to the revised punctuation scheme). It would I believe be possible, maybe even straightforward, to make a tool that takes as input two token lattices (the old one and the new one for the same sentence) and computes an alignment between them that minimizes some notion of edit distance. With that in hand, the vertex identifiers of the old discriminants could be rewritten without resorting to character positions or having to solve the above snafu. It also would require no changes to the parsing engines or the treebanking tool, and would likely be at least partially reusable for future tokenization changes. Any suggestions? Woodley On 11/24/2019 03:43 PM, Stephan Oepen wrote: > many thanks for the quick follow-up, woodley! > > in general, character-based discriminants feel attractive because the idea > promises increased robustness to variation over time in tokenization. and > i am not sure yet i understand the difference in expressivity that you > suggest? an input to parsing is segmented into a sequence of vertices (or > breaking points); whether to number these continuously (0, 1, 2, ?) or > discontinuously according to e.g. corresponding character positions or time > stamps (into a speech signal)?i would think i can encode the same broad > range of lattices either way? > > closer to home, i was in fact thinking that the conversion from an existing > set of discriminants to a character-based regime could in fact be more > mechanic than the retooling you sketch. each current vertex should be > uniquely identified with a left and right character position, viz. the > +FROM and +TO values, respectively, on the underlying token feature > structures (i am assuming that all tokens in one cell share the same > values). for the vast majority of discriminants, would it not just work to > replace their start and end vertices with these characters positions? > > i am prepared to lose some discriminants, e.g. any choices on the > punctuation lexical rules that are being removed, but possibly also some > lexical choices that in the old universe end up anchored to a sub-string > including one or more punctuation marks. in the 500-best treebanks, it > used to be the case that pervasive redundancy of discriminants meant one > could afford to lose a non-trivial number of discriminants during an update > and still arrive at a unique solution. but maybe that works differently in > the full-forest universe? > > finally, i had not yet considered the ?twigs? (as they are an FFTB-specific > innovation). yes, it would seem unfortunate to just lose all twigs that > included one or more of the old punctuation rules! so your candidate > strategy of cutting twigs into two parts (of which one might often come out > empty) at occurrences of these rules strikes me as a promising (still quite > mechanic) way of working around this problem. formally, breaking up twigs > risks losing some information, but in this case i doubt this would be the > case in actuality. > > thanks for tossing around this idea! oe > > > On Sat, 23 Nov 2019 at 20:30 Woodley Packard > wrote: > >> Hi Stephan, >> >> My initial reaction to the notion of character-based discriminants is (1) >> it will not solve your immediate problem without a certain amount of custom >> tooling to convert old discriminants to new ones in a way that is sensitive >> to how the current punctuation rules work, i.e. a given chart vertex will >> have to be able to map to several different character positions depending >> on how much punctuation has been cliticized so far. The twig-shaped >> discriminants used by FFTB will in some cases have to be bifurcated into >> two or more discriminants, as well. Also, (2) this approach loses the >> (theoretical if perhaps not recently used) ability to treebank a nonlinear >> lattice shaped input, e.g. from an ASR system. I could imagine treebanking >> lattices from other sources as well ? perhaps an image caption generator. >> >> Given the custom tooling required for updating the discriminants, I?m not >> sure switching to character-based anchoring would be less painful than >> having that tool compute the new chart vertex anchoring instead ? though I >> could be wrong. What other arguments can be made in favor of >> character-based discriminants? >> >> In terms of support from FFTB, I think there are relatively few places in >> the code that assume the discriminants? from/to are interpretable beyond >> matching the from/to values of the `edge? relation. I think I would >> implement this by (optionally, I suppose, since presumably other grammars >> won?t want to do this at least for now) replacing the from/to on edges read >> from the profile with character positions and more or less pretend that >> there is a chart vertex for every character position. Barring unforeseen >> complications, that wouldn?t be too hard. >> >> Woodley >> >>> On Nov 23, 2019, at 5:58 AM, Stephan Oepen wrote: >>> >>> hi again, woodley, >>> >>> dan and i are currently exploring a 'makeover' of ERG input >>> processing, with the overall goal of increased compatibility with >>> mainstream assumptions about tokenization. >>> >>> among other things, we would like to move to the revised (i.e. >>> non-venerable) PTB (and OntoNotes and UD) tokenization conventions and >>> avoid subsequent re-arranging of segmentation in token mapping. this >>> means we would have to move away from the pseudo-affixation treatment >>> of punctuation marks to a 'pseudo-clitization' approach, meaning that >>> punctuation marks are lexical entries in their own right and attach >>> via binary constructions (rather than as lexical rules). the 'clitic' >>> metaphor, here, is intended to suggest that these lexical entries can >>> only attach at the bottom of the derivation, i.e. to non-clitic >>> lexical items immediately to their left (e.g. in the case of a comma) >>> or to their right (in the case of, say, an opening quote or >>> parenthesis). >>> >>> dan is currently visiting oslo, and we would like to use the >>> opportunity to estimate the cost of moving to such a revised universe. >>> treebank maintenance is a major concern here, as such a radical change >>> in the yields of virtually all derivations would render discriminants >>> invalid when updating to the new forests. i believe a cute idea has >>> emerged that, we optimistically believe, might eliminate much of that >>> concern: character-based discriminant positions, instead of our >>> venerable way of counting chart vertices. >>> >>> for the ERG at least, we believe that leaf nodes in all derivations >>> are reliably annotated with character start and end positions (+FROM >>> and +TO, as well as the +ID lists on token feature structures). these >>> sub-string indices will hardly be affected by the above change to >>> tokenization (except for cases where our current approach to splitting >>> at hyphens and slashes first in token mapping leads to overlapping >>> ranges). hence if discriminants were anchored over character ranges >>> instead of chart cells ... i expect the vast majority of them might >>> just carry over? >>> >>> we would be grateful if you (and others too, of course) could give the >>> above idea some critical thought and look for possible obstacles that >>> dan and i may just be overlooking? technically, i imagine one would >>> have to extend FFTB to (optionally) extract discriminant start and end >>> positions from the sub-string 'coverage' of each constituent, possibly >>> once convert existing treebanks to character-based indexing, and then >>> update into the new universe using character-based matching. does >>> such an approach seem feasible to you in principle? >>> >>> cheers, oe >> From arademaker at gmail.com Thu Feb 20 22:16:27 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Thu, 20 Feb 2020 18:16:27 -0300 Subject: [developers] Acetools for MacOS Message-ID: <8F57BDB5-44AA-4755-AC3D-5A124F6567C5@gmail.com> Hi Woodley, Any change to have the ace tools for MacOS? http://sweaglesw.org/linguistics/acetools/ In particular, the ART 0.1.9 MacOS binary does not run on Catalina: http://sweaglesw.org/linguistics/libtsdb/art.html Best, Alexandre From sweaglesw at sweaglesw.org Sat Feb 22 20:49:34 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Sat, 22 Feb 2020 11:49:34 -0800 Subject: [developers] Acetools for MacOS In-Reply-To: <8F57BDB5-44AA-4755-AC3D-5A124F6567C5@gmail.com> References: <8F57BDB5-44AA-4755-AC3D-5A124F6567C5@gmail.com> Message-ID: <8F9A60DF-A7B3-4E6C-8822-E8FD555A4456@sweaglesw.org> Hi Alexandre, I am using OSX Catalina 10.15.2. I just downloaded the art 0.1.9 binary and ran it successfully. On my first attempt I got the following error: $ ./art -a "~/cdev/ace/ace -g ~/cdev/ace/erg-1214.dat -1" mrs zcat: can't stat: mrs/item.gz (mrs/item.gz.Z): No such file or directory I'm not sure if this is a difference with previous versions of OSX, but what's happening here is that art is trying to decompress my zipped profile, and it expected zcat to support .gz extensions, but the zcat program it found didn't. An easy workaround was to manually decompress the profile before processing it, e.g.: $ gunzip mrs/*.gz After that, everything went through without any issues. I was a little bit surprised to see this, because I had seen it in the past and thought I had made the MacOS binaries use "gzcat" instead of "zcat", but apparently not, at least for this particular release. Was that the problem you ran into, or was it something more sinister? Regards, Woodley > On Feb 20, 2020, at 1:16 PM, Alexandre Rademaker wrote: > > > Hi Woodley, > > Any change to have the ace tools for MacOS? > > http://sweaglesw.org/linguistics/acetools/ > > > In particular, the ART 0.1.9 MacOS binary does not run on Catalina: > > http://sweaglesw.org/linguistics/libtsdb/art.html > > > Best, > Alexandre > From ebender at uw.edu Mon Feb 24 18:45:33 2020 From: ebender at uw.edu (Emily M. Bender) Date: Mon, 24 Feb 2020 09:45:33 -0800 Subject: [developers] Edge can be built interactively, but isn't in the chart Message-ID: Dear all, [Cross-posted to developers and the delphinqa.] After 16 years of teaching grammar engineering, I thought I'd found all of the ways in which one can be in the situation of seemingly being able to build an edge through interactive unification which isn't in the chart. I've documented all of the ones I know about here: http://moin.delph-in.net/GeFaqUnifySurprise Alas, I've found evidence of a new one. Or rather: I'm in that situation (together with a student) but none of the cases noted there apply. More specifically, with the grammar for Meithei [mni] that can be found here: http://faculty.washington.edu/ebender/mni-debug.tgz If we try to analyze this sentence: yo?-si? t?m-? ??-? monkey-PL sleep-NHYP eat-NHYP Monkeys sleep and eat. The LKB and ace both return no parses found. If instead of using the two verbs (one intransitive and one transitive but with a dropped object), we repeat either one of the verbs, we get the expected parses (with both the LKB and ace). yo?-si? ??-? ??-? yo?-si? t?m-? t?m-? Returning to the non-parsing sentence, and looking at the LKB's parse chart, what's missing is the VP-T built out of applying the VP1-TOP-COORD rule to the VP-B over ??-? and the VP over t?m-?. Puzzlingly, I can build this edge interactively just fine. I've run out of guesses as to why it's not showing up in the char and so I thought I'd put this puzzle out in case other DELPH-INites might be entertained by it. Curiously, Emily p.s. Discourse directed me to an earlier discussion about this, where @johnca suggested checking that *chart-packing-p* is set to NIL. It is. -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Mon Feb 24 19:32:48 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Mon, 24 Feb 2020 19:32:48 +0100 Subject: [developers] Edge can be built interactively, but isn't in the chart In-Reply-To: References: Message-ID: from memory, i believe the chart display shows edges with a non-empty orthographemic todo list, i.e. a remaining need to pass through a lexical rule with an associated orthographemic effect. this property of edges is not visible in the interface, and interactive unification may not be paying attention to it. upon completion of lexical parsing, only edges with an empty todo list can go on and feed into syntax rules, so this filter that is applied by the parser might explain seeming misalignment between the interactive mode and what actually happens during parsing. just a wild guess :-), oe On Mon, 24 Feb 2020 at 18:49 Emily M. Bender wrote: > Dear all, > > [Cross-posted to developers and the delphinqa.] > > After 16 years of teaching grammar engineering, I thought I'd found all of > the ways in which one can be in the situation of seemingly being able to > build an edge through interactive unification which isn't in the chart. > I've documented all of the ones I know about here: > > http://moin.delph-in.net/GeFaqUnifySurprise > > Alas, I've found evidence of a new one. Or rather: I'm in that situation > (together with a student) but none of the cases noted there apply. More > specifically, with the grammar for Meithei [mni] that can be found here: > > http://faculty.washington.edu/ebender/mni-debug.tgz > > If we try to analyze this sentence: > > yo?-si? t?m-? ??-? > monkey-PL sleep-NHYP eat-NHYP > Monkeys sleep and eat. > > The LKB and ace both return no parses found. If instead of using the two > verbs (one intransitive and one transitive but with a dropped object), we > repeat either one of the verbs, we get the expected parses (with both the > LKB and ace). > > yo?-si? ??-? ??-? > yo?-si? t?m-? t?m-? > > Returning to the non-parsing sentence, and looking at the LKB's parse > chart, what's missing is the VP-T built out of applying the VP1-TOP-COORD > rule to the VP-B over ??-? and the VP over t?m-?. Puzzlingly, I can build > this edge interactively just fine. I've run out of guesses as to why it's > not showing up in the char and so I thought I'd put this puzzle out in case > other DELPH-INites might be entertained by it. > > Curiously, > Emily > > p.s. Discourse directed me to an earlier discussion about this, where > @johnca suggested checking that *chart-packing-p* is set to NIL. It is. > > > -- > Emily M. Bender (she/her) > Howard and Frances Nostrand Endowed Professor > Department of Linguistics > Faculty Director, CLMS > University of Washington > Twitter: @emilymbender > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Mon Feb 24 19:37:16 2020 From: ebender at uw.edu (Emily M. Bender) Date: Mon, 24 Feb 2020 10:37:16 -0800 Subject: [developers] Edge can be built interactively, but isn't in the chart In-Reply-To: References: Message-ID: Yes -- that is one of the known cases. However, it's not what's going on here: The daughters of the missing edge can be used to create the analogous edge in the sentences with the same verb twice. (And in one case, the daughter is already the product of a syntax rule.) Thank you for the guess though! I'm hoping some such guess will put me on the right path... Emily On Mon, Feb 24, 2020 at 10:34 AM Stephan Oepen wrote: > from memory, i believe the chart display shows edges with a non-empty > orthographemic todo list, i.e. a remaining need to pass through a lexical > rule with an associated orthographemic effect. this property of edges is > not visible in the interface, and interactive unification may not be paying > attention to it. upon completion of lexical parsing, only edges with an > empty todo list can go on and feed into syntax rules, so this filter that > is applied by the parser might explain seeming misalignment between the > interactive mode and what actually happens during parsing. > > just a wild guess :-), oe > > > On Mon, 24 Feb 2020 at 18:49 Emily M. Bender wrote: > >> Dear all, >> >> [Cross-posted to developers and the delphinqa.] >> >> After 16 years of teaching grammar engineering, I thought I'd found all >> of the ways in which one can be in the situation of seemingly being able to >> build an edge through interactive unification which isn't in the chart. >> I've documented all of the ones I know about here: >> >> http://moin.delph-in.net/GeFaqUnifySurprise >> >> Alas, I've found evidence of a new one. Or rather: I'm in that situation >> (together with a student) but none of the cases noted there apply. More >> specifically, with the grammar for Meithei [mni] that can be found here: >> >> http://faculty.washington.edu/ebender/mni-debug.tgz >> >> If we try to analyze this sentence: >> >> yo?-si? t?m-? ??-? >> monkey-PL sleep-NHYP eat-NHYP >> Monkeys sleep and eat. >> >> The LKB and ace both return no parses found. If instead of using the two >> verbs (one intransitive and one transitive but with a dropped object), we >> repeat either one of the verbs, we get the expected parses (with both the >> LKB and ace). >> >> yo?-si? ??-? ??-? >> yo?-si? t?m-? t?m-? >> >> Returning to the non-parsing sentence, and looking at the LKB's parse >> chart, what's missing is the VP-T built out of applying the VP1-TOP-COORD >> rule to the VP-B over ??-? and the VP over t?m-?. Puzzlingly, I can build >> this edge interactively just fine. I've run out of guesses as to why it's >> not showing up in the char and so I thought I'd put this puzzle out in case >> other DELPH-INites might be entertained by it. >> >> Curiously, >> Emily >> >> p.s. Discourse directed me to an earlier discussion about this, where >> @johnca suggested checking that *chart-packing-p* is set to NIL. It is. >> >> >> -- >> Emily M. Bender (she/her) >> Howard and Frances Nostrand Endowed Professor >> Department of Linguistics >> Faculty Director, CLMS >> University of Washington >> Twitter: @emilymbender >> > -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Wed Feb 26 14:02:28 2020 From: bond at ieee.org (Francis Bond) Date: Wed, 26 Feb 2020 21:02:28 +0800 Subject: [developers] Searching treebanks Message-ID: G'day, does anyone know of any way to search Redwoods (or DELPHIN treebanks in general) for trees of a certain type (using something like the Fangorn interface). For example, I want to find how often in the treebank 'start' is intransitive vs NP V VP-ving vs NP V VP-to vs NP V VP NP (I start; I start lecturing; I start to lecture; I start a lecture). In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ... I would be ecstatic if there were an online search I can point my students at, but would be interested in anything. -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Wed Feb 26 14:28:32 2020 From: bond at ieee.org (Francis Bond) Date: Wed, 26 Feb 2020 21:28:32 +0800 Subject: [developers] Searching treebanks In-Reply-To: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no> References: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no> Message-ID: Thanks for the tip. If only we all sensibly annotated our corpora with typecraft. On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan wrote: > Hi Francis, > > For Norwegian you can do such things through > https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of about > 20,000 sentences. > > > (Not right on your mark, but perhaps not too far from the sphere of > "anything" ...) > > > Best > > Lars > ------------------------------ > *From:* developers-bounces at emmtee.net on > behalf of Francis Bond > *Sent:* Wednesday, February 26, 2020 2:02:28 PM > *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy > Baldwin > *Subject:* [developers] Searching treebanks > > G'day, > > does anyone know of any way to search Redwoods (or DELPHIN treebanks in > general) for trees of a certain type (using something like the Fangorn > interface). For example, I want to find how often in the treebank 'start' > is intransitive vs NP V VP-ving vs NP V VP-to vs NP V VP NP (I start; I > start lecturing; I start to lecture; I start a lecture). > > In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ... > > I would be ecstatic if there were an online search I can point my students > at, but would be interested in anything. > > > > -- > Francis Bond > Division of Linguistics and Multilingual Studies > Nanyang Technological University > -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From lars.hellan at ntnu.no Wed Feb 26 14:21:43 2020 From: lars.hellan at ntnu.no (Lars Hellan) Date: Wed, 26 Feb 2020 13:21:43 +0000 Subject: [developers] Searching treebanks In-Reply-To: References: Message-ID: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no> Hi Francis, For Norwegian you can do such things through https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of about 20,000 sentences. (Not right on your mark, but perhaps not too far from the sphere of "anything" ...) Best Lars ________________________________ From: developers-bounces at emmtee.net on behalf of Francis Bond Sent: Wednesday, February 26, 2020 2:02:28 PM To: Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy Baldwin Subject: [developers] Searching treebanks G'day, does anyone know of any way to search Redwoods (or DELPHIN treebanks in general) for trees of a certain type (using something like the Fangorn interface). For example, I want to find how often in the treebank 'start' is intransitive vs NP V VP-ving vs NP V VP-to vs NP V VP NP (I start; I start lecturing; I start to lecture; I start a lecture). In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ... I would be ecstatic if there were an online search I can point my students at, but would be interested in anything. -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Wed Feb 26 15:04:36 2020 From: ebender at uw.edu (Emily M. Bender) Date: Wed, 26 Feb 2020 06:04:36 -0800 Subject: [developers] Searching treebanks In-Reply-To: References: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no> Message-ID: For search over semantic representations (MRS, DM, EDS) there's WeSearch: http://wesearch.delph-in.net/ ... which indexes DeepBank and WikiWoods. Emily On Wed, Feb 26, 2020 at 5:29 AM Francis Bond wrote: > Thanks for the tip. If only we all sensibly annotated our corpora with > typecraft. > > On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan wrote: > >> Hi Francis, >> >> For Norwegian you can do such things through >> https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of >> about 20,000 sentences. >> >> >> (Not right on your mark, but perhaps not too far from the sphere of >> "anything" ...) >> >> >> Best >> >> Lars >> ------------------------------ >> *From:* developers-bounces at emmtee.net on >> behalf of Francis Bond >> *Sent:* Wednesday, February 26, 2020 2:02:28 PM >> *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy >> Baldwin >> *Subject:* [developers] Searching treebanks >> >> G'day, >> >> does anyone know of any way to search Redwoods (or DELPHIN treebanks in >> general) for trees of a certain type (using something like the Fangorn >> interface). For example, I want to find how often in the treebank 'start' >> is intransitive vs NP V VP-ving vs NP V VP-to vs NP V VP NP (I start; I >> start lecturing; I start to lecture; I start a lecture). >> >> In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ... >> >> I would be ecstatic if there were an online search I can point my >> students at, but would be interested in anything. >> >> >> >> -- >> Francis Bond >> Division of Linguistics and Multilingual Studies >> Nanyang Technological University >> > > > -- > Francis Bond > Division of Linguistics and Multilingual Studies > Nanyang Technological University > -- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- An HTML attachment was scrubbed... URL: From tuananh.ke at gmail.com Thu Feb 27 08:46:08 2020 From: tuananh.ke at gmail.com (=?UTF-8?B?VHXhuqVuIEFuaCBMw6o=?=) Date: Thu, 27 Feb 2020 15:46:08 +0800 Subject: [developers] Options to extract syntax trees from FFTB Message-ID: Hi everyone, We are trying to use FFTB to tree bank a small corpus and we would like to extract the chosen syntax trees from the corpus. The expected output would be something like It works --> ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V" ("works"))))) Is there a way to extract this from the FFTB profile? Currently I'm selecting the trees manually by parsing the sentences using ACE with the options --report-label and then split the output string with " ; " but I'm not sure if this is the best approach. [erg-trunk]$ ace -g erg-0.9.26.dat --report-label It works SENT: It works [ LTOP: h0 INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] RELS: < [ pron<0:2> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg GEND: n PT: std ] ] [ pronoun_q<0:2> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ] [ _work_v_1<3:8> LBL: h1 ARG0: e2 ARG1: x3 ARG2: i8 ] > HCONS: < h0 qeq h1 h6 qeq h4 > ICONS: < > ] ; ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V" ("works"))))) NOTE: 1 readings, added 391 / 68 edges to chart (27 fully instantiated, 35 actives used, 18 passives used) RAM: 1880k Thank you -- Tuan Anh -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Thu Feb 27 19:37:01 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 27 Feb 2020 19:37:01 +0100 Subject: [developers] Options to extract syntax trees from FFTB In-Reply-To: References: Message-ID: hi tu?n anh, from what i recall about how FFTB writes tsdb(1) profiles, this should be easy: once treebanking is complete, the ?result? relation should contain one entry per item for each active derivation, typically one after full disambiguation. the ?derivation? field will always be there, but i am not quite sure whether FFTB writes the ?tree? (labeled phrase structure) and ?mrs? fields? you should be able to observe that in your profiles. if not, the LOGON ?redwoods? script can recreate labeled trees for each derivation, using a command roughly like the following: $LOGONROOT/redwoods ?terg ?export tree ?target /tmp best wishes, oe On Thu, 27 Feb 2020 at 08:48 Tu?n Anh L? wrote: > Hi everyone, > > We are trying to use FFTB to tree bank a small corpus and we would like to > extract the chosen syntax trees from the corpus. The expected output would > be something like > > It works --> ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V" ("works"))))) > > Is there a way to extract this from the FFTB profile? > > Currently I'm selecting the trees manually by parsing the sentences using > ACE with the options --report-label and then split the output string with " > ; " but I'm not sure if this is the best approach. > > [erg-trunk]$ ace -g erg-0.9.26.dat --report-label > It works > SENT: It works > [ LTOP: h0 INDEX: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - > PERF: - ] RELS: < [ pron<0:2> LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg GEND: n > PT: std ] ] [ pronoun_q<0:2> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ] [ > _work_v_1<3:8> LBL: h1 ARG0: e2 ARG1: x3 ARG2: i8 ] > HCONS: < h0 qeq h1 h6 > qeq h4 > ICONS: < > ] ; ("S" ("NP" ("NP" ("it"))) ("VP" ("V" ("V" > ("works"))))) > NOTE: 1 readings, added 391 / 68 edges to chart (27 fully instantiated, 35 > actives used, 18 passives used) RAM: 1880k > > Thank you > -- > Tuan Anh > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danf at stanford.edu Mon Mar 23 17:55:10 2020 From: danf at stanford.edu (Dan Flickinger) Date: Mon, 23 Mar 2020 16:55:10 +0000 Subject: [developers] character-based discriminants In-Reply-To: <5E3A0DE5.4020502@sweaglesw.org> References: , <5E3A0DE5.4020502@sweaglesw.org> Message-ID: Hi Woodley and Stephan, [and with apologies to everyone else for the cryptic flavor of this note, which has to do with a conversion of the ERG to treat punctuation marks as separate tokens, for better interoperability with the rest of the universe] I was able to use the converted `decision' files that you constructed during my visit in February, Woodley, with some non-zero additional manual disambiguation, and this morning I completed updating of the full set of 2018 gold trees into the makeover universe, including wsj00-04. I would now be grateful if you could also provide converted decision files for the wsj05-12 profiles that had also been updated with the 2018 grammar after it was released. Since the 2018mo grammar doesn't really have a natural home in SVN, I have put a full copy of it here, and included in its tsdb/gold directory both the recent updated profiles, and the 2018 ones for wsj05-wsj12 that I hope you'll convert: http://lingo.stanford.edu/danf/2018mo.tgz My intention is to now update these gold profiles from that time-warped 2018mo grammar to the SVN `mo' grammar (which we branched from `trunk' during my visit to Oslo in November). If all goes well, we should then be in position to anoint `mo' as the official new `trunk' version, and use this as the basis for the next stable ERG release, ideally this summer. I would also be interested to know if these now-manually-updated profiles allow you to train a better disambiguation model than the one you trained in February just on the automatically updated items. Thanks for the help so far! Dan ________________________________ From: developers-bounces at emmtee.net on behalf of Woodley Packard Sent: Tuesday, February 4, 2020 4:35 PM To: Stephan Oepen Cc: developers at delph-in.net Subject: Re: [developers] character-based discriminants Stephan and Dan, and other interested parties, Happy new year to you all. In the course of taking a closer look at how the proposed character-based discriminant system might work, I've run across a few cases that perhaps would benefit from a bit of discussion. First, my attempt to distill the proposed action plan for an automatic update (downdate?) of the ERG treebanks to the venerable PTB punctuation convention is as follows: 1. Modify ACE and other engines to use input character positions as token vertex identifiers, so that data coming out -- particularly the full forest record in the "edge" relation -- uses these to identify constituent boundaries instead of the existing identifiers (corresponding roughly to whitespace areas). 2. Mechanically revise a copy of the "decisions" relation from the old gold treebank so that the vertex identifiers in it are also character-based, in hopes of matching those used in the new full forest profiles. Destroy any discriminants that are judged unlikely to match correctly. 3. Run an automatic treebank update to achieve a high coverage gold treebank under the new punctuation convention; manually fix any items that didn't quite make it. Stephan pointed out that the +FROM/+TO values on token AVMs are a way to convert existing vertices to character positions. Thinking a bit more closely about this, there is at least one obvious problem: adjacent tokens T1,T2 do not generally have the property that T1.+TO = T2.+FROM, because there is usually whitespace between them. Therefore the revised scheme will have the property that whitespace adjacent to a constituent will in a sense be considered part of the constituent in some cases. I consider that slightly weird, but perhaps not too big a deal. The main thing is we need to pick a convention as to which position in the whitespace is to be considered the label of the vertex. One candidate convention would be that for any given vertex, its character-based label is the smallest +FROM value of any token starting from it, if any, and if no token starts at it, then the largest +TO value of any token ending at it. I would expect that at least in ordinary cases, possibly all cases, all the incident +FROMs would be identical and all the +TOs would be identical also, just with a difference between the +FROMs and +TOs. A somewhat more troubling problem is that multiple token vertices in the ERG can share the same +FROM and +TO. This happens quite productively with hyphenation, e.g.: A four-footed zebra arose. The historical ERG assigns [ +FROM "2" +TO "13" ] to both "four" and "footed" even while the token lattice is split in the middle, i.e. there are two tokens and there is a vertex "in between" them, but there is no sensible character offset available to assign to it. In the existing vertex labeling scheme, the vertex labels are generated based on a topological sort of the lattice, so we get: a(0,1) four(1,2) footed(2,3) zebra(3,4) arose(4,5) Using the convention proposed above, this would translate into: a(0,3) four(3,3) footed(3,14) zebra(14,20) arose(20,26) As you can see, there is a problem: two distinct vertices got smushed into character position 3. The situation is detectable automatically, of course, and ACE actually already has a built-in hack to adjust token +FROM and +TO in this case (making it possible to use the mouse to select parts of a hyphenated group like that in FFTB), but relying on that hack means hoping that ACE made the same decisions as the new punctuation rules in this case and any others that I haven't thought of. I am tempted to look at an alternative way of achieving the primary goal (i.e. synchronizing the ERG treebanks to the revised punctuation scheme). It would I believe be possible, maybe even straightforward, to make a tool that takes as input two token lattices (the old one and the new one for the same sentence) and computes an alignment between them that minimizes some notion of edit distance. With that in hand, the vertex identifiers of the old discriminants could be rewritten without resorting to character positions or having to solve the above snafu. It also would require no changes to the parsing engines or the treebanking tool, and would likely be at least partially reusable for future tokenization changes. Any suggestions? Woodley On 11/24/2019 03:43 PM, Stephan Oepen wrote: > many thanks for the quick follow-up, woodley! > > in general, character-based discriminants feel attractive because the idea > promises increased robustness to variation over time in tokenization. and > i am not sure yet i understand the difference in expressivity that you > suggest? an input to parsing is segmented into a sequence of vertices (or > breaking points); whether to number these continuously (0, 1, 2, ?) or > discontinuously according to e.g. corresponding character positions or time > stamps (into a speech signal)?i would think i can encode the same broad > range of lattices either way? > > closer to home, i was in fact thinking that the conversion from an existing > set of discriminants to a character-based regime could in fact be more > mechanic than the retooling you sketch. each current vertex should be > uniquely identified with a left and right character position, viz. the > +FROM and +TO values, respectively, on the underlying token feature > structures (i am assuming that all tokens in one cell share the same > values). for the vast majority of discriminants, would it not just work to > replace their start and end vertices with these characters positions? > > i am prepared to lose some discriminants, e.g. any choices on the > punctuation lexical rules that are being removed, but possibly also some > lexical choices that in the old universe end up anchored to a sub-string > including one or more punctuation marks. in the 500-best treebanks, it > used to be the case that pervasive redundancy of discriminants meant one > could afford to lose a non-trivial number of discriminants during an update > and still arrive at a unique solution. but maybe that works differently in > the full-forest universe? > > finally, i had not yet considered the ?twigs? (as they are an FFTB-specific > innovation). yes, it would seem unfortunate to just lose all twigs that > included one or more of the old punctuation rules! so your candidate > strategy of cutting twigs into two parts (of which one might often come out > empty) at occurrences of these rules strikes me as a promising (still quite > mechanic) way of working around this problem. formally, breaking up twigs > risks losing some information, but in this case i doubt this would be the > case in actuality. > > thanks for tossing around this idea! oe > > > On Sat, 23 Nov 2019 at 20:30 Woodley Packard > wrote: > >> Hi Stephan, >> >> My initial reaction to the notion of character-based discriminants is (1) >> it will not solve your immediate problem without a certain amount of custom >> tooling to convert old discriminants to new ones in a way that is sensitive >> to how the current punctuation rules work, i.e. a given chart vertex will >> have to be able to map to several different character positions depending >> on how much punctuation has been cliticized so far. The twig-shaped >> discriminants used by FFTB will in some cases have to be bifurcated into >> two or more discriminants, as well. Also, (2) this approach loses the >> (theoretical if perhaps not recently used) ability to treebank a nonlinear >> lattice shaped input, e.g. from an ASR system. I could imagine treebanking >> lattices from other sources as well ? perhaps an image caption generator. >> >> Given the custom tooling required for updating the discriminants, I?m not >> sure switching to character-based anchoring would be less painful than >> having that tool compute the new chart vertex anchoring instead ? though I >> could be wrong. What other arguments can be made in favor of >> character-based discriminants? >> >> In terms of support from FFTB, I think there are relatively few places in >> the code that assume the discriminants? from/to are interpretable beyond >> matching the from/to values of the `edge? relation. I think I would >> implement this by (optionally, I suppose, since presumably other grammars >> won?t want to do this at least for now) replacing the from/to on edges read >> from the profile with character positions and more or less pretend that >> there is a chart vertex for every character position. Barring unforeseen >> complications, that wouldn?t be too hard. >> >> Woodley >> >>> On Nov 23, 2019, at 5:58 AM, Stephan Oepen wrote: >>> >>> hi again, woodley, >>> >>> dan and i are currently exploring a 'makeover' of ERG input >>> processing, with the overall goal of increased compatibility with >>> mainstream assumptions about tokenization. >>> >>> among other things, we would like to move to the revised (i.e. >>> non-venerable) PTB (and OntoNotes and UD) tokenization conventions and >>> avoid subsequent re-arranging of segmentation in token mapping. this >>> means we would have to move away from the pseudo-affixation treatment >>> of punctuation marks to a 'pseudo-clitization' approach, meaning that >>> punctuation marks are lexical entries in their own right and attach >>> via binary constructions (rather than as lexical rules). the 'clitic' >>> metaphor, here, is intended to suggest that these lexical entries can >>> only attach at the bottom of the derivation, i.e. to non-clitic >>> lexical items immediately to their left (e.g. in the case of a comma) >>> or to their right (in the case of, say, an opening quote or >>> parenthesis). >>> >>> dan is currently visiting oslo, and we would like to use the >>> opportunity to estimate the cost of moving to such a revised universe. >>> treebank maintenance is a major concern here, as such a radical change >>> in the yields of virtually all derivations would render discriminants >>> invalid when updating to the new forests. i believe a cute idea has >>> emerged that, we optimistically believe, might eliminate much of that >>> concern: character-based discriminant positions, instead of our >>> venerable way of counting chart vertices. >>> >>> for the ERG at least, we believe that leaf nodes in all derivations >>> are reliably annotated with character start and end positions (+FROM >>> and +TO, as well as the +ID lists on token feature structures). these >>> sub-string indices will hardly be affected by the above change to >>> tokenization (except for cases where our current approach to splitting >>> at hyphens and slashes first in token mapping leads to overlapping >>> ranges). hence if discriminants were anchored over character ranges >>> instead of chart cells ... i expect the vast majority of them might >>> just carry over? >>> >>> we would be grateful if you (and others too, of course) could give the >>> above idea some critical thought and look for possible obstacles that >>> dan and i may just be overlooking? technically, i imagine one would >>> have to extend FFTB to (optionally) extract discriminant start and end >>> positions from the sub-string 'coverage' of each constituent, possibly >>> once convert existing treebanks to character-based indexing, and then >>> update into the new universe using character-based matching. does >>> such an approach seem feasible to you in principle? >>> >>> cheers, oe >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Mon Mar 30 21:46:27 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Mon, 30 Mar 2020 16:46:27 -0300 Subject: [developers] Compiling FFTB on Ubuntu 19.10 Message-ID: Hi Woodley, I had to reinstall my machine and now I am trying to recompile all the tools. I gave up for compiling them for MacOS. That would be great for me, but in the MacOS I haven?t passed from the first step below. So I am compiling everything in a docker running Ubuntu 19.10. My goal is to have FFTB running again, I can't use the acetools binaries you provided because it seems that http://sweaglesw.org/svn/treebank/trunk/web.c still don?t have the changed you made during a conversation in Cambridge: Line 1290: addr.sin_addr.s_addr = 0; // inet_addr("127.0.0.1?); Without that change, running FFTB inside the docker is not easy. We need a proxy server for redirecting the ports (as documented in http://moin.delph-in.net/FftbTop#FFTB_on_remote_machine), but with that change, we don?t need to proxy and can use the docker native way to redirect internal to external ports. I have tried to follow the steps that worked for me last time: 1. install liba 2. install repp-0.2.2 3. install libace 4. Install libtsdb I have success for 1-3, but in step 4 I got an error. The error was caused by /usr/bin/ld: cannot find -ltsdb This is a little bit strange because it seems that during the compilation of libtsdb it is looking for this same library? am I right? The complete trace is below. I didn?t see this error before. Can you help me? I am blocked by this error... $ make cc -fPIC -shared -g -O2 -c -o tsdb.o tsdb.c tsdb.c: In function ?tsdb_free_profile?: tsdb.c:47:5: warning: implicit declaration of function ?hash_free_nokeys? [-Wimplicit-function-declaration] 47 | hash_free_nokeys(r->fields[j].hash); | ^~~~~~~~~~~~~~~~ tsdb.c: In function ?tsdb_write_relation?: tsdb.c:258:2: warning: implicit declaration of function ?unlink? [-Wimplicit-function-declaration] 258 | unlink(fname_bk); | ^~~~~~ cc -fPIC -shared -g -O2 -c -o relations.o relations.c gcc -fPIC -shared -g -O2 -fvisibility=hidden -c hash.c -o hash.o gcc -fPIC -shared -g -O2 tsdb.o relations.o hash.o -shared -o libtsdb.so rm -f libtsdb.a ar cru libtsdb.a tsdb.o relations.o ar: `u' modifier ignored since `D' is the default (see `U') gcc -g -O2 -L. test.c -ltsdb -o test -Wl,-rpath -Wl,`pwd` -lace -ldl -la test.c:74:1: warning: return type defaults to ?int? [-Wimplicit-int] 74 | print_tree_with_edge_id(struct tree *t, int indent, int *edgemap) | ^~~~~~~~~~~~~~~~~~~~~~~ test.c:122:1: warning: return type defaults to ?int? [-Wimplicit-int] 122 | record_eq_edges(int x_eid, int y_eid) | ^~~~~~~~~~~~~~~ test.c: In function ?report_missing_edges?: test.c:165:3: warning: implicit declaration of function ?print_tree?; did you mean ?print_mrs?? [-Wimplicit-function-declaration] 165 | print_tree(t, 2); | ^~~~~~~~~~ | print_mrs test.c: At top level: test.c:174:1: warning: return type defaults to ?int? [-Wimplicit-int] 174 | fidget(struct tree *t) | ^~~~~~ test.c:190:1: warning: return type defaults to ?int? [-Wimplicit-int] 190 | compare_tree_lists(char *iid, struct result *rx, int nx, struct result *ry, int ny, int detail, char *errx, char *erry) | ^~~~~~~~~~~~~~~~~~ test.c: In function ?compare_tree_lists?: test.c:277:2: warning: implicit declaration of function ?hash_free?; did you mean ?hash_find?? [-Wimplicit-function-declaration] 277 | hash_free(hx); | ^~~~~~~~~ | hash_find test.c: In function ?tree_to_mrs?: test.c:381:2: warning: implicit declaration of function ?clear_mrs?; did you mean ?read_mrs?? [-Wimplicit-function-declaration] 381 | clear_mrs(); | ^~~~~~~~~ | read_mrs test.c: At top level: test.c:421:1: warning: return type defaults to ?int? [-Wimplicit-int] 421 | compare_surface_lists(char *iid, char *i_input, struct result *rx, int nx, struct result *ry, int ny, struct tree *gold_tree, struct mrs *gold_mrs, int detail) | ^~~~~~~~~~~~~~~~~~~~~ test.c:507:1: warning: return type defaults to ?int? [-Wimplicit-int] 507 | usage(char *prog) | ^~~~~ test.c:585:1: warning: return type defaults to ?int? [-Wimplicit-int] 585 | main(int argc, char *argv[]) | ^~~~ test.c: In function ?main?: test.c:599:2: warning: implicit declaration of function ?ace_load_grammar? [-Wimplicit-function-declaration] 599 | ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat"); | ^~~~~~~~~~~~~~~~ gcc -g -O2 art.c -lace -ltsdb -lrepp -la -o art -lutil art.c:65:1: warning: return type defaults to ?int? [-Wimplicit-int] 65 | usage(char *myname, int status) | ^~~~~ art.c:87:1: warning: return type defaults to ?int? [-Wimplicit-int] 87 | main(int argc, char *argv[]) | ^~~~ art.c: In function ?main?: art.c:156:13: warning: implicit declaration of function ?forkpty?; did you mean ?fork?? [-Wimplicit-function-declaration] 156 | pid_t p = forkpty(&arbiter_fd, NULL, NULL, NULL); | ^~~~~~~ | fork art.c:301:16: warning: implicit declaration of function ?read_result?; did you mean ?record_result?? [-Wimplicit-function-declaration] 301 | int status = read_result(parse_id, run_id, i_id, i_input); | ^~~~~~~~~~~ | record_result art.c: At top level: art.c:560:1: warning: return type defaults to ?int? [-Wimplicit-int] 560 | write_tuple(FILE *f, char **tuple, struct relation *r) | ^~~~~~~~~~~ /usr/bin/ld: cannot find -ltsdb collect2: error: ld returned 1 exit status make: *** [Makefile:51: art] Error 1 Best, Alexandre From sweaglesw at sweaglesw.org Mon Mar 30 23:11:35 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Mon, 30 Mar 2020 14:11:35 -0700 Subject: [developers] Compiling FFTB on Ubuntu 19.10 In-Reply-To: References: Message-ID: <0B0E6EF3-9F92-4FBF-8D41-A4B910C9E6A3@sweaglesw.org> Hi Alex, It looks like compiling the library succeeded but the test app failed, most likely just because the library is not yet installed. Please install the libraries (make install, or however you prefer) and that should allow the test app to build. -Woodley > On Mar 30, 2020, at 12:47 PM, Alexandre Rademaker wrote: > > ? > Hi Woodley, > > I had to reinstall my machine and now I am trying to recompile all the tools. I gave up for compiling them for MacOS. That would be great for me, but in the MacOS I haven?t passed from the first step below. So I am compiling everything in a docker running Ubuntu 19.10. > > My goal is to have FFTB running again, I can't use the acetools binaries you provided because it seems that http://sweaglesw.org/svn/treebank/trunk/web.c still don?t have the changed you made during a conversation in Cambridge: > > Line 1290: > > addr.sin_addr.s_addr = 0; // inet_addr("127.0.0.1?); > > Without that change, running FFTB inside the docker is not easy. We need a proxy server for redirecting the ports (as documented in http://moin.delph-in.net/FftbTop#FFTB_on_remote_machine), but with that change, we don?t need to proxy and can use the docker native way to redirect internal to external ports. > > I have tried to follow the steps that worked for me last time: > > 1. install liba > 2. install repp-0.2.2 > 3. install libace > 4. Install libtsdb > > I have success for 1-3, but in step 4 I got an error. The error was caused by > > /usr/bin/ld: cannot find -ltsdb > > This is a little bit strange because it seems that during the compilation of libtsdb it is looking for this same library? am I right? > > The complete trace is below. I didn?t see this error before. Can you help me? I am blocked by this error... > > > $ make > cc -fPIC -shared -g -O2 -c -o tsdb.o tsdb.c > tsdb.c: In function ?tsdb_free_profile?: > tsdb.c:47:5: warning: implicit declaration of function ?hash_free_nokeys? [-Wimplicit-function-declaration] > 47 | hash_free_nokeys(r->fields[j].hash); > | ^~~~~~~~~~~~~~~~ > tsdb.c: In function ?tsdb_write_relation?: > tsdb.c:258:2: warning: implicit declaration of function ?unlink? [-Wimplicit-function-declaration] > 258 | unlink(fname_bk); > | ^~~~~~ > cc -fPIC -shared -g -O2 -c -o relations.o relations.c > gcc -fPIC -shared -g -O2 -fvisibility=hidden -c hash.c -o hash.o > gcc -fPIC -shared -g -O2 tsdb.o relations.o hash.o -shared -o libtsdb.so > rm -f libtsdb.a > ar cru libtsdb.a tsdb.o relations.o > ar: `u' modifier ignored since `D' is the default (see `U') > gcc -g -O2 -L. test.c -ltsdb -o test -Wl,-rpath -Wl,`pwd` -lace -ldl -la > test.c:74:1: warning: return type defaults to ?int? [-Wimplicit-int] > 74 | print_tree_with_edge_id(struct tree *t, int indent, int *edgemap) > | ^~~~~~~~~~~~~~~~~~~~~~~ > test.c:122:1: warning: return type defaults to ?int? [-Wimplicit-int] > 122 | record_eq_edges(int x_eid, int y_eid) > | ^~~~~~~~~~~~~~~ > test.c: In function ?report_missing_edges?: > test.c:165:3: warning: implicit declaration of function ?print_tree?; did you mean ?print_mrs?? [-Wimplicit-function-declaration] > 165 | print_tree(t, 2); > | ^~~~~~~~~~ > | print_mrs > test.c: At top level: > test.c:174:1: warning: return type defaults to ?int? [-Wimplicit-int] > 174 | fidget(struct tree *t) > | ^~~~~~ > test.c:190:1: warning: return type defaults to ?int? [-Wimplicit-int] > 190 | compare_tree_lists(char *iid, struct result *rx, int nx, struct result *ry, int ny, int detail, char *errx, char *erry) > | ^~~~~~~~~~~~~~~~~~ > test.c: In function ?compare_tree_lists?: > test.c:277:2: warning: implicit declaration of function ?hash_free?; did you mean ?hash_find?? [-Wimplicit-function-declaration] > 277 | hash_free(hx); > | ^~~~~~~~~ > | hash_find > test.c: In function ?tree_to_mrs?: > test.c:381:2: warning: implicit declaration of function ?clear_mrs?; did you mean ?read_mrs?? [-Wimplicit-function-declaration] > 381 | clear_mrs(); > | ^~~~~~~~~ > | read_mrs > test.c: At top level: > test.c:421:1: warning: return type defaults to ?int? [-Wimplicit-int] > 421 | compare_surface_lists(char *iid, char *i_input, struct result *rx, int nx, struct result *ry, int ny, struct tree *gold_tree, struct mrs *gold_mrs, int detail) > | ^~~~~~~~~~~~~~~~~~~~~ > test.c:507:1: warning: return type defaults to ?int? [-Wimplicit-int] > 507 | usage(char *prog) > | ^~~~~ > test.c:585:1: warning: return type defaults to ?int? [-Wimplicit-int] > 585 | main(int argc, char *argv[]) > | ^~~~ > test.c: In function ?main?: > test.c:599:2: warning: implicit declaration of function ?ace_load_grammar? [-Wimplicit-function-declaration] > 599 | ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat"); > | ^~~~~~~~~~~~~~~~ > gcc -g -O2 art.c -lace -ltsdb -lrepp -la -o art -lutil > art.c:65:1: warning: return type defaults to ?int? [-Wimplicit-int] > 65 | usage(char *myname, int status) > | ^~~~~ > art.c:87:1: warning: return type defaults to ?int? [-Wimplicit-int] > 87 | main(int argc, char *argv[]) > | ^~~~ > art.c: In function ?main?: > art.c:156:13: warning: implicit declaration of function ?forkpty?; did you mean ?fork?? [-Wimplicit-function-declaration] > 156 | pid_t p = forkpty(&arbiter_fd, NULL, NULL, NULL); > | ^~~~~~~ > | fork > art.c:301:16: warning: implicit declaration of function ?read_result?; did you mean ?record_result?? [-Wimplicit-function-declaration] > 301 | int status = read_result(parse_id, run_id, i_id, i_input); > | ^~~~~~~~~~~ > | record_result > art.c: At top level: > art.c:560:1: warning: return type defaults to ?int? [-Wimplicit-int] > 560 | write_tuple(FILE *f, char **tuple, struct relation *r) > | ^~~~~~~~~~~ > /usr/bin/ld: cannot find -ltsdb > collect2: error: ld returned 1 exit status > make: *** [Makefile:51: art] Error 1 > > > Best, > Alexandre > > > From bond at ieee.org Wed Apr 8 14:21:59 2020 From: bond at ieee.org (Francis Bond) Date: Wed, 8 Apr 2020 20:21:59 +0800 Subject: [developers] ace 3.1 Message-ID: G'day, on Ubuntu 18.04, fftb 0.09.30 works fine, but 0.09.31 is having library issues: $ ~/bin/acetools-x86-0.9.31/fftb -g qsg.dat --browser --webdir ~/bin/acetools-x86-0.9.31/assets/ trees/ts.04 /home/bond/bin/acetools-x86-0.9.31/fftb: relocation error: /home/bond/bin/acetools-x86-0.9.31/fftb: symbol __get_cpu_features version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference Can anyone suggest a workaround? -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Thu Apr 9 06:27:32 2020 From: bond at ieee.org (Francis Bond) Date: Thu, 9 Apr 2020 12:27:32 +0800 Subject: [developers] Ungrammatical Input and the FFTB Message-ID: G'day, if we are treebanking a profile with ungrammatical sentences (i-wf = 0), what is the best practice? Currently you cannot annotate them at all. I don't remember what we did in the fine system. I feel it might be good to be automatically accept an utterance with i-wf=0 and no parse, and reject it if it has a parse, .... But I am not really sure. -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From danf at stanford.edu Thu Apr 9 21:56:00 2020 From: danf at stanford.edu (Dan Flickinger) Date: Thu, 9 Apr 2020 19:56:00 +0000 Subject: [developers] Ungrammatical Input and the FFTB In-Reply-To: References: Message-ID: Hi Francis, I might not quite follow you. If a sentence doesn't get any parses, then there is nothing to do in treebanking, except move on to the next sentence, since the unparsed one will not offer you any discriminants to choose. If it does get one or more parses, but is ungrammatical, I usually click "Reject". But I am now starting in with parsing a set of sentences to train a better robust model for errorful student input, using the grammar with mal-rules, so here I don't reject all ungrammatical sentences, but only those where I still can't find an intended robust parse. Dan ________________________________ From: Francis Bond Sent: Wednesday, April 8, 2020 9:27 PM To: Dan Flickinger ; Woodley Packard ; developers at delph-in.net Subject: Ungrammatical Input and the FFTB G'day, if we are treebanking a profile with ungrammatical sentences (i-wf = 0), what is the best practice? Currently you cannot annotate them at all. I don't remember what we did in the fine system. I feel it might be good to be automatically accept an utterance with i-wf=0 and no parse, and reject it if it has a parse, .... But I am not really sure. -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.passos.morgado at gmail.com Wed May 6 06:09:49 2020 From: luis.passos.morgado at gmail.com (Luis Morgado da Costa) Date: Wed, 6 May 2020 12:09:49 +0800 Subject: [developers] ACE crashing with ZHONG In-Reply-To: References: Message-ID: Dear Woodley (or anyone else who can help), Ace is crashing unexpectedly with at least two sentences in a large regression test for ZHONG: ? ? ? ?? ? ? ? ? ? (can-not-can lend me 1 CL pen?) ? ? ? ? ? ? (want-not-want borrow book?) Our suspicion is that the problem arises from the interaction of this V-not-V question form in Mandarin, and the fact that in these examples the verbs are auxiliaries. The same error does not happen, for example, for the sentence: ? ? ? ? ? ? (you eat-not-eat mean?) We repeatedly get the same error for these (and similar) sentences: *ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.* *Aborted (core dumped)* ========================================================= $ ./ace -g zhong.dat ? ? ? ?? ? ? ? ? ? ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. Aborted (core dumped) ./ace -g zhong.dat ? ? ? ? ? ? ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. Aborted (core dumped) ========================================================== This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen with LKB FOS (we get parses for both sentences above, see below). [image: Screenshot from 2020-05-06 11-28-01.png] [image: Screenshot from 2020-05-06 11-30-49.png] Is there anything we might be missing? We would much appreciate if you could help us solve this. For testing, you might want to download the current (uncommitted) version of ZHONG: https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV Cheers, Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2020-05-06 11-28-01.png Type: image/png Size: 234949 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2020-05-06 11-30-49.png Type: image/png Size: 210808 bytes Desc: not available URL: From sweaglesw at sweaglesw.org Wed May 6 08:25:13 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Tue, 5 May 2020 23:25:13 -0700 Subject: [developers] ACE crashing with ZHONG In-Reply-To: References: Message-ID: <57A32B7E-FD19-4E46-BD3E-AC8FEC2D3C87@sweaglesw.org> Hi Luis, I wasn't quite sure which ace/config.tdl to use in the Zhong tree, as there are several, but I guessed that maybe cmn/zhs/ace/config.tdl was a good place to start, and was able to reproduce the errors you found. A little bit of hunting showed that this is a result of STEM containing unconstrained strings, and it looks like the culprit is the v_aux_ell-lr rule. That rule fails to constrain the mother's STEM.FIRST value, and as a result, subsequent orthographemic rules (in this case, the abua-olr rule) can't tell what string they should be operating on. Possibly v_aux_ell-lr should pass up the daughter's STEM value? I hope that helps, Woodley > On May 5, 2020, at 9:09 PM, Luis Morgado da Costa wrote: > > > Dear Woodley (or anyone else who can help), > > Ace is crashing unexpectedly with at least two sentences in a large regression test for ZHONG: > ? ? ? ?? ? ? ? ? ? (can-not-can lend me 1 CL pen?) > ? ? ? ? ? ? (want-not-want borrow book?) > > Our suspicion is that the problem arises from the interaction of this V-not-V question form in Mandarin, and the fact that in these examples the verbs are auxiliaries. > The same error does not happen, for example, for the sentence: > > ? ? ? ? ? ? (you eat-not-eat mean?) > > We repeatedly get the same error for these (and similar) sentences: > ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. > Aborted (core dumped) > > ========================================================= > $ ./ace -g zhong.dat > ? ? ? ?? ? ? ? ? ? > ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. > Aborted (core dumped) > > ./ace -g zhong.dat > ? ? ? ? ? ? > ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. > Aborted (core dumped) > ========================================================== > > This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen with LKB FOS (we get parses for both sentences above, see below). > > > > > > > > Is there anything we might be missing? We would much appreciate if you could help us solve this. > > For testing, you might want to download the current (uncommitted) version of ZHONG: https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV > > > > Cheers, > Luis > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.passos.morgado at gmail.com Wed May 6 09:11:01 2020 From: luis.passos.morgado at gmail.com (Luis Morgado da Costa) Date: Wed, 6 May 2020 15:11:01 +0800 Subject: [developers] ACE crashing with ZHONG In-Reply-To: <57A32B7E-FD19-4E46-BD3E-AC8FEC2D3C87@sweaglesw.org> References: <57A32B7E-FD19-4E46-BD3E-AC8FEC2D3C87@sweaglesw.org> Message-ID: Thanks Woodley, That helped a lot. Everything is working as expected now. I had forgotten to also inherit from: constant-lex-rule := lex-rule & [ STEM #stem, DTR [ STEM #stem ]]. Cheers, Luis On Wed, May 6, 2020 at 2:25 PM Woodley Packard wrote: > Hi Luis, > > I wasn't quite sure which ace/config.tdl to use in the Zhong tree, as > there are several, but I guessed that maybe cmn/zhs/ace/config.tdl was a > good place to start, and was able to reproduce the errors you found. A > little bit of hunting showed that this is a result of STEM containing > unconstrained strings, and it looks like the culprit is the v_aux_ell-lr > rule. That rule fails to constrain the mother's STEM.FIRST value, and as a > result, subsequent orthographemic rules (in this case, the abua-olr rule) > can't tell what string they should be operating on. Possibly v_aux_ell-lr > should pass up the daughter's STEM value? > > I hope that helps, > Woodley > > On May 5, 2020, at 9:09 PM, Luis Morgado da Costa < > luis.passos.morgado at gmail.com> wrote: > > > Dear Woodley (or anyone else who can help), > > Ace is crashing unexpectedly with at least two sentences in a large > regression test for ZHONG: > ? ? ? ?? ? ? ? ? ? (can-not-can lend me 1 CL pen?) > ? ? ? ? ? ? (want-not-want borrow book?) > > Our suspicion is that the problem arises from the interaction of this > V-not-V question form in Mandarin, and the fact that in these examples the > verbs are auxiliaries. > The same error does not happen, for example, for the sentence: > > ? ? ? ? ? ? (you eat-not-eat mean?) > > We repeatedly get the same error for these (and similar) sentences: > *ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed.* > *Aborted (core dumped)* > > ========================================================= > $ ./ace -g zhong.dat > ? ? ? ?? ? ? ? ? ? > ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. > Aborted (core dumped) > > ./ace -g zhong.dat > ? ? ? ? ? ? > ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. > Aborted (core dumped) > ========================================================== > > This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen > with LKB FOS (we get parses for both sentences above, see below). > > > > > > > > Is there anything we might be missing? We would much appreciate if you > could help us solve this. > > For testing, you might want to download the current (uncommitted) version > of ZHONG: > https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV > > > > Cheers, > Luis > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed May 6 14:28:32 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 6 May 2020 09:28:32 -0300 Subject: [developers] ACE crashing with ZHONG In-Reply-To: References: Message-ID: <2AE4ACB3-2048-40EA-9D84-1C72FD841446@gmail.com> But why the grammar worked in LKB FOS? Just curious to understand the possible difference between the tools. Alexandre Sent from my iPhone > On 6 May 2020, at 04:12, Luis Morgado da Costa wrote: > > ? > Thanks Woodley, > > That helped a lot. Everything is working as expected now. I had forgotten to also inherit from: > > constant-lex-rule := lex-rule & > [ STEM #stem, > DTR [ STEM #stem ]]. > > Cheers, > Luis > > > >> On Wed, May 6, 2020 at 2:25 PM Woodley Packard wrote: >> Hi Luis, >> >> I wasn't quite sure which ace/config.tdl to use in the Zhong tree, as there are several, but I guessed that maybe cmn/zhs/ace/config.tdl was a good place to start, and was able to reproduce the errors you found. A little bit of hunting showed that this is a result of STEM containing unconstrained strings, and it looks like the culprit is the v_aux_ell-lr rule. That rule fails to constrain the mother's STEM.FIRST value, and as a result, subsequent orthographemic rules (in this case, the abua-olr rule) can't tell what string they should be operating on. Possibly v_aux_ell-lr should pass up the daughter's STEM value? >> >> I hope that helps, >> Woodley >> >>> On May 5, 2020, at 9:09 PM, Luis Morgado da Costa wrote: >>> >>> >>> Dear Woodley (or anyone else who can help), >>> >>> Ace is crashing unexpectedly with at least two sentences in a large regression test for ZHONG: >>> ? ? ? ?? ? ? ? ? ? (can-not-can lend me 1 CL pen?) >>> ? ? ? ? ? ? (want-not-want borrow book?) >>> >>> Our suspicion is that the problem arises from the interaction of this V-not-V question form in Mandarin, and the fact that in these examples the verbs are auxiliaries. >>> The same error does not happen, for example, for the sentence: >>> >>> ? ? ? ? ? ? (you eat-not-eat mean?) >>> >>> We repeatedly get the same error for these (and similar) sentences: >>> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. >>> Aborted (core dumped) >>> >>> ========================================================= >>> $ ./ace -g zhong.dat >>> ? ? ? ?? ? ? ? ? ? >>> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. >>> Aborted (core dumped) >>> >>> ./ace -g zhong.dat >>> ? ? ? ? ? ? >>> ace: type.c:625: type_to_string: Assertion `ty->name[0]=='"'' failed. >>> Aborted (core dumped) >>> ========================================================== >>> >>> This happens both in ACE 0.9.30 and 0.9.31; However, it does not happen with LKB FOS (we get parses for both sentences above, see below). >>> >>> >>> >>> >>> >>> >>> >>> Is there anything we might be missing? We would much appreciate if you could help us solve this. >>> >>> For testing, you might want to download the current (uncommitted) version of ZHONG: https://drive.google.com/open?id=1p7lPA06sD2v6Xq0qG0TslGF0x6n5uIqV >>> >>> >>> Cheers, >>> Luis >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Tue May 12 09:55:08 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Tue, 12 May 2020 15:55:08 +0800 Subject: [developers] compiling fftb In-Reply-To: <91487488-C19F-4903-BA57-8766A901F50E@sweaglesw.org> References: <1E880046-6207-462E-8255-6B4780C26BC1@gmail.com> <0FC0A140-51CB-42E7-98F4-C7B3864BFCA0@sweaglesw.org> <3A085354-A45E-46D5-B57B-A8B8703B7276@gmail.com> <91487488-C19F-4903-BA57-8766A901F50E@sweaglesw.org> Message-ID: Hi all, I'm getting similar errors to Alexandre. I successfully compiled and installed liba, repp-0.2.2, and then ace, but I'm getting the error that it cannot find I try "make all" for libtsdb. I noticed that tsdb.h is provided by libtsdb, and `#include ` seems to look in my system libraries. Changing all these to `#include "tsdb.h"` (thinking it might use the file in the current directory) did not work, so I reverted those changes and ran the following: make libtsdb.a # required for 'make install' make libtsdb.so # required for 'make install' make install # copies the above 2 things plus tsdb.h to /usr/local/lib/ Then I tried running "make all" again and now I see this: [...] test.c: In function ?main?: test.c:599:2: warning: implicit declaration of function ?ace_load_grammar? [-Wimplicit-function-declaration] 599 | ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat"); | ^~~~~~~~~~~~~~~~ /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_exp.o): in function `__ieee754_exp_ifunc': (.text+0x246): undefined reference to `_dl_x86_cpu_features' /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_log.o): in function `__ieee754_log_ifunc': (.text+0x2c6): undefined reference to `_dl_x86_cpu_features' collect2: error: ld returned 1 exit status make: *** [Makefile:45: test.static] Error 1 It seems there's some incompatibility in glibc versions. This SO question seems relevant: https://stackoverflow.com/q/56415996/1441112 ; maybe it's a static vs. dynamic linking issue? Other than test.static, I was able to make other targets, such as art and mkprof, but I see errors when I try to run them: $ ./art -h ./art: error while loading shared libraries: libace.so: cannot open shared object file: No such file or directory But I have libace.so at /usr/local/lib/libace.so, so I'm not sure what went wrong here. My end goal is to compile FFTB, and if I carry on with the current setup I see the same errors as when compiling test.static when I do "make fftb" for the FFTB source code. Does anybody know how to get around these issues? Some context: * For compiling ACE I copied itsdb_libraries.tgz as described here: http://moin.delph-in.net/AceInstall#Missing_itsdb.h * I'm running Pop!_OS 20.04 (similar to Ubuntu), with glibc version 2.31 On Fri, Jul 19, 2019 at 10:38 PM Woodley Packard wrote: > It looks like you are trying to compile the "liba" dependency. MacOS does > shared libraries quite differently from Linux. it will probably be easiest > to do it as a static library; try "make liba.a"? > > -Woodley > > > > On Jul 19, 2019, at 6:02 AM, Alexandre Rademaker > wrote: > > > > > > Hi Woodley, > > > > Once I follow the proper order for compile the dependencies (liba, > libace, libtsdb, fftb), I got everything to work at Linux. But no success o > Mac OS yet!! :-( > > > > Any direction? > > > > I found that gcc-9 is the gcc installed from brew > > > > $ gcc-9 --version > > gcc-9 (Homebrew GCC 9.1.0) 9.1.0 > > Copyright (C) 2019 Free Software Foundation, Inc. > > This is free software; see the source for copying conditions. There is > NO > > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR > PURPOSE. > > > > > > These where my changes in the Makefile but I could not compile. > > > > $ svn diff Makefile > > Index: Makefile > > =================================================================== > > --- Makefile (revision 40) > > +++ Makefile (working copy) > > @@ -1,6 +1,6 @@ > > -HDRS=net.h timer.h http.h web.h sql.h server.h aisle-rpc.h asta-rpc.h > background.h daemon.h aside-rpc.h escape.h > > -OBJS=net.o timer.o http.o web.o sql.o server.o aisle-rpc.o asta-rpc.o > background.o daemon.o aside-rpc.o escape.o > > -CC=gcc > > +HDRS=net.h timer.h http.h web.h server.h aisle-rpc.h asta-rpc.h > background.h daemon.h aside-rpc.h escape.h > > +OBJS=net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o > background.o daemon.o aside-rpc.o escape.o > > +CC=gcc-9 > > CFLAGS=-g -O -shared -fPIC -pthread > > #CFLAGS=-g -pg -O -shared -fPIC -pthread > > > > @@ -16,13 +16,13 @@ > > cp liba.h /usr/local/include/ > > > > tests: ${OBJS} liba.h > > - gcc -g -isystem . test.c ${OBJS} -lpq -lpthread -o test > > + ${CC} -g -isystem . test.c ${OBJS} -lpthread -o test > > > > shared-tests: > > - gcc -g test.c -la -o test > > + ${CC} -g test.c -la -o test > > > > liba.so: ${OBJS} liba.h Makefile > > - ld -shared ${OBJS} -o liba.so -lpq -lpthread > > + ld ${OBJS} -o liba.so -lpthread > > > > > > The error is: > > > > $ make > > ld net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o > background.o daemon.o aside-rpc.o escape.o -o liba.so -lpthread > > ld: warning: No version-min specified on command line > > Undefined symbols for architecture x86_64: > > "_main", referenced from: > > implicit entry/start for main executable > > ld: symbol(s) not found for inferred architecture x86_64 > > make: *** [liba.so] Error 1 > > > > > > > > Best, > > > > -- > > Alexandre Rademaker > > http://arademaker.github.io > > > > > > > > > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From petterha at gmail.com Wed May 13 15:08:52 2020 From: petterha at gmail.com (Petter Haugereid) Date: Wed, 13 May 2020 15:08:52 +0200 Subject: [developers] Treebanking and training with FFT Message-ID: Hi everybody, I have been trying over some days to make treebanking work with FFT. Following instructions on the DELPH-IN site, I have given the commands below, and I end up with a browser window with the items of the profile I attempt to treebank. However, when I click on one of the items, I get an error message "404 Not Found". Do any of you know what I am doing wrong? Here are the commands (with full paths): mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g ~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test ~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat --browser --webdir ~/acetools /tmp/mrs-test/ I am quite keen to get a statistical model for my grammar, so I have tried to train a model from a small treebank which I have disambiguated with the logon tool. When I try to train with LOGON, only get a lot of garbage collection messages, and I eventually have to kill the process. When I try to train with FFT with the following commands, I get the messages below: mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/ art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat -O' -f /tmp/mrs-test/ FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem & FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker ~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/ ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost # loading /tmp/mrs-test/... # loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/... # loading gold # ... iid 1 -- gold tree 1 / 1 not in parse forest # ... iid 2 -- gold tree 1 / 1 not in parse forest # ... iid 3 -- gold tree 1 / 1 not in parse forest ... # ... iid 68 -- gold tree 1 / 1 not in parse forest # ... iid 69 -- gold tree 1 / 1 not in parse forest # loaded 0 ambiguous feature forests with gold trees # [1]+ Exit 255 FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem # Floating point exception (core dumped) I tried the same commands with the ERG MRS treebank in LOGON, and I was able to train a model with it. I suspect the reason I don't succeed, is that I have treebanked with LOGON, while Dan has used FFT. Here are links to 1) the MRS treebank https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0 2) The Norwegian MRS items I have treebanked https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0 3) The Norsyg grammar (loading 'lkb/small-script' with the LKB, 'ace/config-small.tdl' with ACE is sufficient) https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0 4) A compiled version of the grammar, compiled with ace-0.9.30 https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0 If someone can point me to what I am doing wrong, I would be very greatful! Best, Petter -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Wed May 13 15:20:12 2020 From: bond at ieee.org (Francis Bond) Date: Wed, 13 May 2020 21:20:12 +0800 Subject: [developers] Treebanking and training with FFT In-Reply-To: References: Message-ID: Hi, We successfully treebanked recently, using (and updating) the wiki page. Is the webdir correct? It should have the files control.js, index.html and render.js in it. We found it in ace-tools-x86.0.9.31/assets (but not in 0.9.30). However 0.9.31 did not work for some reason, so we used the grammar and fftb from 0.9.30 and the webdir from 0.9.31. They are also included somewhere in the logon tree. I hope this helps. On Wed, May 13, 2020 at 9:09 PM Petter Haugereid wrote: > Hi everybody, > > I have been trying over some days to make treebanking work with FFT. > Following instructions on the DELPH-IN site, I have given the commands > below, and I end up with a browser window with the items of the profile I > attempt to treebank. However, when I click on one of the items, I get an > error message "404 Not Found". Do any of you know what I am doing wrong? > > Here are the commands (with full paths): > mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test > art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g > ~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test > ~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat > --browser --webdir ~/acetools /tmp/mrs-test/ > > I am quite keen to get a statistical model for my grammar, so I have tried > to train a model from a small treebank which I have disambiguated with the > logon tool. When I try to train with LOGON, only get a lot of garbage > collection messages, and I eventually have to kill the process. When I try > to train with FFT with the following commands, I get the messages below: > > mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/ > art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat > -O' -f /tmp/mrs-test/ > FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem & > FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker > ~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/ > ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost > > # loading /tmp/mrs-test/... > # loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/... > # loading gold > # ... iid 1 -- gold tree 1 / 1 not in parse forest > # ... iid 2 -- gold tree 1 / 1 not in parse forest > # ... iid 3 -- gold tree 1 / 1 not in parse forest > ... > # ... iid 68 -- gold tree 1 / 1 not in parse forest > # ... iid 69 -- gold tree 1 / 1 not in parse forest > # loaded 0 ambiguous feature forests with gold trees > # [1]+ Exit 255 FFGRANDPARENT=0 > ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem > # Floating point exception (core dumped) > > I tried the same commands with the ERG MRS treebank in LOGON, and I was > able to train a model with it. I suspect the reason I don't succeed, is > that I have treebanked with LOGON, while Dan has used FFT. > > Here are links to > 1) the MRS treebank > https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0 > 2) The Norwegian MRS items I have treebanked > https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0 > 3) The Norsyg grammar (loading 'lkb/small-script' with the LKB, > 'ace/config-small.tdl' with ACE is sufficient) > https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0 > 4) A compiled version of the grammar, compiled with ace-0.9.30 > https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0 > > If someone can point me to what I am doing wrong, I would be very greatful! > > Best, > > Petter > -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From petterha at gmail.com Wed May 13 19:45:11 2020 From: petterha at gmail.com (Petter Haugereid) Date: Wed, 13 May 2020 19:45:11 +0200 Subject: [developers] Treebanking and training with FFT In-Reply-To: References: Message-ID: Yes, it helped! I changed the webdir to ~/logon/lingo/answer/fftb/ (where I found the files you mentioned), and then I could treebank with fftb. I was also able to train a model. Thank you very much! Petter On Wed, May 13, 2020 at 3:20 PM Francis Bond wrote: > Hi, > > We successfully treebanked recently, using (and updating) the wiki page. > Is the webdir correct? It should have the files control.js, index.html > and render.js in it. We found it in ace-tools-x86.0.9.31/assets (but > not in 0.9.30). However 0.9.31 did not work for some reason, so we used > the grammar and fftb from 0.9.30 and the webdir from 0.9.31. They are > also included somewhere in the logon tree. > > I hope this helps. > > > > > > On Wed, May 13, 2020 at 9:09 PM Petter Haugereid > wrote: > >> Hi everybody, >> >> I have been trying over some days to make treebanking work with FFT. >> Following instructions on the DELPH-IN site, I have given the commands >> below, and I end up with a browser window with the items of the profile I >> attempt to treebank. However, when I click on one of the items, I get an >> error message "404 Not Found". Do any of you know what I am doing wrong? >> >> Here are the commands (with full paths): >> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test >> art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g >> ~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test >> ~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat >> --browser --webdir ~/acetools /tmp/mrs-test/ >> >> I am quite keen to get a statistical model for my grammar, so I have >> tried to train a model from a small treebank which I have disambiguated >> with the logon tool. When I try to train with LOGON, only get a lot of >> garbage collection messages, and I eventually have to kill the process. >> When I try to train with FFT with the following commands, I get the >> messages below: >> >> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/ >> art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat >> -O' -f /tmp/mrs-test/ >> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem & >> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker >> ~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/ >> ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost >> >> # loading /tmp/mrs-test/... >> # loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/... >> # loading gold >> # ... iid 1 -- gold tree 1 / 1 not in parse forest >> # ... iid 2 -- gold tree 1 / 1 not in parse forest >> # ... iid 3 -- gold tree 1 / 1 not in parse forest >> ... >> # ... iid 68 -- gold tree 1 / 1 not in parse forest >> # ... iid 69 -- gold tree 1 / 1 not in parse forest >> # loaded 0 ambiguous feature forests with gold trees >> # [1]+ Exit 255 FFGRANDPARENT=0 >> ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem >> # Floating point exception (core dumped) >> >> I tried the same commands with the ERG MRS treebank in LOGON, and I was >> able to train a model with it. I suspect the reason I don't succeed, is >> that I have treebanked with LOGON, while Dan has used FFT. >> >> Here are links to >> 1) the MRS treebank >> https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0 >> 2) The Norwegian MRS items I have treebanked >> https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0 >> 3) The Norsyg grammar (loading 'lkb/small-script' with the LKB, >> 'ace/config-small.tdl' with ACE is sufficient) >> https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0 >> 4) A compiled version of the grammar, compiled with ace-0.9.30 >> https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0 >> >> If someone can point me to what I am doing wrong, I would be very >> greatful! >> >> Best, >> >> Petter >> > > > -- > Francis Bond > Division of Linguistics and Multilingual Studies > Nanyang Technological University > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Thu May 14 03:38:12 2020 From: bond at ieee.org (Francis Bond) Date: Thu, 14 May 2020 09:38:12 +0800 Subject: [developers] Treebanking and training with FFT In-Reply-To: References: Message-ID: Great. I added a bit more to the documentation, just in case. On Thu, May 14, 2020 at 1:45 AM Petter Haugereid wrote: > Yes, it helped! > I changed the webdir to ~/logon/lingo/answer/fftb/ (where I found the > files you mentioned), and then I could treebank with fftb. I was also able > to train a model. > Thank you very much! > > Petter > > On Wed, May 13, 2020 at 3:20 PM Francis Bond wrote: > >> Hi, >> >> We successfully treebanked recently, using (and updating) the wiki page. >> Is the webdir correct? It should have the files control.js, index.html >> and render.js in it. We found it in ace-tools-x86.0.9.31/assets (but >> not in 0.9.30). However 0.9.31 did not work for some reason, so we used >> the grammar and fftb from 0.9.30 and the webdir from 0.9.31. They are >> also included somewhere in the logon tree. >> >> I hope this helps. >> >> >> >> >> >> On Wed, May 13, 2020 at 9:09 PM Petter Haugereid >> wrote: >> >>> Hi everybody, >>> >>> I have been trying over some days to make treebanking work with FFT. >>> Following instructions on the DELPH-IN site, I have given the commands >>> below, and I end up with a browser window with the items of the profile I >>> attempt to treebank. However, when I click on one of the items, I get an >>> error message "404 Not Found". Do any of you know what I am doing wrong? >>> >>> Here are the commands (with full paths): >>> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test >>> art -f -a '~/tools/ace-0.9.30/ace --disable-generalization -g >>> ~/tools/ace-0.9.30/norwegian-small.dat -O' /tmp/mrs-test >>> ~/acetools-x86-0.9.30/fftb -g ~/tools/ace-0.9.30/norwegian-small.dat >>> --browser --webdir ~/acetools /tmp/mrs-test/ >>> >>> I am quite keen to get a statistical model for my grammar, so I have >>> tried to train a model from a small treebank which I have disambiguated >>> with the logon tool. When I try to train with LOGON, only get a lot of >>> garbage collection messages, and I eventually have to kill the process. >>> When I try to train with FFT with the following commands, I get the >>> messages below: >>> >>> mkprof -s ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ /tmp/mrs-test/ >>> art -a '~/tools/ace-0.9.30/ace -g ~/tools/ace-0.9.30/norwegian-small.dat >>> -O' -f /tmp/mrs-test/ >>> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem & >>> FFGRANDPARENT=0 ~/acetools-x86-0.9.30/ffworker >>> ~/tools/ace-0.9.30/norwegian-small.dat /tmp/mrs-test/ >>> ~/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/ localhost >>> >>> # loading /tmp/mrs-test/... >>> # loading /home/petter/logon/lingo/lkb/src/tsdb/home/mrs.2020.05.12/... >>> # loading gold >>> # ... iid 1 -- gold tree 1 / 1 not in parse forest >>> # ... iid 2 -- gold tree 1 / 1 not in parse forest >>> # ... iid 3 -- gold tree 1 / 1 not in parse forest >>> ... >>> # ... iid 68 -- gold tree 1 / 1 not in parse forest >>> # ... iid 69 -- gold tree 1 / 1 not in parse forest >>> # loaded 0 ambiguous feature forests with gold trees >>> # [1]+ Exit 255 FFGRANDPARENT=0 >>> ~/acetools-x86-0.9.30/ffmaster 1 mrs-test-gp0.mem >>> # Floating point exception (core dumped) >>> >>> I tried the same commands with the ERG MRS treebank in LOGON, and I was >>> able to train a model with it. I suspect the reason I don't succeed, is >>> that I have treebanked with LOGON, while Dan has used FFT. >>> >>> Here are links to >>> 1) the MRS treebank >>> https://www.dropbox.com/s/7mj53j644vwhbes/mrs.2020.05.12.tgz?dl=0 >>> 2) The Norwegian MRS items I have treebanked >>> https://www.dropbox.com/s/qfhuqwnxlz0e1pb/mrs.txt?dl=0 >>> 3) The Norsyg grammar (loading 'lkb/small-script' with the LKB, >>> 'ace/config-small.tdl' with ACE is sufficient) >>> https://www.dropbox.com/s/rmoy6q40dvz1dxh/norsyg.20-05-13.tgz?dl=0 >>> 4) A compiled version of the grammar, compiled with ace-0.9.30 >>> https://www.dropbox.com/s/cb0dq9omuhojlmv/norwegian-small.dat?dl=0 >>> >>> If someone can point me to what I am doing wrong, I would be very >>> greatful! >>> >>> Best, >>> >>> Petter >>> >> >> >> -- >> Francis Bond >> Division of Linguistics and Multilingual Studies >> Nanyang Technological University >> > -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Fri May 15 08:50:22 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 15 May 2020 14:50:22 +0800 Subject: [developers] compiling fftb In-Reply-To: References: <1E880046-6207-462E-8255-6B4780C26BC1@gmail.com> <0FC0A140-51CB-42E7-98F4-C7B3864BFCA0@sweaglesw.org> <3A085354-A45E-46D5-B57B-A8B8703B7276@gmail.com> <91487488-C19F-4903-BA57-8766A901F50E@sweaglesw.org> Message-ID: Hi Woodley, (I re-added the developers list on CC so they can see the fix) Moving -lm to after -Wl,-Bdynamic did the trick. Strangely, once I did that and `make all && make install` for libtsdb, the other errors went away. Not sure if they were related or something else changed on my system in the meantime. And as you said, this also fixed the compiling of FFTB for me. Cheers, On Thu, May 14, 2020 at 1:14 AM Woodley Packard wrote: > Hi Mike, > > The errors while compiling test.static and friends do appear to be related > to the stack overflow thread you found. Fortunately those binaries are not > required; you should be able to use the dynamic ones just fine. If you > want the ones that have the support libraries compiled in statically, I > recommend moving -lm from inside of the static link block in > TOOL_STATIC_LDFLAGS to after the -Wl,-Bdynamic. I've done that at my end > now; thanks for the report. The same should work for the LIBS setting for > FFTB's Makefile. > > The error you're seeing when running art most likely is a result of your > system's shared library search path not including /usr/local/lib/. Your > options would be to put libace.so somewhere your system expects to find it > or add that path. To do that latter, you can edit /etc/ld.so.conf or > /etc/ld.so.conf.d/, or use LD_LIBRARY_PATH. > > Let me know if that helps resolve the issues at your end. > > Thanks, > Woodley > > On May 12, 2020, at 12:55 AM, goodman.m.w at gmail.com wrote: > > Hi all, > > I'm getting similar errors to Alexandre. I successfully compiled and > installed liba, repp-0.2.2, and then ace, but I'm getting the error that it > cannot find I try "make all" for libtsdb. I noticed that tsdb.h is > provided by libtsdb, and `#include ` seems to look in my system > libraries. Changing all these to `#include "tsdb.h"` (thinking it might use > the file in the current directory) did not work, so I reverted those > changes and ran the following: > > make libtsdb.a # required for 'make install' > make libtsdb.so # required for 'make install' > make install # copies the above 2 things plus tsdb.h to > /usr/local/lib/ > > Then I tried running "make all" again and now I see this: > > [...] > test.c: In function ?main?: > test.c:599:2: warning: implicit declaration of function > ?ace_load_grammar? [-Wimplicit-function-declaration] > 599 | > ace_load_grammar("/home/sweaglesw/cdev/ace-regression/comparison.dat"); > | ^~~~~~~~~~~~~~~~ > /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_exp.o): in > function `__ieee754_exp_ifunc': > (.text+0x246): undefined reference to `_dl_x86_cpu_features' > /usr/bin/ld: /usr/lib/x86_64-linux-gnu/libm-2.31.a(e_log.o): in > function `__ieee754_log_ifunc': > (.text+0x2c6): undefined reference to `_dl_x86_cpu_features' > collect2: error: ld returned 1 exit status > make: *** [Makefile:45: test.static] Error 1 > > It seems there's some incompatibility in glibc versions. This SO question > seems relevant: https://stackoverflow.com/q/56415996/1441112 ; maybe it's > a static vs. dynamic linking issue? Other than test.static, I was able to > make other targets, such as art and mkprof, but I see errors when I try to > run them: > > $ ./art -h > ./art: error while loading shared libraries: libace.so: cannot open > shared object file: No such file or directory > > But I have libace.so at /usr/local/lib/libace.so, so I'm not sure what > went wrong here. My end goal is to compile FFTB, and if I carry on with the > current setup I see the same errors as when compiling test.static when I do > "make fftb" for the FFTB source code. Does anybody know how to get around > these issues? > > Some context: > * For compiling ACE I copied itsdb_libraries.tgz as described here: > http://moin.delph-in.net/AceInstall#Missing_itsdb.h > * I'm running Pop!_OS 20.04 (similar to Ubuntu), with glibc version 2.31 > > > On Fri, Jul 19, 2019 at 10:38 PM Woodley Packard > wrote: > >> It looks like you are trying to compile the "liba" dependency. MacOS >> does shared libraries quite differently from Linux. it will probably be >> easiest to do it as a static library; try "make liba.a"? >> >> -Woodley >> >> >> > On Jul 19, 2019, at 6:02 AM, Alexandre Rademaker >> wrote: >> > >> > >> > Hi Woodley, >> > >> > Once I follow the proper order for compile the dependencies (liba, >> libace, libtsdb, fftb), I got everything to work at Linux. But no success o >> Mac OS yet!! :-( >> > >> > Any direction? >> > >> > I found that gcc-9 is the gcc installed from brew >> > >> > $ gcc-9 --version >> > gcc-9 (Homebrew GCC 9.1.0) 9.1.0 >> > Copyright (C) 2019 Free Software Foundation, Inc. >> > This is free software; see the source for copying conditions. There is >> NO >> > warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR >> PURPOSE. >> > >> > >> > These where my changes in the Makefile but I could not compile. >> > >> > $ svn diff Makefile >> > Index: Makefile >> > =================================================================== >> > --- Makefile (revision 40) >> > +++ Makefile (working copy) >> > @@ -1,6 +1,6 @@ >> > -HDRS=net.h timer.h http.h web.h sql.h server.h aisle-rpc.h asta-rpc.h >> background.h daemon.h aside-rpc.h escape.h >> > -OBJS=net.o timer.o http.o web.o sql.o server.o aisle-rpc.o asta-rpc.o >> background.o daemon.o aside-rpc.o escape.o >> > -CC=gcc >> > +HDRS=net.h timer.h http.h web.h server.h aisle-rpc.h asta-rpc.h >> background.h daemon.h aside-rpc.h escape.h >> > +OBJS=net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o >> background.o daemon.o aside-rpc.o escape.o >> > +CC=gcc-9 >> > CFLAGS=-g -O -shared -fPIC -pthread >> > #CFLAGS=-g -pg -O -shared -fPIC -pthread >> > >> > @@ -16,13 +16,13 @@ >> > cp liba.h /usr/local/include/ >> > >> > tests: ${OBJS} liba.h >> > - gcc -g -isystem . test.c ${OBJS} -lpq -lpthread -o test >> > + ${CC} -g -isystem . test.c ${OBJS} -lpthread -o test >> > >> > shared-tests: >> > - gcc -g test.c -la -o test >> > + ${CC} -g test.c -la -o test >> > >> > liba.so: ${OBJS} liba.h Makefile >> > - ld -shared ${OBJS} -o liba.so -lpq -lpthread >> > + ld ${OBJS} -o liba.so -lpthread >> > >> > >> > The error is: >> > >> > $ make >> > ld net.o timer.o http.o web.o server.o aisle-rpc.o asta-rpc.o >> background.o daemon.o aside-rpc.o escape.o -o liba.so -lpthread >> > ld: warning: No version-min specified on command line >> > Undefined symbols for architecture x86_64: >> > "_main", referenced from: >> > implicit entry/start for main executable >> > ld: symbol(s) not found for inferred architecture x86_64 >> > make: *** [liba.so] Error 1 >> > >> > >> > >> > Best, >> > >> > -- >> > Alexandre Rademaker >> > http://arademaker.github.io >> > >> > >> > >> >> >> > > -- > -Michael Wayne Goodman > > > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Thu May 21 23:38:57 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Thu, 21 May 2020 21:38:57 +0000 Subject: [developers] LKB-FOS now includes [incr tsdb()] Message-ID: Hi all, I've just released a new version of LKB-FOS. The main change is that the Linux version includes all of the non-LOGON parts of [incr tsdb()]. The podium runs, and I believe that all of its menu commands are working correctly. I've created a foreign function interface in SBCL for the BDB C program, so training maxent models also works. Anything that's at all CPU-intensive runs a lot quicker than in the LOGON run-time binary. For macOS, I haven't made a serious attempt at recompiling the core [incr tsdb()] C programs (tsdb, swish++), so there's not much of it that works - the main useful exception being reading and applying maxent models (e.g. as described at the end of http://moin.delph-in.net/LkbGeneration). No LOGON-specific functionality is available (i.e. source code enabled by the :logon feature), which means that PVM, WWW demo, SVMs and language models, external MT system interfaces etc are missing. If anyone particularly wants one of these features in LKB-FOS, it should be possible now there's a solid foundation to start from. BTW, below is a relevant posting to the developers list by Stephan in 2006. The previous posting in that thread was over-optimistic: a number of issues (which I won't bore this list with) made the port to SBCL harder than one might have expected. Anyway, I'm pleased to have made progress on this issue 14 years on! All the best, John PS The new LKB-FOS contains many other improvements - please see the README. Download link at http://moin.delph-in.net/LkbFos > http://lists.delph-in.net/archives/developers/2006/000632.html > > [developers] SBCL port > Stephan Oepen oe at csli.Stanford.EDU > Mon Oct 30 11:23:05 CET 2006 > > howdy, > > > But I expect a port would not be too difficult to achieve for either > > of these systems. Stephan, what do you think? > > [incr tsdb()] makes fairly central use of foreign functions, which are > non-standard. also, the [incr tsdb()] GUI depends on threads, which in > SBCL are just barely available (in a way different from the traditional > MP package), and only for Linux on x86 and AMD64 currently. i have no > current plans to port [incr tsdb()] to other Lisps, and personally i am > not too keen on getting other developers involved in that right now. i > would want to review patches to [incr tsdb()] code so as to make sure i > can maintain its overall design. these days i am afraid i have no time > for such activity. > > the LOGON MT architecture is an extension to [incr tsdb()], i.e it has > inherited the same constraints on cross-platform portability. however, > we are about to release a complete run-time edition of LOGON, such that > people will be able to get full functionality without their own license > for Allegro CL. > > more high-level, SBCL does look like a Lisp going the right direction. > but before it makes sense for us to make the coordinated effort towards > supporting the breadth of DELPH-IN software on a new Lisp, we should be > sure of our minimum requirements. the following come to my mind: > > (1) stable, efficient, actively maintained ANSI CL implementation > (2) UniCode strings, including full external format support > (3) cross-platform availability > (4) multi-processing, preferably with Lisp control of scheduler > (5) foreign function interface > (6) high-level OS interface: run-shell-command(), sockets, et al. > > SBCL appears to have all of the above but (4). i know CMU-CL used to > include the traditional MP package, but i have no idea about the other > desiderata there. > > best - oe > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125 > +++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515 > +++ --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net --- > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From ned at nedned.net Fri May 22 03:57:00 2020 From: ned at nedned.net (Ned Letcher) Date: Fri, 22 May 2020 11:57:00 +1000 Subject: [developers] Searching treebanks In-Reply-To: References: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no> Message-ID: Heya Francis, I surveyed syntactic querying tools for treebank search in my thesis. During development of Typediff , I needed to embed an interactive querying interface for DELPHIN treebanks, and came to the conclusion that Fangorn was the best tool for the job. Sadly there is not a live version of Typediff live currently. Fangorn itself wasn't too hard to get running I found, and as part of Typediff I created a tool for converting DELPHIN treebanks into the format that Fangorn expects, which you might be able to use. I have been hoping to get a version of Typediff up and running somewhere but it's not something I've been able to prioritise. If I do, I will be sure to let you know :) Cheers, Ned On Thu, 27 Feb 2020 at 01:09, Emily M. Bender wrote: > For search over semantic representations (MRS, DM, EDS) there's WeSearch: > > http://wesearch.delph-in.net/ > > ... which indexes DeepBank and WikiWoods. > > Emily > > On Wed, Feb 26, 2020 at 5:29 AM Francis Bond wrote: > >> Thanks for the tip. If only we all sensibly annotated our corpora with >> typecraft. >> >> On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan wrote: >> >>> Hi Francis, >>> >>> For Norwegian you can do such things through >>> https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of >>> about 20,000 sentences. >>> >>> >>> (Not right on your mark, but perhaps not too far from the sphere of >>> "anything" ...) >>> >>> >>> Best >>> >>> Lars >>> ------------------------------ >>> *From:* developers-bounces at emmtee.net >>> on behalf of Francis Bond >>> *Sent:* Wednesday, February 26, 2020 2:02:28 PM >>> *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy >>> Baldwin >>> *Subject:* [developers] Searching treebanks >>> >>> G'day, >>> >>> does anyone know of any way to search Redwoods (or DELPHIN treebanks in >>> general) for trees of a certain type (using something like the Fangorn >>> interface). For example, I want to find how often in the treebank 'start' >>> is intransitive vs NP V VP-ving vs NP V VP-to vs NP V VP NP (I start; I >>> start lecturing; I start to lecture; I start a lecture). >>> >>> In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ... >>> >>> I would be ecstatic if there were an online search I can point my >>> students at, but would be interested in anything. >>> >>> >>> >>> -- >>> Francis Bond >>> Division of Linguistics and Multilingual Studies >>> Nanyang Technological University >>> >> >> >> -- >> Francis Bond >> Division of Linguistics and Multilingual Studies >> Nanyang Technological University >> > > > -- > Emily M. Bender (she/her) > Howard and Frances Nostrand Endowed Professor > Department of Linguistics > Faculty Director, CLMS > University of Washington > Twitter: @emilymbender > -- nedned.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Mon May 25 14:37:34 2020 From: bond at ieee.org (Francis Bond) Date: Mon, 25 May 2020 20:37:34 +0800 Subject: [developers] Searching treebanks In-Reply-To: References: <5fca14bec6bb4ab9bdec8793a31f092b@ntnu.no> Message-ID: Thank you! I will try to set up Fangorn then. On Fri, May 22, 2020 at 9:57 AM Ned Letcher wrote: > Heya Francis, > > I surveyed syntactic querying tools for treebank search in my thesis. > During development of Typediff , I > needed to embed an interactive querying interface for DELPHIN treebanks, > and came to the conclusion that Fangorn was the best tool for the job. > Sadly there is not a live version of Typediff live currently. > > Fangorn itself wasn't too hard to get > running I found, and as part of Typediff I created a tool > > for converting DELPHIN treebanks into the format that Fangorn expects, > which you might be able to use. > > I have been hoping to get a version of Typediff up and running somewhere > but it's not something I've been able to prioritise. If I do, I will be > sure to let you know :) > > Cheers, > Ned > > On Thu, 27 Feb 2020 at 01:09, Emily M. Bender wrote: > >> For search over semantic representations (MRS, DM, EDS) there's WeSearch: >> >> http://wesearch.delph-in.net/ >> >> ... which indexes DeepBank and WikiWoods. >> >> Emily >> >> On Wed, Feb 26, 2020 at 5:29 AM Francis Bond wrote: >> >>> Thanks for the tip. If only we all sensibly annotated our corpora >>> with typecraft. >>> >>> On Wed, Feb 26, 2020 at 9:21 PM Lars Hellan wrote: >>> >>>> Hi Francis, >>>> >>>> For Norwegian you can do such things through >>>> https://typecraft.org/tc2wiki/Norwegian_Valency_Corpus, a corpus of >>>> about 20,000 sentences. >>>> >>>> >>>> (Not right on your mark, but perhaps not too far from the sphere of >>>> "anything" ...) >>>> >>>> >>>> Best >>>> >>>> Lars >>>> ------------------------------ >>>> *From:* developers-bounces at emmtee.net >>>> on behalf of Francis Bond >>>> *Sent:* Wednesday, February 26, 2020 2:02:28 PM >>>> *To:* Stephan Oepen; developers at delph-in.net; Rebecca Dridan; Timothy >>>> Baldwin >>>> *Subject:* [developers] Searching treebanks >>>> >>>> G'day, >>>> >>>> does anyone know of any way to search Redwoods (or DELPHIN treebanks in >>>> general) for trees of a certain type (using something like the Fangorn >>>> interface). For example, I want to find how often in the treebank 'start' >>>> is intransitive vs NP V VP-ving vs NP V VP-to vs NP V VP NP (I start; I >>>> start lecturing; I start to lecture; I start a lecture). >>>> >>>> In fangorn this was "//VP/VB/start[->S/VP/VBG" for NP V VP-ving, ... >>>> >>>> I would be ecstatic if there were an online search I can point my >>>> students at, but would be interested in anything. >>>> >>>> >>>> >>>> -- >>>> Francis Bond >>>> Division of Linguistics and Multilingual Studies >>>> Nanyang Technological University >>>> >>> >>> >>> -- >>> Francis Bond >>> Division of Linguistics and Multilingual Studies >>> Nanyang Technological University >>> >> >> >> -- >> Emily M. Bender (she/her) >> Howard and Frances Nostrand Endowed Professor >> Department of Linguistics >> Faculty Director, CLMS >> University of Washington >> Twitter: @emilymbender >> > > > -- > nedned.net > -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Mon Jun 1 06:35:57 2020 From: bond at ieee.org (Francis Bond) Date: Mon, 1 Jun 2020 12:35:57 +0800 Subject: [developers] LKB-FOS now includes [incr tsdb()] In-Reply-To: References: Message-ID: Hi, to get it working on ubuntu 18.04.4, I had to make some libraries visible: bdb.so, libdb-4.2.so, libtermcap.so.2 I did this with: export LD_LIBRARY_PATH=/home/bond/delphin/lkb_fos.2020/src/tsdb/linux.x86.64:/home/bond/delphin/lkb_fos.2020/lib/linux.x86.64 I also had to link libtermcap.so.2 -> /lib/x86_64-linux-gnu/libncurses.so.5.9 Maybe we could have a symbolic link from src/tsdb/linux.x86.64/bdb.so to lib/linux.x86.64/bdb.so so that we only have to point to a single directory? Should I make an installation section in LkbFos and add the notes there? On Fri, May 22, 2020 at 5:39 AM John Carroll wrote: > Hi all, > > I've just released a new version of LKB-FOS. The main change is that the > Linux version includes all of the non-LOGON parts of [incr tsdb()]. The > podium runs, and I believe that all of its menu commands are working > correctly. I've created a foreign function interface in SBCL for the BDB C > program, so training maxent models also works. Anything that's at all > CPU-intensive runs a lot quicker than in the LOGON run-time binary. > > For macOS, I haven't made a serious attempt at recompiling the core [incr > tsdb()] C programs (tsdb, swish++), so there's not much of it that works - > the main useful exception being reading and applying maxent models (e.g. as > described at the end of http://moin.delph-in.net/LkbGeneration). > > No LOGON-specific functionality is available (i.e. source code enabled by > the :logon feature), which means that PVM, WWW demo, SVMs and language > models, external MT system interfaces etc are missing. If anyone > particularly wants one of these features in LKB-FOS, it should be possible > now there's a solid foundation to start from. > > BTW, below is a relevant posting to the developers list by Stephan in > 2006. The previous posting in that thread was over-optimistic: a number of > issues (which I won't bore this list with) made the port to SBCL harder > than one might have expected. Anyway, I'm pleased to have made progress on > this issue 14 years on! > > All the best, > > John > > PS The new LKB-FOS contains many other improvements - please see the > README. Download link at http://moin.delph-in.net/LkbFos > > > > http://lists.delph-in.net/archives/developers/2006/000632.html > > > > [developers] SBCL port > > Stephan Oepen oe at csli.Stanford.EDU > > Mon Oct 30 11:23:05 CET 2006 > > > > howdy, > > > > > But I expect a port would not be too difficult to achieve for either > > > of these systems. Stephan, what do you think? > > > > [incr tsdb()] makes fairly central use of foreign functions, which are > > non-standard. also, the [incr tsdb()] GUI depends on threads, which in > > SBCL are just barely available (in a way different from the traditional > > MP package), and only for Linux on x86 and AMD64 currently. i have no > > current plans to port [incr tsdb()] to other Lisps, and personally i am > > not too keen on getting other developers involved in that right now. i > > would want to review patches to [incr tsdb()] code so as to make sure i > > can maintain its overall design. these days i am afraid i have no time > > for such activity. > > > > the LOGON MT architecture is an extension to [incr tsdb()], i.e it has > > inherited the same constraints on cross-platform portability. however, > > we are about to release a complete run-time edition of LOGON, such that > > people will be able to get full functionality without their own license > > for Allegro CL. > > > > more high-level, SBCL does look like a Lisp going the right direction. > > but before it makes sense for us to make the coordinated effort towards > > supporting the breadth of DELPH-IN software on a new Lisp, we should be > > sure of our minimum requirements. the following come to my mind: > > > > (1) stable, efficient, actively maintained ANSI CL implementation > > (2) UniCode strings, including full external format support > > (3) cross-platform availability > > (4) multi-processing, preferably with Lisp control of scheduler > > (5) foreign function interface > > (6) high-level OS interface: run-shell-command(), sockets, et al. > > > > SBCL appears to have all of the above but (4). i know CMU-CL used to > > include the traditional MP package, but i have no idea about the other > > desiderata there. > > > > best - oe > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) > 2284 0125 > > +++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 > 0515 > > +++ --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at > oepen.net --- > > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Mon Jun 1 12:04:01 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Mon, 1 Jun 2020 10:04:01 +0000 Subject: [developers] LKB-FOS now includes [incr tsdb()] In-Reply-To: References: Message-ID: Hi Francis, Thanks for the report and suggestions about LD_LIBARY_PATH. I put a hint about what to do in the README, but it's buried in dense text so not easy to find ("LD_LIBARY_PATH must include /lib/linux.x86.64"). It's a good ides to have a couple of sentences about this in LkbFos. Here's what LKB-FOS does on startup: * it finds the absolute path to the lkb_fos directory, and from that, it constructs a path to where it thinks bdb.so should be (e.g. on my system this is /home/ubuntu/Documents/delphin/lkb_fos/src/tsdb/linux.x86.64/bdb.so) * it attempts to load bdb.so from this path as a shared library * if there's an error in this load, it's either because the user has moved bdb.so, or libdb-4.2.so (which bdb.so depends on) can't be found * libdb-4.2.so will be pulled in if LD_LIBARY_PATH points to lkb_fos/lib/linux.x86.64 Later in a session, if you start up the [incr tsdb()] podium it tries to load libtermcap.so.2; on my system this is in a standard Linux shared library directory so is picked up fine. But evidently this isn't the case for everyone. So to avoid installation fuss, I was just going to provide it in lkb_fos/lib/linux.x86.64 - the file is very small. LOGON also takes this approach. I'll do this in the next LKB-FOS release. If DELPHINHOME is set as recommended, then I think the following should be sufficient: export LD_LIBRARY_PATH=$DELPHINHOME/lkb_fos/lib/linux.x86.64:$LD_LIBRARY_PATH And for the moment, some users will also need to execute: ln -s /libncurses.so.5.7 .../lkb_fos/lib/linux.x86.64/libtermcap.so.2 Does this look reasonable? If so I'll update LkbFos. John On 1 Jun 2020, at 05:35, Francis Bond > wrote: Hi, to get it working on ubuntu 18.04.4, I had to make some libraries visible: bdb.so, libdb-4.2.so, libtermcap.so.2 I did this with: export LD_LIBRARY_PATH=/home/bond/delphin/lkb_fos.2020/src/tsdb/linux.x86.64:/home/bond/delphin/lkb_fos.2020/lib/linux.x86.64 I also had to link libtermcap.so.2 -> /lib/x86_64-linux-gnu/libncurses.so.5.9 Maybe we could have a symbolic link from src/tsdb/linux.x86.64/bdb.so to lib/linux.x86.64/bdb.so so that we only have to point to a single directory? Should I make an installation section in LkbFos and add the notes there? On Fri, May 22, 2020 at 5:39 AM John Carroll > wrote: Hi all, I've just released a new version of LKB-FOS. The main change is that the Linux version includes all of the non-LOGON parts of [incr tsdb()]. The podium runs, and I believe that all of its menu commands are working correctly. I've created a foreign function interface in SBCL for the BDB C program, so training maxent models also works. Anything that's at all CPU-intensive runs a lot quicker than in the LOGON run-time binary. For macOS, I haven't made a serious attempt at recompiling the core [incr tsdb()] C programs (tsdb, swish++), so there's not much of it that works - the main useful exception being reading and applying maxent models (e.g. as described at the end of http://moin.delph-in.net/LkbGeneration). No LOGON-specific functionality is available (i.e. source code enabled by the :logon feature), which means that PVM, WWW demo, SVMs and language models, external MT system interfaces etc are missing. If anyone particularly wants one of these features in LKB-FOS, it should be possible now there's a solid foundation to start from. BTW, below is a relevant posting to the developers list by Stephan in 2006. The previous posting in that thread was over-optimistic: a number of issues (which I won't bore this list with) made the port to SBCL harder than one might have expected. Anyway, I'm pleased to have made progress on this issue 14 years on! All the best, John PS The new LKB-FOS contains many other improvements - please see the README. Download link at http://moin.delph-in.net/LkbFos > http://lists.delph-in.net/archives/developers/2006/000632.html > > [developers] SBCL port > Stephan Oepen oe at csli.Stanford.EDU > Mon Oct 30 11:23:05 CET 2006 > > howdy, > > > But I expect a port would not be too difficult to achieve for either > > of these systems. Stephan, what do you think? > > [incr tsdb()] makes fairly central use of foreign functions, which are > non-standard. also, the [incr tsdb()] GUI depends on threads, which in > SBCL are just barely available (in a way different from the traditional > MP package), and only for Linux on x86 and AMD64 currently. i have no > current plans to port [incr tsdb()] to other Lisps, and personally i am > not too keen on getting other developers involved in that right now. i > would want to review patches to [incr tsdb()] code so as to make sure i > can maintain its overall design. these days i am afraid i have no time > for such activity. > > the LOGON MT architecture is an extension to [incr tsdb()], i.e it has > inherited the same constraints on cross-platform portability. however, > we are about to release a complete run-time edition of LOGON, such that > people will be able to get full functionality without their own license > for Allegro CL. > > more high-level, SBCL does look like a Lisp going the right direction. > but before it makes sense for us to make the coordinated effort towards > supporting the breadth of DELPH-IN software on a new Lisp, we should be > sure of our minimum requirements. the following come to my mind: > > (1) stable, efficient, actively maintained ANSI CL implementation > (2) UniCode strings, including full external format support > (3) cross-platform availability > (4) multi-processing, preferably with Lisp control of scheduler > (5) foreign function interface > (6) high-level OS interface: run-shell-command(), sockets, et al. > > SBCL appears to have all of the above but (4). i know CMU-CL used to > include the traditional MP package, but i have no idea about the other > desiderata there. > > best - oe > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125 > +++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515 > +++ --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at oepen.net --- > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -- Francis Bond > Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Mon Jun 1 13:28:32 2020 From: bond at ieee.org (Francis Bond) Date: Mon, 1 Jun 2020 19:28:32 +0800 Subject: [developers] LKB-FOS now includes [incr tsdb()] In-Reply-To: References: Message-ID: That looks very reasonable! Thanks. On Mon, Jun 1, 2020 at 6:04 PM John Carroll wrote: > Hi Francis, > > Thanks for the report and suggestions about LD_LIBARY_PATH. I put a hint > about what to do in the README, but it's buried in dense text so not easy > to find ("LD_LIBARY_PATH must include /lib/linux.x86.64"). > It's a good ides to have a couple of sentences about this in LkbFos. > > Here's what LKB-FOS does on startup: > > * it finds the absolute path to the lkb_fos directory, and from that, it > constructs a path to where it thinks bdb.so should be (e.g. on my system > this is /home/ubuntu/Documents/delphin/lkb_fos/src/tsdb/linux.x86.64/bdb.so) > > * it attempts to load bdb.so from this path as a shared library > > * if there's an error in this load, it's either because the user has moved > bdb.so, or libdb-4.2.so (which bdb.so depends on) can't be found > > * libdb-4.2.so will be pulled in if LD_LIBARY_PATH points to > lkb_fos/lib/linux.x86.64 > > > Later in a session, if you start up the [incr tsdb()] podium it tries to > load libtermcap.so.2; on my system this is in a standard Linux shared > library directory so is picked up fine. But evidently this isn't the case > for everyone. So to avoid installation fuss, I was just going to provide it > in lkb_fos/lib/linux.x86.64 - the file is very small. LOGON also takes this > approach. I'll do this in the next LKB-FOS release. > > If DELPHINHOME is set as recommended, then I think the following should be > sufficient: > > export > LD_LIBRARY_PATH=$DELPHINHOME/lkb_fos/lib/linux.x86.64:$LD_LIBRARY_PATH > > And for the moment, some users will also need to execute: > > ln -s directory>/libncurses.so.5.7 .../lkb_fos/lib/linux.x86.64/libtermcap.so.2 > > > Does this look reasonable? If so I'll update LkbFos. > > John > > > On 1 Jun 2020, at 05:35, Francis Bond wrote: > > Hi, > > to get it working on ubuntu 18.04.4, I had to make some libraries visible: > bdb.so, libdb-4.2.so, libtermcap.so.2 > > I did this with: > export > LD_LIBRARY_PATH=/home/bond/delphin/lkb_fos.2020/src/tsdb/linux.x86.64:/home/bond/delphin/lkb_fos.2020/lib/linux.x86.64 > > I also had to link libtermcap.so.2 -> > /lib/x86_64-linux-gnu/libncurses.so.5.9 > > Maybe we could have a symbolic link from src/tsdb/linux.x86.64/bdb.so > to lib/linux.x86.64/bdb.so > so that we only have to point to a single directory? > > Should I make an installation section in LkbFos and add the notes there? > > > On Fri, May 22, 2020 at 5:39 AM John Carroll > wrote: > >> Hi all, >> >> I've just released a new version of LKB-FOS. The main change is that the >> Linux version includes all of the non-LOGON parts of [incr tsdb()]. The >> podium runs, and I believe that all of its menu commands are working >> correctly. I've created a foreign function interface in SBCL for the BDB C >> program, so training maxent models also works. Anything that's at all >> CPU-intensive runs a lot quicker than in the LOGON run-time binary. >> >> For macOS, I haven't made a serious attempt at recompiling the core [incr >> tsdb()] C programs (tsdb, swish++), so there's not much of it that works - >> the main useful exception being reading and applying maxent models (e.g. as >> described at the end of http://moin.delph-in.net/LkbGeneration). >> >> No LOGON-specific functionality is available (i.e. source code enabled by >> the :logon feature), which means that PVM, WWW demo, SVMs and language >> models, external MT system interfaces etc are missing. If anyone >> particularly wants one of these features in LKB-FOS, it should be possible >> now there's a solid foundation to start from. >> >> BTW, below is a relevant posting to the developers list by Stephan in >> 2006. The previous posting in that thread was over-optimistic: a number of >> issues (which I won't bore this list with) made the port to SBCL harder >> than one might have expected. Anyway, I'm pleased to have made progress on >> this issue 14 years on! >> >> All the best, >> >> John >> >> PS The new LKB-FOS contains many other improvements - please see the >> README. Download link at http://moin.delph-in.net/LkbFos >> >> >> > http://lists.delph-in.net/archives/developers/2006/000632.html >> > >> > [developers] SBCL port >> > Stephan Oepen oe at csli.Stanford.EDU >> > Mon Oct 30 11:23:05 CET 2006 >> > >> > howdy, >> > >> > > But I expect a port would not be too difficult to achieve for either >> > > of these systems. Stephan, what do you think? >> > >> > [incr tsdb()] makes fairly central use of foreign functions, which are >> > non-standard. also, the [incr tsdb()] GUI depends on threads, which in >> > SBCL are just barely available (in a way different from the traditional >> > MP package), and only for Linux on x86 and AMD64 currently. i have no >> > current plans to port [incr tsdb()] to other Lisps, and personally i am >> > not too keen on getting other developers involved in that right now. i >> > would want to review patches to [incr tsdb()] code so as to make sure i >> > can maintain its overall design. these days i am afraid i have no time >> > for such activity. >> > >> > the LOGON MT architecture is an extension to [incr tsdb()], i.e it has >> > inherited the same constraints on cross-platform portability. however, >> > we are about to release a complete run-time edition of LOGON, such that >> > people will be able to get full functionality without their own license >> > for Allegro CL. >> > >> > more high-level, SBCL does look like a Lisp going the right direction. >> > but before it makes sense for us to make the coordinated effort towards >> > supporting the breadth of DELPH-IN software on a new Lisp, we should be >> > sure of our minimum requirements. the following come to my mind: >> > >> > (1) stable, efficient, actively maintained ANSI CL implementation >> > (2) UniCode strings, including full external format support >> > (3) cross-platform availability >> > (4) multi-processing, preferably with Lisp control of scheduler >> > (5) foreign function interface >> > (6) high-level OS interface: run-shell-command(), sockets, et al. >> > >> > SBCL appears to have all of the above but (4). i know CMU-CL used to >> > include the traditional MP package, but i have no idea about the other >> > desiderata there. >> > >> > best - oe >> > >> > >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > +++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) >> 2284 0125 >> > +++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 >> 0515 >> > +++ --- oe at csli.stanford.edu; oe at ifi.uio.no; stephan at >> oepen.net --- >> > >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> > > -- > Francis Bond > Division of Linguistics and Multilingual Studies > Nanyang Technological University > > > -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Thu Jun 4 19:09:48 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Thu, 4 Jun 2020 17:09:48 +0000 Subject: [developers] Questions about chart mapping Message-ID: <4E0FC400-B1E5-4FB5-8DB4-5DDCED53A0C4@sussex.ac.uk> Hi developers, I've started to look at chart mapping and how it might be implemented. I've been reading the following: 'Tutorial - Chart Mapping in PET' at DELPH-IN Summit 2009 http://www.delph-in.net/2009/cm.pdf LREC 2008 paper http://www.lrec-conf.org/proceedings/lrec2008/pdf/349_paper.pdf I've also been checking my understanding of the formalism by looking at the token mapping rules in the ERG 2018 directory tmr/. I have a few questions below which I've tried to contextualise with respect to the tutorial slides. I hope an expert can answer them. > Copying Information > > * reentrancies can be used to copy information from INPUT to OUTPUT Presumably reentrancies can also be used to copy information from CONTEXT to OUTPUT? > Chart Mapping Procedure > > * a rule match is completed if all CONTEXT and INPUT arguments are bound What happens if there are several ways of matching chart edges to CONTEXT in a rule? Is the rule applied repeatedly, once for each alternative match? Or is only one of the alternative matches considered? This could matter if feature values or regular expression captures are copied from the context to the output. > * each rule is applied until its fixpoint is reached If I've understood the formalism correctly, I can imagine a rule that doesn't ever reach a fixpoint for some inputs (e.g. a rule in which the input and output unify, with the output building structure). Is the intended interpretation the following: a rule is never applied more than once to the same combination of input and context edges? And it's up to the grammarian to avoid writing infinitely looping rules? If this is the correct interpretation, then I'm puzzled by a few rules in the ERG: bridge_tmr in tmr/bridge.tdl, and the four rules default_(ld|lb|rd|rb)_tmr in tmr/gml.tdl. Their inputs seem to unify with their outputs, so surely each would apply in an infinite loop (i.e. an input edge would match and be replaced with a new output edge, and since this new edge had not previously been used as an input the rule would pick this up and apply again, etc etc)? Aside from the fixpoint issue, I'm not sure I understand the purpose of the rules default_(ld|lb|rd|rb)_tmr. At first glance they seem to merely replace their input. Is their purpose to remove all features that are not specified on the input side? I'm also puzzled by the following comment on bridge_tmr: > ;; ... here, we take advantage of redundancy detection built into > ;; token mapping, i.e. even though the rule is written as if it could apply any > ;; number of times per cell, there shall not be duplicates in the token chart. What enforces the restriction that "there shall not be duplicates in the token chart"? I can't see any mention of redundancy detection or of this restriction in the paper or tutorial slides. Is the restriction somehow enforced by the fixpoint condition? Thanks in advance for clarification on these points. John From oe at ifi.uio.no Thu Jun 4 19:36:07 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 4 Jun 2020 19:36:07 +0200 Subject: [developers] Questions about chart mapping In-Reply-To: <4E0FC400-B1E5-4FB5-8DB4-5DDCED53A0C4@sussex.ac.uk> References: <4E0FC400-B1E5-4FB5-8DB4-5DDCED53A0C4@sussex.ac.uk> Message-ID: hi john, peter and i originally designed the formalism, and undoubtedly there are finer points not in the paper or slides. PET can output detailed tracing information (look for the ?erg? shell alias in $LOGONROOT/dot.bashrc, which i suspect may be helpful. from memory, ?redundancy? detection means that new chart items are discarded if an equivalent item exists, and processing that role stopped for that position. i used to try and write rules that could not feed indefinitely on their own OUTPUT, but in at least some cases i allowed myself to take advantage of the redundancy check. regarding non-determinism in matching a rule LHS, from memory i would expect that all possibilities are explored. yes, copying into the rule RHS is certainly not limited to INPUT matches. if you were game, maybe we should video-conference at some point to go through other subtleties (that you are bound to uncover :-)? i would be thrilled if the LKB were to acquire an implementation of chart mapping (which i believe would also have several prospective use cases in generation)! best wishes, oe tor. 4. jun. 2020 kl. 19:11 skrev John Carroll : > Hi developers, > > I've started to look at chart mapping and how it might be implemented. > I've been reading the following: > > 'Tutorial - Chart Mapping in PET' at DELPH-IN Summit 2009 > http://www.delph-in.net/2009/cm.pdf > LREC 2008 paper > http://www.lrec-conf.org/proceedings/lrec2008/pdf/349_paper.pdf > > I've also been checking my understanding of the formalism by looking at > the token mapping rules in the ERG 2018 directory tmr/. I have a few > questions below which I've tried to contextualise with respect to the > tutorial slides. I hope an expert can answer them. > > > Copying Information > > > > * reentrancies can be used to copy information from INPUT to OUTPUT > > Presumably reentrancies can also be used to copy information from CONTEXT > to OUTPUT? > > > Chart Mapping Procedure > > > > * a rule match is completed if all CONTEXT and INPUT arguments are bound > > What happens if there are several ways of matching chart edges to CONTEXT > in a rule? Is the rule applied repeatedly, once for each alternative match? > Or is only one of the alternative matches considered? This could matter if > feature values or regular expression captures are copied from the context > to the output. > > > * each rule is applied until its fixpoint is reached > > If I've understood the formalism correctly, I can imagine a rule that > doesn't ever reach a fixpoint for some inputs (e.g. a rule in which the > input and output unify, with the output building structure). Is the > intended interpretation the following: a rule is never applied more than > once to the same combination of input and context edges? And it's up to the > grammarian to avoid writing infinitely looping rules? > > If this is the correct interpretation, then I'm puzzled by a few rules in > the ERG: bridge_tmr in tmr/bridge.tdl, and the four rules > default_(ld|lb|rd|rb)_tmr in tmr/gml.tdl. Their inputs seem to unify with > their outputs, so surely each would apply in an infinite loop (i.e. an > input edge would match and be replaced with a new output edge, and since > this new edge had not previously been used as an input the rule would pick > this up and apply again, etc etc)? > > Aside from the fixpoint issue, I'm not sure I understand the purpose of > the rules default_(ld|lb|rd|rb)_tmr. At first glance they seem to merely > replace their input. Is their purpose to remove all features that are not > specified on the input side? > > I'm also puzzled by the following comment on bridge_tmr: > > > ;; ... here, we take advantage of redundancy detection built into > > ;; token mapping, i.e. even though the rule is written as if it could > apply any > > ;; number of times per cell, there shall not be duplicates in the token > chart. > > What enforces the restriction that "there shall not be duplicates in the > token chart"? I can't see any mention of redundancy detection or of this > restriction in the paper or tutorial slides. Is the restriction somehow > enforced by the fixpoint condition? > > Thanks in advance for clarification on these points. > > John > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: IMG_0003.jpg Type: image/jpg Size: 146096 bytes Desc: not available URL: From arademaker at gmail.com Tue Jun 16 14:52:48 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 16 Jun 2020 09:52:48 -0300 Subject: [developers] Coref over ERSs? In-Reply-To: References: Message-ID: <9D5D3E2E-7A3E-4DF8-BA89-067E6DA8E729@gmail.com> Hi Woodley and Nikhil, I have just found this thread in my inbox. Woodley, can you share the code you have? Nikhil, did you make any progress in this area? I am looking for single sentence solution first. Best, Alexandre > On 13 Mar 2019, at 14:46, Nikhil Krishnaswamy wrote: > > Hi Woodley, > > Thanks for getting in touch. Insofar as I envision using MRS as a resource, it would be plain text in single sentences or well-formed sentence fragments. The pipeline we're developing is still malleable though, so it would be fairly simple to change formats or insert a preprocessing step depending on the tools or resources already available that we might want to make use of. > > Thanks, > Nikhil > > Nikhil Krishnaswamy, Ph.D. > Postdoctoral Researcher, Department of Computer Science > > > On Wed, Mar 13, 2019 at 1:43 PM Woodley Packard wrote: > Hi Nikhil, > > In the past I worked on coreference resolution in MRS, although never quite to the point of a publication or software release. Are you interested primarily in coreference within a single sentence or across multiple sentences? Also, what format are you considering consuming MRS in (simple text based, DMRS, EDs, DM, ...)? It?s possible some of my (oldish) tools could be of use to you. > > Regards, > Woodley From arademaker at gmail.com Fri Jul 3 16:19:41 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 3 Jul 2020 11:19:41 -0300 Subject: [developers] www script in the logon distribution Message-ID: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> Hi Stephan, For some reason, the www script in the logon distribution does not start the webserver. Using the `--debug` option, I don't have any additional information in the log file (actually, the script didn't mention the debug anywhere). I am following all instructions from http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running without any error in the startup. I don't see any *.pvm file in the /tmp. The script bin/logon starts LKB and the [incr TSDB()] normally. I have used `?cat` to save a lisp file and load it manually in the ACL REPL, no error too. Any idea? The log file is below. Michael and Francis, I did a complete review of the Dockerfile yesterday. Does it make sense to move https://github.com/own-pt/docker-logon to the https://github.com/delph-in organization? Maybe I can also rename it since the docker now has more than just the minimal environment to run the LOGON tools. I believe that having more repositories under the same delph-in organization makes things clear and gives more visibility. Nice to have Matrix and the brew package already there. I hope that people will start to recognize the benefits of git/GitHub compared to SVN (documentation, issue, easy branching, cross-references of code/issues/PR etc). Best, Alexandre ////// user at 4091e35482b2:~/logon$ ./www --binary --debug --erg --port 9080 International Allegro CL Enterprise Edition 10.0 [64-bit Linux (x86-64)] (Feb 20, 2019 18:22) Copyright (C) 1985-2015, Franz Inc., Oakland, CA, USA. All Rights Reserved. This standard runtime copy of Allegro CL was built by: [TC13152] Universitetet i Oslo ; Loading /home/user/logon/dot.tsdbrc ; Loading /home/user/.tsdbrc [changing package from "COMMON-LISP-USER" to "TSDB"] TSNLP(1): NIL TSNLP(2): NIL TSNLP(3): T TSNLP(4): 5 TSNLP(5): "

(This on-line demonstrator is hosted at the University of Oslo)
" TSNLP(6): ; Loading /home/user/logon/lingo/erg/lkb/script set-coding-system(): activated UTF8. ; Loading /home/user/logon/lingo/erg/Version.lsp ; Loading /home/user/logon/lingo/erg/lkb/globals.lsp ; Loading /home/user/logon/lingo/erg/lkb/user-fns.lsp ; Loading /home/user/logon/lingo/erg/lkb/checkpaths.lsp ; Loading /home/user/logon/lingo/erg/lkb/patches.lsp Reading in type file fundamentals Reading in type file tmt Reading in type file lextypes [14:13:08] gc-after-hook(): {L#626 N=5.2M O=0 E=100%} [S=2.3G R=102M]. Reading in type file syntax [14:13:10] gc-after-hook(): {L#627 N=7.1M O=0 E=99%} [S=2.3G R=232M]. Reading in type file ctype Reading in type file lexrules Reading in type file auxverbs [14:13:12] gc-after-hook(): {L#628 N=9.2M O=0 E=98%} [S=2.3G R=352M]. Reading in type file mtr Reading in type file dt Checking type hierarchy Checking for unique greatest lower bounds Expanding constraints [14:13:18] gc-after-hook(): {L#629 N=55M O=5.2K E=99%} [S=2.3G R=352M]. Making constraints well formed [14:13:19] gc-after-hook(): {L#630 N=72M O=4.8M E=82%} [S=2.3G R=356M]. [14:13:19] gc-after-hook(): {L#631 N=80M O=1.9M E=68%} [S=2.3G R=358M]. [14:13:20] gc-after-hook(): {L#632 N=87M O=2.2M E=79%} [S=2.3G R=392M]. [14:13:21] gc-after-hook(): {L#633 N=62M O=34M E=43%} [S=2.3G R=442M]. [14:13:22] gc-after-hook(): {L#634 N=69M O=23M E=80%} [S=2.3G R=466M]. [14:13:22] gc-after-hook(): 133M tenured; forcing global gc(). [14:13:23] gc-after-hook(): {GR#8 N=54M O=0 E=100%} [S=2.3G R=484M]. [14:13:24] gc-after-hook(): {L#635 N=88M O=0 E=0%} [S=2.3G R=484M]. [14:13:25] gc-after-hook(): {L#636 N=97M O=10M E=69%} [S=2.3G R=491M]. [14:13:26] gc-after-hook(): {L#637 N=99M O=14M E=63%} [S=2.4G R=532M]. [14:13:27] gc-after-hook(): {L#638 N=93M O=29M E=53%} [S=2.4G R=581M]. 80175904 bytes have been tenured, next gc will be global. See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information. Expanding defaults Type file checked successfully Computing display ordering Reading in cached leaf types Cached leaf types read Reading in cached lexicon (main) Cached lexicon read Reading in rules file constructions Reading in lexical rules file inflr Reading in lexical rules file inflr-pnct Reading in root file roots Reading in lexical rules file lexrinst Reading in parse node file parse-nodes ; Loading /home/user/logon/lingo/erg/lkb/mrsglobals.lsp ; Loading /home/user/logon/lingo/erg/lkb/eds.lsp ; Loading /home/user/logon/lingo/erg/www/setup.lsp ; cpu time (non-gc) 13.952552 sec user, 0.026410 sec system ; cpu time (gc) 9.165182 sec user, 0.505708 sec system ; cpu time (total) 23.117734 sec user, 0.532118 sec system ; real time 22.104421 sec (107.0%) ; space allocation: ; 25,979,360 cons cells, 681,401,040 other bytes, 0 static bytes ; Page Faults: major: 0 (gc: 66190), minor: 163781 (gc: 66190) ; Loading /home/user/logon/lingo/erg/rpp/setup.lsp read-repp(): reading file `xml.rpp'. read-repp(): reading file `latex.rpp'. read-repp(): reading file `ascii.rpp'. read-repp(): reading file `html.rpp'. read-repp(): reading file `wiki.rpp'. read-repp(): reading file `lgt.rpp'. read-repp(): reading file `gml.rpp'. read-repp(): reading file `robustness.rpp'. read-repp(): reading file `quotes.rpp'. read-repp(): reading file `ptb.rpp'. read-repp(): reading file `lkb.rpp'. read-repp(): reading file `micro.rpp'. read-repp(): reading file `tokenizer.rpp'. read-heads() reading file `rules.hds'. read-model(): reading file `jhpstg.g.mem'. [14:13:30] gc-after-hook(): {G#638 N=78M O=0 E=87%} [S=2.4G R=617M]. read-semi(): reading file `erg.smi'. read-semi(): reading file `hierarchy.smi'. read-semi(): reading file `abstract.smi'. read-semi(): reading file `surface.smi'. [14:13:32] gc-after-hook(): {L#639 N=108M O=0 E=0%} [S=2.4G R=617M]. read-vpm(): reading file `semi.vpm'. read-vpm(): reading file `abstract.vpm'. ; Loading /home/user/logon/lingo/erg/lkb/mt.lsp read-transfer-rules(): reading file `paraphraser.mtr'. read-transfer-rules(): reading file `idioms.mtr'. read-transfer-rules(): reading file `trigger.mtr'. [14:13:34] gc-after-hook(): {L#640 N=108M O=11M E=83%} [S=2.4G R=617M]. read-transfer-rules(): reading file `generation.mtr'. Building rule filter [14:13:36] gc-after-hook(): {L#641 N=105M O=9.5M E=90%} [S=2.4G R=617M]. [14:13:42] gc-after-hook(): {L#642 N=93M O=14M E=95%} [S=2.4G R=617M]. [14:13:47] gc-after-hook(): {L#643 N=24M O=72M E=92%} [S=2.4G R=666M]. [14:13:47] gc-after-hook(): 161M tenured; forcing global gc(). [14:13:48] gc-after-hook(): {GR#10 N=12M O=0 E=100%} [S=2.4G R=678M]. 75861824 bytes have been tenured, next gc will be global. See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information. Building lr connections table Constructing lr table for non-morphological rules Grammar input complete NIL TSNLP(7): [14:14:27] gc-after-hook(): {G#643 N=35M O=0 E=81%} [S=2.4G R=678M]. [14:14:30] gc-after-hook(): {L#644 N=41M O=0 E=0%} [S=2.4G R=678M]. [14:14:32] gc-after-hook(): {L#645 N=41M O=5.7M E=94%} [S=2.4G R=682M]. [14:14:35] gc-after-hook(): {L#646 N=43M O=2.8M E=90%} [S=2.4G R=685M]. [14:14:38] gc-after-hook(): {L#647 N=42M O=4.0M E=94%} [S=2.4G R=689M]. [14:14:41] gc-after-hook(): {L#648 N=25M O=21M E=93%} [S=2.4G R=711M]. [14:14:44] gc-after-hook(): {L#649 N=26M O=4.2M E=77%} [S=2.4G R=715M]. [14:14:47] gc-after-hook(): {L#650 N=27M O=4.0M E=92%} [S=2.4G R=719M]. [14:14:50] gc-after-hook(): {L#651 N=26M O=4.2M E=92%} [S=2.4G R=723M]. [14:14:53] gc-after-hook(): {L#652 N=27M O=4.3M E=93%} [S=2.4G R=728M]. 53092272 bytes have been tenured, next gc will be global. See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information. [14:14:58] gc-after-hook(): {G#652 N=25M O=0 E=95%} [S=2.4G R=733M]. [14:15:01] gc-after-hook(): {L#653 N=29M O=0 E=0%} [S=2.4G R=733M]. [14:15:04] gc-after-hook(): {L#654 N=30M O=4.1M E=95%} [S=2.4G R=734M]. [14:15:08] gc-after-hook(): {L#655 N=31M O=3.4M E=90%} [S=2.4G R=737M]. [14:15:11] gc-after-hook(): {L#656 N=30M O=4.9M E=91%} [S=2.4G R=742M]. [14:15:14] gc-after-hook(): {L#657 N=25M O=8.4M E=92%} [S=2.4G R=750M]. [14:15:18] gc-after-hook(): {L#658 N=24M O=5.0M E=87%} [S=2.4G R=756M]. [14:15:21] gc-after-hook(): {L#659 N=24M O=4.4M E=93%} [S=2.4G R=760M]. [14:15:25] gc-after-hook(): {L#660 N=24M O=3.8M E=93%} [S=2.4G R=764M]. [14:15:28] gc-after-hook(): {L#661 N=23M O=4.0M E=89%} [S=2.4G R=768M]. [14:15:31] gc-after-hook(): {L#662 N=24M O=4.1M E=92%} [S=2.4G R=772M]. [14:15:34] gc-after-hook(): {L#663 N=25M O=3.8M E=92%} [S=2.4G R=776M]. [14:15:37] gc-after-hook(): {L#664 N=25M O=3.6M E=92%} [S=2.4G R=779M]. [14:15:40] gc-after-hook(): {L#665 N=26M O=3.7M E=93%} [S=2.4G R=783M]. 55870688 bytes have been tenured, next gc will be global. See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more information. #[SEM-I {38454 ges}: 0 roles; 22406 predicates; 0 properties] TSNLP(8): "/brat/" TSNLP(9): [t40009] BEGIN [t4000a] BEGIN [t40009] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' [t4000a] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' [t4000a] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] [t40009] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] [t4000a] read-vpm(): reading file `semi.vpm'. [t40009] read-vpm(): reading file `semi.vpm'. [t4000a] 95873 types in 15 s [t4000a] [t40009] 95873 types in 15 s [t40009] [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <40009> [00:17]. [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <4000a> [00:17]. NIL TSNLP(10): [t4000b] BEGIN [t4000c] BEGIN [t4000d] BEGIN [t4000e] BEGIN [t4000d] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' [t4000e] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' [t4000c] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' [t4000b] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/lingo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' From goodman.m.w at gmail.com Fri Jul 3 16:47:19 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 3 Jul 2020 22:47:19 +0800 Subject: [developers] www script in the logon distribution In-Reply-To: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> Message-ID: Hi Alexandre, I certainly don't mind if you want to put it under the delph-in organization. I just looked it over briefly and I have two questions and a suggestion: * It is described as being for macOS, but very little actually looks macOS-specific. Would it be appropriate to describe it in more general terms in case someone wants to run it from some other platform? * It is called docker-logon, but I don't see that it gets any of the LOGON distribution. Maybe it should be renamed? * It looks like you've included web.c from FFTB. The FFTB project is under the MIT license, so you'll need to include its LICENSE file as well. On Fri, Jul 3, 2020 at 10:21 PM Alexandre Rademaker wrote: > > Hi Stephan, > > For some reason, the www script in the logon distribution does not start > the webserver. Using the `--debug` option, I don't have any additional > information in the log file (actually, the script didn't mention the debug > anywhere). I am following all instructions from > http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running > without any error in the startup. I don't see any *.pvm file in the /tmp. > The script bin/logon starts LKB and the [incr TSDB()] normally. I have used > `?cat` to save a lisp file and load it manually in the ACL REPL, no error > too. Any idea? The log file is below. > > Michael and Francis, > > I did a complete review of the Dockerfile yesterday. Does it make sense to > move https://github.com/own-pt/docker-logon to the > https://github.com/delph-in organization? Maybe I can also rename it > since the docker now has more than just the minimal environment to run the > LOGON tools. I believe that having more repositories under the same > delph-in organization makes things clear and gives more visibility. Nice > to have Matrix and the brew package already there. I hope that people will > start to recognize the benefits of git/GitHub compared to SVN > (documentation, issue, easy branching, cross-references of code/issues/PR > etc). > > > Best, > Alexandre > > > ////// > > user at 4091e35482b2:~/logon$ ./www --binary --debug --erg --port 9080 > > International Allegro CL Enterprise Edition > 10.0 [64-bit Linux (x86-64)] (Feb 20, 2019 18:22) > Copyright (C) 1985-2015, Franz Inc., Oakland, CA, USA. All Rights > Reserved. > > This standard runtime copy of Allegro CL was built by: > [TC13152] Universitetet i Oslo > > ; Loading /home/user/logon/dot.tsdbrc > ; Loading /home/user/.tsdbrc > > [changing package from "COMMON-LISP-USER" to "TSDB"] > TSNLP(1): NIL > TSNLP(2): NIL > TSNLP(3): T > TSNLP(4): 5 > TSNLP(5): "

> (This on-line demonstrator is hosted at the > University > > of Oslo)
" > TSNLP(6): ; Loading /home/user/logon/lingo/erg/lkb/script > set-coding-system(): activated UTF8. > ; Loading /home/user/logon/lingo/erg/Version.lsp > ; Loading /home/user/logon/lingo/erg/lkb/globals.lsp > ; Loading /home/user/logon/lingo/erg/lkb/user-fns.lsp > ; Loading /home/user/logon/lingo/erg/lkb/checkpaths.lsp > ; Loading /home/user/logon/lingo/erg/lkb/patches.lsp > > Reading in type file fundamentals > Reading in type file tmt > Reading in type file lextypes > [14:13:08] gc-after-hook(): {L#626 N=5.2M O=0 E=100%} [S=2.3G R=102M]. > > Reading in type file syntax > [14:13:10] gc-after-hook(): {L#627 N=7.1M O=0 E=99%} [S=2.3G R=232M]. > > Reading in type file ctype > Reading in type file lexrules > Reading in type file auxverbs > [14:13:12] gc-after-hook(): {L#628 N=9.2M O=0 E=98%} [S=2.3G R=352M]. > > Reading in type file mtr > Reading in type file dt > Checking type hierarchy > Checking for unique greatest lower bounds > Expanding constraints > [14:13:18] gc-after-hook(): {L#629 N=55M O=5.2K E=99%} [S=2.3G R=352M]. > > Making constraints well formed > [14:13:19] gc-after-hook(): {L#630 N=72M O=4.8M E=82%} [S=2.3G R=356M]. > [14:13:19] gc-after-hook(): {L#631 N=80M O=1.9M E=68%} [S=2.3G R=358M]. > [14:13:20] gc-after-hook(): {L#632 N=87M O=2.2M E=79%} [S=2.3G R=392M]. > [14:13:21] gc-after-hook(): {L#633 N=62M O=34M E=43%} [S=2.3G R=442M]. > [14:13:22] gc-after-hook(): {L#634 N=69M O=23M E=80%} [S=2.3G R=466M]. > [14:13:22] gc-after-hook(): 133M tenured; forcing global gc(). > [14:13:23] gc-after-hook(): {GR#8 N=54M O=0 E=100%} [S=2.3G R=484M]. > [14:13:24] gc-after-hook(): {L#635 N=88M O=0 E=0%} [S=2.3G R=484M]. > [14:13:25] gc-after-hook(): {L#636 N=97M O=10M E=69%} [S=2.3G R=491M]. > [14:13:26] gc-after-hook(): {L#637 N=99M O=14M E=63%} [S=2.4G R=532M]. > [14:13:27] gc-after-hook(): {L#638 N=93M O=29M E=53%} [S=2.4G R=581M]. > 80175904 bytes have been tenured, next gc will be global. > See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more > information. > > Expanding defaults > Type file checked successfully > Computing display ordering > Reading in cached leaf types > Cached leaf types read > Reading in cached lexicon (main) > Cached lexicon read > Reading in rules file constructions > Reading in lexical rules file inflr > Reading in lexical rules file inflr-pnct > Reading in root file roots > Reading in lexical rules file lexrinst > Reading in parse node file parse-nodes > ; Loading /home/user/logon/lingo/erg/lkb/mrsglobals.lsp > ; Loading /home/user/logon/lingo/erg/lkb/eds.lsp > ; Loading /home/user/logon/lingo/erg/www/setup.lsp > ; cpu time (non-gc) 13.952552 sec user, 0.026410 sec system > ; cpu time (gc) 9.165182 sec user, 0.505708 sec system > ; cpu time (total) 23.117734 sec user, 0.532118 sec system > ; real time 22.104421 sec (107.0%) > ; space allocation: > ; 25,979,360 cons cells, 681,401,040 other bytes, 0 static bytes > ; Page Faults: major: 0 (gc: 66190), minor: 163781 (gc: 66190) > ; Loading /home/user/logon/lingo/erg/rpp/setup.lsp > read-repp(): reading file `xml.rpp'. > read-repp(): reading file `latex.rpp'. > read-repp(): reading file `ascii.rpp'. > read-repp(): reading file `html.rpp'. > read-repp(): reading file `wiki.rpp'. > read-repp(): reading file `lgt.rpp'. > read-repp(): reading file `gml.rpp'. > read-repp(): reading file `robustness.rpp'. > read-repp(): reading file `quotes.rpp'. > read-repp(): reading file `ptb.rpp'. > read-repp(): reading file `lkb.rpp'. > read-repp(): reading file `micro.rpp'. > read-repp(): reading file `tokenizer.rpp'. > read-heads() reading file `rules.hds'. > read-model(): reading file `jhpstg.g.mem'. > [14:13:30] gc-after-hook(): {G#638 N=78M O=0 E=87%} [S=2.4G R=617M]. > read-semi(): reading file `erg.smi'. > read-semi(): reading file `hierarchy.smi'. > read-semi(): reading file `abstract.smi'. > read-semi(): reading file `surface.smi'. > [14:13:32] gc-after-hook(): {L#639 N=108M O=0 E=0%} [S=2.4G R=617M]. > read-vpm(): reading file `semi.vpm'. > read-vpm(): reading file `abstract.vpm'. > ; Loading /home/user/logon/lingo/erg/lkb/mt.lsp > read-transfer-rules(): reading file `paraphraser.mtr'. > read-transfer-rules(): reading file `idioms.mtr'. > read-transfer-rules(): reading file `trigger.mtr'. > [14:13:34] gc-after-hook(): {L#640 N=108M O=11M E=83%} [S=2.4G R=617M]. > read-transfer-rules(): reading file `generation.mtr'. > > Building rule filter > [14:13:36] gc-after-hook(): {L#641 N=105M O=9.5M E=90%} [S=2.4G R=617M]. > [14:13:42] gc-after-hook(): {L#642 N=93M O=14M E=95%} [S=2.4G R=617M]. > [14:13:47] gc-after-hook(): {L#643 N=24M O=72M E=92%} [S=2.4G R=666M]. > [14:13:47] gc-after-hook(): 161M tenured; forcing global gc(). > [14:13:48] gc-after-hook(): {GR#10 N=12M O=0 E=100%} [S=2.4G R=678M]. > 75861824 bytes have been tenured, next gc will be global. > See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more > information. > > Building lr connections table > Constructing lr table for non-morphological rules > Grammar input complete > NIL > TSNLP(7): [14:14:27] gc-after-hook(): {G#643 N=35M O=0 E=81%} [S=2.4G > R=678M]. > [14:14:30] gc-after-hook(): {L#644 N=41M O=0 E=0%} [S=2.4G R=678M]. > [14:14:32] gc-after-hook(): {L#645 N=41M O=5.7M E=94%} [S=2.4G R=682M]. > [14:14:35] gc-after-hook(): {L#646 N=43M O=2.8M E=90%} [S=2.4G R=685M]. > [14:14:38] gc-after-hook(): {L#647 N=42M O=4.0M E=94%} [S=2.4G R=689M]. > [14:14:41] gc-after-hook(): {L#648 N=25M O=21M E=93%} [S=2.4G R=711M]. > [14:14:44] gc-after-hook(): {L#649 N=26M O=4.2M E=77%} [S=2.4G R=715M]. > [14:14:47] gc-after-hook(): {L#650 N=27M O=4.0M E=92%} [S=2.4G R=719M]. > [14:14:50] gc-after-hook(): {L#651 N=26M O=4.2M E=92%} [S=2.4G R=723M]. > [14:14:53] gc-after-hook(): {L#652 N=27M O=4.3M E=93%} [S=2.4G R=728M]. > 53092272 bytes have been tenured, next gc will be global. > See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more > information. > [14:14:58] gc-after-hook(): {G#652 N=25M O=0 E=95%} [S=2.4G R=733M]. > [14:15:01] gc-after-hook(): {L#653 N=29M O=0 E=0%} [S=2.4G R=733M]. > [14:15:04] gc-after-hook(): {L#654 N=30M O=4.1M E=95%} [S=2.4G R=734M]. > [14:15:08] gc-after-hook(): {L#655 N=31M O=3.4M E=90%} [S=2.4G R=737M]. > [14:15:11] gc-after-hook(): {L#656 N=30M O=4.9M E=91%} [S=2.4G R=742M]. > [14:15:14] gc-after-hook(): {L#657 N=25M O=8.4M E=92%} [S=2.4G R=750M]. > [14:15:18] gc-after-hook(): {L#658 N=24M O=5.0M E=87%} [S=2.4G R=756M]. > [14:15:21] gc-after-hook(): {L#659 N=24M O=4.4M E=93%} [S=2.4G R=760M]. > [14:15:25] gc-after-hook(): {L#660 N=24M O=3.8M E=93%} [S=2.4G R=764M]. > [14:15:28] gc-after-hook(): {L#661 N=23M O=4.0M E=89%} [S=2.4G R=768M]. > [14:15:31] gc-after-hook(): {L#662 N=24M O=4.1M E=92%} [S=2.4G R=772M]. > [14:15:34] gc-after-hook(): {L#663 N=25M O=3.8M E=92%} [S=2.4G R=776M]. > [14:15:37] gc-after-hook(): {L#664 N=25M O=3.6M E=92%} [S=2.4G R=779M]. > [14:15:40] gc-after-hook(): {L#665 N=26M O=3.7M E=93%} [S=2.4G R=783M]. > 55870688 bytes have been tenured, next gc will be global. > See the documentation for variable EXCL:*GLOBAL-GC-BEHAVIOR* for more > information. > #[SEM-I {38454 ges}: 0 roles; 22406 predicates; 0 properties] > TSNLP(8): "/brat/" > TSNLP(9): > [t40009] BEGIN > [t4000a] BEGIN > [t40009] reading `/home/user/logon/lingo/erg/pet/english.set'... including > `/home/user/logon/lingo/erg/pet/common.set'... including > `/home/user/logon/lingo/erg/pet/global.set'... including > `/home/user/logon/lingo/erg/pet/repp.set'... including > `/home/user/logon/lingo/erg/pet/mrs.set'... loading > `/home/user/logon/lingo/erg/english.grm' > [t4000a] reading `/home/user/logon/lingo/erg/pet/english.set'... including > `/home/user/logon/lingo/erg/pet/common.set'... including > `/home/user/logon/lingo/erg/pet/global.set'... including > `/home/user/logon/lingo/erg/pet/repp.set'... including > `/home/user/logon/lingo/erg/pet/mrs.set'... loading > `/home/user/logon/lingo/erg/english.grm' > [t4000a] (ERG (1214)) reading ME model > `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] > [t40009] (ERG (1214)) reading ME model > `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] > [t4000a] read-vpm(): reading file `semi.vpm'. > [t40009] read-vpm(): reading file `semi.vpm'. > [t4000a] 95873 types in 15 s > [t4000a] > [t40009] 95873 types in 15 s > [t40009] > [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <40009> > [00:17]. > [14:16:18] wait-for-clients(): `4091e35482b2' registered as tid <4000a> > [00:17]. > > NIL > TSNLP(10): > [t4000b] BEGIN > [t4000c] BEGIN > [t4000d] BEGIN > [t4000e] BEGIN > [t4000d] reading `/home/user/logon/lingo/erg/pet/english.set'... including > `/home/user/logon/lingo/erg/pet/common.set'... including > `/home/user/logon/lingo/erg/pet/global.set'... including > `/home/user/logon/lingo/erg/pet/repp.set'... including > `/home/user/logon/lingo/erg/pet/mrs.set'... loading > `/home/user/logon/lingo/erg/english.grm' > [t4000e] reading `/home/user/logon/lingo/erg/pet/english.set'... including > `/home/user/logon/lingo/erg/pet/common.set'... including > `/home/user/logon/lingo/erg/pet/global.set'... including > `/home/user/logon/lingo/erg/pet/repp.set'... including > `/home/user/logon/lingo/erg/pet/mrs.set'... loading > `/home/user/logon/lingo/erg/english.grm' > [t4000c] reading `/home/user/logon/lingo/erg/pet/english.set'... including > `/home/user/logon/lingo/erg/pet/common.set'... including > `/home/user/logon/lingo/erg/pet/global.set'... including > `/home/user/logon/lingo/erg/pet/repp.set'... including > `/home/user/logon/lingo/erg/pet/mrs.set'... loading > `/home/user/logon/lingo/erg/english.grm' > [t4000b] reading `/home/user/logon/lingo/erg/pet/english.set'... including > `/home/user/logon/lingo/erg/pet/common.set'... including > `/home/user/logon/lingo/erg/pet/global.set'... including > `/home/user/logon/lingo/erg/pet/repp.set'... including > `/home/user/logon/lingo/erg/pet/mrs.set'... loading > `/home/user/logon/lingo/erg/english.grm' > > > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Sun Jul 5 00:35:48 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Sat, 4 Jul 2020 19:35:48 -0300 Subject: [developers] repp tool segmentation fault Message-ID: <202F53D2-9F87-4E19-BA4D-E257BB0E400D@gmail.com> Hi Woodley, > I was able to confirm that with the escaped backslashes, I get the segmentation fault and without, I do not. I suspect this is a bug in repp that we should file with Woodley. https://github.com/delph-in/homebrew-delphin/issues/1 Not sure how easy is to fix this.. Alexandre Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Sun Jul 5 22:09:27 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Sun, 5 Jul 2020 17:09:27 -0300 Subject: [developers] www script in the logon distribution In-Reply-To: References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> Message-ID: Hi Michael, thank you very much for your comments. It is really good to have feedbacks. My answers below. > On 3 Jul 2020, at 11:47, goodman.m.w at gmail.com wrote: > > Hi Alexandre, > > I certainly don't mind if you want to put it under the delph-in organization. I just looked it over briefly and I have two questions and a suggestion: > > * It is described as being for macOS, but very little actually looks macOS-specific. Would it be appropriate to describe it in more general terms in case someone wants to run it from some other platform? Indeed, we can now take this as a solution for many more situations than those envisioned by http://moin.delph-in.net/LkbMacintosh. I have added a better introduction to the README of the repo. Comments are welcome. I have also added links in http://moin.delph-in.net/ToolsTop. > * It is called docker-logon, but I don't see that it gets any of the LOGON distribution. Maybe it should be renamed? I renamed it to https://github.com/own-pt/docker-delphin. > * It looks like you've included web.c from FFTB. The FFTB project is under the MIT license, so you'll need to include its LICENSE file as well. > This is important. Thank you for reminding me about license. I have added a MIT license and in the readme I also add a notice about the license of the tools. Regarding the web.c copy, I am not very happy with the current solution. I can see the following alternatives: 1. Having a copy of fftb svn repo in a git repository under the DELPH-IN organization. We could than use it to replicate Woodley changes in the SVN official repo, track issues, and we could also have branches with changes like the one I proposed in the web.c. 2. Use a patch file instead of a copy of the whole web.c, a little bit more complicate and I am not sure how safe it would be. 3. Have a script to change the file during the docker image building, somehow similar to the previous option. Comments are welcome, but I would vote on option 1, if possible, not only for fftb but also for ACE, art etc. Best, Alexandre From arademaker at gmail.com Sun Jul 5 22:51:40 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Sun, 5 Jul 2020 17:51:40 -0300 Subject: [developers] The WeSearch interface References: <20190519193216.33012.97827@sh.hpc.uio.no> Message-ID: Hi Stephan, The `export.sh` script mentioned in http://moin.delph-in.net/ErgWeSearch is not available in the WeSearcch repository (http://svn.delph-in.net/wsi/trunk/). Can you share this script? As an alternative, I tried to use the $LOGON/redwoods script to export a profile created with Pydelphin+ACE but I was not able to understand how to operate with a profile not created by the $LOGON/parse. Any help would be more than welcome! ;-) Best, Alexandre From oe at ifi.uio.no Sun Jul 5 23:40:21 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sun, 5 Jul 2020 23:40:21 +0200 Subject: [developers] The WeSearch interface In-Reply-To: References: <20190519193216.33012.97827@sh.hpc.uio.no> Message-ID: hi alexandre, it appears roman (who worked on WSI improvements at UW for a while) created the script that you are missing. i am not sure i actually have a copy myself (and cannot easily check while traveling this week). but we used to create the WSI indices from the standard export files created by the LOGON ?redwoods? script. that should work with any valid [incr tsdb()] treebank, no matter how it was created. somewhere in the ERG, there should be a file Notes, or Readme, or the like with export instructions. so, how did you create your treebank(s), how do you call the ?redwoods? script, and (most importantly) what exactly happens? best wishes, oe On Sun, 5 Jul 2020 at 22:52 Alexandre Rademaker wrote: > > Hi Stephan, > > The `export.sh` script mentioned in http://moin.delph-in.net/ErgWeSearch > is not available in the WeSearcch repository ( > http://svn.delph-in.net/wsi/trunk/). Can you share this script? > > As an alternative, I tried to use the $LOGON/redwoods script to export a > profile created with Pydelphin+ACE but I was not able to understand how to > operate with a profile not created by the $LOGON/parse. Any help would be > more than welcome! ;-) > > Best, > Alexandre > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Mon Jul 6 00:17:48 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Sun, 5 Jul 2020 19:17:48 -0300 Subject: [developers] The WeSearch interface In-Reply-To: References: <20190519193216.33012.97827@sh.hpc.uio.no> Message-ID: Hi Stephan, Thank you for your answer. I am actually trying to reproduce the results from https://www.aclweb.org/anthology/W15-2205/ and the code that transforms EDS/MRS to RDF seems to live in the WeSearch Java code, right? Anyway, having the WeSearch interface running will be also VERY helpful. In the future, I surely would like to explore more and understand the lisp code that redwoods script is calling, the main part seems to be in the TSDB package and the function `browse-trees` but there are many auxiliar scripts loaded before it and many variables and other functions from the LKB and TSDB packages. It is still not clear to me how to decouple the lisp code from the LOGON scripts and all PVM related stuff. I have created the profile with: % delphin mkprof --input sample.txt --relations ~/hpsg/logon/lingo/lkb/src/tsdb/skeletons/english/Relations --skeleton treebank Then with pydelphin I analysed it with ACE: //// from delphin import ace from delphin import tsdb from delphin import itsdb ts = itsdb.TestSuite('treebank') with ace.ACEParser('erg.dat') as cpu: ts.process(cpu) //// For exporting, I tried many different alternatives of parameters. Unfortunately, I didn?t find much documentation about the redwoods script parameters. I would like to obtain the eds, mrs and dm (for that, I remember an old emails from you pointing to a python script that I will need to revisit) formats. Many combinations of parameters result in case (1) below. The last try gives me the result (2). 1) $ ./redwoods --binary --erg --default --composite --target /tmp --export mrs,eds --active all /home/user/tmp/treebank redwoods: invalid `erg' profile `/home/user/tmp/treebank'; exit. 2) $ ./redwoods --binary --target /tmp --export mrs,eds /home/user/tmp/treebank exporting `/home/user/tmp/treebank' [1 -- 1001] International Allegro CL Enterprise Edition 10.0 [64-bit Linux (x86-64)] (Feb 20, 2019 18:22) Copyright (C) 1985-2015, Franz Inc., Oakland, CA, USA. All Rights Reserved. This standard runtime copy of Allegro CL was built by: [TC13152] Universitetet i Oslo ; Loading /home/user/logon/dot.tsdbrc ; Loading /home/user/.tsdbrc [changing package from "COMMON-LISP-USER" to "TSDB"] TSNLP(1): NIL TSNLP(2): Error: "" does not exist, cannot load [condition type: FILE-DOES-NOT-EXIST-ERROR] Restart actions (select using :continue): 0: retry the load of 1: skip loading 2: Return to Top Level (an "abort" restart). 3: Abort entirely from this (lisp) process. [changing package from "TSDB" to "LKB"] [1] LKB(3): :pop [changing package from "LKB" to "TSDB"] TSNLP(3): EOF Really exit lisp [n]? Best, Alexandre > On 5 Jul 2020, at 18:40, Stephan Oepen wrote: > > hi alexandre, > > it appears roman (who worked on WSI improvements at UW for a while) created the script that you are missing. i am not sure i actually have a copy myself (and cannot easily check while traveling this week). > > but we used to create the WSI indices from the standard export files created by the LOGON ?redwoods? script. that should work with any valid [incr tsdb()] treebank, no matter how it was created. somewhere in the ERG, there should be a file Notes, or Readme, or the like with export instructions. > > so, how did you create your treebank(s), how do you call the ?redwoods? script, and (most importantly) what exactly happens? > > best wishes, oe From goodman.m.w at gmail.com Mon Jul 6 05:02:10 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Mon, 6 Jul 2020 11:02:10 +0800 Subject: [developers] www script in the logon distribution In-Reply-To: References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> Message-ID: On Mon, Jul 6, 2020 at 4:09 AM Alexandre Rademaker wrote: > > On 3 Jul 2020, at 11:47, goodman.m.w at gmail.com wrote: > > * It is described as being for macOS, but very little actually looks > macOS-specific. Would it be appropriate to describe it in more general > terms in case someone wants to run it from some other platform? > > Indeed, we can now take this as a solution for many more situations than > those envisioned by http://moin.delph-in.net/LkbMacintosh. I have added a > better introduction to the README of the repo. Comments are welcome. I have > also added links in http://moin.delph-in.net/ToolsTop. > > > * It is called docker-logon, but I don't see that it gets any of the > LOGON distribution. Maybe it should be renamed? > > I renamed it to https://github.com/own-pt/docker-delphin. > Re macOS and LOGON, that looks better, although some of the prose further down the README still makes references to these two things. It might be good if you could group these into sections and/or clarify the additional steps (e.g., "The LOGON distribution is not included but this container is compatible with its requirements. You can install LOGON by doing ..."). > > * It looks like you've included web.c from FFTB. The FFTB project is > under the MIT license, so you'll need to include its LICENSE file as well. > > > > This is important. Thank you for reminding me about license. I have added > a MIT license and in the readme I also add a notice about the license of > the tools. > I think what you added is sufficient. > Regarding the web.c copy, I am not very happy with the current solution. I > can see the following alternatives: > > 1. Having a copy of fftb svn repo in a git repository under the DELPH-IN > organization. We could than use it to replicate Woodley changes in the SVN > official repo, track issues, and we could also have branches with changes > like the one I proposed in the web.c. > I have also at times wanted a bug tracker for Woodley's tools, and the license doesn't prevent us from creating such a mirror, but I don't recall ever asking Woodley his opinion about this as he seems content with the current setup. He's responsive when I email patches to his code, and there's a bug tracker of sorts in the "wishlist" wikis (e.g., http://moin.delph-in.net/FftbWishlist). Since your version of web.c only makes a minimal one-line change, perhaps we can just provide Woodley a patch for adding a --bind option so you can specify the address or 0? 2. Use a patch file instead of a copy of the whole web.c, a little bit more > complicate and I am not sure how safe it would be. > > 3. Have a script to change the file during the docker image building, > somehow similar to the previous option. > These sound brittle; at least, you'd have to make sure they keep in sync with the current version. If we can't get a patch for a custom bind address added into FFTB, then for this solution it would be best to pin the SVN version of FFTB in the docker file so the patch will apply cleanly. -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Jul 8 20:48:29 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 8 Jul 2020 15:48:29 -0300 Subject: [developers] Fwd: Exporting profile References: <9D60DA02-9084-4C57-BA72-2EDAEDE2F4EC@gmail.com> Message-ID: I forgot to copy the list. Best, Alexandre > From: Alexandre Rademaker > Subject: Exporting profile > Date: 8 July 2020 15:47:22 GMT-3 > To: Stephan Oepen > > > Hi Stephan, > > I was able to export the profile with: > > $ ./redwoods --binary --terg --home /home/user/tmp/ --target /tmp --export mrs,eds --active all treebank > > (The name of my profile is `treebank` and it is located in /home/user/tmp. I discovered the parameter `home` and the possibility to specify the last version of ERG with `terg`). > > That is nice, the parsing of profile files is not so trivial task and doesn?t make sense to not use the code already available. I wonder if the output format is document. For each item in the profile, I got a .gz file like that: > > [1] (1 of 3) {1} [ the text of the sentence ] > ^L > [1:0] (active) > > [the mrs text representation] > > [the eds text representation] > > ^L > [1:1] (inactive) > > [the mrs...] > > [the eds...] > > ^L > [1:2] (inactive) > > [the mrs...] > > [the eds...] > > > I would also like to understand what is the minimal Lisp code to export a profile using the functions from the tsdb and lkb packages. Given that, I would not depend on the scripts. I would be able to start a lisp REPL and do it interactively. I was expecting to be able to learn it with the `source` parameter, but I didn?t get any result. > > Why do I need the grammar to export the profile? Sorry, maybe the answer to this question is a long one, an article or wiki page! ;-) I remember that I have already read somewhere that some formats need the grammar or the SEM-I interface, right? > > > Best, > Alexandre -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Thu Jul 9 00:11:41 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 9 Jul 2020 00:11:41 +0200 Subject: [developers] Fwd: Exporting profile In-Reply-To: References: <9D60DA02-9084-4C57-BA72-2EDAEDE2F4EC@gmail.com> Message-ID: i am glad you are making progress, alexandre! the WSI indexer should be able to parse those export files, though ordinarily we only export and index the active result(s)?assuming you have manually disambiguated? the grammar is needed to export because derived formats (e.g. labeled trees, MRS, EDS, DM) are computed dynamically, i.e. the export `interprets? each recorded derivation using the full grammar, including its MRS, EDS, et al. output configuration. using the `?cat? option should give you the sequence of LKB and [incr tsdb()] function calls. i am afraid there is no formal documentation of the export format, but your schematic summary almost seems self-explanatory! best wishes, oe ons. 8. jul. 2020 kl. 20:51 skrev Alexandre Rademaker : > > I forgot to copy the list. > > Best, > Alexandre > > *From: *Alexandre Rademaker > *Subject: **Exporting profile* > *Date: *8 July 2020 15:47:22 GMT-3 > *To: *Stephan Oepen > > > Hi Stephan, > > I was able to export the profile with: > > $ ./redwoods --binary --terg --home /home/user/tmp/ --target /tmp --export > mrs,eds --active all treebank > > (The name of my profile is `treebank` and it is located in /home/user/tmp. > I discovered the parameter `home` and the possibility to specify the last > version of ERG with `terg`). > > That is nice, the parsing of profile files is not so trivial task and > doesn?t make sense to not use the code already available. I wonder if the > output format is document. For each item in the profile, I got a .gz file > like that: > > [1] (1 of 3) {1} [ the text of the sentence ] > ^L > [1:0] (active) > > [the mrs text representation] > > [the eds text representation] > > ^L > [1:1] (inactive) > > [the mrs...] > > [the eds...] > > ^L > [1:2] (inactive) > > [the mrs...] > > [the eds...] > > > I would also like to understand what is the minimal Lisp code to export a > profile using the functions from the tsdb and lkb packages. Given that, I > would not depend on the scripts. I would be able to start a lisp REPL and > do it interactively. I was expecting to be able to learn it with the > `source` parameter, but I didn?t get any result. > > Why do I need the grammar to export the profile? Sorry, maybe the answer > to this question is a long one, an article or wiki page! ;-) I remember > that I have already read somewhere that some formats need the grammar or > the SEM-I interface, right? > > > Best, > Alexandre > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Fri Jul 10 22:21:43 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 10 Jul 2020 17:21:43 -0300 Subject: [developers] semantic representations in RDF Message-ID: Hi, Sorry for this long email. I have written for Stephan many times this week, so I don?t want to keep disturbing (only! ;-)) him. So, I am sharing my finds about the WSI interface, maybe someone that worked with this code can share some information with me. I am trying to reproduce the text entailment technique described in [a0] https://www.aclweb.org/anthology/W15-2205.pdf The basically idea is to convert EDS to RDF and after applying some transformations, use SPARQL to test the entailment between two structures. Given that, the first step is to have the RDF data from EDS representations. For that, the authors used the WSI code. That is why I am trying to understand the current status and the history behind the WSI interface/code. The file http://svn.delph-in.net/wsi/trunk/src/CHANGES.txt is not very informative! Maybe there are reasons for the problems I listed below. Maybe someone is still working on the code. Maybe the problems are well-know limitations and ideas that were never really implemented. I just want to know if it makes sense to invest time on trying to solve the problems I found. BTW, there is no license file, may I fork this SVN repository in a GitHub repository? The relevant pages/articles are: [w1] http://moin.delph-in.net/WeSearch/Rdf [w2] http://moin.delph-in.net/ErgWeSearch [w3] http://moin.delph-in.net/WeSearch/Interface [w4] http://moin.delph-in.net/WeSearch/QueryLanguage [a1] http://www.lrec-conf.org/proceedings/lrec2014/pdf/1166_Paper.pdf [a2] https://www.aclweb.org/anthology/C14-2020.pdf Problems: 1) The wiki pages [w3,w2] are not in sync with the README.txt in the code repository http://svn.delph-in.net/wsi/trunk/. For example, the directory `generic-gui` is now called `common-gui`. 2) The .nq file produced by the indexing is not valid. IRI likes `<9>` are not allowed in https://www.w3.org/TR/n-quads/. I was able to produce a temporary solution but it creates other problems. 3) The [a1,a2,w1] say nothing about how the URLs/IRIs are created. But as we can see for the output below, nodes like `x3` would have a single IRI shared for all sentences in the corpora. I understand the EDS node identifier are not variables, and that tiples are grouped in a graph, but, still, conceptually, in the dataset, there is no single x3, but many different ones in different sentences, right? I didn?t find the complete ontology that defines the EDS, MRS and DM representations. On [a1] the authors wrote only: > The full MRS ontology (not discussed in detail here) distinguishes different types of nodes, corresponding to full predications vs. individual logical variables vs. hierarchically organized sub-properties of variables... 4) There is no rdfs:type (), the `type` predicate is defined in the http://www.w3.org/1999/02/22-rdf-syntax-ns# (prefix `rdf`). 5) If I fix the cases [4] and [2] in the RDF transformation code, the interface breaks. I am still investigating if the problem is in the SPARQL generation or in the page construction from the results. 6) The query language (WQL) documented in http://alt.qcri.org/semeval2015/task18/index.php?id=search and [w4] is not working in the current version of the interface: Accept => x: _* [ARG* x] Reject => x: _fight* [ARG* x] Reject => /v[ARG* x] Reject => +dog Comments are welcome! ;-) Best, Alexandre EDS: {e2: _1:udef_q<0:3>[BV x3] e9:card<0:3>("2"){e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x3] x3:_dog_n_1<4:8>{x PERS 3, NUM pl, IND +, PT pt}[] e2:_fight_v_1<13:21>{e SF prop, TENSE pres, MOOD indicative, PROG +, PERF -}[ARG1 x3] } RDF predicates triples only for the EDS above: % cat 1.nq | grep "<1>" | grep "predicate" <_1> "udef_q"^^ <1> . "card"^^ <1> . "_dog_n_1"^^ <1> . "_fight_v_1"^^ <1> . Complete RDF from the EDS above: <_1> "udef_q"^^ <1> . <_1> <1> . <_1> <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . "card"^^ <1> . "2"^^ <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . "_dog_n_1"^^ <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . <1> . "_fight_v_1"^^ <1> . <1> . <1> . "true"^^ <1> . From arademaker at gmail.com Mon Jul 13 23:54:08 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Mon, 13 Jul 2020 18:54:08 -0300 Subject: [developers] MacLUI alpha test In-Reply-To: References: <938B5910-D32F-4BD3-99D7-D41B8C71C0D6@gmail.com> <73D395AD-B766-4CA4-9953-2F2C6CF67616@uw.edu> <6EA616B0-05D6-4D65-A5E9-4879F3BD8320@sussex.ac.uk> <6094A46D-805D-4207-9438-BF1282CC2EB3@sweaglesw.org> <43159F5B-7B0A-4E35-A9B3-AA693D50CF01@sussex.ac.uk> Message-ID: Hi Woodley, I just noticed that http://moin.delph-in.net/LkbLui#Obtaining_and_Running_LUI didn?t have a link to the directory below where the README.txt file has the installation instructions: http://sweaglesw.org/linguistics/maclui/ Maybe the other links to http://sweaglesw.org/linguistics/yzlui-for-osx.tar.gz and http://sweaglesw.org/linguistics/yzlui.x86-64 in the same paragraph are now obsolete!? I am not sure, so I didn?t remove them. Best, Alexandre From sweaglesw at sweaglesw.org Tue Jul 14 00:36:49 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Mon, 13 Jul 2020 15:36:49 -0700 Subject: [developers] MacLUI alpha test In-Reply-To: References: Message-ID: <6273D36E-ED92-4CC7-9AD0-5368558EB545@sweaglesw.org> Hi Alexandre, The maclui preview that you added a link to is/was not yet released software, and there are caveats about using it. But I?m glad you called my attention to it, since that is another thing I surely should mention on Wednesday. Woodley > On Jul 13, 2020, at 2:54 PM, Alexandre Rademaker wrote: > > ?Hi Woodley, > > I just noticed that http://moin.delph-in.net/LkbLui#Obtaining_and_Running_LUI didn?t have a link to the directory below where the README.txt file has the installation instructions: > > http://sweaglesw.org/linguistics/maclui/ > > Maybe the other links to http://sweaglesw.org/linguistics/yzlui-for-osx.tar.gz and http://sweaglesw.org/linguistics/yzlui.x86-64 in the same paragraph are now obsolete!? I am not sure, so I didn?t remove them. > > Best, > Alexandre From oe at ifi.uio.no Tue Jul 14 02:25:18 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Tue, 14 Jul 2020 02:25:18 +0200 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: hi alexandre, > 6) The query language (WQL) documented in http://alt.qcri.org/semeval2015/task18/index.php?id=search and [w4] is not working in the current version of the interface: > > Accept => x: _* [ARG* x] > Reject => x: _fight* [ARG* x] > Reject => /v[ARG* x] > Reject => +dog what do you actually consider the 'current interface' in this context? the WQL documentation you reference is from the SDP shared task, so you would have to try those queries against one of the bi-lexical formats (e.g. DM :-): the '/' (PoS) and '+' (lemma) operators are only defined for SDP graphs, i suspect. also, 'v' is not a valid PoS value (but 'v*' seems to work): http://wesearch.delph-in.net/sdp/search.jsp see you tomorrow! oe From oe at ifi.uio.no Tue Jul 14 21:41:47 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Tue, 14 Jul 2020 21:41:47 +0200 Subject: [developers] relating various layers of information in [incr tsdb()] profiles Message-ID: hi jan, yesterday (during the summit plenary), you inquired about a tighter linking of MRS predications to the underlying syntactic analysis than the default character ranges. that is actually a fine example of existing functionality, only nobody but me likely knows about it, because there is no available documentation (it is buried in 'lingo/lkb/src/mrs/lnk.lisp' and the LOGON 'redwoods' script). if one were to temporarily venture back to the LOGON environment, i just tried the following: $LOGONROOT/redwoods --erg \ --export/id/blind input,derivation,mrs,eds \ --condition "i-id == 21" --target /tmp mrs here, the '/blind' modifier means to ignore any MRS (or labeled tree) that may be recorded in the profile, which will trigger [incr tsdb()] re-creating the complete feature structure (and then MRS) from the recorded derivation tree; the '/id' modifier calls for MRS linking to use identifiers into the derivation tree (rather than character ranges), e.g. the export file contains: [...] (ROOT_STRICT (141 SB-HD_MC_C -0.207561 0 2 (138 HDN_BNP-PN_C 0.0930572 0 1 (137 N_SG_ILR 0.135806 0 1 (31 abrams at n_-_pn_le 0 0 1 [...] [ TOP: h1 INDEX: e3 [ e SF: PROP TENSE: PAST MOOD: INDICATIVE PROG: - PERF: - ] RELS: < [ proper_q<@138> LBL: h4 ARG0: x6 [ x PERS: 3 NUM: SG IND: + ] RSTR: h5 BODY: h7 ] [ named<@31> LBL: h8 ARG0: x6 CARG: "Abrams" ] [...] in the above <@138> and <@31> refer to the corresponding node identifiers in the derivation tree, i.e. the unary rule that adds the quantifier and the lexical entry for Abrams, respectively. from what i recall, these links are injected into (the AVM description of) each MRS predication during the bottom-up reconstruction of the derivation tree, i.e. as tokens, lexical entries, and constructions are being put back together deterministically by [incr tsdb()]. looking further into the export file, there are both the initial (REPP output) and internal (after chart mapping) tokenizations (in YY token serialization): < (1, 0, 1, <0:6>, 1, "Abrams", 0, "null") (2, 1, 2, <7:13>, 1, "barked", 0, "null") (3, 2, 3, <13:14>, 1, ".", 0, "null") > < (26, 0, 1, <0:6>, 1, "abrams", 0, "null") (28, 0, 1, <0:6>, 1, "abrams", 0, "null") (25, 1, 2, <7:14>, 1, "barked.", 0, "null") (27, 1, 2, <7:14>, 1, "barked.", 0, "null") > toward the bottom of the derivation tree, each lexical entry is related to a list of (internal) token identifiers and corresponding token feature structures, e.g. [...] (38 bark_v1 at v_-_le 0 1 2 ("barked." 25 [...] so far, so good (and quite straightforward). at this point, the relation between internal and initial tokens becomes a little more complex, as one initial token can be split into multiple internal tokens (as would be the case e.g. in 'New York-based', with initial tokens 'New' and 'York-based' vs. internal tokens 'New", 'York-', and 'based'); likewise, multiple initial tokens are frequently glued together (e.g. initial #2 and #3 to form internal #25 or #27). hence, one has to resort to character ranges (plus knowledge that the initial tokens are a simple sequence), to sort out these correspondences. i ended up going through this example because this kind of exact accounting through all analysis layers has at times been important to me, and i do believe there should be complete information in ERG profiles to piece things back together. but, as demonstrated in the above, this process requires looking at both layers of tokenization, the derivation tree, and identifier-linked MRSs in tandem. this kind of holistic interpretation, i suspect, remains out of scope for pyDelphin for now, in part because it requires the ability to reconstruct derivations, using the grammar. i attach the complete export file, in case you wanted to look at this example more closely. best wishes, oe ps: from the available 'documentation' on alternate ways anchoring MRS predications in corresponding input elements: ;;; ;;; an attempt at generalizing over various ways of linking to the underlying ;;; input to the parser, be it by character or vertex ranges (as used at times ;;; in HoG et al.) or token identifiers (originally at YY and now in LOGON). ;;; currently, there are four distinct value formats: ;;; ;;; <0:4> character range (i.e. a sub-string of an assumed flat input); ;;; <0#2> chart vertex range (traditional in PET to some degree); ;;; <0 1 3> token identifiers, i.e. links to basic input units; ;;; <@42> edge identifier (used internally in generation) ;;; ;;; of these, the first is maybe most widely supported across DELPH-IN tools, ;;; while the second (in my view) should be deprecated. the third resembles ;;; what was used in VerbMobil, YY, and now LOGON; given that the input to a ;;; `deep' parser can always be viewed as a token lattice, this is probably the ;;; most general mode, and we should aim to establish it over time: first, the ;;; underlying input may not have been string-shaped (but come from the lattice ;;; of a speech recognizer), and second even with one underlying string there ;;; could be token-level ambiguity, so identifying the actual token used in an ;;; analysis preserves more information. properties like the sub-string range, ;;; prosodic information (VerbMobil), or pointers to KB nodes (YY) can all be ;;; associated with the individual tokens sent into the parser. finally, the ;;; fourth mode is used in generation, where surface linking actually is a two- ;;; stage process (see comments in `generate.lsp'). (4-dec-06; oe) ;;; -------------- next part -------------- A non-text attachment was scrubbed... Name: 21.gz Type: application/gzip Size: 1022 bytes Desc: not available URL: From oe at ifi.uio.no Fri Jul 17 13:57:16 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Fri, 17 Jul 2020 13:57:16 +0200 Subject: [developers] consolidating LKB and [incr tsdb()] versions Message-ID: hi john, many thanks, once more, for your continued work on the LKB, and likewise for porting the core of [incr tsdb()] to additional lisp environments! i feel we might want to try and reduce variation across different branches of the code. today, i know of the following three: (0) lkb/trunk (1) logon/lingo/lkb (2) lkb/fos i had originally created the LOGON branch of the code to (a) include some bulky pieces (e.g. language models for realization ranking and associated third-party software) that ann did not want in the LKB trunk and (b) experiment with new or revised functionality (primarily working with the ERG) without affecting the larger LKB community. i have periodically merged back LOGON revisions into the trunk, so if we were to look over #+:logon throughout the code now it should be a good indicator of either (a) or (b). as you activate some of that code in the FOS branch now, in principle we should go back and review (b)-type revisions for general, long-term use. but i suspect that, at least among the subscribers of the list, the LOGON version of the LKB has been used at least as much as the isolated trunk ... hence i am not too worried. since we branched FOS off the trunk a few years back, development in LOGON has continued, whereas i believe no recent changes have been committed to the LKB trunk. therefore, i would be tempted to try and merge across these two active LKB branches, and the possibly declare the current trunk a frozen ?dead end?? would you have some time to jointly work on unification of bug fixes and revisions this coming week? inasmuch as the LOGON environment is still used, i would like to incorporate your FOS improvements. and, likewise, i would want my changes (in [incr tsdb()] and maybe MRS or EDS manipulation) exposed to the FOS users. in terms of internal DELPH-IN responsibilities, the current LKB trunk used to be packaged via what we call the LinGO builds (http://lingo.delph-in.net) and the Ubuntu+LKB live CD ( https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB). both are maintained by UW, so i am not quite sure about the breadth of their user base? but i am wondering whether these UW builds could move to using the FOS code in the foreseeable future (seeing as i am proposing to officially call an end ? best wishes, oe -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Sat Jul 18 07:58:28 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Sat, 18 Jul 2020 13:58:28 +0800 Subject: [developers] Infrastructure notes Message-ID: Hello, Thanks again to everyone for the interesting discussion about modernizing the DELPH-IN infrastructure at the summit. I was relatively quiet during the discussion, but I have some thoughts below regarding the mailing list. I agree it would be sad if the mailing list went away. Fortunately, there is a new version of Mailman that runs on Python 3 (see https://list.org/). Python's own mailing lists run it (here's an example of python-ideas: https://mail.python.org/archives/list/python-ideas at python.org/). It has some nice features, but I'm not really fond of the archives view compared to the dense thread view of Mailman2 (e.g., http://lists.delph-in.net/archives/developers/2020/). Maybe it can be configured to look more like this? I also wouldn't mind using the Discourse site as our mailing list manager, but the current UW installation is frequently down or not sending out emails. I believe this is an issue particular to the installation and not to Discourse itself. Beyond that, Discourse should not supplant the mailing list unless (a) it can be used entirely via plaintext email, and (b) we can import the existing list archives or find another solution for archival. I'll send a second email about moving from SVN to Git. -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Sat Jul 18 08:11:50 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Sat, 18 Jul 2020 14:11:50 +0800 Subject: [developers] Infrastructure notes: svn to git Message-ID: Hello, this is my second email about infrastructure changes. Since I recently performed the import of the Matrix repository from SVN to Git and had some success (and some failures), I have some suggestions. I used git-svn for this, but some points are general. Also, most points are valid regardless of the host (whether it's GitHub, GitLab, Bitbucket, your own server, etc.). - If your repository encompasses multiple projects, consider taking this opportunity to split them into separate Git repositories. Unlike SVN, a Git repository is easy to move around (on your local disk or to a new host), so there's less reason to one repo for multiple projects. - Use the --authors-file option to map SVN usernames to those of the destination host; e.g., if going to GitHub, map to $ username at users.noreply.github.com so their personal email is not exposed. - If your code relies on the presence of empty directories, use --preserve-empty-dirs, as Git doesn't keep empty directories (the option places a dummy file in each empty dir). - If you're *moving* to Git and not mirroring, look into changing --prefix to avoid ambiguous branch names (the default sets up the SVN repo like a remote repository at origin/, which is also used when cloning from other remotes, I think). - Use --stdlayout if your SVN repo has the normal branches/, tags/, and trunk/ split (otherwise use -b, -t, and -T to set these separately). It will recreate the repo with Git's more efficient branching model than creating subdirectories as in SVN. - After the import (especially when moving and not mirroring), create a tag that points to the last commit from SVN. This is mainly in case you later wish to see the state of the repository before the move. - DON'T delete the Git repo you create in this way after it's pushed somewhere else. It contains metadata, which doesn't get pushed to the remote, for reconnecting to the SVN repo. If the SVN repo has new commits after the import, you'll need that metadata to apply them on top of the Git repo (maybe it's possible without the metadata, but I couldn't figure it out). I also have some suggestions for migrating a Trac instance to GitHub Issues, if anyone is dealing with that, but I won't send a separate email unless people are interested. -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From olzama at uw.edu Sat Jul 18 09:59:02 2020 From: olzama at uw.edu (Olga Zamaraeva) Date: Sat, 18 Jul 2020 00:59:02 -0700 Subject: [developers] Infrastructure notes In-Reply-To: References: Message-ID: Hi Michael, > Discourse should not supplant the mailing list unless [...] (b) we can import the existing list archives or find another solution for archival. When I started advocating for a Discourse website---primarily because it is easier to organize figures and code and to mark specific posts as being the solutions to specific questions, which greatly improves discoverability; so, very much not plain text---I was certainly operating under the assumption that such an import is possible ( https://meta.discourse.org/t/importing-mailing-lists-mbox-listserv-google-groups-emails/79773 ). On Fri, Jul 17, 2020 at 10:59 PM goodman.m.w at gmail.com < goodman.m.w at gmail.com> wrote: > Hello, > > Thanks again to everyone for the interesting discussion about modernizing > the DELPH-IN infrastructure at the summit. I was relatively quiet during > the discussion, but I have some thoughts below regarding the mailing list. > > I agree it would be sad if the mailing list went away. Fortunately, there > is a new version of Mailman that runs on Python 3 (see https://list.org/). > Python's own mailing lists run it (here's an example of python-ideas: > https://mail.python.org/archives/list/python-ideas at python.org/). It has > some nice features, but I'm not really fond of the archives view compared > to the dense thread view of Mailman2 (e.g., > http://lists.delph-in.net/archives/developers/2020/). Maybe it can be > configured to look more like this? > > I also wouldn't mind using the Discourse site as our mailing list manager, > but the current UW installation is frequently down or not sending out > emails. I believe this is an issue particular to the installation and not > to Discourse itself. Beyond that, Discourse should not supplant the mailing > list unless (a) it can be used entirely via plaintext email, and (b) we can > import the existing list archives or find another solution for archival. > > I'll send a second email about moving from SVN to Git. > > -- > -Michael Wayne Goodman > -- Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Sat Jul 18 11:13:50 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Sat, 18 Jul 2020 17:13:50 +0800 Subject: [developers] Infrastructure notes In-Reply-To: References: Message-ID: On Sat, Jul 18, 2020 at 3:59 PM Olga Zamaraeva wrote: > > Hi Michael, > > > Discourse should not supplant the mailing list unless [...] (b) we can import the existing list archives or find another solution for archival. > > When I started advocating for a Discourse website---primarily because it is easier to organize figures and code and to mark specific posts as being the solutions to specific questions, which greatly improves discoverability; so, very much not plain text---I was certainly operating under the assumption that such an import is possible ( https://meta.discourse.org/t/importing-mailing-lists-mbox-listserv-google-groups-emails/79773). Thanks, Olga. To clarify on that point, I appreciate and use the features of the web interface, but if the Discourse instance is to replace the mailing list, it should *also* be fully usable by email (by "fully" I mean that people can follow and participate in the conversation as if it were email; not including web-only features like thread tagging or marking posts as solutions). The formatting in the posts, being Markdown, should be legible in plaintext email (although HTML mail is pretty standard these days; maybe I'm not being "modern" enough and should relax that point :). > > On Fri, Jul 17, 2020 at 10:59 PM goodman.m.w at gmail.com < goodman.m.w at gmail.com> wrote: >> >> Hello, >> >> Thanks again to everyone for the interesting discussion about modernizing the DELPH-IN infrastructure at the summit. I was relatively quiet during the discussion, but I have some thoughts below regarding the mailing list. >> >> I agree it would be sad if the mailing list went away. Fortunately, there is a new version of Mailman that runs on Python 3 (see https://list.org/). Python's own mailing lists run it (here's an example of python-ideas: https://mail.python.org/archives/list/python-ideas at python.org/). It has some nice features, but I'm not really fond of the archives view compared to the dense thread view of Mailman2 (e.g., http://lists.delph-in.net/archives/developers/2020/). Maybe it can be configured to look more like this? >> >> I also wouldn't mind using the Discourse site as our mailing list manager, but the current UW installation is frequently down or not sending out emails. I believe this is an issue particular to the installation and not to Discourse itself. Beyond that, Discourse should not supplant the mailing list unless (a) it can be used entirely via plaintext email, and (b) we can import the existing list archives or find another solution for archival. >> >> I'll send a second email about moving from SVN to Git. >> >> -- >> -Michael Wayne Goodman > > > > -- > Olga Zamaraeva -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Tue Jul 21 03:46:46 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Mon, 20 Jul 2020 22:46:46 -0300 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: Hi Stephan, By current interface I mean the one I was able to run in my local machine taking the current version of the code in: http://svn.delph-in.net/wsi/trunk Documentation of the query language WQL in http://alt.qcri.org/semeval2015/task18/index.php?id=search is not clear about the operators vs format they support. I was expecting that regex would work in the predicates of EDS or MRS. So a query `x: _fight*[ARG* y]` could match a sentence with a predicate `_fight_v_1`. Emily, You mentioned that you have an instance of the wsearch interface running too. Are you using the same code of the repository above? Do you know about any update/branch of this code? I am planning to work on: 1. New code (not java based) for transform the semantic representations to RDF 2. New code (not java based) to transform WQL to SPARQL. Best, Alexandre > On 13 Jul 2020, at 21:25, Stephan Oepen wrote: > > hi alexandre, > >> 6) The query language (WQL) documented in http://alt.qcri.org/semeval2015/task18/index.php?id=search and [w4] is not working in the current version of the interface: >> >> Accept => x: _* [ARG* x] >> Reject => x: _fight* [ARG* x] >> Reject => /v[ARG* x] >> Reject => +dog > > what do you actually consider the 'current interface' in this context? > the WQL documentation you reference is from the SDP shared task, so > you would have to try those queries against one of the bi-lexical > formats (e.g. DM :-): the '/' (PoS) and '+' (lemma) operators are only > defined for SDP graphs, i suspect. also, 'v' is not a valid PoS value > (but 'v*' seems to work): > > http://wesearch.delph-in.net/sdp/search.jsp > > see you tomorrow! oe From oe at ifi.uio.no Tue Jul 21 15:48:13 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Tue, 21 Jul 2020 15:48:13 +0200 Subject: [developers] More ERG/Redwoods issues In-Reply-To: References: Message-ID: hi again, mike, > Regarding my second point about unexpected characters in SimpleMRS strings, I tried making PyDelphin more robust to these situations even though I think they should be deemed invalid, but there are some that are simply irredeemable: > > _+-]\?[/NN_u_unknown_rel"<12:18> (wlb03) > > The ] initially threw me off, but even worse is the " after _rel (I included the <12:18> here just for context; note that there is no " at the start of this predicate so this is not a string predicate). I'm not sure how it got there. Maybe an ACE/LKB serialization error? > In addition, I found a problem with a CARG in ws213: > > [ named<37:41> LBL: h16 CARG: "NP\S"" ARG0: x12 ] > > Note that there are two quotation marks at the end of the CARG value. The item it comes from is 1000008400480, which does not have " following NP\S. (The i-input is: This complex category is notated as (NP\\S) instead of V.) i am copying woodley, because the MRSs you are reading most likely come from FFTB (i am also adding the 'developers' list, as surely most folks care about these corner cases). token mapping will allow the grammar to put virtually any character into its predicates, and by and large i would say rightly so (even if not all of the predicate and CARG examples in the above may ultimately be desirable :-). thus, MRS serialization may need to be sensitive to different escaping conventions we have (or may yet have to establish), as i have tried to summarize in our related M$ GitHub issue: https://github.com/delph-in/pydelphin/issues/302 > _output_string(?hello/JJ_u_unknown (ws202) > _employee_name/NN_u_unknown (ws203) > > There are _ characters inside the lemma portion of the predicates, which is not allowed. I don't recall if we came up with a scheme for encoding literal underscores in lemmas. yes, i agree token mapping should not construct these predicates! the immediate solution that comes to my mind would be to backslash-escape underscores in the lemma (and sense) fields, which i believe would then bring along escaping of literal backslashes, i.e. in your first example: _output\_string(?hello/JJ_u_unknown. but before guarding against these invalid predicates in token mapping, it would be good to push a little further in terms of cross-platform agreement on these fine points of (simple) MRS serialization. best wishes, oe From goodman.m.w at gmail.com Tue Jul 21 17:50:07 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Tue, 21 Jul 2020 23:50:07 +0800 Subject: [developers] More ERG/Redwoods issues In-Reply-To: References: Message-ID: Thanks, Stephan, On Tue, Jul 21, 2020 at 9:48 PM Stephan Oepen wrote: > [...] > token mapping will allow the grammar to put virtually any character > into its predicates, and by and large i would say rightly so (even if > not all of the predicate and CARG examples in the above may ultimately > be desirable :-). thus, MRS serialization may need to be sensitive to > different escaping conventions we have (or may yet have to establish), > as i have tried to summarize in our related M$ GitHub issue: > > https://github.com/delph-in/pydelphin/issues/302 > > I'm merging some of the more general discussion from the linked GitHub issue to this thread. Regarding the PredicateRfc wiki, I don't think we should read it too literally, as it was not written with the level of rigor as we put into, e.g., the [TdlRfc](http://moin.delph-in.net/TdlRfc) page, and I'd call it more descriptive than prescriptive. But we certainly could improve it to be such a reference document. Regarding the shape of predicates, we need to separate our design considerations for the predicate symbols themselves from any constraints of a particular serialization format, as they may be used, unquoted, in other formats beyond SimpleMRS (e.g. EDS 'native' format, PENMAN, Indexed MRS, etc.) which may have different sets of valid and invalid characters. In an earlier thread we established that predicates of some different forms are equivalent if they differ only along these dimensions: * upper/lower case distinctions (_predicate_n_1 == _PREDICATE_n_1) * surrounding quotes (_predicate_n_1 == "_predicate_n_1") * presence of _rel suffix (_predicate_n_1 == _predicate_n_1_rel) (Aside: I'm not fond of the last one because of the ambiguity with _rel as a sense field (place_n == place_n_rel?); I'd argue for *requiring* that any _rel suffix (that isn't a sense) be removed for grammar-external ("exported") MRSs) I think we can go further and say that quoted predicates are not even part of the spec for predicates; rather, they are an encoding scheme used by several serialization formats for predicates that cannot legally be encoded otherwise. At least, this could be true for exported MRSs. I recognize the historical purpose of quoted predicates for those that don't have a type defined in the grammar. Other serialization formats may use other schemes. In JSON, for instance, predicates are always quoted and they follow JSON escaping conventions. The XML formats allow for "real predicates" that separate the lemma, pos, and sense fields, but they are still bound by XML's encoding conventions. > _output_string(?hello/JJ_u_unknown (ws202) > > _employee_name/NN_u_unknown (ws203) > > > > There are _ characters inside the lemma portion of the predicates, which > is not allowed. I don't recall if we came up with a scheme for encoding > literal underscores in lemmas. > > yes, i agree token mapping should not construct these predicates! the > immediate solution that comes to my mind would be to backslash-escape > underscores in the lemma (and sense) fields, which i believe would > then bring along escaping of literal backslashes, i.e. in your first > example: _output\_string(?hello/JJ_u_unknown. > I have a slight dispreference for backslash-escaping literal underscores, because it complicates parsing. We could no longer simply split on _ characters to get the components, and must parse the predicates character-by-character to determine if the \ that precedes _ is itself escaped, etc. TSDB's strategy might work, using \s or similar. We'd still need to parse it to get the original form, but we can just split on _ to get the individual components. > but before guarding against these invalid predicates in token mapping, > it would be good to push a little further in terms of cross-platform > agreement on these fine points of (simple) MRS serialization. > > best wishes, oe > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Tue Jul 21 18:09:37 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 21 Jul 2020 13:09:37 -0300 Subject: [developers] First step for the clone of FFTB SVN in GitHub Message-ID: https://github.com/arademaker/treebank This was created with the commands below, following the step-by-step from http://www.sailmaker.co.uk/blog/2013/05/05/migrating-from-svn-to-git-preserving-branches-and-tags-3 that is consistent with the git svn documentation. mkdir treebank cd treebank git svn init http://sweaglesw.org/svn/treebank --stdlayout --prefix=svn/ for tag in `git branch -r | grep "tags/" | sed 's/ tags\///'`; do git branch $tag refs/remotes/$tag; done git svn fetch The SVN repository (http://sweaglesw.org/svn/treebank/) didn?t contain branches, so I have created branches for the two tags I found: `foo` and `packard-2015`. Note that these tags do not look very interesting and maybe they could be removed in the SVN repository. That would make the process even simpler, tracking only the trunk branch. With the repository ready, I created the GitHub repository and pushed to it git push -u origin master svn/tags/foo svn/tags/packard-2015 We don?t have automation yet, if Woodley update its SVN, all I need to do is: git svn fetch git push --all -u origin First command retrieves the news from SVN to my local machine. Second command pushes the news to GitHub. Eventually, a new tag or branch can be created by Woodley, in that case, I may also need to create git branches for them before push to GitHub. Since the FFTB is not updated very often, I feel the manual solution can works for now, but I wonder if I have any way to me informed by SVN changes. Does any one know any alternative to subscribe for SVN changes? Hi Michael and Stephan, any comments about the solution above? Michael is the only user in the https://github.com/delph-in organization? Can you add me so I can move this repository to the delph-in org? Best, Alexandre From oe at ifi.uio.no Tue Jul 21 20:57:34 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Tue, 21 Jul 2020 20:57:34 +0200 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: hi again, alexandre: > By current interface I mean the one I was able to run in my local machine taking the current version of the code in: > > http://svn.delph-in.net/wsi/trunk i see, i had not realized you had gotten so far as to run your own WSI instance ... congratulations on that milestone! > Documentation of the query language WQL in http://alt.qcri.org/semeval2015/task18/index.php?id=search is not clear about the operators vs format they support. yes, in fact there is no complete documentation of the WQL syntax and of which operators are restricted to which formats. the above page (the closest we come to WQL documentation, i believe) is from the SDP shared tasks, hence only applies to the bi-lexical frameworks (DM, PAS, PSD, and CCD). > I was expecting that regex would work in the predicates of EDS or MRS. So a query `x: _fight*[ARG* y]` could match a sentence with a predicate `_fight_v_1`. yes, that type of wildcarding should indeed be applicable to pretty much any query elements and graph formats. your example query works on the DeepBank index for ESD: http://wesearch.delph-in.net/deepbank/ it does not match any results when searching the DeepBank MRSs, however. that is because WQL variables in an MRS index are (interpreted as if) typed using the standard MRS conventions, i.e. there is no predication whose label is of type 'x' and where there is some argument of type 'y'. if works if you modify the query to comply with MRS types: 'h:_fight_*[ARG* x]'. > You mentioned that you have an instance of the wsearch interface running too. Are you using the same code of the repository above? Do you know about any update/branch of this code? i believe UW is not currently running their own WSI instance, because they worry that index performance inside a virtual machine might not scale favorably. the improvements made by the UW MSc student are in the WSI trunk, so you (unlike me) are using the latest and greatest :-). $ svn log http://svn.delph-in.net/wsi/trunk |head ------------------------------------------------------------------------ r27878 | rpearah at uw.edu | 2019-05-25 20:30:59 +0200 (Sat, 25 May 2019) | 1 line chore: ? Add missing dependencies to pom.xml ------------------------------------------------------------------------ r27877 | rpearah at uw.edu | 2019-05-25 20:30:54 +0200 (Sat, 25 May 2019) | 1 line style: ? Some minor style changes to MRS representation ------------------------------------------------------------------------ r27804 | rpearah at uw.edu | 2019-05-15 23:08:31 +0200 (Wed, 15 May 2019) | 1 line > 1. New code (not java based) for transform the semantic representations to RDF > 2. New code (not java based) to transform WQL to SPARQL. yes, the first of these is also something i have been meaning to do natively in lisp, i.e. export directly to RDF, rather than export to those [incr tsdb()] ASCII files and then parse these in java, to convert to RDF. i believe we should have turtle 'ontologies' (or schemas, if you will) for the various RDF representations, i.e. at least MRS, EDS, and DM. i am tempted to migrate the WSI code from SVN to M$ GitHub, and then we could maybe collect these schemas there, and you could look into generating the RDF serializations without java? as for the second, the WQL parser is fairly tightly integrated with the web application and RDF back-end ... here i am not as sure that isolating just the parser will be worthwhile? i take it you are about as eager a java person as i am :-)? best wishes, oe From arademaker at gmail.com Tue Jul 21 22:31:55 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 21 Jul 2020 17:31:55 -0300 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: Hi Stephan, Thank you for your attention on that thread. I am afraid that we should have more differences between the code running in http://wesearch.delph-in.net/deepbank/search.jsp and the code in the SVN repository http://svn.delph-in.net/wsi/trunk/ that I compile and it is running in my local machine following the steps in http://moin.delph-in.net/WeSearch/Interface. I am attaching the two SPARQL produced by the same search string `x: _fi*[ARG* y]`. In both cases, the query was submitted to the EDS representations. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sparql-wesearch.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: sparql-local.txt URL: -------------- next part -------------- Note how in the local instance, the pattern `_fi*` is transformed into an enumeration of the predicates found in the dataset: { ?100 eds:predicate "_fight_n_1"^^xsd:string } UNION { ?100 eds:predicate "_fight_v_1"^^xsd:string } But in the SPARQL on the delph-in.net server, the pattern is transformed into a regex filter regex(?100TEXT, "^_fi.*$?) The same happens when I submitted the query `h:_fi*[ARG* x]` to the MRS representations. For the SVN to Git, if you agree, I can repeat the process that I executed to FFTB (reported in another email today) to create a git clone from the WSI SVN. Maybe if Michael add me in the Delphin-in organization I can already create the repository there. Yes, I agree that we can have ontologies/vocabularies defined to each representation and I could work on that. We could take as starting point the discussion at http://moin.delph-in.net/WeSearch/Rdf, right? There are some notes in the end of the page http://moin.delph-in.net/ErgWeSearch too. But first, I want to understand what updates we have from the 2015 SDP shared-task data formats and the current work you are doing in the http://mrp.nlpl.eu/2020/index.php and https://github.com/cfmrp/mtool. EDS is one particular format that can be described in MRP format, right? We also have the SDP tabular format, does it make sense to support all these formats? If you prefer, we can schedule a call for sync on the goals and possible approaches. For code, yes, I don?t like Java. It would be nice to take the opportunity to better understand the Lisp code embedded in the LKB, TSDB and some other packages in the LOGON repository. The only changed that I made in the code so far is shown below. I am also using apache-jena-3.15.0, the last version of Jena. % svn diff Index: src/common-gui/src/main/webapp/WEB-INF/web.xml =================================================================== --- src/common-gui/src/main/webapp/WEB-INF/web.xml (revision 28808) +++ src/common-gui/src/main/webapp/WEB-INF/web.xml (working copy) @@ -18,7 +18,7 @@ no.uio.ifi.wsi.gui.SearchInterface DATA_PATH - /ltg/ls/aserve/indices/sdp/ + /Users/ar/hpsg/text-entailment/data/ 1 Index: src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java =================================================================== --- src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java (revision 28808) +++ src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java (working copy) @@ -27,8 +27,8 @@ CountIndexGenerator generator = new CountIndexGenerator(cmlReader.getCountDirectory()); generator.index(cmlReader.getRdfDirectory()); generator.writeCache(); - runProcess(new String[] { "apache-jena-2.11.0/bin/tdbloader2", "--loc", cmlReader.getTdbDirectory() + "/1", - cmlReader.getRdfDirectory() + "/*" }); + runProcess(new String[] { "apache-jena/bin/tdbloader2", "--loc", cmlReader.getTdbDirectory() + "1", + cmlReader.getRdfDirectory() + "1.nq" }); } public static void runProcess(String[] command) throws Exception { Best, Alexandre > On 21 Jul 2020, at 15:57, Stephan Oepen wrote: > > hi again, alexandre: > >> By current interface I mean the one I was able to run in my local machine taking the current version of the code in: >> >> http://svn.delph-in.net/wsi/trunk > > i see, i had not realized you had gotten so far as to run your own WSI > instance ... congratulations on that milestone! > >> Documentation of the query language WQL in http://alt.qcri.org/semeval2015/task18/index.php?id=search is not clear about the operators vs format they support. > > yes, in fact there is no complete documentation of the WQL syntax and > of which operators are restricted to which formats. the above page > (the closest we come to WQL documentation, i believe) is from the SDP > shared tasks, hence only applies to the bi-lexical frameworks (DM, > PAS, PSD, and CCD). > >> I was expecting that regex would work in the predicates of EDS or MRS. So a query `x: _fight*[ARG* y]` could match a sentence with a predicate `_fight_v_1`. > > yes, that type of wildcarding should indeed be applicable to pretty > much any query elements and graph formats. your example query works > on the DeepBank index for ESD: > > http://wesearch.delph-in.net/deepbank/ > > it does not match any results when searching the DeepBank MRSs, > however. that is because WQL variables in an MRS index are > (interpreted as if) typed using the standard MRS conventions, i.e. > there is no predication whose label is of type 'x' and where there is > some argument of type 'y'. if works if you modify the query to comply > with MRS types: 'h:_fight_*[ARG* x]'. > >> You mentioned that you have an instance of the wsearch interface running too. Are you using the same code of the repository above? Do you know about any update/branch of this code? > > i believe UW is not currently running their own WSI instance, because > they worry that index performance inside a virtual machine might not > scale favorably. the improvements made by the UW MSc student are in > the WSI trunk, so you (unlike me) are using the latest and greatest > :-). > > $ svn log http://svn.delph-in.net/wsi/trunk |head > ------------------------------------------------------------------------ > r27878 | rpearah at uw.edu | 2019-05-25 20:30:59 +0200 (Sat, 25 May 2019) | 1 line > > chore: ? Add missing dependencies to pom.xml > ------------------------------------------------------------------------ > r27877 | rpearah at uw.edu | 2019-05-25 20:30:54 +0200 (Sat, 25 May 2019) | 1 line > > style: ? Some minor style changes to MRS representation > ------------------------------------------------------------------------ > r27804 | rpearah at uw.edu | 2019-05-15 23:08:31 +0200 (Wed, 15 May 2019) | 1 line > >> 1. New code (not java based) for transform the semantic representations to RDF >> 2. New code (not java based) to transform WQL to SPARQL. > > yes, the first of these is also something i have been meaning to do > natively in lisp, i.e. export directly to RDF, rather than export to > those [incr tsdb()] ASCII files and then parse these in java, to > convert to RDF. i believe we should have turtle 'ontologies' (or > schemas, if you will) for the various RDF representations, i.e. at > least MRS, EDS, and DM. i am tempted to migrate the WSI code from SVN > to M$ GitHub, and then we could maybe collect these schemas there, and > you could look into generating the RDF serializations without java? > > as for the second, the WQL parser is fairly tightly integrated with > the web application and RDF back-end ... here i am not as sure that > isolating just the parser will be worthwhile? i take it you are about > as eager a java person as i am :-)? > > best wishes, oe From oe at ifi.uio.no Tue Jul 21 22:48:45 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Tue, 21 Jul 2020 22:48:45 +0200 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: > For the SVN to Git, if you agree, I can repeat the process that I executed to FFTB (reported in another email today) to create a git clone from the WSI SVN. Maybe if Michael add me in the Delphin-in organization I can already create the repository there. actually, please let me manage this migration. rather than putting a git front-end on top of the current SVN repository, i am tempted to use WSI as a full migration pilot, i.e. dump the complete repository history from SVN, import all of it into a fresh git repository, and then host that on M$ GitHub. that way, i can hope to reduce the importance of the DELPH-IN SubVersioN server over time ... cheers, oe ps: regarding differences between your local WSI instance and the one at 'wesearch.delph-in.net': that is quite possible: i am currently running an older version of the code (because, as i confessed during the summit, i have yet to work out how to re-build the application with the latest patches from UW and load that into my tomcat; it appears that you are getting the lucene first-line line, to expand queries and avoid regular expression matching on the triple store, whereas i seem to not be getting that in the cases you observe). From oe at ifi.uio.no Tue Jul 21 23:44:53 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Tue, 21 Jul 2020 23:44:53 +0200 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: > Note how in the local instance, the pattern `_fi*` is transformed into an enumeration of the predicates found in the dataset: > > { ?100 eds:predicate "_fight_n_1"^^xsd:string } UNION { ?100 eds:predicate "_fight_v_1"^^xsd:string } > > But in the SPARQL on the delph-in.net server, the pattern is transformed into a regex filter > > regex(?100TEXT, "^_fi.*$?) actually, this kind of expansion (a query optimization, using a first-line lucene index of known strings) appears to be sensitive to the size of the expansion set. i can confirm that (on the reference WSI instance) '_fi*' is matched using a (slow) regular expression (filter), whereas '_fight*' gets expanded; see the attachment. presumably you just have a smaller index in your local instance? the original WSI developer was an experienced enterprise coder, so i am not surprised (but impressed) he implemented it this way: presumably there is a tipping point in efficiency by querying with a disjunction of specific strings vs. filtering candidate matches using a regular expression ... cheers, oe -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot from 2020-07-21 23-41-45.png Type: image/png Size: 318658 bytes Desc: not available URL: From arademaker at gmail.com Wed Jul 22 04:50:52 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 21 Jul 2020 23:50:52 -0300 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: <31C42B7C-0457-47F2-A97A-1F12ABDC4E30@gmail.com> Sure, even better having the full migration to git/GitHub! Great, I will be waiting you. Regarding the lucene first-line search, thank you for clarifying what is going on. It is really hard to understand the Java code, the logic is spread in many different Java files under a deep nesting of folders! ?Object-oriented programming is an exceptionally bad idea which could only have originated in California.? ? Edsger W. Dijkstra ;-) But isn?t it a kind of optimisation that we should expect from the triple store. I will make some tests with Allegro Graph. Best, Alexandre > On 21 Jul 2020, at 17:48, Stephan Oepen wrote: > >> For the SVN to Git, if you agree, I can repeat the process that I executed to FFTB (reported in another email today) to create a git clone from the WSI SVN. Maybe if Michael add me in the Delphin-in organization I can already create the repository there. > > actually, please let me manage this migration. rather than putting a > git front-end on top of the current SVN repository, i am tempted to > use WSI as a full migration pilot, i.e. dump the complete repository > history from SVN, import all of it into a fresh git repository, and > then host that on M$ GitHub. that way, i can hope to reduce the > importance of the DELPH-IN SubVersioN server over time ... > > cheers, oe > > ps: regarding differences between your local WSI instance and the one > at 'wesearch.delph-in.net': that is quite possible: i am currently > running an older version of the code (because, as i confessed during > the summit, i have yet to work out how to re-build the application > with the latest patches from UW and load that into my tomcat; it > appears that you are getting the lucene first-line line, to expand > queries and avoid regular expression matching on the triple store, > whereas i seem to not be getting that in the cases you observe). From goodman.m.w at gmail.com Wed Jul 22 15:51:42 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Wed, 22 Jul 2020 21:51:42 +0800 Subject: [developers] First step for the clone of FFTB SVN in GitHub In-Reply-To: References: Message-ID: Hi Alexandre, On Wed, Jul 22, 2020 at 12:10 AM Alexandre Rademaker wrote: > mkdir treebank > cd treebank > git svn init http://sweaglesw.org/svn/treebank --stdlayout --prefix=svn/ > for tag in `git branch -r | grep "tags/" | sed 's/ tags\///'`; do git > branch $tag refs/remotes/$tag; done > git svn fetch > This looks more or less right, but you didn't link the SVN author(s) to GitHub accounts using --authors-file. Also I think it makes more sense to create Git tags for the SVN tags instead of branches. I also tried importing this repo using GitHub's importer ( https://docs.github.com/en/github/importing-your-projects-to-github/about-github-importer) and it worked great. It's very simple and results in a Git-like repository, but unfortunately it does not keep the SVN tracking information needed for proper mirroring. It would work better for a one-time import. So I suggest recreating your repo as follows. First create the authors file: cat > authors.txt < (no author) = (no author) <(no author)> EOF Then clone (does init and fetch in one command): git svn clone http://sweaglesw.org/svn/treebank --stdlayout --prefix=svn/ --authors-file=authors.txt Then convert remote tag-branches to Git tags: cd treebank git for-each-ref --format="%(refname:lstrip=-1) %(objectname)" refs/remotes/svn/tags | while read ref; do git tag $ref; done And replace the 'master' branch with 'trunk': git checkout -b trunk git branch -d master > The SVN repository (http://sweaglesw.org/svn/treebank/) didn?t contain > branches, so I have created branches for the two tags I found: `foo` and > `packard-2015`. Note that these tags do not look very interesting and maybe > they could be removed in the SVN repository. That would make the process > even simpler, tracking only the trunk branch. > You're probably right about 'foo', but the 'packard-2015' tag points to the revision used for Woodley's thesis and IWCS 2015 paper so I don't think we should discard that one. But maybe it doesn't matter for a mirror since it still exists in the SVN repo? We don?t have automation yet, if Woodley update its SVN, all I need to do > is: > > git svn fetch > git push --all -u origin > When you do `git svn fetch`, it retrieves the commits from the remote SVN repository but doesn't incorporate them into your local tree. Use `git svn rebase` instead. > > Hi Michael and Stephan, any comments about the solution above? Michael is > the only user in the https://github.com/delph-in organization? Can you > add me so I can move this repository to the delph-in org? > Francis invited you to be a member of the organization. I just updated that invitation so you'd be added to the "Sweagles" team, which has admin rights to the delph-in/FFTB repository I created. Once you accept the invitation, you can do the following to push your repo: git remote add origin https://github.com/delph-in/FFTB.git git push -u origin --all git push -u origin --tags (I called the remote repo FFTB instead of treebank because it's more distinctive and recognizable, I think.) -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Jul 22 16:26:05 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 22 Jul 2020 11:26:05 -0300 Subject: [developers] First step for the clone of FFTB SVN in GitHub In-Reply-To: References: Message-ID: +1. I agree with Stephan. I have chosen treebank mainly to keep the same name of the original Woodley repository. But I agree with `fftb` (lowercase). Michael, can you give me admin access to https://github.com/delph-in/FFTB? So I can have more flexibility. I will answer the other email from Michael next. Best, Alexandre > On 22 Jul 2020, at 11:17, Stephan Oepen wrote: > >> (I called the remote repo FFTB instead of treebank because it's more distinctive and recognizable, I think.) > > with my over-developed sense of aesthetics, i am wondering about the > use of capitalization. there are 'erg', 'jacy', 'pydelphin', etc. > from before (and the 'JaEn' exception, but francis is of course > special). should the DELPH-IN organization possibly standardize on > all-lowercase repository names? > > oe From goodman.m.w at gmail.com Wed Jul 22 16:57:07 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Wed, 22 Jul 2020 22:57:07 +0800 Subject: [developers] First step for the clone of FFTB SVN in GitHub In-Reply-To: References: Message-ID: Done. fftb it is. On Wed, Jul 22, 2020 at 10:27 PM Alexandre Rademaker wrote: > > +1. I agree with Stephan. I have chosen treebank mainly to keep the same > name of the original Woodley repository. But I agree with `fftb` > (lowercase). Michael, can you give me admin access to > https://github.com/delph-in/FFTB? So I can have more flexibility. > You have admin access because you're in the "Sweagles" team. It may have dropped momentarily when I renamed the repo, but I've re-added it. Try again? > > I will answer the other email from Michael next. > > Best, > Alexandre > > > > On 22 Jul 2020, at 11:17, Stephan Oepen wrote: > > > >> (I called the remote repo FFTB instead of treebank because it's more > distinctive and recognizable, I think.) > > > > with my over-developed sense of aesthetics, i am wondering about the > > use of capitalization. there are 'erg', 'jacy', 'pydelphin', etc. > > from before (and the 'JaEn' exception, but francis is of course > > special). should the DELPH-IN organization possibly standardize on > > all-lowercase repository names? > As long as we don't have to write emails in all-lowercase, then I don't mind :) -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Sat Jul 25 03:06:23 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 24 Jul 2020 22:06:23 -0300 Subject: [developers] semantic representations in RDF In-Reply-To: References: Message-ID: Hi, I have created a repository for sharing examples and develop the RDF schemas for all DELPH-IN representations. Comments are welcome. This is just initial (and incomplete) ideas. I started with one example of EDS, the RDF original representation (produced by the WSI code) and my suggestions to fix and simplify the schema. I could have used the wiki for this discussion, but I thought it would be interesting to try the GitHub features: issues, PR, online editing of files etc. Latter, we can move the documentation to the wiki. https://github.com/arademaker/delph-in-rdf Stephan mentioned that he wants to move the http://moin.delph-in.net/WeSearch/Interface to GitHub. My repository above is NOT about the WSI and can be taken as a temporary place for the discussions of the RDF schemas; I will be waiting for Stephan to start conversations about the implementation of the transformations and possible improvements in the current WSI code. Best, Alexandre > On 21 Jul 2020, at 15:57, Stephan Oepen wrote: > > i am tempted to migrate the WSI code from SVN > to M$ GitHub, and then we could maybe collect these schemas there, and > you could look into generating the RDF serializations without java? From arademaker at gmail.com Tue Jul 28 16:37:20 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 28 Jul 2020 11:37:20 -0300 Subject: [developers] [sdp-organizers] From EDS/RMS to DM In-Reply-To: <16C87E38-C822-47D2-956F-784A581D5639@gmail.com> References: <29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com> <16C87E38-C822-47D2-956F-784A581D5639@gmail.com> Message-ID: Hi Stephan, While processing a sample of the wordnet glosses, the redwoods script produced two invalid .gz files. One example is for the sentence: "a historical region in central and northern Yugoslavia; Serbs settled the region in the 6th and 7th centuries" See the derivation node: (527 #) In the result file of the profile, the derivation node looks fine, the 334.gz is attached. (527 a_det_rbst 0.000000 0 1 ("a" 347 "token [ +FORM \\"a\\" +FROM \\"0\\" +TO \\"1\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:1>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"a\\" +TICK + +ONSET c-or-v-onset ]?)) I am trying to understand the lisp code of redwoods.lisp, but without being able to load it in my slime environment, navigating in the source code and debugging is a nightmare. I know that export-tree is doing more than just copy the derivation tree from the profile, but I didn?t understand what it is doing with the derivations, it is hard to have the `big picture`. BTW, you really like the `loop` macro! ;-) These errors cause the dtm script to fail, although I should not expect it to work with the current trunk version of ERG, dm.cfg was not changed since 2012. % svn info etc/dm.cfg Path: etc/dm.cfg Name: dm.cfg Working Copy Root Path: /Users/ar/hpsg/terg URL: http://svn.delph-in.net/erg/trunk/etc/dm.cfg Relative URL: ^/erg/trunk/etc/dm.cfg Repository Root: http://svn.delph-in.net Repository UUID: 3df82f5b-d43a-0410-af33-fce91db48ec5 Revision: 28882 Node Kind: file Schedule: normal Last Changed Author: oe Last Changed Rev: 12172 Last Changed Date: 2012-12-01 18:54:20 -0200 (Sat, 01 Dec 2012) Text Last Updated: 2019-02-07 20:21:10 -0200 (Thu, 07 Feb 2019) Checksum: b8097dfbd5cc9b9d654233314006f8c8b0fcecaa Since my goal is to have at least one bi-lexical format in the WSI interface, I am still trying to understand what the dtm (converter) does. The converter.pdf explains how to use the code, input/output, but it doesn't disclose its logic, the high-level description of the system. Eventually, we can reimplement the dtm using pydelphin (see https://github.com/delph-in/pydelphin/issues/122). The error that I have reported in my previous message when I call redwoods with the dm in `--export input,derivation,mrs,eds,dm` is probably related to what I am showing here since the `dm-construct` function end ups calling the python dtm.py code. Finally, the handling of `:dm` keyword was not copied to the lkb-fos/src/tsdb/lisp/ source code. But I am sure you and John are both aware of that. As always, comments and possible references are welcome! ;-) Best, Alexandre PS: I know that all these errors are expected since, as you said, `I am venturing into unexplored territory` by mixing the ?classic? DELPHIN toolchain with the 'modern tools from the pacific northwest?. Yes, I am processing the profiles with ACE/pydelphin and ?exporting? data (derivation, input, MRS and EDS) from them with redwoods lisp code. But I assume we aim at have interoperability between the tools, right? That is my motivation to keep reporting the errors. Please, correct me if I am wrong. -------------- next part -------------- A non-text attachment was scrubbed... Name: 334.gz Type: application/x-gzip Size: 3753 bytes Desc: not available URL: -------------- next part -------------- From oe at ifi.uio.no Tue Jul 28 18:08:29 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Tue, 28 Jul 2020 18:08:29 +0200 Subject: [developers] [sdp-organizers] From EDS/RMS to DM In-Reply-To: References: <29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com> <16C87E38-C822-47D2-956F-784A581D5639@gmail.com> Message-ID: the export code will want to rebuild the derivation, i.e. the version of the grammar loaded needs to be fully compatible with the treebank (or parsed profile). i wonder whether ?a_det_rbst? is available at the time of exporting? it sounds like a mal-configuration of the grammar, maybe? which you would have to match on the LKB side then, e.g. push the right feature or load the right ?script? file? greetings from the road (metaphorically), oe On Tue, 28 Jul 2020 at 16:38 Alexandre Rademaker wrote: > > Hi Stephan, > > While processing a sample of the wordnet glosses, the redwoods script > produced two invalid .gz files. One example is for the sentence: "a > historical region in central and northern Yugoslavia; Serbs settled the > region in the 6th and 7th centuries" > > See the derivation node: > > (527 #) > > In the result file of the profile, the derivation node looks fine, the > 334.gz is attached. > > (527 a_det_rbst 0.000000 0 1 ("a" 347 "token [ +FORM \\"a\\" +FROM \\"0\\" > +TO \\"1\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] > LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ > +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE > non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics > +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* > LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null > [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:1>\\" +LL ctype [ > -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"a\\" +TICK + +ONSET > c-or-v-onset ]?)) > > I am trying to understand the lisp code of redwoods.lisp, but without > being able to load it in my slime environment, navigating in the source > code and debugging is a nightmare. I know that export-tree is doing more > than just copy the derivation tree from the profile, but I didn?t > understand what it is doing with the derivations, it is hard to have the > `big picture`. BTW, you really like the `loop` macro! ;-) > > These errors cause the dtm script to fail, although I should not expect it > to work with the current trunk version of ERG, dm.cfg was not changed since > 2012. > > % svn info etc/dm.cfg > Path: etc/dm.cfg > Name: dm.cfg > Working Copy Root Path: /Users/ar/hpsg/terg > URL: http://svn.delph-in.net/erg/trunk/etc/dm.cfg > Relative URL: ^/erg/trunk/etc/dm.cfg > Repository Root: http://svn.delph-in.net > Repository UUID: 3df82f5b-d43a-0410-af33-fce91db48ec5 > Revision: 28882 > Node Kind: file > Schedule: normal > Last Changed Author: oe > Last Changed Rev: 12172 > Last Changed Date: 2012-12-01 18:54:20 -0200 (Sat, 01 Dec 2012) > Text Last Updated: 2019-02-07 20:21:10 -0200 (Thu, 07 Feb 2019) > Checksum: b8097dfbd5cc9b9d654233314006f8c8b0fcecaa > > Since my goal is to have at least one bi-lexical format in the WSI > interface, I am still trying to understand what the dtm (converter) does. > The converter.pdf explains how to use the code, input/output, but it > doesn't disclose its logic, the high-level description of the system. > Eventually, we can reimplement the dtm using pydelphin (see > https://github.com/delph-in/pydelphin/issues/122). The error that I have > reported in my previous message when I call redwoods with the dm in > `--export input,derivation,mrs,eds,dm` is probably related to what I am > showing here since the `dm-construct` function end ups calling the python > dtm.py code. Finally, the handling of `:dm` keyword was not copied to the > lkb-fos/src/tsdb/lisp/ source code. But I am sure you and John are both > aware of that. > > As always, comments and possible references are welcome! ;-) > > Best, > Alexandre > > PS: I know that all these errors are expected since, as you said, `I am > venturing into unexplored territory` by mixing the ?classic? DELPHIN > toolchain with the 'modern tools from the pacific northwest?. Yes, I am > processing the profiles with ACE/pydelphin and ?exporting? data > (derivation, input, MRS and EDS) from them with redwoods lisp code. But I > assume we aim at have interoperability between the tools, right? That is my > motivation to keep reporting the errors. Please, correct me if I am wrong. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Tue Jul 28 21:06:13 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 28 Jul 2020 16:06:13 -0300 Subject: [developers] [sdp-organizers] From EDS/RMS to DM In-Reply-To: References: <29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com> <16C87E38-C822-47D2-956F-784A581D5639@gmail.com> Message-ID: <2E976150-CF4E-4424-A7F1-F2AE093D88CD@gmail.com> > On 28 Jul 2020, at 13:08, Stephan Oepen wrote: > > the export code will want to rebuild the derivation, i.e. the version of the grammar loaded needs to be fully compatible with the treebank (or parsed profile). Do you mean that `redwoods` reads the derivation just to check if the grammar passed as parameter to it was compatible with the grammar used to process the profile? So can I bypass this check and simply copy the derivation tree to the .gz file? What does it means a grammar be full compatible with a profile? Does it means that the grammar is the same used to process the profile? > i wonder whether ?a_det_rbst? is available at the time of exporting? it sounds like a mal-configuration of the grammar, maybe? > which you would have to match on the LKB side then, e.g. push the right feature or load the right ?script? file? Yes, you are right. I found this entry in the lexicon-rbst.tdl: a_det_rbst := d_-_sg-a_le_mal & [ ORTH < "a" >, SYNSEM [ LKEYS.KEYREL.PRED _a_q_rel, PHON.ONSET voc ] ]. This file is included in the english.tdl file and ACE loads to the ace/config.tdl that declares english.tdl as the grammar-top. But LKB loads the lkb/script and it doesn?t mentioned the english.tdl? So you are probably right. Unfortunately, I don?t know how to make LKB load the same grammar files that ACE is loading. I suspect this situation is what Michael would like to avoid when he proposed the http://moin.delph-in.net/VirtualSharedConfigs discussion. So far, I was considering that making logon and ACE pointing to the terg trunk would be enough, now I am realising that I wasn?t paying attention to the configurations. I hope Dan is reading this thread!! ;-) Maybe a easier solution would be to use the last stable release of ERG where lkb/script and ace/config.tdl should be compatible. But my LOGON/lingo/erg/Version.lsp has `(defparameter *grammar-version* "ERG (1214)?)`. The LOGON/lingo/terg/Version.lisp has `(defparameter *grammar-version* "ERG (trunk)?)`. How to make LOGON use ERG 2018 instead of 1214? > greetings from the road (metaphorically), oe Thank you. Best, Alexandre From oe at ifi.uio.no Thu Jul 30 13:15:47 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 30 Jul 2020 13:15:47 +0200 Subject: [developers] [sdp-organizers] From EDS/RMS to DM In-Reply-To: <2E976150-CF4E-4424-A7F1-F2AE093D88CD@gmail.com> References: <29C60769-FC12-4663-8BCF-A1DC52155B8F@gmail.com> <16C87E38-C822-47D2-956F-784A581D5639@gmail.com> <2E976150-CF4E-4424-A7F1-F2AE093D88CD@gmail.com> Message-ID: hi again, alexandre, in general, i used to recommend that most users work with actual ERG releases rather than with whatever state you find in the trunk on a given day (which, after all, is an internal work in progress). from your observations, it sounds as if dan (possibly around his joint work with colleagues at NTU) is experimenting with a mal-configuration of the ERG, and just now at least the default parameterization of the grammar in ACE differs from the defaults in the LKB and PET; that would likely not be the case in a release. from what you describe, i doubt you want the mal-extensions in your parses? for a grammar to be compatible with a treebank means that it can re-build all derivation trees recorded in the profile. the 'same' grammar will always be compatible, but sometimes it can be desirable to actually improve (or revise) the grammar in ways that do not inhibit re-unification of derivation trees but change the contents of the feature structures and, thus, derived representations like the MRS, EDS, DM, etc. this is one aspect in which we refer to the Redwoods treebanking approach as 'dynamic': the gold-standard HPSG derivation can be output in various derived views. exporting from a treebank is an interpretative process, i.e. there is no way to make it succeed (in how i designed things in [incr tsdb()] at least) without re-building all recorded derivations. arguably, MRSs should not be recorded in the treebanked profiles (they are there in recent ERG releases for convenience). the LOGON 'redwoods' scripts can be forced to always re-compute them, using the '/blind' modifier on its '--export' option. the LOGON environment provides the 'terg' (for trunk or test or trial) target so that users can put a grammar version of their choice there; please see the 'LogonExtras' page on the wiki for details; i expect it should work to 'switch' to the 2018 release of the ERG roughly as follows cd $LOGONROOT/terg svn switch $LOGONSVN/erg/tags/2018 once you are in a universe with a grammar (when loaded into the LKB) that matches your treebanked derivations, i would hope that exporting to DM will also become functional? as you note, there is a non-trivial amount of grammar-specific configuration in the DM converter (categorizing different predicates into the various classes distinguished by ivanova et al., 2012), which could lead to sub-optimal results here and there. however, from what i know about the ERG evolution between 1214 and 2018, i believe the MRSs have been comparatively stable, so DM exports from a 2018 treebank should still be decent, i would hope! best wishes, oe On Tue, Jul 28, 2020 at 9:07 PM Alexandre Rademaker wrote: > > > > On 28 Jul 2020, at 13:08, Stephan Oepen wrote: > > > > the export code will want to rebuild the derivation, i.e. the version of the grammar loaded needs to be fully compatible with the treebank (or parsed profile). > > Do you mean that `redwoods` reads the derivation just to check if the grammar passed as parameter to it was compatible with the grammar used to process the profile? So can I bypass this check and simply copy the derivation tree to the .gz file? > > What does it means a grammar be full compatible with a profile? Does it means that the grammar is the same used to process the profile? > > > i wonder whether ?a_det_rbst? is available at the time of exporting? it sounds like a mal-configuration of the grammar, maybe? > > which you would have to match on the LKB side then, e.g. push the right feature or load the right ?script? file? > > Yes, you are right. I found this entry in the lexicon-rbst.tdl: > > a_det_rbst := d_-_sg-a_le_mal & > [ ORTH < "a" >, > SYNSEM [ LKEYS.KEYREL.PRED _a_q_rel, > PHON.ONSET voc ] ]. > > > This file is included in the english.tdl file and ACE loads to the ace/config.tdl that declares english.tdl as the grammar-top. But LKB loads the lkb/script and it doesn?t mentioned the english.tdl? So you are probably right. Unfortunately, I don?t know how to make LKB load the same grammar files that ACE is loading. > > I suspect this situation is what Michael would like to avoid when he proposed the http://moin.delph-in.net/VirtualSharedConfigs discussion. So far, I was considering that making logon and ACE pointing to the terg trunk would be enough, now I am realising that I wasn?t paying attention to the configurations. > > I hope Dan is reading this thread!! ;-) > > Maybe a easier solution would be to use the last stable release of ERG where lkb/script and ace/config.tdl should be compatible. But my LOGON/lingo/erg/Version.lsp has `(defparameter *grammar-version* "ERG (1214)?)`. The LOGON/lingo/terg/Version.lisp has `(defparameter *grammar-version* "ERG (trunk)?)`. How to make LOGON use ERG 2018 instead of 1214? > > > greetings from the road (metaphorically), oe > > Thank you. > > Best, > Alexandre > From oe at ifi.uio.no Fri Jul 31 18:52:45 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Fri, 31 Jul 2020 18:52:45 +0200 Subject: [developers] consolidating LKB and [incr tsdb()] versions In-Reply-To: References: Message-ID: hi again, john (and all): ann and emily, please read on. > in terms of internal DELPH-IN responsibilities, the current LKB trunk used to be packaged via what we call the LinGO builds (http://lingo.delph-in.net) and the Ubuntu+LKB live CD (https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB). both are maintained by UW, so i am not quite sure about the breadth of their user base? but i am wondering whether these UW builds could move to using the FOS code in the foreseeable future (seeing as i am proposing to officially call an end ? i sent the above (incomplete) message somewhat hastily during the summit, so that folks in the LKB (FOS) tutorial could look at it then and there. in the meantime, john and i have started consolidating the FOS and the LOGON branches of the LKB and [incr tsdb()] code bases. we hope to have a unified version available in a week or two, which would mean that LKB and [incr tsdb()] functionality will again be more closely synchronized across the two branches (some functionality will only remain available in the LOGON environment, though, e.g. due to dependencies on external, linux-only binaries). we would like to propose that this new, actively developed version of the LKB and [incr tsdb()] become the 'trunk' in the DELPH-IN SubVersioN repository sometime in late august. the current 'trunk' (where there has not been active development for the past few years) would then be preserved as a tag (or possibly a branch, if there was an expectation of future revisions). at that point, the LinGO builds and Ubuntu+LKB creation (both maintained at UW) would likely need some attention, to either work off the new tag or adapt the build environment to the new version. once the consolidation is complete, i would be tempted to spring-clean the LKB and [incr tsdb()] code bases, seeking to remove 'dead' code, for example sub-modules that have been superseded by newer developments or have long fallen out of use and are not actively maintained. the following candidates for removal come to my mind (but there are likely more): src/glue/sppp.lsp (precursor to REPP; oe) src/main/ltemplates.lsp (unused since mid-1990s; oe) src/mrs/spell.lisp (ERG-specific and long unused; ann) src/mrs/mrsmunge.lisp (superseded by transfer rules; ann) src/mt/fragments.lisp (oe) src/mt/smt.lisp (oe) src/preprocess/ (SPPP reimplementation, SAF, SMAF; ben) i imagine john and i should take a joint pass and compile a more definite list of candidates for code cleaning for review. in the list above, i have tried to indicate who i believe was the original code owner. ann, hoping you have read this far: would you be okay with some purging of long unused code? everyone else, if you suspect you might be using any of the above sub-modules, please get in touch with john and me! best wishes, oe From aac10 at cl.cam.ac.uk Fri Jul 31 19:25:23 2020 From: aac10 at cl.cam.ac.uk (Ann Copestake) Date: Fri, 31 Jul 2020 18:25:23 +0100 Subject: [developers] consolidating LKB and [incr tsdb()] versions In-Reply-To: References: Message-ID: <2bb168af-0be2-f170-5337-c83881716389@cl.cam.ac.uk> Go for it! The mrsmunge code was still used in a number of individual projects even after the creation of the much more powerful transfer rule mechanism, but (as far as I am concerned) has been entirely superseded by the python code that Alex (and others) wrote to manipulate DMRS. Best wishes, Ann On 31/07/2020 17:52, Stephan Oepen wrote: > hi again, john (and all): > > ann and emily, please read on. > >> in terms of internal DELPH-IN responsibilities, the current LKB trunk used to be packaged via what we call the LinGO builds (http://lingo.delph-in.net) and the Ubuntu+LKB live CD (https://wiki.ling.washington.edu/bin/view.cgi/Main/KnoppixLKB). both are maintained by UW, so i am not quite sure about the breadth of their user base? but i am wondering whether these UW builds could move to using the FOS code in the foreseeable future (seeing as i am proposing to officially call an end ? > i sent the above (incomplete) message somewhat hastily during the > summit, so that folks in the LKB (FOS) tutorial could look at it then > and there. > > in the meantime, john and i have started consolidating the FOS and the > LOGON branches of the LKB and [incr tsdb()] code bases. we hope to > have a unified version available in a week or two, which would mean > that LKB and [incr tsdb()] functionality will again be more closely > synchronized across the two branches (some functionality will only > remain available in the LOGON environment, though, e.g. due to > dependencies on external, linux-only binaries). > > we would like to propose that this new, actively developed version of > the LKB and [incr tsdb()] become the 'trunk' in the DELPH-IN > SubVersioN repository sometime in late august. the current 'trunk' > (where there has not been active development for the past few years) > would then be preserved as a tag (or possibly a branch, if there was > an expectation of future revisions). at that point, the LinGO builds > and Ubuntu+LKB creation (both maintained at UW) would likely need some > attention, to either work off the new tag or adapt the build > environment to the new version. > > once the consolidation is complete, i would be tempted to spring-clean > the LKB and [incr tsdb()] code bases, seeking to remove 'dead' code, > for example sub-modules that have been superseded by newer > developments or have long fallen out of use and are not actively > maintained. the following candidates for removal come to my mind (but > there are likely more): > > src/glue/sppp.lsp (precursor to REPP; oe) > src/main/ltemplates.lsp (unused since mid-1990s; oe) > src/mrs/spell.lisp (ERG-specific and long unused; ann) > src/mrs/mrsmunge.lisp (superseded by transfer rules; ann) > src/mt/fragments.lisp (oe) > src/mt/smt.lisp (oe) > src/preprocess/ (SPPP reimplementation, SAF, SMAF; ben) > > i imagine john and i should take a joint pass and compile a more > definite list of candidates for code cleaning for review. in the list > above, i have tried to indicate who i believe was the original code > owner. ann, hoping you have read this far: would you be okay with > some purging of long unused code? everyone else, if you suspect you > might be using any of the above sub-modules, please get in touch with > john and me! > > best wishes, oe > > From oe at ifi.uio.no Sun Aug 2 14:44:21 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sun, 2 Aug 2020 14:44:21 +0200 Subject: [developers] extension to the REPP sub-formalism Message-ID: dear bec, mike, and woodley: during the summit you may have noticed dan mentioning a 'war zone' around NE-related token mapping rules in the current ERG trunk. with our move to modern, OntoNotes-style tokenization, the initial REPP segmentation now breaks at dashes (including hyphens) and slashes. but these will, of course, occur frequently in named entities like email and web addresses, where they should preferably not be segmented. the current unhappy state of affairs is that initial tokenization over-segments, with dan then heroically seeking to re-unite at least the most common patterns of 'multi-token' named entities in token mapping, where any number of token boundaries may have been introduced at hyphens and slashes. to rationalize this state of affairs (and, thus, work toward a peace treaty in token mapping), i believe we will need to extend the REPP language with a new facility: masking sub-strings according to NE-like patterns prior to core REPP processing, and exempting masked regions from all subsequent rewriting (i.e. making sure they remain intact). i have added an example of this new facility (introducing the '+' operator) to the ERG trunk; please see: http://svn.delph-in.net/erg/trunk/rpp/ne.rpp at present, these rules are only loaded into the LKB (where i am in the process of adding masking to the REPP implementation), hence they should not cause trouble in the other engines (i hope). i would like to invite you (as the developers of REPP processors in PET, pyDelphin, and ACE, respectively) to look over this proposal and share any comments you might have. assuming we can agree on the need for extending the REPP language along the above lines, i am hoping you might have a chance to add support for the masking operator in your REPP implementations? from my ongoing work in the LKB, masking support appears relatively straightforward once an engine implements the step-wise accounting for character position sketched by Dridan & Oepen (2012; ACL). the masking patterns merely set a boolean flag for the matched character positions, and subsequent rewriting must block rule applications that destructively change one or more masked character positions. output of capture groups (copying from the left-hand side verbatim), on the other hand, must be allowed over masked regions. because the LKB implementation predates the 2012 paper, however, i will first have to implement the precise accounting mechanism to validate the above expectation regarding how to realize masking. what do you make of the above proposal? oe From sweaglesw at sweaglesw.org Mon Aug 3 09:26:21 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Mon, 3 Aug 2020 00:26:21 -0700 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: Message-ID: <8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org> Hi Stephan, It looks from the file you referenced like the proposed new operation is '=' rather than '+'? This seems like a plausible and modest addition to me, and should not be hard to implement. I guess you will be limited to using this facility in cases where the designation as named entity is sufficiently unambiguous based on the RE alone. It is tempting to contemplate ways in which REPP could offer ambiguous tokenization output here, but so far my imagination is too limited to come up with the scenario where it would be useful. Woodley > On Aug 2, 2020, at 5:44 AM, Stephan Oepen wrote: > > dear bec, mike, and woodley: > > during the summit you may have noticed dan mentioning a 'war zone' > around NE-related token mapping rules in the current ERG trunk. with > our move to modern, OntoNotes-style tokenization, the initial REPP > segmentation now breaks at dashes (including hyphens) and slashes. > but these will, of course, occur frequently in named entities like > email and web addresses, where they should preferably not be > segmented. the current unhappy state of affairs is that initial > tokenization over-segments, with dan then heroically seeking to > re-unite at least the most common patterns of 'multi-token' named > entities in token mapping, where any number of token boundaries may > have been introduced at hyphens and slashes. > > to rationalize this state of affairs (and, thus, work toward a peace > treaty in token mapping), i believe we will need to extend the REPP > language with a new facility: masking sub-strings according to NE-like > patterns prior to core REPP processing, and exempting masked regions > from all subsequent rewriting (i.e. making sure they remain intact). > i have added an example of this new facility (introducing the '+' > operator) to the ERG trunk; please see: > > http://svn.delph-in.net/erg/trunk/rpp/ne.rpp > > at present, these rules are only loaded into the LKB (where i am in > the process of adding masking to the REPP implementation), hence they > should not cause trouble in the other engines (i hope). i would like > to invite you (as the developers of REPP processors in PET, pyDelphin, > and ACE, respectively) to look over this proposal and share any > comments you might have. assuming we can agree on the need for > extending the REPP language along the above lines, i am hoping you > might have a chance to add support for the masking operator in your > REPP implementations? > > from my ongoing work in the LKB, masking support appears relatively > straightforward once an engine implements the step-wise accounting for > character position sketched by Dridan & Oepen (2012; ACL). the > masking patterns merely set a boolean flag for the matched character > positions, and subsequent rewriting must block rule applications that > destructively change one or more masked character positions. output > of capture groups (copying from the left-hand side verbatim), on the > other hand, must be allowed over masked regions. because the LKB > implementation predates the 2012 paper, however, i will first have to > implement the precise accounting mechanism to validate the above > expectation regarding how to realize masking. > > what do you make of the above proposal? oe From goodman.m.w at gmail.com Mon Aug 3 09:35:06 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Mon, 3 Aug 2020 15:35:06 +0800 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: Message-ID: Hi Stephan, This sounds like a good solution. I have some questions/comments below. On Sun, Aug 2, 2020 at 8:44 PM Stephan Oepen wrote: > [...] > to rationalize this state of affairs (and, thus, work toward a peace > treaty in token mapping), i believe we will need to extend the REPP > language with a new facility: masking sub-strings according to NE-like > patterns prior to core REPP processing, and exempting masked regions > from all subsequent rewriting (i.e. making sure they remain intact). > Ok, so if I understood correctly, masking is not sequential like rewrite rules, and happens before the rewrite rules regardless of where the mask pattern appears in the file (just as the tokenization pattern is applied after the rewrite rules), and the order of application of the mask patterns doesn't matter. I first wish to discuss mask pattern discovery, and this cross-cuts with some other unclear areas of the REPP specification. To recap, REPP has sequential operators ('!' rewrite rule, '<' file include, and '>' group call) which apply in order during processing, and non-sequential operators ('#' iterative group definition, ':' tokenizer pattern, '@' meta-info declaration) which do not apply except in certain circumstances (iterative groups when they are called, tokenization after all rewrite rules have applied). Non-sequential operators also have these two properties: 1. They may only be defined once in a REPP (once per identifier for iterative groups) 2. They are local to a REPP instance (an iterative group or tokenizer pattern in an external module is not available to other modules) (These are partially guesses; I've raised an issue for PyDelphin to resolve related questions so they don't distract from the current topic: https://github.com/delph-in/pydelphin/issues/308) The masking rules are non-sequential, but (1) clearly doesn't apply, and (2) doesn't seem to apply in your proposal since ne.rpp is a submodule. At first my reaction was to vote for starting simple and using masks defined in the top-level module only (like the tokenizer), but I can see the value in having them spread across submodules: a submodule may define rewrite rules that require additional masks that are only needed when the module is active. So if we allow submodules to define these global masks, I guess we need to collect any mask pattern found by crawling active submodules. The non-sequential but global nature raises an issue: what if a submodule containing a mask is active (e.g., set in *repp-calls* in the LKB) but is not actually called with a group-call (i.e., if `>ne` did not appear in tokenizer.rpp)? > i have added an example of this new facility (introducing the '+' > operator) to the ERG trunk; please see: > > http://svn.delph-in.net/erg/trunk/rpp/ne.rpp > As an aside, that email regex is needlessly complicated. Since, in a unicode-aware regex engine, the word-character class \w is equivalent to the L and N unicode properties with the underscore ([\p{L}\p{N}_]), and since the TLD part of the domain must have only ascii characters, it can be simplified as follows: ? Either way it's not RFC5322 compatible but I imagine in running text you want to match addresses that may be displayed with unicode codepoints. > [...] the masking patterns merely set a boolean flag for the matched > character > positions, and subsequent rewriting must block rule applications that > destructively change one or more masked character positions. output > of capture groups (copying from the left-hand side verbatim), on the > other hand, must be allowed over masked regions. That makes sense, but we may need a different mechanism than just boolean flags because of the possibility of immediately adjacent masked regions looking like one solid region when we should allow material to be inserted between them. Instead, an IOB scheme (like in chunking) or similar would be better. There's also the question of overlapping masks (viz., when a mask pattern matches a sequence that is already part of another mask). The IOB vector would not accommodate these as separate, overlapping masks, so we could (1) ignore overlapping matches, (2) union them (and update the IOB values accordingly), or (3) use a different data structure such as a list of mask start-positions and run-lengths. Currently I like option (2). Finally, do we want to block rewrite rules where a capture group starts or ends within a mask? I can imagine multiple capture groups that collectively copy the entire masked region without alteration. I think this situation wouldn't be too bad if we just check that the before and after masked substrings have the same contents *and* the characterization is constant (the same offset for the whole mask). This means the following would pass because reinserting a single non-captured character doesn't change the characterization: !(?) \1@\2 But the following would change the characterization at the end and would thus be blocked: !(?) \1.com\2 Also, generally speaking, I can see this functionality having potential to reduce the need for special casing of things beyond named entities. Currently the ERG has 12 lexical entries for "email" ("e-mail", "e - mail", "e mail", nouns and verbs) and some of the orthographic variation seems to account for tokenization effects. Is there any reason it should not be used in these cases? -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweaglesw at sweaglesw.org Mon Aug 3 18:51:26 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Mon, 3 Aug 2020 09:51:26 -0700 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: Message-ID: <63074D83-5697-438A-81C9-749ACBF88246@sweaglesw.org> Mike, what makes you think the masking operator should float to the top of the execution order instead of applying in the position it is written? I expect it may be useful to apply some degree of normalization before masking applies, and if the desire is for masking first the author can always put it first. Woodley > On Aug 3, 2020, at 12:35 AM, "goodman.m.w at gmail.com" wrote: > > ? > Hi Stephan, > > This sounds like a good solution. I have some questions/comments below. > >> On Sun, Aug 2, 2020 at 8:44 PM Stephan Oepen wrote: >> [...] >> to rationalize this state of affairs (and, thus, work toward a peace >> treaty in token mapping), i believe we will need to extend the REPP >> language with a new facility: masking sub-strings according to NE-like >> patterns prior to core REPP processing, and exempting masked regions >> from all subsequent rewriting (i.e. making sure they remain intact). > > Ok, so if I understood correctly, masking is not sequential like rewrite rules, and happens before the rewrite rules regardless of where the mask pattern appears in the file (just as the tokenization pattern is applied after the rewrite rules), and the order of application of the mask patterns doesn't matter. > > I first wish to discuss mask pattern discovery, and this cross-cuts with some other unclear areas of the REPP specification. To recap, REPP has sequential operators ('!' rewrite rule, '<' file include, and '>' group call) which apply in order during processing, and non-sequential operators ('#' iterative group definition, ':' tokenizer pattern, '@' meta-info declaration) which do not apply except in certain circumstances (iterative groups when they are called, tokenization after all rewrite rules have applied). Non-sequential operators also have these two properties: > > 1. They may only be defined once in a REPP (once per identifier for iterative groups) > 2. They are local to a REPP instance (an iterative group or tokenizer pattern in an external module is not available to other modules) > > (These are partially guesses; I've raised an issue for PyDelphin to resolve related questions so they don't distract from the current topic: https://github.com/delph-in/pydelphin/issues/308) > > The masking rules are non-sequential, but (1) clearly doesn't apply, and (2) doesn't seem to apply in your proposal since ne.rpp is a submodule. At first my reaction was to vote for starting simple and using masks defined in the top-level module only (like the tokenizer), but I can see the value in having them spread across submodules: a submodule may define rewrite rules that require additional masks that are only needed when the module is active. > > So if we allow submodules to define these global masks, I guess we need to collect any mask pattern found by crawling active submodules. The non-sequential but global nature raises an issue: what if a submodule containing a mask is active (e.g., set in *repp-calls* in the LKB) but is not actually called with a group-call (i.e., if `>ne` did not appear in tokenizer.rpp)? > >> i have added an example of this new facility (introducing the '+' >> operator) to the ERG trunk; please see: >> >> http://svn.delph-in.net/erg/trunk/rpp/ne.rpp > > As an aside, that email regex is needlessly complicated. Since, in a unicode-aware regex engine, the word-character class \w is equivalent to the L and N unicode properties with the underscore ([\p{L}\p{N}_]), and since the TLD part of the domain must have only ascii characters, it can be simplified as follows: > > ? > > Either way it's not RFC5322 compatible but I imagine in running text you want to match addresses that may be displayed with unicode codepoints. > >> [...] the masking patterns merely set a boolean flag for the matched character >> positions, and subsequent rewriting must block rule applications that >> destructively change one or more masked character positions. output >> of capture groups (copying from the left-hand side verbatim), on the >> other hand, must be allowed over masked regions. > > That makes sense, but we may need a different mechanism than just boolean flags because of the possibility of immediately adjacent masked regions looking like one solid region when we should allow material to be inserted between them. Instead, an IOB scheme (like in chunking) or similar would be better. > > There's also the question of overlapping masks (viz., when a mask pattern matches a sequence that is already part of another mask). The IOB vector would not accommodate these as separate, overlapping masks, so we could (1) ignore overlapping matches, (2) union them (and update the IOB values accordingly), or (3) use a different data structure such as a list of mask start-positions and run-lengths. Currently I like option (2). > > Finally, do we want to block rewrite rules where a capture group starts or ends within a mask? I can imagine multiple capture groups that collectively copy the entire masked region without alteration. I think this situation wouldn't be too bad if we just check that the before and after masked substrings have the same contents *and* the characterization is constant (the same offset for the whole mask). This means the following would pass because reinserting a single non-captured character doesn't change the characterization: > > !(?) \1@\2 > > But the following would change the characterization at the end and would thus be blocked: > > !(?) \1.com\2 > > Also, generally speaking, I can see this functionality having potential to reduce the need for special casing of things beyond named entities. Currently the ERG has 12 lexical entries for "email" ("e-mail", "e - mail", "e mail", nouns and verbs) and some of the orthographic variation seems to account for tokenization effects. Is there any reason it should not be used in these cases? > > -- > -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Mon Aug 3 18:52:04 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Mon, 3 Aug 2020 18:52:04 +0200 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: Message-ID: hi again, mike, and many thanks for the quick response! > Ok, so if I understood correctly, masking is not sequential like rewrite rules, and happens before the rewrite rules regardless of where the mask pattern appears in the file (just as the tokenization pattern is applied after the rewrite rules), and the order of application of the mask patterns doesn't matter. that is in fact not what i had intended. i would like the masking rules to follow the standard sequential flow of control in REPP, i.e. they get invoked when the processor gets to that point in the rule sequence. for full generality, i imagine one might want to allow some string-level normalization prior to mask invocation. the effects of a successful mask matching will be valid from that point in the processing sequence onwards. on this view, i believe your clarification questions (1) and (2) do not apply, right? > That makes sense, but we may need a different mechanism than just boolean flags because of the possibility of immediately adjacent masked regions looking like one solid region when we should allow material to be inserted between them. Instead, an IOB scheme (like in chunking) or similar would be better. indeed, that is a good point (that i had not yet considered). yes, destructive rewriting inbetween two adjacent masking regions must be allowed. > There's also the question of overlapping masks (viz., when a mask pattern matches a sequence that is already part of another mask). The IOB vector would not accommodate these as separate, overlapping masks, so we could (1) ignore overlapping matches, (2) union them (and update the IOB values accordingly), or (3) use a different data structure such as a list of mask start-positions and run-lengths. Currently I like option (2). yes, your option (2) sounds like the most straightforward solution, both in terms of specifying the expected behavior and implementing it. the alternative would be not to allow overlapping mask matching, but to me too it seems conceptually simplest (for REPP users and implementers alike) to not restrict mask matching and union overlapping matches. > Finally, do we want to block rewrite rules where a capture group starts or ends within a mask? I can imagine multiple capture groups that collectively copy the entire masked region without alteration. I think this situation wouldn't be too bad if we just check that the before and after masked substrings have the same contents *and* the characterization is constant (the same offset for the whole mask). i am not quite sure what exactly you have in mind here regarding constant characterization (masked sub-strings can be shifted to the left or the right, but their length and content must not change)? my original assumption was to just disallow rewriting without capture groups inside (or overlapping with) a masked region. this feels like a simple and clear constraint to me. on this view, two adjacent capture groups that cover (at least) the complete masked region would be fine, but even single-character identity rewriting (as in your '@' example) should be blocked. i fail to see a compelling need for that kind of rewriting in the first place, and i would like to not complicate masking support too much. i imagine it might be relatively straightforward to evaluate rewriting conditions while synthesizing the output (i.e. while processing the right-hand side of a rule), interleaved with the character-level accounting. i have started to extend ReppTop on the wiki with a section on masking, though some of the fine points of this thread have yet to be (decided and) written down. thanks, once more, for pushing towards more specificity! > Also, generally speaking, I can see this functionality having potential to reduce the need for special casing of things beyond named entities. Currently the ERG has 12 lexical entries for "email" ("e-mail", "e - mail", "e mail", nouns and verbs) and some of the orthographic variation seems to account for tokenization effects. Is there any reason it should not be used in these cases? well, yes, i too wonder at times whether accommodation of typographic variation could be reduced in the ERG lexicon :-). this is a tricky game, i fear. in part because what is in the lexicon (in some cases) seeks to cover both common conventions and common deviations, in part because there have been some usage scenarios for the ERG without going through the REPP layer (i.e. when parsing pre-tokenized or otherwise externally tokenized inputs). for the above example, i imagine (at least if assuming REPP tokenization) one could hope to make do without the three-token |e - mail| lexical entry (by masking |e-mail|), whereas the other variants likely are required. but such masking could be said to duplicate specific lexical information in the REPP rules, so maybe one would rather want to not require the |e-mail| entry? best wishes, oe From oe at ifi.uio.no Mon Aug 3 18:58:47 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Mon, 3 Aug 2020 18:58:47 +0200 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: <8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org> References: <8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org> Message-ID: hi woodley, > It looks from the file you referenced like the proposed new operation is '=' rather than '+'? yes, sorry, my typo in the email! > I guess you will be limited to using this facility in cases where the designation as named entity is sufficiently unambiguous based on the RE alone. It is tempting to contemplate ways in which REPP could offer ambiguous tokenization output here, but so far my imagination is too limited to come up with the scenario where it would be useful. indeed, the intended use for masking would be for (near-)certain patterns; in principle, one could further split and ambiguate in token mapping then. in the REPP predecessor, there was some contemplation of string-level rewriting over a token lattice, but with the introduction of token mapping we more than happily purged that complexity from the initial tokenizer. i have grown fond of the current division of labor, with a simple, sequence-to-sequence initial step (which should be limited to straightforward string-level processing), the ability to call out to external processors (like a PoS tagger) with that simple sequence, and deferring lattice processing to the second stage of preprocessing, where we can manipulate structured token objects ... glad to hear you expect REPP masking should not be hard to implement; i have yet to find out whether i share that optimistic expectation on the LKB side :-). oe From goodman.m.w at gmail.com Tue Aug 4 04:11:25 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Tue, 4 Aug 2020 10:11:25 +0800 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: Message-ID: On Tue, Aug 4, 2020 at 12:52 AM Stephan Oepen wrote: > hi again, mike, and many thanks for the quick response! > > > Ok, so if I understood correctly, masking is not sequential like rewrite > rules, and happens before the rewrite rules regardless of where the mask > pattern appears in the file (just as the tokenization pattern is applied > after the rewrite rules), and the order of application of the mask patterns > doesn't matter. > > that is in fact not what i had intended. i would like the masking > rules to follow the standard sequential flow of control in REPP, i.e. > they get invoked when the processor gets to that point in the rule > sequence. for full generality, i imagine one might want to allow some > string-level normalization prior to mask invocation. the effects of a > successful mask matching will be valid from that point in the > processing sequence onwards. > Sorry, I misinterpreted what you meant by "masking sub-strings [...] prior to core REPP processing" in the original email. on this view, i believe your clarification questions (1) and (2) do > not apply, right? > Correct, although my related questions (in the GitHub issue) still stand. We can deal with those later. [...] > > > Finally, do we want to block rewrite rules where a capture group starts > or ends within a mask? I can imagine multiple capture groups that > collectively copy the entire masked region without alteration. I think this > situation wouldn't be too bad if we just check that the before and after > masked substrings have the same contents *and* the characterization is > constant (the same offset for the whole mask). > > i am not quite sure what exactly you have in mind here regarding > constant characterization (masked sub-strings can be shifted to the > left or the right, but their length and content must not change)? By "the same offset for the whole mask" I am referring to the start and end positions that are tracked for each character. The offset itself may change (indicating the masked region shifting left or right), but all start and end offsets within a masked region must be the same offset, otherwise it indicates that the length has changed or that content has been replaced. > my > original assumption was to just disallow rewriting without capture > groups inside (or overlapping with) a masked region. this feels like > a simple and clear constraint to me. on this view, two adjacent > capture groups that cover (at least) the complete masked region would > be fine, but even single-character identity rewriting (as in your '@' > example) should be blocked. i fail to see a compelling need for that > kind of rewriting in the first place, and i would like to not > complicate masking support too much. i imagine it might be relatively > straightforward to evaluate rewriting conditions while synthesizing > the output (i.e. while processing the right-hand side of a rule), > interleaved with the character-level accounting. > I agree that these cases are extremely unlikely. I think that being too permissive with these seemingly trivial decisions can lead to unexpected bugs later. For instance, if we allow multiple capture groups to piece together the original masked string and we oversee the rewriting to ensure it hasn't changed, these might cause problems, depending on implementation: ; mask "abc" =abc ; full mask is captured and rewritten contiguously, but string and offsets change !(a)(b)(c) \2\1\3 ; full mask is captured, only part is written !(a(b)(c)) \2\3 ; full mask is captured and rewritten contiguously, but 'b' is duplicated !(a(b))(c) \1\2\3 I feel that the analysis of the regex on the left and the template on the right to ensure that the full masked substring is recreated contiguously, completely, and in order is an overly-complicated solution. Perhaps when I write this code I'll see something that makes it easy to compute. But barring that, I proposed using post-rule-application checks on having uniform start/end offsets in each mask and that the contents of those substrings is identical to the corresponding pre-rule-application substrings. These checks alone would not block the '@' example only as a side effect, because replacing a single non-captured character does not break the uniformity of the offsets (and in this case the string didn't change, either). When 2 or more non-captured characters are replaced, the offsets become non-uniform, even if the replaced characters are identical to the input. The '@' example could probably be blocked with a third check that no non-captured material is inserted in a mask; at least, this sounds much simpler than tracking the captured groups. The alternative where a rewrite rule is blocked if capture groups begin or end within a mask sounds like a special case that would be confusing for a grammar developer not familiar with the full REPP specification. > [...] > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bec.dridan at gmail.com Wed Aug 5 13:24:39 2020 From: bec.dridan at gmail.com (Bec Dridan) Date: Wed, 5 Aug 2020 21:24:39 +1000 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: Message-ID: It's a _loong_ time since I looked at that code (or used svn...). I've been refreshing my memory of the code, and I think I can see how that works. As a mechanism, it sounds reasonable, but it's going to be a long time before I'd have time to sit down and try to make the change. More than happy for anyone else to take up the challenge :) Bec On Sun, Aug 2, 2020 at 10:44 PM Stephan Oepen wrote: > dear bec, mike, and woodley: > > during the summit you may have noticed dan mentioning a 'war zone' > around NE-related token mapping rules in the current ERG trunk. with > our move to modern, OntoNotes-style tokenization, the initial REPP > segmentation now breaks at dashes (including hyphens) and slashes. > but these will, of course, occur frequently in named entities like > email and web addresses, where they should preferably not be > segmented. the current unhappy state of affairs is that initial > tokenization over-segments, with dan then heroically seeking to > re-unite at least the most common patterns of 'multi-token' named > entities in token mapping, where any number of token boundaries may > have been introduced at hyphens and slashes. > > to rationalize this state of affairs (and, thus, work toward a peace > treaty in token mapping), i believe we will need to extend the REPP > language with a new facility: masking sub-strings according to NE-like > patterns prior to core REPP processing, and exempting masked regions > from all subsequent rewriting (i.e. making sure they remain intact). > i have added an example of this new facility (introducing the '+' > operator) to the ERG trunk; please see: > > http://svn.delph-in.net/erg/trunk/rpp/ne.rpp > > at present, these rules are only loaded into the LKB (where i am in > the process of adding masking to the REPP implementation), hence they > should not cause trouble in the other engines (i hope). i would like > to invite you (as the developers of REPP processors in PET, pyDelphin, > and ACE, respectively) to look over this proposal and share any > comments you might have. assuming we can agree on the need for > extending the REPP language along the above lines, i am hoping you > might have a chance to add support for the masking operator in your > REPP implementations? > > from my ongoing work in the LKB, masking support appears relatively > straightforward once an engine implements the step-wise accounting for > character position sketched by Dridan & Oepen (2012; ACL). the > masking patterns merely set a boolean flag for the matched character > positions, and subsequent rewriting must block rule applications that > destructively change one or more masked character positions. output > of capture groups (copying from the left-hand side verbatim), on the > other hand, must be allowed over masked regions. because the LKB > implementation predates the 2012 paper, however, i will first have to > implement the precise accounting mechanism to validate the above > expectation regarding how to realize masking. > > what do you make of the above proposal? oe > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Thu Aug 6 10:04:57 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 6 Aug 2020 10:04:57 +0200 Subject: [developers] www script in the logon distribution In-Reply-To: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> Message-ID: hi again, alexandre: > For some reason, the www script in the logon distribution does not start the webserver. Using the `--debug` option, I don't have any additional information in the log file (actually, the script didn't mention the debug anywhere). I am following all instructions from http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running without any error in the startup. I don't see any *.pvm file in the /tmp. The script bin/logon starts LKB and the [incr TSDB()] normally. I have used `?cat` to save a lisp file and load it manually in the ACL REPL, no error too. Any idea? i am slowly catching up to DELPH-IN email, with apologies for the long turn-around! is the above still a current problem? is this within your container, or does it also occur on a 'regular' linux box? to debug further, note that the 'www' script sets things up so that you can interact with the running lisp image once initialization is complete, i.e. just type into the lisp prompt, e.g. to inspect the state of AllegroServe. when you observe that the web server is not started, does that mean it does not even bind to its port? when running with the standard '--erg' option, i would expect the following to work (and return the dynamically generated top-level page): wget http://localhost:8100/logon best wishes, oe From sweaglesw at sweaglesw.org Sun Aug 9 08:53:49 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Sat, 8 Aug 2020 23:53:49 -0700 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: <8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org> Message-ID: Hi again, > On Aug 3, 2020, at 9:58 AM, Stephan Oepen wrote: > > glad to hear you expect REPP masking should not be hard to implement; > i have yet to find out whether i share that optimistic expectation on > the LKB side :-). I got to the point of being able to play around a bit with rules, anyway. I can mask email addresses, but as far as I can tell, no subsequent rules are ever even trying to do anything inside of them. Is this actually a good test case? I get a single identical token for the email address in the below example, before and after implementing the masking idea: $ ace -g erg.dat -E I sent an e-mail. EXECUTING MASK pattern... MASKING I<0:1> sent<2:6> <7:29> an<30:32> e<33:34> -<34:35> mail<35:39> .<39:40> > On Aug 3, 2020, at 12:35 AM, goodman.m.w at gmail.com wrote: > > As an aside, that email regex is needlessly complicated. Since, in a unicode-aware regex engine, the word-character class \w is equivalent to the L and N unicode properties with the underscore ([\p{L}\p{N}_]), and since the TLD part of the domain must have only ascii characters, it can be simplified as follows: > > ? Besides looking prettier, Mike's regex has the advantage of working in Boost's POSIX regex interface, whereas Stephan's does not. I am not particularly eager to change to a different regex API. Boost regex has multiple ways to call it, and for whatever reason, the POSIX way does not support the \p{} syntax. I ended up using the BIO-encoded representation of what's masked that Mike proposed, so I can mask two adjacent spans and then still insert material between them, but block changing material inside of the masked regions. In my implementation, material copied by capture group is OK but material rewritten literally on the RHS of a replace fails currently, because that material ends up being marked as unmasked, whereas the check requires identical content, characterization, and mask tags for everything in a masked area. As you both noted, shifting the entire mask left or right is fine. Regards, -Woodley From oe at ifi.uio.no Sun Aug 9 23:47:48 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sun, 9 Aug 2020 23:47:48 +0200 Subject: [developers] extension to the REPP sub-formalism In-Reply-To: References: <8F548C32-1BCF-474F-BD82-67B39B322E8E@sweaglesw.org> Message-ID: hi again, woodley: > I got to the point of being able to play around a bit with rules, anyway. I can mask email addresses, but as far as I can tell, no subsequent rules are ever even trying to do anything inside of them. Is this actually a good test case? I get a single identical token for the email address in the below example, before and after implementing the masking idea: i am happy to hear you were able to confirm your optimistic expectation that masking would not be too difficult to implement :-). i shall add a few more masking rules to the ERG trunk this coming week, but i would think the following could be a useful test case to explore the interaction of masking and rewriting (i would expect eleven tokens): stephan, oe at yy.com, oe at ellingsen-oepen.net, or ??????@?????-??????.??, called. > Besides looking prettier, Mike's regex has the advantage of working in Boost's POSIX regex interface, whereas Stephan's does not. I am not particularly eager to change to a different regex API. Boost regex has multiple ways to call it, and for whatever reason, the POSIX way does not support the \p{} syntax. i would suggest we leave aesthetic judgments to the maintainers of the REPP rules, but in this case i put in unicode properties for a reason: i am eager to take into use the \p{} syntax because (unlike classic character ranges or shorthands like \w) it is unambiguously defined across engines, independent of locales. more importantly, i expect unicode properties will afford a cleaner and more general solution to normalization of punctuation, e.g. different types of whitespace and various conventions for opening and closing quote marks; unicode properties may also help in dealing with interspersed foreign content. it appears Boost regex offers full unicode support when combined with ICU, which i would guess ACE is using from before? so, i am hoping that full unicode support in regular expressions (in REPP and chart mapping) might become available with relatively minor adjustments of how you call into the Boost regex engine? https://www.boost.org/doc/libs/1_73_0/libs/regex/doc/html/boost_regex/unicode.html > I ended up using the BIO-encoded representation of what's masked that Mike proposed, so I can mask two adjacent spans and then still insert material between them, but block changing material inside of the masked regions. In my implementation, material copied by capture group is OK but material rewritten literally on the RHS of a replace fails currently, because that material ends up being marked as unmasked, whereas the check requires identical content, characterization, and mask tags for everything in a masked area. that all sounds compatible with my intuitions about how i would like the masking to behave. in general, i am hoping to discourage literal rewriting, as it has the potential to weaken characterization accounting. many thanks for working on this! oe From olzama at uw.edu Tue Aug 11 22:38:35 2020 From: olzama at uw.edu (Olga Zamaraeva) Date: Tue, 11 Aug 2020 13:38:35 -0700 Subject: [developers] ERG coverage references Message-ID: Dear developers, I am looking for some very general reference on the ERG coverage (to include in a document which has a short section on HPSG grammars). The most recent one I was able to find so far is Table 1 in Flickinger et al. 2012 . Are there any more recent ones? Anything that is associated with the 2018 release perhaps? Dan's summit updates do not include this info, as far as I can tell. Thank you, -- Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Thu Aug 13 04:44:21 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 12 Aug 2020 23:44:21 -0300 Subject: [developers] ErgWeSearch - Deep Linguistic Processing with HPSG (DELPH-IN) Message-ID: <4C8C0B71-0AA4-4F78-B167-09894CD950F4@gmail.com> Hi Stephan, Any reason for keeping this page below restricted: http://moin.delph-in.net/ErgWeSearch Currently it has #acl RomanPearah,ParticipantsGroup:read,write,admin I have two students working with me and they can?t access this page. Can we remove this acl directive? Alexandre Sent from my iPhone From arademaker at gmail.com Fri Aug 14 22:37:13 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 14 Aug 2020 17:37:13 -0300 Subject: [developers] Broken link Message-ID: <38A648AB-989A-4C64-91A8-70542AB17163@gmail.com> Hi Stephan, Im http://moin.delph-in.net/EdsTop EDS since its 2002 inception (Oepen, et al., 2002) has found a broader range of DELPH-IN-internal applications? The first link to http://bultreebank.org/proceedings/paper10.pdf is broken, it should be http://bultreebank.org/wp-content/uploads/2017/05/paper10.pdf ? Best, Alexandre From arademaker at gmail.com Wed Aug 19 23:37:38 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 19 Aug 2020 18:37:38 -0300 Subject: [developers] First step for the clone of FFTB SVN in GitHub In-Reply-To: References: Message-ID: <89F0DA8E-B78C-47F7-AC1C-C2F9732BAF7D@gmail.com> Hi, Michael, thank you for all suggestions you made. I followed almost everything! ;-) https://github.com/delph-in/fftb The fftb tool now has a mirror repository in the GitHub (or M$ GitHub as Stephan likes to write!) Since I do need a place to put the README and the authors.txt, I ended up having a master branch that will be updated mainly with changes from the SVN. The trunk will be the pristine branch, mirroring SVN trunk. Next, we have to define possible workflows. For instance, I had an issue (https://github.com/delph-in/fftb/issues/1) with remote connections. Woodley told me how to solve it with a tiny change in the file web.c, I kept the modification in a branch called `issue-1` for now. This branch would probably not be merged into master directly, only if Woodley agrees to make the change in the SVN principal repository? I would them update the git repo with the changes in the SVN. If Woodley doesn?t accept the change in the code (*), we can still have this branch in the git repository for particular uses. For instance, I can obtain the fftb code with this change to make the docker image: https://github.com/own-pt/docker-delphin/blob/master/image/Dockerfile#L65-L67 So I don?t have to copy the web.c into the docker repository, which is excellent! Suggestions are welcome! (*) if I understood it right, the modification opens the door for any remote connection, and it can be understood as a security risk. Best, Alexandre > On 22 Jul 2020, at 11:57, goodman.m.w at gmail.com wrote: > > Done. fftb it is. > From arademaker at gmail.com Fri Aug 21 16:54:20 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 21 Aug 2020 11:54:20 -0300 Subject: [developers] Comparing a profile with a grammar output Message-ID: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> Hi, After having a profile disambiguated with FFTB, my first question is how to compare it with the grammar output for the same set of sentences? This comparison should give me an evaluation of the current ranking model with my data. It should tell me if it is worth to train a new model. In particular, if I compare the semantic structure, it would also allow me to ignore variations on syntactic analysis that doesn't impact semantic representation. I remember that Michael mentioned in the last Summit that PyDelphin has some support for comparing semantic representation, am I right? I didn't find it in the documentation. I also tried to use the mtools from Stephan (https://github.com/cfmrp/mtool) but I am probably not using it right, since even with two different sentences I am getting the same output below: % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 1.eds % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds % ./main.py --read eds --score mrp --framework eds --gold 1.eds 2.eds {"n": 0, "null": 0, "exact": 0, "tops": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "labels": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "properties": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "anchors": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "edges": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "all": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "time": 6.985664367675781e-05, "cpu": 0.00020100000000000673} What am I missing? Should I use any other method to compare the profile with the grammar output? Comments and suggestions are welcome! :-) Best, Alexandre Rademaker http://arademaker.github.com/ http://researcher.ibm.com/person/br-alexrad From oe at ifi.uio.no Fri Aug 21 17:35:26 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Fri, 21 Aug 2020 17:35:26 +0200 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> Message-ID: hi alexandre: > After having a profile disambiguated with FFTB, my first question is how to compare it with the grammar output for the same set of sentences? This comparison should give me an evaluation of the current ranking model with my data. It should tell me if it is worth to train a new model. In particular, if I compare the semantic structure, it would also allow me to ignore variations on syntactic analysis that doesn't impact semantic representation. one used to do this kind of evaluation in [incr tsdb()], initiated through the 'Trees | Score' command. i am not sure this will work out of the box for comparing an FFTB-based treebank to an ACE-generated parsing profile ... although it should, in principle! i expect you would want to select 'Result Equivalence' and 'Score All Items' in 'Trees | Switches', and just give it a shot :-). if you get the above to work (which should give exact match accuracies), the [incr tsdb()] scorer can also compute a range of additional metrics, including EDM and ParsEval, but these would have to be activated programmatically: (setf *redwoods-score-counts* '(:ta :parseval :edma :edmp)) it should also be possible to batch-score a selection of profiles, with a little bit of coding in the high-level [incr tsdb()] scripting language; for your inspiration, i have put on-line an archive of working files from the (as of yet unpublished) manuscript on robust parsing and unification with, among others, yi zhang: http://nlpl.eu/oe/edm.tgz > I also tried to use the mtools from Stephan (https://github.com/cfmrp/mtool) but I am probably not using it right, since even with two different sentences I am getting the same output below: > > % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 1.eds > % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds > % ./main.py --read eds --score mrp --framework eds --gold 1.eds 2.eds > {"n": 0, you end up scoring zero items, which either suggests your EDS input files are not considered valid by mtool, or the '--framework eds' selection fails. the latter should not be necessary (it may only work with MRP input files; its purpose is to select a sub-set of graphs, explicitly marked for a specific framework, from a multi-framework input file). equally likely, your EDS input files may be missing the identifier prefix; please see 'data/score/eds/' in mtool for the expected syntax. cheers, oe From arademaker at gmail.com Sat Aug 22 01:30:30 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 21 Aug 2020 20:30:30 -0300 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> Message-ID: > On 21 Aug 2020, at 12:35, Stephan Oepen wrote: > > one used to do this kind of evaluation in [incr tsdb()], initiated > through the 'Trees | Score' command. i am not sure this will work out > of the box for comparing an FFTB-based treebank to an ACE-generated > parsing profile ... although it should, in principle! i expect you > would want to select 'Result Equivalence' and 'Score All Items' in > 'Trees | Switches', and just give it a shot :-). > > if you get the above to work (which should give exact match > accuracies), the [incr tsdb()] scorer can also compute a range of > additional metrics, including EDM and ParsEval, but these would have > to be activated programmatically: > > (setf *redwoods-score-counts* '(:ta :parseval :edma :edmp)) > Hi Stephan, Thank you for the directions. Trying that first approach, it gave me 100% accuracy?? But the window with the results opened instantaneously, I don?t believe it really did the analysis. How to make sure the tool is doing what we excepted it to do? How can I know if the two profiles are proper selected? Best, Alexandre -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-2.png Type: image/png Size: 512765 bytes Desc: not available URL: From arademaker at gmail.com Sat Aug 22 02:38:15 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 21 Aug 2020 21:38:15 -0300 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> Message-ID: <03B12AF5-0AB3-4E72-9D8C-83429E179C1A@gmail.com> I just noticed that I need to load the grammar in the LKB. But LKB could not load the trunk version of ERG. Using the ERG last stable version, I got the same behaviour. > On 21 Aug 2020, at 20:30, Alexandre Rademaker wrote: > >> On 21 Aug 2020, at 12:35, Stephan Oepen wrote: >> >> one used to do this kind of evaluation in [incr tsdb()], initiated >> through the 'Trees | Score' command. i am not sure this will work out >> of the box for comparing an FFTB-based treebank to an ACE-generated >> parsing profile ... although it should, in principle! i expect you >> would want to select 'Result Equivalence' and 'Score All Items' in >> 'Trees | Switches', and just give it a shot :-). >> >> if you get the above to work (which should give exact match >> accuracies), the [incr tsdb()] scorer can also compute a range of >> additional metrics, including EDM and ParsEval, but these would have >> to be activated programmatically: >> >> (setf *redwoods-score-counts* '(:ta :parseval :edma :edmp)) > > Hi Stephan, > > Thank you for the directions. Trying that first approach, it gave me 100% accuracy?? But the window with the results opened instantaneously, I don?t believe it really did the analysis. How to make sure the tool is doing what we excepted it to do? How can I know if the two profiles are proper selected? > > > > > Best, > Alexandre > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-3.png Type: image/png Size: 553009 bytes Desc: not available URL: From arademaker at gmail.com Sat Aug 22 03:22:31 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 21 Aug 2020 22:22:31 -0300 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> Message-ID: <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com> Hi Stephan, I tried the mtool again. Same problem. % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds I added the #XXXX right before the EDS serialization. The only different between these files in the https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is that these files are not formatted with one predicate per line, instead, the EDS is serialised in a single line without line breaks. % ./main.py --read eds --score smatch --gold ../sick/1.eds ../sick/2.eds {"n": 0, "g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0, "time": 1.4781951904296875e-05, "cpu": 4.4000000000044004e-05} % ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds {"n": 0, "null": 0, "exact": 0, "tops": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "labels": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "properties": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "anchors": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "edges": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "attributes": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "all": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}, "time": 3.0994415283203125e-05, "cpu": 9.099999999995223e-05} But? I found the repo that Michael presented in the Summit https://github.com/delph-in/delphin.edm: % delphin edm 1.eds 2.eds Precision: 1.0 Recall: 1.0 F-score: 1.0 It works! But I need to remove the prefix (#NNNNN) before the EDS serializations. Even better, it works directly with profiles although the verbose option didn?t show anything interesting (I would like to see results per item): % delphin edm -v golden parsed Precision: 0.9637710992177851 Recall: 0.9683557394002068 F-score: 0.9660579799855565 Thank you Michael! I am not very confident on these numbers, I was expecting more differences, but? Anyway, it would be nice to double-check with mtool if I can. Best, Alexandre > On 21 Aug 2020, at 12:35, Stephan Oepen wrote: > >> I also tried to use the mtools from Stephan (https://github.com/cfmrp/mtool) but I am probably not using it right, since even with two different sentences I am getting the same output below: >> >> % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 1.eds >> % echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --from ace --to eds > 2.eds >> % ./main.py --read eds --score mrp --framework eds --gold 1.eds 2.eds >> {"n": 0, > > you end up scoring zero items, which either suggests your EDS input > files are not considered valid by mtool, or the '--framework eds' > selection fails. the latter should not be necessary (it may only work > with MRP input files; its purpose is to select a sub-set of graphs, > explicitly marked for a specific framework, from a multi-framework > input file). equally likely, your EDS input files may be missing the > identifier prefix; please see 'data/score/eds/' in mtool for the > expected syntax. From oe at ifi.uio.no Sat Aug 22 09:02:50 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sat, 22 Aug 2020 09:02:50 +0200 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com> References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com> Message-ID: hi again, alexandre and mike: > I added the #XXXX right before the EDS serialization. The only different between these files in the https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is that these files are not formatted with one predicate per line, instead, the EDS is serialised in a single line without line breaks. i am tempted to declare those line breaks a necessary part of the native EDS syntax (though i see that the current EdsTop wiki page does not explicitly state that). mike, could you change EDS serialization in pyDelphin to reflect the multi-line format exemplified on that page? also, when you have an item identifier available i would suggest you prefix the EDS with an additional line (assuming the identifier is 4711): #4711 this latter addition should be considered optional, though, and i shall check that the mtool EDS reader does not require it (i suspect currently it does; mtool has hardly been used in conjunction with native EDS serialization, so this is a welcome push toward better cross-format and -platform interoperability). regarding your lack of success when invoking the scorer in [incr tsdb()], alexandre: could you make available to me a copy of the two profiles involved? best wishes, oe From goodman.m.w at gmail.com Sat Aug 22 11:34:07 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Sat, 22 Aug 2020 17:34:07 +0800 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com> Message-ID: Hi Alexandre and Stephan, Alexandre, if you only care about exact matches of semantics, you might look into how the Matrix does regression testing (see rtest.py at https://github.com/delph-in/matrix/). And I'd need to refresh my understanding of the specifics, but I think delphin.edm gives the same final scores as mtool but not Bec's tool. It *can* replicate the results of both by adjusting weights and things via command options. It doesn't give as granular a breakdown as mtool, but you can get per-item results when you increase verbosity twice (-vv). Finally, if you want EDS with line breaks, try adding `--indent` or `--indent 1` to the `delphin convert` command. Stephan, I don't see why line breaks are necessary for EDS native format. There is no syntactic necessity that I can see. I find it very useful to have the option for single-line EDS (or any format), e.g., for line-pairing the exported representations of two profiles. I'll consider how I can make the identifier prefix (e.g., #4711) map to the internal 'identifier' field (see https://pydelphin.readthedocs.io/en/latest/api/delphin.eds.html#delphin.eds.EDS ). On Sat, Aug 22, 2020 at 3:03 PM Stephan Oepen wrote: > hi again, alexandre and mike: > > > I added the #XXXX right before the EDS serialization. The only different > between these files in the > https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is > that these files are not formatted with one predicate per line, instead, > the EDS is serialised in a single line without line breaks. > > i am tempted to declare those line breaks a necessary part of the > native EDS syntax (though i see that the current EdsTop wiki page does > not explicitly state that). mike, could you change EDS serialization > in pyDelphin to reflect the multi-line format exemplified on that > page? also, when you have an item identifier available i would > suggest you prefix the EDS with an additional line (assuming the > identifier is 4711): > > #4711 > > this latter addition should be considered optional, though, and i > shall check that the mtool EDS reader does not require it (i suspect > currently it does; mtool has hardly been used in conjunction with > native EDS serialization, so this is a welcome push toward better > cross-format and -platform interoperability). > > regarding your lack of success when invoking the scorer in [incr > tsdb()], alexandre: could you make available to me a copy of the two > profiles involved? > > best wishes, oe > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Sat Aug 22 17:57:53 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Sat, 22 Aug 2020 12:57:53 -0300 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com> Message-ID: <50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com> Hi Stephan, Following your directions, I asked pydelphin to export with line breaks (?indent), and I successfully execute the mtool with all except the ?mrp? metric, see below. For the profiles, you can find them at https://github.com/arademaker/sick-fftb. Thank you so much for your help. You raised an interesting question about the `item identifier`. Is it part of the EDS? We may need a specification of a file format containing a sequence of EDS serialization (or native EDS syntax, as you also wrote). I think the same happens with ACE stdout protocols (https://pydelphin.readthedocs.io/en/latest/api/delphin.ace.html#ace-stdout-protocols), for instance, a "SENT: ..." precedes all MRSs, but this is not part of the MRS. These issues are all related to the work in the RDF Schemas? Best, Alexandre echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --indent --from ace --to eds > 1.eds echo "It is rainning today." | ace -g ../wn/terg-mac.dat -T -n 1 | delphin convert --indent --from ace --to eds > 2.eds % ./main.py --read eds --score ucca --gold ../sick/1.eds ../sick/2.eds {"n": 1, "labeled": {"primary": {"g": 6, "s": 6, "c": 6, "p": 1.0, "r": 1.0, "f": 1.0}, "remote": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}}, "unlabeled": {"primary": {"g": 5, "s": 5, "c": 5, "p": 1.0, "r": 1.0, "f": 1.0}, "remote": {"g": 0, "s": 0, "c": 0, "p": 0.0, "r": 0.0, "f": 0.0}}, "time": 0.0001461505889892578, "cpu": 0.0004350000000000742} % ./main.py --read eds --score smatch --gold ../sick/1.eds ../sick/2.eds {"n": 1, "g": 42, "s": 42, "c": 42, "p": 1.0, "r": 1.0, "f": 1.0, "time": 0.0055389404296875, "cpu": 0.016556000000000015} % ./main.py --read eds --score edm --gold ../sick/1.eds ../sick/2.eds {"n": 1, "names": {"g": 7, "s": 7, "c": 7, "p": 1.0, "r": 1.0, "f": 1.0}, "arguments": {"g": 6, "s": 6, "c": 6, "p": 1.0, "r": 1.0, "f": 1.0}, "tops": {"g": 1, "s": 1, "c": 1, "p": 1.0, "r": 1.0, "f": 1.0}, "properties": {"g": 21, "s": 21, "c": 21, "p": 1.0, "r": 1.0, "f": 1.0}, "all": {"g": 35, "s": 35, "c": 35, "p": 1.0, "r": 1.0, "f": 1.0}, "time": 8.106231689453125e-05, "cpu": 0.00024100000000004673} % ./main.py --read eds --score sdp --gold ../sick/1.eds ../sick/2.eds {"n": 1, "labeled": {"g": 7, "s": 7, "c": 7, "p": 1.0, "r": 1.0, "f": 1.0, "m": 1.0}, "unlabeled": {"g": 6, "s": 6, "c": 6, "p": 1.0, "r": 1.0, "f": 1.0, "m": 1.0}, "time": 7.104873657226562e-05, "cpu": 0.00021099999999996122} % ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds Traceback (most recent call last): File "./main.py", line 472, in main(); File "./main.py", line 385, in main result = score.mces.evaluate(gold, graphs, File "/Users/ar/hpsg/mtool/score/mces.py", line 493, in evaluate for id, g, s, tops, labels, properties, anchors, \ File "/Users/ar/hpsg/mtool/score/mces.py", line 490, in results = (schedule(g, s, rrhc_limit, mces_limit, trace, errors) File "/Users/ar/hpsg/mtool/score/mces.py", line 441, in schedule raise e; File "/Users/ar/hpsg/mtool/score/mces.py", line 389, in schedule = g.score(s, mapping); File "/Users/ar/hpsg/mtool/graph.py", line 856, in score = tuples(self, identities1); File "/Users/ar/hpsg/mtool/graph.py", line 771, in tuples anchors.add((identity, anchor)); TypeError: unhashable type: ?list' > On 22 Aug 2020, at 04:02, Stephan Oepen wrote: > > hi again, alexandre and mike: > >> I added the #XXXX right before the EDS serialization. The only different between these files in the https://github.com/cfmrp/mtool/blob/master/data/score/eds/wsj.pet.eds is that these files are not formatted with one predicate per line, instead, the EDS is serialised in a single line without line breaks. > > i am tempted to declare those line breaks a necessary part of the > native EDS syntax (though i see that the current EdsTop wiki page does > not explicitly state that). mike, could you change EDS serialization > in pyDelphin to reflect the multi-line format exemplified on that > page? also, when you have an item identifier available i would > suggest you prefix the EDS with an additional line (assuming the > identifier is 4711): > > #4711 > > this latter addition should be considered optional, though, and i > shall check that the mtool EDS reader does not require it (i suspect > currently it does; mtool has hardly been used in conjunction with > native EDS serialization, so this is a welcome push toward better > cross-format and -platform interoperability). > > regarding your lack of success when invoking the scorer in [incr > tsdb()], alexandre: could you make available to me a copy of the two > profiles involved? > > best wishes, oe From oe at ifi.uio.no Sat Aug 22 18:38:25 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sat, 22 Aug 2020 18:38:25 +0200 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: <50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com> References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com> <50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com> Message-ID: > % ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds > Traceback (most recent call last): > File "./main.py", line 472, in > main(); > File "./main.py", line 385, in main > result = score.mces.evaluate(gold, graphs, > File "/Users/ar/hpsg/mtool/score/mces.py", line 493, in evaluate > for id, g, s, tops, labels, properties, anchors, \ > File "/Users/ar/hpsg/mtool/score/mces.py", line 490, in > results = (schedule(g, s, rrhc_limit, mces_limit, trace, errors) > File "/Users/ar/hpsg/mtool/score/mces.py", line 441, in schedule > raise e; > File "/Users/ar/hpsg/mtool/score/mces.py", line 389, in schedule > = g.score(s, mapping); > File "/Users/ar/hpsg/mtool/graph.py", line 856, in score > = tuples(self, identities1); > File "/Users/ar/hpsg/mtool/graph.py", line 771, in tuples > anchors.add((identity, anchor)); > TypeError: unhashable type: ?list' could you report that in the mtool issue tracker (on M$ GitHub), ideally attaching the two input files? i shall have a look :-). oe From arademaker at gmail.com Sat Aug 22 20:31:21 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Sat, 22 Aug 2020 15:31:21 -0300 Subject: [developers] Comparing a profile with a grammar output In-Reply-To: References: <8EC3CDEC-C0B3-45C8-AC44-22DCA1BA9292@gmail.com> <672451AB-C8D6-40CF-B5EB-6FE29CFF4CF2@gmail.com> <50D90672-043A-4568-9419-3C6E1D83AFA7@gmail.com> Message-ID: <1472E9EA-A52C-4F38-822D-4E162F2DF422@gmail.com> Done, https://github.com/cfmrp/mtool/issues/78, but I saw that you just fixed the error! Thank you. > On 22 Aug 2020, at 13:38, Stephan Oepen wrote: > >> % ./main.py --read eds --score mrp --gold ../sick/1.eds ../sick/2.eds >> Traceback (most recent call last): >> File "./main.py", line 472, in >> main(); >> File "./main.py", line 385, in main >> result = score.mces.evaluate(gold, graphs, >> File "/Users/ar/hpsg/mtool/score/mces.py", line 493, in evaluate >> for id, g, s, tops, labels, properties, anchors, \ >> File "/Users/ar/hpsg/mtool/score/mces.py", line 490, in >> results = (schedule(g, s, rrhc_limit, mces_limit, trace, errors) >> File "/Users/ar/hpsg/mtool/score/mces.py", line 441, in schedule >> raise e; >> File "/Users/ar/hpsg/mtool/score/mces.py", line 389, in schedule >> = g.score(s, mapping); >> File "/Users/ar/hpsg/mtool/graph.py", line 856, in score >> = tuples(self, identities1); >> File "/Users/ar/hpsg/mtool/graph.py", line 771, in tuples >> anchors.add((identity, anchor)); >> TypeError: unhashable type: ?list' > > could you report that in the mtool issue tracker (on M$ GitHub), > ideally attaching the two input files? i shall have a look :-). > > oe From olzama at uw.edu Thu Sep 3 19:02:34 2020 From: olzama at uw.edu (Olga Zamaraeva) Date: Thu, 3 Sep 2020 10:02:34 -0700 Subject: [developers] A one-off Matrix Dev meeting next Wednesday Message-ID: Dear all, Now that some of us have been actively working on the Matrix for some time, we thought it would make sense for us to have a meeting every now and then. So we will have one on *Wednesday Sep 9 6:30 PM Seattle time.* It is just a one-time thing, focused mostly on stuff Mike, T.J., and myself have been doing lately (e.g. development practices discussion), which is why we did not ask others for time preferences and are not trying to cover as many zones as possible etc. But we still wanted to let everyone know about it in case someone wants to join and can make the time! It will be over Zoom, the invitation below: Topic: Matrix Dev Time: Sep 9, 2020 06:30 PM Pacific Time (US and Canada) Join Zoom Meeting https://washington.zoom.us/j/92424621772?pwd=OWNxUHZOdXdiNmMxbVpabVlsM2hJUT09 Meeting ID: 924 2462 1772 Passcode: 900358 One tap mobile +12063379723,,92424621772# US (Seattle) +12532158782,,92424621772# US (Tacoma) Dial by your location +1 206 337 9723 US (Seattle) +1 253 215 8782 US (Tacoma) +1 213 338 8477 US (Los Angeles) +1 346 248 7799 US (Houston) +1 602 753 0140 US (Phoenix) +1 669 219 2599 US (San Jose) +1 669 900 6833 US (San Jose) +1 720 928 9299 US (Denver) +1 971 247 1195 US (Portland) +1 786 635 1003 US (Miami) +1 267 831 0333 US (Philadelphia) +1 301 715 8592 US (Germantown) +1 312 626 6799 US (Chicago) +1 470 250 9358 US (Atlanta) +1 470 381 2552 US (Atlanta) +1 646 518 9805 US (New York) +1 646 876 9923 US (New York) +1 651 372 8299 US (St. Paul) Meeting ID: 924 2462 1772 Find your local number: https://washington.zoom.us/u/abq23yFNjV Join by SIP 92424621772 at zoomcrc.com Join by H.323 162.255.37.11 (US West) 162.255.36.11 (US East) 221.122.88.195 (China) 115.114.131.7 (India Mumbai) 115.114.115.7 (India Hyderabad) 213.19.144.110 (Amsterdam Netherlands) 213.244.140.110 (Germany) 103.122.166.55 (Australia) 209.9.211.110 (Hong Kong SAR) 64.211.144.160 (Brazil) 69.174.57.160 (Canada) 207.226.132.110 (Japan) Meeting ID: 924 2462 1772 Passcode: 900358 -- Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Sep 9 06:06:38 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 9 Sep 2020 01:06:38 -0300 Subject: [developers] Abstract Wikipedia Message-ID: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia The goal of Abstract Wikipedia is to let more people share in more knowledge in more languages. Abstract Wikipedia is an extension of Wikidata. In Abstract Wikipedia, people can create and maintain Wikipedia articles in a language-independent way. A Wikipedia in a language can translate this language-independent article into its language. Code does the translation. The Grammatical Framework community provided some response and suggestion on how GF could be used for language generation https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community I wonder if the statement about HPSG is fair: > check out other grammar formalisms, like HPSG, you'll see similar coverage to GF, but no unified API for different languages. Alexandre Sent from my iPhone -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Wed Sep 9 15:47:01 2020 From: ebender at uw.edu (Emily M. Bender) Date: Wed, 9 Sep 2020 06:47:01 -0700 Subject: [developers] Abstract Wikipedia In-Reply-To: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> Message-ID: Dear Alexandre, The ERG has been under continuous development since 1993, with definitely more than 27 person-years in it at this point. I guess the question is whether the GF resources are comparable in scale... Emily On Tue, Sep 8, 2020 at 9:07 PM Alexandre Rademaker wrote: > > https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia > > The goal of *Abstract Wikipedia* is to let more people share in more > knowledge in more languages. Abstract Wikipedia is an extension of > Wikidata. In Abstract Wikipedia, people can create and maintain Wikipedia > articles in a language-independent way. A Wikipedia in a language can > translate this language-independent article into its language. Code does > the translation. > > The Grammatical Framework community provided some response and suggestion > on how GF could be used for language generation > > > https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community > > I wonder if the statement about HPSG is fair: > > check out other grammar formalisms, like HPSG > , you'll see similar coverage > to GF, but no unified API for different languages. > > > Alexandre > Sent from my iPhone > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Sep 9 16:54:18 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 9 Sep 2020 11:54:18 -0300 Subject: [developers] Abstract Wikipedia In-Reply-To: References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> Message-ID: Yes, definitely. Regarding the unified API for multiple languages, I mentioned MATRIX as having a similar goal: in the end, it is all about speed up the development of grammars and provides similar analysis of similar linguistic constructions (reuse and interoperability), right? Aarne Ranta replied to me today in the GF mailing list. Since it is a public list, I am copying here (also at https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/c0r2Lm0eAgAJ) for hearing from this community. He admitted the coverage of GF is years behind, anyway. If so, maybe it is one more opportunity to demystify the complexity of real uses of HPSG grammars. The previous projects should be enough evidence, but not everybody knows about http://moin.delph-in.net/OldProjects. My personal experience is that things are not always ready to use out-of-the-box, but I find my way through the HPSG/DELPHI-IN universe during the last 4-5 years! ;-) > Hello Alexandre, > > Thanks for pointing this out. This reminds me that I should write a little summary of what would be involved in reproducing the RGL, and what our starting point was back in 2001. So here it is: > > The main inspirations of GF-RGL were > > - XFST, Xerox Finite State morphologies for several languages > - CLE, Core Language Engine, an SRI-Cambridge-Telia etc project for building syntax modules for some languages to be used in applications > > The main lesson learned was > > - make it open source and involve a community. CLE in practically disappeared because nobody had the rights to continue with it, and XFST was increasingly replaced by open-source variants > > This brings us to > > - the LinGO matrix, using HPSG, an open-source project > > also an inspiration for the RGL, just a bit later, but still alive and active. The difference we wanted to make was > > - think about non-linguist programmers as the majority of users > > This led us to > > - design GF and its module system in a way similar to programming languages, rather than grammar formalisms > - separate the linguist's view (the internals of the RGL) from the application programmer's view (the RGL API) > > The closest GF counterpart of the LinGO matrix is thus the internal abstract syntax of the RGL. But when looking at the LinGO/DELPH-IN documentation back in 2003 and still today, I cannot see anything corresponding to the API. It is more of a linguists' project than of programmers'. And I think it would be quite a job to develop it into an API direction similar to GF. Not only is the starting point less friendly to that (with GF's formal distinction between abstract and concrete syntax), but even in the GF world, it took several years to bring the module system and the compiler into a state that smoothly supports the division of labour between linguists and application programmers in the way we do. > > This said, HPSG has reached longer in their linguistic coverage in many languages, in particular in the English RGL: GF has nothing like that, and again it would take years of work to build it. > > Of course, the nicest thing would be to share resources in a formalism independent way. This looks quite feasible in the case of morphological lexica, and is an ongoing practice already. But when it comes to syntax, I am less sure. Syntax code in GF and HPSG and other higher-level (above context-free) formalisms is essentially like code in different programming languages. There the practice is that each language has to build their standard libraries from scratch (think about for instance collections and generics in Java, C++, Haskell,...) An alternative is to enable foreign function interfaces (like from Python to C), but I cannot see very concretely right now how this would look for instance between GF and HPSG - and how much there would really be to gain. But of course we have mutual communication, for instance by co-organizing GEAF workshops (Grammar Engineering Across Frameworks), and see each other as allies rather than enemies. > > ParGram (in LFG) could also be mentioned, but it used to be a proprietary system that was more difficult to learn from. > > Regards > > Aarne. Anyway, the abstract wikipedia project is about language generation and it seems very interesting. Best, Alexandre > On 9 Sep 2020, at 10:47, Emily M. Bender wrote: > > Dear Alexandre, > > The ERG has been under continuous development since 1993, with definitely more than 27 person-years in it at this point. I guess the question is whether the GF resources are comparable in scale... > > Emily > From ebender at uw.edu Wed Sep 9 18:08:23 2020 From: ebender at uw.edu (Emily M. Bender) Date: Wed, 9 Sep 2020 09:08:23 -0700 Subject: [developers] Abstract Wikipedia In-Reply-To: References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> Message-ID: I think the closest thing to an analog to GF with DELPH-IN materials would be an 'API' for authoring MRS representations that is user friendly for non-linguists + some set of transfer rules that take those MRSes into ones that work in each language-specific grammar. The Grammar Matrix itself is not conceived of as a tool for making grammar engineering easier for non-linguists (though people frequently seem to want that). Emily On Wed, Sep 9, 2020 at 7:54 AM Alexandre Rademaker wrote: > > Yes, definitely. Regarding the unified API for multiple languages, I > mentioned MATRIX as having a similar goal: in the end, it is all about > speed up the development of grammars and provides similar analysis of > similar linguistic constructions (reuse and interoperability), right? Aarne > Ranta replied to me today in the GF mailing list. Since it is a public > list, I am copying here (also at > https://groups.google.com/g/gf-dev/c/A6lNwZ813b0/m/c0r2Lm0eAgAJ) for > hearing from this community. > > He admitted the coverage of GF is years behind, anyway. If so, maybe it is > one more opportunity to demystify the complexity of real uses of HPSG > grammars. The previous projects should be enough evidence, but not > everybody knows about http://moin.delph-in.net/OldProjects. My personal > experience is that things are not always ready to use out-of-the-box, but I > find my way through the HPSG/DELPHI-IN universe during the last 4-5 years! > ;-) > > > > Hello Alexandre, > > > > Thanks for pointing this out. This reminds me that I should write a > little summary of what would be involved in reproducing the RGL, and what > our starting point was back in 2001. So here it is: > > > > The main inspirations of GF-RGL were > > > > - XFST, Xerox Finite State morphologies for several languages > > - CLE, Core Language Engine, an SRI-Cambridge-Telia etc project for > building syntax modules for some languages to be used in applications > > > > The main lesson learned was > > > > - make it open source and involve a community. CLE in practically > disappeared because nobody had the rights to continue with it, and XFST was > increasingly replaced by open-source variants > > > > This brings us to > > > > - the LinGO matrix, using HPSG, an open-source project > > > > also an inspiration for the RGL, just a bit later, but still alive and > active. The difference we wanted to make was > > > > - think about non-linguist programmers as the majority of users > > > > This led us to > > > > - design GF and its module system in a way similar to programming > languages, rather than grammar formalisms > > - separate the linguist's view (the internals of the RGL) from the > application programmer's view (the RGL API) > > > > The closest GF counterpart of the LinGO matrix is thus the internal > abstract syntax of the RGL. But when looking at the LinGO/DELPH-IN > documentation back in 2003 and still today, I cannot see anything > corresponding to the API. It is more of a linguists' project than of > programmers'. And I think it would be quite a job to develop it into an API > direction similar to GF. Not only is the starting point less friendly to > that (with GF's formal distinction between abstract and concrete syntax), > but even in the GF world, it took several years to bring the module system > and the compiler into a state that smoothly supports the division of labour > between linguists and application programmers in the way we do. > > > > This said, HPSG has reached longer in their linguistic coverage in many > languages, in particular in the English RGL: GF has nothing like that, and > again it would take years of work to build it. > > > > Of course, the nicest thing would be to share resources in a formalism > independent way. This looks quite feasible in the case of morphological > lexica, and is an ongoing practice already. But when it comes to syntax, I > am less sure. Syntax code in GF and HPSG and other higher-level (above > context-free) formalisms is essentially like code in different programming > languages. There the practice is that each language has to build their > standard libraries from scratch (think about for instance collections and > generics in Java, C++, Haskell,...) An alternative is to enable foreign > function interfaces (like from Python to C), but I cannot see very > concretely right now how this would look for instance between GF and HPSG - > and how much there would really be to gain. But of course we have mutual > communication, for instance by co-organizing GEAF workshops (Grammar > Engineering Across Frameworks), and see each other as allies rather than > enemies. > > > > ParGram (in LFG) could also be mentioned, but it used to be a > proprietary system that was more difficult to learn from. > > > > Regards > > > > Aarne. > > > Anyway, the abstract wikipedia project is about language generation and it > seems very interesting. > > Best, > Alexandre > > > > > On 9 Sep 2020, at 10:47, Emily M. Bender wrote: > > > > Dear Alexandre, > > > > The ERG has been under continuous development since 1993, with > definitely more than 27 person-years in it at this point. I guess the > question is whether the GF resources are comparable in scale... > > > > Emily > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Sep 9 18:33:50 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 9 Sep 2020 13:33:50 -0300 Subject: [developers] Abstract Wikipedia In-Reply-To: References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> Message-ID: Thank you Emily, yes, and they don't also have a solution for making grammar engineering easier for non-linguistics either. It is all about the separation of concerns between the linguistics and non-linguistics. They claim that a non-linguist person can use their RGL (resource grammar library) for building what they call the `application grammar`. The RGL would be maintained by linguistics. In that sense, as you said, the DELPH-IN equivalence would be a set of transfer rules. The most obvious example that came to my mind for an end-to-end approach that could be an example of this is the openproof project: http://svn.delph-in.net/erg/tags/2018/openproof/README Right? Best, Alexandre > On 9 Sep 2020, at 13:08, Emily M. Bender wrote: > > I think the closest thing to an analog to GF with DELPH-IN materials would be an 'API' for authoring > MRS representations that is user friendly for non-linguists + some set of transfer rules that take > those MRSes into ones that work in each language-specific grammar. > > The Grammar Matrix itself is not conceived of as a tool for making grammar engineering easier for > non-linguists (though people frequently seem to want that). > > Emily > From arademaker at gmail.com Wed Sep 9 20:42:57 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 9 Sep 2020 15:42:57 -0300 Subject: [developers] Valid MRS? Bug in ERG? Message-ID: Hi, Are the following two MRSs considered valid? Note that TOP is h0, h0 is qeq h1 but h1 is not the label of any predicate. In both cases, pydelphin could not make the transformation to EDS. I just want to confirm if they are invalid, if so, maybe pydelphin can?t really make sense of them. One additional possible silly question. If they are invalid, can it be consider a bug in ERG? [ TOP: h0 INDEX: e2 [ e SF: prop-or-ques ] RELS: < [ unknown<0:27> LBL: h4 ARG: u5 ARG0: e2 ] [ _quick_a_1<0:7> LBL: h4 ARG0: e6 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e2 ] [ _and_c<8:11> LBL: h4 ARG0: e7 [ e SF: prop ] ARG1: u8 ARG2: e9 [ e SF: prop ] ] [ _without_p<12:19> LBL: h4 ARG0: e9 ARG1: e2 ARG2: x10 [ x PERS: 3 NUM: sg IND: + ] ] [ udef_q<12:19> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] [ _warning_n_of<20:27> LBL: h14 ARG0: x10 ARG1: i15 ] > HCONS: < h0 qeq h1 h12 qeq h14 > ] [ TOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: untensed MOOD: indicative ] RELS: < [ unknown<0:69> LBL: h4 ARG: u5 ARG0: e2 ] [ _in_p_loc<0:2> LBL: h4 ARG0: e2 ARG1: u6 ARG2: x7 [ x PERS: 3 NUM: sg IND: + PT: pt ] ] [ _the_q<3:6> LBL: h8 ARG0: x7 RSTR: h9 BODY: h10 ] [ compound<7:17> LBL: h11 ARG0: e12 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x13 [ x IND: + PT: pt ] ] [ udef_q<7:12> LBL: h14 ARG0: x13 RSTR: h15 BODY: h16 ] [ _front_n_1<7:12> LBL: h17 ARG0: x13 ] [ _part_n_of<13:17> LBL: h11 ARG0: x7 ARG1: i18 ] [ _of_p<18:20> LBL: h11 ARG0: e19 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x20 [ x PERS: 3 NUM: sg IND: + PT: pt ] ] [ _the_q<21:24> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ] [ _neck_n_1<25:29> LBL: h24 ARG0: x20 ] [ _below_p<30:35> LBL: h11 ARG0: e25 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x26 [ x PERS: 3 NUM: sg IND: + PT: pt ] ] [ _the_q<36:39> LBL: h27 ARG0: x26 RSTR: h28 BODY: h29 ] [ _chin_n_1<40:44> LBL: h30 ARG0: x26 ] [ _and_c<45:48> LBL: h11 ARG0: e31 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: e25 ARG2: e32 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ] [ _above_p<49:54> LBL: h11 ARG0: e32 ARG1: x7 ARG2: x33 [ x PERS: 3 NUM: sg PT: pt ] ] [ _the_q<55:58> LBL: h34 ARG0: x33 RSTR: h35 BODY: h36 ] [ _collarbone/nn_u_unknown<59:69> LBL: h37 ARG0: x33 ] > HCONS: < h0 qeq h1 h9 qeq h11 h15 qeq h17 h22 qeq h24 h28 qeq h30 h35 qeq h37 > ] Best, Alexandre From goodman.m.w at gmail.com Thu Sep 10 02:30:39 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Thu, 10 Sep 2020 08:30:39 +0800 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: Message-ID: Hi Alexandre, These are disconnected graphs. Having the right side of a qeq select a handle that is not the label of any EP is an invalid configuration. This most likely is a symptom of some bug in the ERG. Regarding conversion to EDS with PyDelphin, I've created https://github.com/delph-in/pydelphin/issues/316 to track the issue. I think the LKB's EDS code will more aggressively search for a top for the EDS graph during conversion, perhaps looking to the INDEX. If anyone (Stephan?) cares to explain the procedure for selecting tops in less-than-perfect MRSs, I'd be happy to try and implement it in PyDelphin. Otherwise I'll just try to make the error message more informative. On Thu, Sep 10, 2020 at 2:44 AM Alexandre Rademaker wrote: > > Hi, > > Are the following two MRSs considered valid? Note that TOP is h0, h0 is > qeq h1 but h1 is not the label of any predicate. In both cases, pydelphin > could not make the transformation to EDS. I just want to confirm if they > are invalid, if so, maybe pydelphin can?t really make sense of them. > > One additional possible silly question. If they are invalid, can it be > consider a bug in ERG? > > > [ TOP: h0 > INDEX: e2 [ e SF: prop-or-ques ] > RELS: < [ unknown<0:27> LBL: h4 ARG: u5 ARG0: e2 ] > [ _quick_a_1<0:7> LBL: h4 ARG0: e6 [ e SF: prop TENSE: untensed > MOOD: indicative PROG: - PERF: - ] ARG1: e2 ] > [ _and_c<8:11> LBL: h4 ARG0: e7 [ e SF: prop ] ARG1: u8 ARG2: e9 > [ e SF: prop ] ] > [ _without_p<12:19> LBL: h4 ARG0: e9 ARG1: e2 ARG2: x10 [ x > PERS: 3 NUM: sg IND: + ] ] > [ udef_q<12:19> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] > [ _warning_n_of<20:27> LBL: h14 ARG0: x10 ARG1: i15 ] > > HCONS: < h0 qeq h1 h12 qeq h14 > ] > > > [ TOP: h0 > INDEX: e2 [ e SF: prop-or-ques TENSE: untensed MOOD: indicative ] > RELS: < [ unknown<0:69> LBL: h4 ARG: u5 ARG0: e2 ] > [ _in_p_loc<0:2> LBL: h4 ARG0: e2 ARG1: u6 ARG2: x7 [ x PERS: 3 > NUM: sg IND: + PT: pt ] ] > [ _the_q<3:6> LBL: h8 ARG0: x7 RSTR: h9 BODY: h10 ] > [ compound<7:17> LBL: h11 ARG0: e12 [ e SF: prop TENSE: untensed > MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x13 [ x IND: + PT: pt ] ] > [ udef_q<7:12> LBL: h14 ARG0: x13 RSTR: h15 BODY: h16 ] > [ _front_n_1<7:12> LBL: h17 ARG0: x13 ] > [ _part_n_of<13:17> LBL: h11 ARG0: x7 ARG1: i18 ] > [ _of_p<18:20> LBL: h11 ARG0: e19 [ e SF: prop TENSE: untensed > MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x20 [ x PERS: 3 NUM: sg > IND: + PT: pt ] ] > [ _the_q<21:24> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ] > [ _neck_n_1<25:29> LBL: h24 ARG0: x20 ] > [ _below_p<30:35> LBL: h11 ARG0: e25 [ e SF: prop TENSE: > untensed MOOD: indicative PROG: - PERF: - ] ARG1: x7 ARG2: x26 [ x PERS: 3 > NUM: sg IND: + PT: pt ] ] > [ _the_q<36:39> LBL: h27 ARG0: x26 RSTR: h28 BODY: h29 ] > [ _chin_n_1<40:44> LBL: h30 ARG0: x26 ] > [ _and_c<45:48> LBL: h11 ARG0: e31 [ e SF: prop TENSE: untensed > MOOD: indicative PROG: - PERF: - ] ARG1: e25 ARG2: e32 [ e SF: prop TENSE: > untensed MOOD: indicative PROG: - PERF: - ] ] > [ _above_p<49:54> LBL: h11 ARG0: e32 ARG1: x7 ARG2: x33 [ x > PERS: 3 NUM: sg PT: pt ] ] > [ _the_q<55:58> LBL: h34 ARG0: x33 RSTR: h35 BODY: h36 ] > [ _collarbone/nn_u_unknown<59:69> LBL: h37 ARG0: x33 ] > > HCONS: < h0 qeq h1 h9 qeq h11 h15 qeq h17 h22 qeq h24 h28 qeq h30 h35 > qeq h37 > ] > > > Best, > Alexandre > > > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Thu Sep 10 08:45:15 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 10 Sep 2020 08:45:15 +0200 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: Message-ID: g'day: > I think the LKB's EDS code will more aggressively search for a top for the EDS graph during conversion, perhaps looking to the INDEX. If anyone (Stephan?) cares to explain the procedure for selecting tops in less-than-perfect MRSs, I'd be happy to try and implement it in PyDelphin. yes, robustness to unusual or illformed (as in this case) MRSs has long been a key goal in the EDS conversion (in the LKB); MRS infelicities (in ERG parses) were probably more common in 2002 than today, but still i think that conversion should preferably never fail, i.e. possibly rather drop information from an illformed MRS than not yield an EDS at all. regarding the top node, i do indeed fall back to the INDEX, if need be: (let* ((ltop (ed-find-representative eds (psoa-top-h psoa))) (index (ed-find-representative eds (psoa-index psoa)))) (setf (eds-top eds) (or (and (ed-p ltop) (ed-id ltop)) (and (ed-p index) (ed-id index)) (and (var-p (psoa-index psoa)) (var-string (psoa-index psoa)))))) the third clause in the or() appears intended to deal with an MRS whose INDEX is not the intrinsic variable of any EP. in that case, the EDS will end up with a top that is not the identifier of any of its nodes, so effectively no top. thinking about such corner cases just now, i am tempted to drop that third fall-back clause and leave the top empty (which would be formally equivalent, seeing as the top property is interpreted as an annotation on one of the actual graph nodes). it appears native serialization allows for empty top nodes already, in which case there will be nothing following the opening brace on the first line: (format stream "{~@[~(~a~):~]~ ~:[~3*~; (~@[cyclic~*~]~@[ ~*~]~@[fragmented~*~])~]~@[~%~]" (eds-top object) (and *eds-show-status-p* (or cyclicp fragmentedp) ) cyclicp (and cyclicp fragmentedp) fragmentedp (eds-relations object)) while i am sure we have never hit empty tops while working with MRSs produced by the ERG, the above suggests that (a) identification of the top node is optional in EDS and (b) native serialization was intended as a line-oriented format. mike, may i suggest you add the fall-back, looking for the INDEX, and otherwise allow EDSs whose top is empty. regarding the exact definition of the native EDS serialization, i shall return to that question in the original thread we had on the topic (one might disallow whitespace between the opening brace and the optional top, to try and evade conclusion (b) above). cheers, oe From goodman.m.w at gmail.com Thu Sep 10 09:17:28 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Thu, 10 Sep 2020 15:17:28 +0800 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: Message-ID: Thanks for the clarification, Stephan. I've noted the suggestion for backing off on TOP to INDEX and for allowing no top. This makes sense. I'm completely unable to make sense of the lisp format call, so I'm not sure what you mean regarding conclusion (b), but I'll wait for your post to the other thread. On Thu, Sep 10, 2020 at 2:45 PM Stephan Oepen wrote: > g'day: > > > I think the LKB's EDS code will more aggressively search for a top for > the EDS graph during conversion, perhaps looking to the INDEX. If anyone > (Stephan?) cares to explain the procedure for selecting tops in > less-than-perfect MRSs, I'd be happy to try and implement it in PyDelphin. > > yes, robustness to unusual or illformed (as in this case) MRSs has > long been a key goal in the EDS conversion (in the LKB); MRS > infelicities (in ERG parses) were probably more common in 2002 than > today, but still i think that conversion should preferably never fail, > i.e. possibly rather drop information from an illformed MRS than not > yield an EDS at all. > > regarding the top node, i do indeed fall back to the INDEX, if need be: > > (let* ((ltop (ed-find-representative eds (psoa-top-h psoa))) > (index (ed-find-representative eds (psoa-index psoa)))) > (setf (eds-top eds) > (or (and (ed-p ltop) (ed-id ltop)) > (and (ed-p index) (ed-id index)) > (and (var-p (psoa-index psoa)) > (var-string (psoa-index psoa)))))) > > the third clause in the or() appears intended to deal with an MRS > whose INDEX is not the intrinsic variable of any EP. in that case, > the EDS will end up with a top that is not the identifier of any of > its nodes, so effectively no top. > > thinking about such corner cases just now, i am tempted to drop that > third fall-back clause and leave the top empty (which would be > formally equivalent, seeing as the top property is interpreted as an > annotation on one of the actual graph nodes). it appears native > serialization allows for empty top nodes already, in which case there > will be nothing following the opening brace on the first line: > > (format > stream > "{~@[~(~a~):~]~ > ~:[~3*~; (~@[cyclic~*~]~@[ ~*~]~@[fragmented~*~])~]~@[~%~]" > (eds-top object) > (and *eds-show-status-p* (or cyclicp fragmentedp) ) > cyclicp (and cyclicp fragmentedp) fragmentedp > (eds-relations object)) > > while i am sure we have never hit empty tops while working with MRSs > produced by the ERG, the above suggests that (a) identification of the > top node is optional in EDS and (b) native serialization was intended > as a line-oriented format. > > mike, may i suggest you add the fall-back, looking for the INDEX, and > otherwise allow EDSs whose top is empty. regarding the exact > definition of the native EDS serialization, i shall return to that > question in the original thread we had on the topic (one might > disallow whitespace between the opening brace and the optional top, to > try and evade conclusion (b) above). > > cheers, oe > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From bond at ieee.org Thu Sep 10 13:36:02 2020 From: bond at ieee.org (Francis Bond) Date: Thu, 10 Sep 2020 19:36:02 +0800 Subject: [developers] Abstract Wikipedia In-Reply-To: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> Message-ID: I think it is true. GF did a lot of vocabulary acquisition based on OMW 1.0 (some of my students helped) so they have vocab linked to synsets, as well as their own internal semantic hierarchy. Their actual grammars are a lot more basic than the ERG, and of course the coverage varies from language to language. On Wed, Sep 9, 2020 at 12:07 PM Alexandre Rademaker wrote: > > https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia > > The goal of *Abstract Wikipedia* is to let more people share in more > knowledge in more languages. Abstract Wikipedia is an extension of > Wikidata. In Abstract Wikipedia, people can create and maintain Wikipedia > articles in a language-independent way. A Wikipedia in a language can > translate this language-independent article into its language. Code does > the translation. > > The Grammatical Framework community provided some response and suggestion > on how GF could be used for language generation > > > https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community > > I wonder if the statement about HPSG is fair: > > check out other grammar formalisms, like HPSG > , you'll see similar coverage > to GF, but no unified API for different languages. > > > Alexandre > Sent from my iPhone > -- Francis Bond Division of Linguistics and Multilingual Studies Nanyang Technological University -------------- next part -------------- An HTML attachment was scrubbed... URL: From kivs at bultreebank.org Thu Sep 10 14:11:33 2020 From: kivs at bultreebank.org (=?utf-8?Q?Kiril=20Simov?=) Date: Thu, 10 Sep 2020 15:11:33 +0300 Subject: [developers] =?utf-8?q?Abstract_Wikipedia?= In-Reply-To: References: <04F8A34E-EBBC-4779-999D-8E13E965AB32@gmail.com> Message-ID: <20200910121133.8664.qmail@s481.sureserver.com> They also are using word embeddings together with their grammar for selection of the appropriate lexical forms. With best regards, Kiril > -------Original Message------- > From: Francis Bond > To: Alexandre Rademaker > Cc: developers > Subject: Re: [developers] Abstract Wikipedia > Sent: 10 Sep '20 14:36 > > I think it is true. GF did a lot of vocabulary acquisition based on > OMW 1.0 (some of my students helped) so they have vocab linked to > synsets, as well as their own internal semantic hierarchy. > > Their actual grammars are a lot more basic than the ERG, and of course > the coverage varies from language to language. > > On Wed, Sep 9, 2020 at 12:07 PM Alexandre Rademaker > wrote: > > > https://meta.m.wikimedia.org/wiki/Abstract_Wikipedia > > > > The goal of ABSTRACT WIKIPEDIA is to let more people share in more > > knowledge in more languages. Abstract Wikipedia is an extension of > > Wikidata. In Abstract Wikipedia, people can create and maintain > > Wikipedia articles in a language-independent way. A Wikipedia in a > > language can translate this language-independent article into its > > language. Code does the translation. > > > > The Grammatical Framework community provided some response and > > suggestion on how GF could be used for language generation > > > > > https://meta.m.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Response_from_the_Grammatical_Framework_community > > > > > > I wonder if the statement about HPSG is fair: > > > >> check out other grammar formalisms, like HPSG, you'll see similar > >> coverage to GF, but no unified API for different languages. > > > > Alexandre > > > > Sent from my iPhone > > -- > > Francis Bond > Division of Linguistics and Multilingual Studies > Nanyang Technological University > From arademaker at gmail.com Thu Sep 10 14:22:17 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Thu, 10 Sep 2020 09:22:17 -0300 Subject: [developers] Abstract Wikipedia In-Reply-To: <20200910121133.8664.qmail@s481.sureserver.com> References: <20200910121133.8664.qmail@s481.sureserver.com> Message-ID: Indeed, I followed some of the development of https://github.com/GrammaticalFramework/gf-wordnet too. Alexandre Sent from my iPhone > On 10 Sep 2020, at 09:11, Kiril Simov wrote: > > They also are using word embeddings together with > their grammar for selection of the appropriate > lexical forms. > > With best regards, > > Kiril -------------- next part -------------- An HTML attachment was scrubbed... URL: From olzama at uw.edu Sat Sep 12 21:08:58 2020 From: olzama at uw.edu (Olga Zamaraeva) Date: Sat, 12 Sep 2020 12:08:58 -0700 Subject: [developers] lui: no unification result of failure Message-ID: Dear Developers, Have you seen this behavior? (10 seconds video; basically, in some cases, there is no unification result or failure, just no visible reaction on the unification attempt) https://youtu.be/Ifqn1iAodSg Am I using the software wrong somehow (how? can you tell?), or is this a bug? I tried this with LKB FOS with the latest maclui but also with logon on ubuntu. Thanks! -- Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Sat Sep 12 23:05:05 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Sat, 12 Sep 2020 21:05:05 +0000 Subject: [developers] lui: no unification result of failure In-Reply-To: References: Message-ID: <31CB6DA5-3A0B-4FA8-A0CC-1FB8FE4CE1DA@sussex.ac.uk> Hi Olga, I think lui would normally open a new feature structure window showing the unification result. But that doesn't happen in the video. Unfortunately I can only help if the problem is on the lkb side. Could you try doing the same drag again, either in logon/ubuntu or lkb-fos started from a terminal session - but before the drag execute the following at the lisp prompt: (trace lkb::lsp-retrieve-object lkb::debug-yadu!) During/after the drag do you now get any output in the terminal window? There should be 2 calls to the former function and 1 call to the latter, all returning normally. John On 12 Sep 2020, at 20:08, Olga Zamaraeva > wrote: Dear Developers, Have you seen this behavior? (10 seconds video; basically, in some cases, there is no unification result or failure, just no visible reaction on the unification attempt) https://youtu.be/Ifqn1iAodSg Am I using the software wrong somehow (how? can you tell?), or is this a bug? I tried this with LKB FOS with the latest maclui but also with logon on ubuntu. Thanks! -- Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: From olzama at uw.edu Mon Sep 14 19:20:40 2020 From: olzama at uw.edu (Olga Zamaraeva) Date: Mon, 14 Sep 2020 10:20:40 -0700 Subject: [developers] lui: no unification result of failure In-Reply-To: <31CB6DA5-3A0B-4FA8-A0CC-1FB8FE4CE1DA@sussex.ac.uk> References: <31CB6DA5-3A0B-4FA8-A0CC-1FB8FE4CE1DA@sussex.ac.uk> Message-ID: Hi John, Here's what I see if I do what you suggested: [image: Screen Shot 2020-09-14 at 10.19.27 AM.png] On Sat, Sep 12, 2020 at 2:05 PM John Carroll wrote: > Hi Olga, > > I think lui would normally open a new feature structure window showing the > unification result. But that doesn't happen in the video. Unfortunately I > can only help if the problem is on the lkb side. > > Could you try doing the same drag again, either in logon/ubuntu or lkb-fos > started from a terminal session - but before the drag execute the following > at the lisp prompt: > > (trace lkb::lsp-retrieve-object lkb::debug-yadu!) > > During/after the drag do you now get any output in the terminal window? > There should be 2 calls to the former function and 1 call to the latter, > all returning normally. > > John > > > On 12 Sep 2020, at 20:08, Olga Zamaraeva wrote: > > Dear Developers, > > Have you seen this behavior? (10 seconds video; basically, in some cases, > there is no unification result or failure, just no visible reaction on the > unification attempt) > > https://youtu.be/Ifqn1iAodSg > > Am I using the software wrong somehow (how? can you tell?), or is this a > bug? > > I tried this with LKB FOS with the latest maclui but also with logon on > ubuntu. > > > Thanks! > -- > Olga Zamaraeva > > > -- Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2020-09-14 at 10.19.27 AM.png Type: image/png Size: 553438 bytes Desc: not available URL: From arademaker at gmail.com Wed Sep 16 05:35:35 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 16 Sep 2020 00:35:35 -0300 Subject: [developers] LkbFos: how to copy from the text area Message-ID: Hi John, How can I copy the text (the scoped MRSs) from the scoped MRS window? Best, Alexandre -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PastedGraphic-1.png Type: image/png Size: 504865 bytes Desc: not available URL: From J.A.Carroll at sussex.ac.uk Wed Sep 16 11:26:52 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Wed, 16 Sep 2020 09:26:52 +0000 Subject: [developers] LkbFos: how to copy from the text area In-Reply-To: References: Message-ID: Hi Alexandre, In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste. macOS: 1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end. 2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way. Linux: 1. Shift-drag as above. 2. To paste the highlighted text, click the middle mouse button. I'll add these instructions to http://moin.delph-in.net/LkbFos John > On 16 Sep 2020, at 04:35, Alexandre Rademaker wrote: > > > Hi John, > > How can I copy the text (the scoped MRSs) from the scoped MRS window? > > > > > Best, > Alexandre > From J.A.Carroll at sussex.ac.uk Wed Sep 16 12:57:40 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Wed, 16 Sep 2020 10:57:40 +0000 Subject: [developers] Fwd: LkbFos: how to copy from the text area References: Message-ID: Resending due to possible problem at email gateway Begin forwarded message: From: John Carroll > Subject: Re: [developers] LkbFos: how to copy from the text area Date: 16 September 2020 at 10:26:51 BST To: Alexandre Rademaker > Cc: developers > Hi Alexandre, In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste. macOS: 1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end. 2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way. Linux: 1. Shift-drag as above. 2. To paste the highlighted text, click the middle mouse button. I'll add these instructions to http://moin.delph-in.net/LkbFos John On 16 Sep 2020, at 04:35, Alexandre Rademaker > wrote: Hi John, How can I copy the text (the scoped MRSs) from the scoped MRS window? Best, Alexandre -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Sep 16 17:18:21 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 16 Sep 2020 12:18:21 -0300 Subject: [developers] LkbFos: how to copy from the text area In-Reply-To: References: Message-ID: Thank you John. The shift-drag works nicely! I don?t know how to make the right-click on MacOS. Best, Alexandre > On 16 Sep 2020, at 06:26, John Carroll wrote: > > Hi Alexandre, > > In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste. > > macOS: > 1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end. > 2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way. > > Linux: > 1. Shift-drag as above. > 2. To paste the highlighted text, click the middle mouse button. > > I'll add these instructions to http://moin.delph-in.net/LkbFos > > John From J.A.Carroll at sussex.ac.uk Wed Sep 16 17:34:30 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Wed, 16 Sep 2020 15:34:30 +0000 Subject: [developers] LkbFos: how to copy from the text area In-Reply-To: References: Message-ID: I use an Apple Magic Mouse and this supports right-click in macOS. In mouse preferences, right-click by default is "Secondary click". I've not managed to get Magic Mouse to generate a middle-click in Linux running in VirtualBox. Middle-click is needed for [incr tsdb()]. In the end I gave in and bought a cheap 3-button mouse just to get that gesture. John On 16 Sep 2020, at 16:18, Alexandre Rademaker > wrote: Thank you John. The shift-drag works nicely! I don?t know how to make the right-click on MacOS. Best, Alexandre > On 16 Sep 2020, at 06:26, John Carroll > wrote: > > Hi Alexandre, > > In LkbFos you can copy from text-like windows such as Scoped MRS, Lkb Top etc. How to do this is different between macOS and Linux since they have different conceptions of copy/paste. > > macOS: > 1. Shift-drag to highlight the text you want to copy; do this by holding down the shift key while dragging the mouse with the left button held down. Alternatively you can shift-left-click one end of the text and then shift-right-click the other end. > 2. Type command-C (or select Copy from the XQuartz Edit menu). The text is now in the system clipboard and can be pasted in the normal way. > > Linux: > 1. Shift-drag as above. > 2. To paste the highlighted text, click the middle mouse button. > > I'll add these instructions to http://moin.delph-in.net/LkbFos > > John -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Thu Sep 24 20:55:44 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Thu, 24 Sep 2020 15:55:44 -0300 Subject: [developers] Discriminant-Based MRS Banking Message-ID: <5109459B-EFF8-41A6-B8FC-C8FA73D31D44@gmail.com> Hi Stephan, I have already used the `compare` button from http://erg.delph-in.net, but I didn?t know that this web interface can edit profiles. The paper http://www.lrec-conf.org/proceedings/lrec2006/pdf/364_pdf.pdf suggested this is the case. That is, the web interface can save decisions. Well, I just have suspected since there is a disabled button `save` in the public address. Is that the case? If so, where can I find documentation about it? I know about the page http://moin.delph-in.net/LogonOnline, but it only talks about how to call the www script in the LOGON directory. Best, Alexandre From arademaker at gmail.com Fri Sep 25 01:07:21 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Thu, 24 Sep 2020 20:07:21 -0300 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: Message-ID: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Hi Michael and Stephan, A good place to learn about the Lisp format is http://www.gigamonkeys.com/book/a-few-format-recipes.html Basically, the control-string: "{~@[~(~a~):~]~ ~:[~3*~; (~@[cyclic~*~]~@[ ~*~]~@[fragmented~*~])~]~@[~%~]? The ~( means lower case. So ~{~a~) will output the value of the expression `(eds-top object)` in lower-case. But the ~@[ means conditional format, if `(eds-top object)` value is nil, it will not break nor consume the other arguments in its place. The remain of the control-string is quite complicate to read but one can follow the complete documentation in CLHS. For instance, the `~3*` means http://www.lispworks.com/documentation/lw50/CLHS/Body/22_cga.htm! ;-) It looks like the serialisation/encode of EDS in pydelphin is also robust to empty top: https://github.com/delph-in/pydelphin/blob/develop/delphin/codecs/eds.py#L257 But the decode/parse is not, see tests below. Actually, encode should not emit a colon in the first line and, of course, there is this discussion about the line-oriented format that would require a broad review of the encode/decode of EDS. I have submitted a PR to Michael solving the translation from MRS to EDS, but I didn?t touch in the decode/encode functions. I found the Lisp code in the lkb/src/mrs/dependencies.lisp file, so it is part of the LKB source code. I am curious, what `psoa` stands for? >>> edsnative.encode(edsnative.decode(a)) '{e2: e2:unknown<0:83>{e SF prop-or-ques}[ARG x4] _1:_a_q<0:1>[BV x4] x4:_river_n_of<2:7>{x PERS 3, NUM sg, IND +, PT pt}[] e10:_in_p_loc<8:10>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x4, ARG2 x11] _2:proper_q<11:30>[BV x11] e16:_northeastern_a_1<11:23>{e SF prop, TENSE untensed, MOOD indicative, PROG bool, PERF -}[ARG1 x11] x11:named<24:30>("Brazil"){x PERS 3, NUM sg, IND +}[] e18:_flow_v_1<36:41>{e SF prop, TENSE pres, MOOD indicative, PROG -, PERF -}[ARG1 x4] e19:_general_a_1<42:51>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18] e20:loc_nonsp<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x21] x21:place_n<52:61>{x PERS 3, NUM sg}[] _3:def_implicit_q<52:61>[BV x21] e26:_northward_a_1<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x21] e27:_to_p_state<62:64>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x28] _4:_the_q<65:68>[BV x28] e33:compound<69:83>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x28, ARG2 x34] _5:proper_q<69:77>[BV x34] x34:named<69:77>("Atlantic"){x PERS 3, NUM sg, IND +, PT pt}[] x28:named<78:83>("Ocean"){x PERS 3, NUM sg, IND +, PT pt}[]}? >>> x = edsnative.decode(a) >>> x.top 'e2' >>> x.top = None >>> edsnative.encode(x) '{: e2:unknown<0:83>{e SF prop-or-ques}[ARG x4] _1:_a_q<0:1>[BV x4] x4:_river_n_of<2:7>{x PERS 3, NUM sg, IND +, PT pt}[] e10:_in_p_loc<8:10>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x4, ARG2 x11] _2:proper_q<11:30>[BV x11] e16:_northeastern_a_1<11:23>{e SF prop, TENSE untensed, MOOD indicative, PROG bool, PERF -}[ARG1 x11] x11:named<24:30>("Brazil"){x PERS 3, NUM sg, IND +}[] e18:_flow_v_1<36:41>{e SF prop, TENSE pres, MOOD indicative, PROG -, PERF -}[ARG1 x4] e19:_general_a_1<42:51>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18] e20:loc_nonsp<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x21] x21:place_n<52:61>{x PERS 3, NUM sg}[] _3:def_implicit_q<52:61>[BV x21] e26:_northward_a_1<52:61>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x21] e27:_to_p_state<62:64>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 e18, ARG2 x28] _4:_the_q<65:68>[BV x28] e33:compound<69:83>{e SF prop, TENSE untensed, MOOD indicative, PROG -, PERF -}[ARG1 x28, ARG2 x34] _5:proper_q<69:77>[BV x34] x34:named<69:77>("Atlantic"){x PERS 3, NUM sg, IND +, PT pt}[] x28:named<78:83>("Ocean"){x PERS 3, NUM sg, IND +, PT pt}[]}? >>> edsnative.decode(x) Traceback (most recent call last): File "", line 1, in File "/Users/ar/venv/lib/python3.8/site-packages/delphin/codecs/eds.py", line 110, in decode lexer = _EDSLexer.lex(s.splitlines()) AttributeError: 'EDS' object has no attribute 'splitlines' Best, Alexandre From goodman.m.w at gmail.com Fri Sep 25 03:34:44 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 25 Sep 2020 09:34:44 +0800 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: On Fri, Sep 25, 2020 at 7:07 AM Alexandre Rademaker wrote: > > Hi Michael and Stephan, > > A good place to learn about the Lisp format is > http://www.gigamonkeys.com/book/a-few-format-recipes.html > > [...] Thanks Alexandre for the links and the explanation. I tried reading some elisp docs and a guide on format in order to understand the expression when Stephan posted it, but after about 20 minutes I decided that was too much effort just to understand an email, so I gave up. It looks like the serialisation/encode of EDS in pydelphin is also robust > to empty top: > > > https://github.com/delph-in/pydelphin/blob/develop/delphin/codecs/eds.py#L257 > Hmm, I guess I anticipated that because I allow an empty top in the data structure. Thanks for digging that up! > But the decode/parse is not, see tests below. Actually, encode should not > emit a colon in the first line and, of course, there is this discussion > about the line-oriented format that would require a broad review of the > encode/decode of EDS. > I think the colon was deliberate to avoid potential ambiguity with the identifier of the first node. Stephan instead wants to make newlines obligatory. I'm happy to make newlines + indentation the default for EDS native serialization, but I'm not prepared to get rid of the ability to write single-line EDS. > > I have submitted a PR to Michael solving the translation from MRS to EDS, > but I didn?t touch in the decode/encode functions. > Thanks, I'll take a look. I found the Lisp code in the lkb/src/mrs/dependencies.lisp file, so it is > part of the LKB source code. I am curious, what `psoa` stands for? > > "probable-state-of-affairs". But I'm not sure where that terminology comes from. > > [...] > >>> edsnative.decode(x) > Traceback (most recent call last): > File "", line 1, in > File "/Users/ar/venv/lib/python3.8/site-packages/delphin/codecs/eds.py", > line 110, in decode > lexer = _EDSLexer.lex(s.splitlines()) > AttributeError: 'EDS' object has no attribute 'splitlines' > Here you have attempted to decode x, which is the EDS data structure. Instead you'd want to do `edsnative.decode(edsnative.encode(x))`, but that also fails because it expects the top variable before the colon. It appears my robustness attempt was incomplete. -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From ebender at uw.edu Fri Sep 25 04:40:23 2020 From: ebender at uw.edu (Emily M. Bender) Date: Thu, 24 Sep 2020 19:40:23 -0700 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: I don't have much to contribute to serialization etc, but psoa is `parameterized state of affairs', and I think it comes from the situation semantics literature. Emily On Thu, Sep 24, 2020 at 6:36 PM goodman.m.w at gmail.com wrote: > On Fri, Sep 25, 2020 at 7:07 AM Alexandre Rademaker > wrote: > >> >> Hi Michael and Stephan, >> >> A good place to learn about the Lisp format is >> http://www.gigamonkeys.com/book/a-few-format-recipes.html >> >> [...] > > > Thanks Alexandre for the links and the explanation. > > I tried reading some elisp docs and a guide on format in order to > understand the expression when Stephan posted it, but after about 20 > minutes I decided that was too much effort just to understand an email, so > I gave up. > > It looks like the serialisation/encode of EDS in pydelphin is also robust >> to empty top: >> >> >> https://github.com/delph-in/pydelphin/blob/develop/delphin/codecs/eds.py#L257 >> > > Hmm, I guess I anticipated that because I allow an empty top in the data > structure. Thanks for digging that up! > > > >> But the decode/parse is not, see tests below. Actually, encode should not >> emit a colon in the first line and, of course, there is this discussion >> about the line-oriented format that would require a broad review of the >> encode/decode of EDS. >> > > I think the colon was deliberate to avoid potential ambiguity with the > identifier of the first node. Stephan instead wants to make newlines > obligatory. I'm happy to make newlines + indentation the default for EDS > native serialization, but I'm not prepared to get rid of the ability to > write single-line EDS. > > >> >> I have submitted a PR to Michael solving the translation from MRS to EDS, >> but I didn?t touch in the decode/encode functions. >> > > Thanks, I'll take a look. > > I found the Lisp code in the lkb/src/mrs/dependencies.lisp file, so it is >> part of the LKB source code. I am curious, what `psoa` stands for? >> >> "probable-state-of-affairs". But I'm not sure where that terminology > comes from. > > > >> >> [...] >> >>> edsnative.decode(x) >> Traceback (most recent call last): >> File "", line 1, in >> File >> "/Users/ar/venv/lib/python3.8/site-packages/delphin/codecs/eds.py", line >> 110, in decode >> lexer = _EDSLexer.lex(s.splitlines()) >> AttributeError: 'EDS' object has no attribute 'splitlines' >> > > Here you have attempted to decode x, which is the EDS data structure. > Instead you'd want to do `edsnative.decode(edsnative.encode(x))`, but that > also fails because it expects the top variable before the colon. It appears > my robustness attempt was incomplete. > > -- > -Michael Wayne Goodman > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Fri Sep 25 05:05:57 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 25 Sep 2020 00:05:57 -0300 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: Thank you Emily, I found one mention in https://plato.stanford.edu/entries/situations-semantics/. But in the context of the Lisp code, I am curious why the authors used this in the suffix of the function name? BTW, I reported two issues on pydelphin and ERG repositories: https://github.com/delph-in/erg/issues/25 The trunk version of ERG gave me this MRS.. [ TOP: h0 INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ] [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ] [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8 ARG3: i9 ] [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop TENSE: untensed MOOD: indicative ] ] [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ] [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ] [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ] [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE: untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] > HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ] Handle h23 does not appear in the predicates. The h5 and h8 only in the arguments. Is it valid? Pydelphin transformation to DMRS works with a warning "broken handle constraint?. Can it be transformed to EDS? Is this an evidence that MRS to EDS is much less robust than MRS to DMRS? BTW, since I am having so many trouble with MRS to EDS, and my goal is to compare a golden version of a profile with its 1-best parsed version to evaluate the parse selection model, I wonder if I could be doing that with DMRS instead of EDS? any idea? Any alternative to https://github.com/delph-in/delphin.edm using DMRS? Best, Alexandre > On 24 Sep 2020, at 23:40, Emily M. Bender wrote: > > I don't have much to contribute to serialization etc, but psoa is `parameterized state of affairs', and I think it comes from the situation semantics literature. > > Emily > From arademaker at gmail.com Fri Sep 25 05:20:58 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 25 Sep 2020 00:20:58 -0300 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: Ops! I came to hasty conclusions! LKB FOS parsed the sentence with ERG trunk. Two analysis, two MRS and two EDS with no error!! In the MRSs from LKB, ARG2 of `_communicate_v_to` and `_express_v_to` are both qeq to the label of the `unknown` predicate. I don?t know what I can conclude now? ACE error? > On 25 Sep 2020, at 00:05, Alexandre Rademaker wrote: > > > Thank you Emily, I found one mention in https://plato.stanford.edu/entries/situations-semantics/. But in the context of the Lisp code, I am curious why the authors used this in the suffix of the function name? > > BTW, I reported two issues on pydelphin and ERG repositories: > > https://github.com/delph-in/erg/issues/25 > > The trunk version of ERG gave me this MRS.. > > [ TOP: h0 > INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] > RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ] > [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ] > [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8 ARG3: i9 ] > [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop TENSE: untensed MOOD: indicative ] ] > [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ] > [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ] > [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ] > [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE: untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] > > HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ] > > Handle h23 does not appear in the predicates. The h5 and h8 only in the arguments. Is it valid? > > Pydelphin transformation to DMRS works with a warning "broken handle constraint?. Can it be transformed to EDS? Is this an evidence that MRS to EDS is much less robust than MRS to DMRS? > > BTW, since I am having so many trouble with MRS to EDS, and my goal is to compare a golden version of a profile with its 1-best parsed version to evaluate the parse selection model, I wonder if I could be doing that with DMRS instead of EDS? any idea? Any alternative to https://github.com/delph-in/delphin.edm using DMRS? > > Best, > Alexandre From goodman.m.w at gmail.com Fri Sep 25 05:29:27 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 25 Sep 2020 11:29:27 +0800 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: Thanks, Emily, for the correction! On Fri, Sep 25, 2020 at 11:22 AM Alexandre Rademaker wrote: > > Ops! I came to hasty conclusions! LKB FOS parsed the sentence with ERG > trunk. Two analysis, two MRS and two EDS with no error!! In the MRSs from > LKB, ARG2 of `_communicate_v_to` and `_express_v_to` are both qeq to the > label of the `unknown` predicate. > > I don?t know what I can conclude now? ACE error? > Regarding the MRS you reported: broken HCONS are a somewhat common issue, and they are the symptom of a bug. The MRS is not valid. Regarding conversion to EDS, the LKB goes to greater lengths to give an EDS even for ill-formed MRSs, while PyDelphin tries to avoid grammar-specific solutions. The result is that PyDelphin will, I think, apply predicate-modification more broadly than the LKB, but it is a bit more brittle. While there may be other reasons to use DMRS instead, in this case the different behavior in PyDelphin is just because it issues a warning instead of an error, then prints out the partial DMRS, dropping the disconnected nodes. > > On 25 Sep 2020, at 00:05, Alexandre Rademaker > wrote: > > > > > > Thank you Emily, I found one mention in > https://plato.stanford.edu/entries/situations-semantics/. But in the > context of the Lisp code, I am curious why the authors used this in the > suffix of the function name? > > > > BTW, I reported two issues on pydelphin and ERG repositories: > > > > https://github.com/delph-in/erg/issues/25 > > > > The trunk version of ERG gave me this MRS.. > > > > [ TOP: h0 > > INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] > > RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE: > tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ] > > [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop > TENSE: pres MOOD: indicative PROG: - PERF: - ] ] > > [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8 ARG3: > i9 ] > > [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop > TENSE: untensed MOOD: indicative ] ] > > [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ] > > [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ] > > [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ] > > [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE: > untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] > > > HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ] > > > > Handle h23 does not appear in the predicates. The h5 and h8 only in the > arguments. Is it valid? > > > > Pydelphin transformation to DMRS works with a warning "broken handle > constraint?. Can it be transformed to EDS? Is this an evidence that MRS to > EDS is much less robust than MRS to DMRS? > > > > BTW, since I am having so many trouble with MRS to EDS, and my goal is > to compare a golden version of a profile with its 1-best parsed version to > evaluate the parse selection model, I wonder if I could be doing that with > DMRS instead of EDS? any idea? Any alternative to > https://github.com/delph-in/delphin.edm using DMRS? > > > > Best, > > Alexandre > > > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Fri Sep 25 05:32:36 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 25 Sep 2020 11:32:36 +0800 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: And regarding https://github.com/delph-in/delphin.edm, note that this implementation also works for MRS, by converting to EDS along the way, and for DMRS, without conversion. On Fri, Sep 25, 2020 at 11:29 AM goodman.m.w at gmail.com < goodman.m.w at gmail.com> wrote: > Thanks, Emily, for the correction! > > On Fri, Sep 25, 2020 at 11:22 AM Alexandre Rademaker > wrote: > >> >> Ops! I came to hasty conclusions! LKB FOS parsed the sentence with ERG >> trunk. Two analysis, two MRS and two EDS with no error!! In the MRSs from >> LKB, ARG2 of `_communicate_v_to` and `_express_v_to` are both qeq to the >> label of the `unknown` predicate. >> >> I don?t know what I can conclude now? ACE error? >> > > Regarding the MRS you reported: broken HCONS are a somewhat common issue, > and they are the symptom of a bug. The MRS is not valid. > > Regarding conversion to EDS, the LKB goes to greater lengths to give an > EDS even for ill-formed MRSs, while PyDelphin tries to avoid > grammar-specific solutions. The result is that PyDelphin will, I think, > apply predicate-modification more broadly than the LKB, but it is a bit > more brittle. > > While there may be other reasons to use DMRS instead, in this case the > different behavior in PyDelphin is just because it issues a warning instead > of an error, then prints out the partial DMRS, dropping the disconnected > nodes. > > > >> > On 25 Sep 2020, at 00:05, Alexandre Rademaker >> wrote: >> > >> > >> > Thank you Emily, I found one mention in >> https://plato.stanford.edu/entries/situations-semantics/. But in the >> context of the Lisp code, I am curious why the authors used this in the >> suffix of the function name? >> > >> > BTW, I reported two issues on pydelphin and ERG repositories: >> > >> > https://github.com/delph-in/erg/issues/25 >> > >> > The trunk version of ERG gave me this MRS.. >> > >> > [ TOP: h0 >> > INDEX: e2 [ e SF: prop TENSE: tensed MOOD: indicative PROG: - PERF: - ] >> > RELS: < [ _communicate_v_to<0:11> LBL: h1 ARG0: e4 [ e SF: prop TENSE: >> tensed MOOD: indicative PROG: - PERF: - ] ARG1: i3 ARG2: h5 ARG3: i6 ] >> > [ _or_c<12:14> LBL: h1 ARG0: e2 ARG1: e4 ARG2: e7 [ e SF: prop >> TENSE: pres MOOD: indicative PROG: - PERF: - ] ] >> > [ _express_v_to<15:22> LBL: h1 ARG0: e7 ARG1: i3 ARG2: h8 >> ARG3: i9 ] >> > [ unknown<23:33> LBL: h10 ARG: u12 ARG0: e11 [ e SF: prop >> TENSE: untensed MOOD: indicative ] ] >> > [ _by_p_means<23:25> LBL: h10 ARG0: e11 ARG1: u13 ARG2: x14 ] >> > [ udef_q<26:33> LBL: h15 ARG0: x14 RSTR: h16 BODY: h17 ] >> > [ nominalization<26:33> LBL: h18 ARG0: x14 ARG1: h19 ] >> > [ _write_v_to<26:33> LBL: h19 ARG0: e20 [ e SF: prop TENSE: >> untensed MOOD: indicative PROG: + PERF: - ] ARG1: i21 ARG2: i22 ] > >> > HCONS: < h0 qeq h1 h5 qeq h23 h8 qeq h23 h16 qeq h18 > ] >> > >> > Handle h23 does not appear in the predicates. The h5 and h8 only in the >> arguments. Is it valid? >> > >> > Pydelphin transformation to DMRS works with a warning "broken handle >> constraint?. Can it be transformed to EDS? Is this an evidence that MRS to >> EDS is much less robust than MRS to DMRS? >> > >> > BTW, since I am having so many trouble with MRS to EDS, and my goal is >> to compare a golden version of a profile with its 1-best parsed version to >> evaluate the parse selection model, I wonder if I could be doing that with >> DMRS instead of EDS? any idea? Any alternative to >> https://github.com/delph-in/delphin.edm using DMRS? >> > >> > Best, >> > Alexandre >> >> >> > > -- > -Michael Wayne Goodman > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Fri Sep 25 06:02:16 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 25 Sep 2020 01:02:16 -0300 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: Message-ID: <7916EB41-2AD1-4020-A6B8-64742999D1BE@gmail.com> Hi Michael, Thank you for your comments. The main fact now is that LKB does not produce disconnected MRSs for the same sentence. Maybe ACE error or a difference on the initialization scripts of ERG for each tool? I understood the possible difference in the approach for converting MRS to EDS between LKB and PyDelphin, but the MRSs are different to begin with. Good point about the incomplete DMRS output. One more hasty conclusion from my side. I didn?t inspect carefully the DMRS produced. How can I tell edm to use DMRS instead of EDS? Maybe I missed something... Maybe edm must be more robust to ignore pairs with errors? Alexandre Sent from my iPhone > On 25 Sep 2020, at 00:32, goodman.m.w at gmail.com wrote: > And regarding https://github.com/delph-in/delphin.edm, note that this implementation also works for MRS, by converting to EDS along the way, and for DMRS, without conversion. -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Fri Sep 25 06:40:02 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 25 Sep 2020 12:40:02 +0800 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: <7916EB41-2AD1-4020-A6B8-64742999D1BE@gmail.com> References: <7916EB41-2AD1-4020-A6B8-64742999D1BE@gmail.com> Message-ID: On Fri, Sep 25, 2020 at 12:02 PM Alexandre Rademaker wrote: > Hi Michael, > > Thank you for your comments. > > The main fact now is that LKB does not produce disconnected MRSs for the > same sentence. Maybe ACE error or a difference on the initialization > scripts of ERG for each tool? I understood the possible difference in the > approach for converting MRS to EDS between LKB and PyDelphin, but the MRSs > are different to begin with. > Oh I see. I missed that part of your message. Different parse-ranking models, perhaps? > Good point about the incomplete DMRS output. One more hasty conclusion > from my side. I didn?t inspect carefully the DMRS produced. > > How can I tell edm to use DMRS instead of EDS? Maybe I missed > something... > The -f / --format option specifies the codec to use. E.g., --format=dmrx Maybe edm must be more robust to ignore pairs with errors? > That's a good idea. I created https://github.com/delph-in/delphin.edm/issues/3 > Alexandre > Sent from my iPhone > > On 25 Sep 2020, at 00:32, goodman.m.w at gmail.com wrote: > > And regarding https://github.com/delph-in/delphin.edm, note that this > implementation also works for MRS, by converting to EDS along the way, and > for DMRS, without conversion. > > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Sun Sep 27 01:54:53 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Sat, 26 Sep 2020 20:54:53 -0300 Subject: [developers] CSLI vs MRS profiles Message-ID: Hi, A couple of months ago, I was editing the page http://moin.delph-in.net/MatrixMrsTestSuite and Stephan corrected my mistake and explained to me the origin of both datasets: > the MRS test suite is something that ann and dan cooked up over the course of five or so weeks while dan was visiting cambridge, 2001 or 2002, i would say. except for some reuse of Abrams and Browne, I doubt there is any overlap in actual sentences with what was originally called the HP test suite. the latter was created to explore variation in syntactic structures and lives on in the DELPH-IN universe under the name CSLI test suite (since around 1994). the MRS test suite, on the other hand, exemplifies basic semantic constructions. so, in my view it is misleading to say it was derived from the HP data, but dan was of course centrally involved in both efforts. I am playing with these profiles again, trying to have both translated to Brazilian Portuguese (not the European Portuguese translation we have in the wiki). But I have a question Why 7 sentences are different if we compare http://moin.delph-in.net/MatrixMrsTestSuite to http://svn.delph-in.net/erg/trunk/tsdb/gold/mrs/? Diff below. Does anyone remember what happen? Should we update the wiki? The page http://moin.delph-in.net/MatrixMrsTestSuite is now marked as Immutable, but I believe the sentence > Currently, there are test suites for the following languages (included in the [incr tsdb()] software package) is misleading. We don't have a [incr tsdb()] package with profiles. The page http://www.delph-in.net/itsdb/ has a link to http://lingo.stanford.edu/ftp/latest/ but it was not working. I found three profiles from the root of LOGON tree: % find . | grep mrs/item.gz ./lingo/erg/tsdb/gold/mrs/item.gz ./lingo/terg/tsdb/gold/mrs/item.gz ./dfki/gg/tsdb/gold/mrs/item.gz So the profiles are not included in [incr tsdb()], they are maintained with grammars, right? We don?t have profiles with all the languages listed in the wiki, only two. We have also some discussions in https://delphinqa.ling.washington.edu/t/matrix-mrs-test-suite/484. In the forum, I suggested other changes in the wiki: > The wiki is confusing since on the main page we have translations for Japanese and a relevant discussion about the structure of the set only at the bottom of the page. Moreover, in http://moin.delph-in.net/MatrixMrsTestSuiteEn links are all broken. Does anyone know what is this old server that the links like http://cypriot.stanford.edu/~bond/mrs-en060524/11.html point to? How can I help to make them work again? Any comment? Best, % diff matrix-en.sent mrs-en.sent 20c20 < Cats bark. --- > Cats go. 22c22 < Some bark. --- > Some went. 53c53 < Chased dogs bark. --- > Chased dogs go. 55c55 < That the cat chases Browne is old. --- > That the cat chases Browne is obvious. 62,64c62,64 < Browne's barks. < Twenty three dogs bark. < Two hundred twenty dogs bark. --- > Browne's goes. > Twenty three dogs go. > Two hundred twenty dogs go. 79c79 < Abrams promised Browne to bark. --- > Abrams promised Browne to go. 97c97 < The cats found a way to bark. --- > The cats found a way to go. -- Alexandre Rademaker http://arademaker.github.io From oe at ifi.uio.no Sun Sep 27 09:16:17 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sun, 27 Sep 2020 09:16:17 +0200 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: hi mike and alexandre, > Regarding conversion to EDS, the LKB goes to greater lengths to give an EDS even for ill-formed MRSs, while PyDelphin tries to avoid grammar-specific solutions. The result is that PyDelphin will, I think, apply predicate-modification more broadly than the LKB, but it is a bit more brittle. conceptually, i think all MRSs should be converted to EDSs, and no information that can be expressed in the EDS graph should be lost. more practically: each EP (and each ICONS) in the MRS should introduce a node into the EDS, and each semantic role whose value is associated with a node should yield an edge (additional edges should be introduced for instances of predicate modification). additional or illformed information in the MRS (e.g. invalid handle constraints or roles whose value does not correspond to the label or intrinsic variable of an EP) should be ignored. on this view, conversion to EDS should always succeed. indeed, robustness to (linguistically) illformed MRSs has been a key goal in the original EDS converter that is part of the LKB. i would welcome bug reports, in case you encounter conversion errors. mike. why would you think any of the above might call for grammar-specific solutions? i would like to encourage pyDelphin to embrace the same robustness goals as the LKB-based converter. EDSs were originally invented for practical utility, to make it easier for downstream applications to work with ERG analyses; for that goal, any MRS that comes out of the parser should also yield an EDS, no matter its structure or contents. best wishes, oe From goodman.m.w at gmail.com Sun Sep 27 17:37:32 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Sun, 27 Sep 2020 23:37:32 +0800 Subject: [developers] Valid MRS? Bug in ERG? In-Reply-To: References: <1B872B18-BE49-4B14-A893-2AFB0A98C856@gmail.com> Message-ID: On Sun, Sep 27, 2020 at 3:16 PM Stephan Oepen wrote: > [...] > conceptually, i think all MRSs should be converted to EDSs, and no > information that can be expressed in the EDS graph should be lost. > more practically: each EP (and each ICONS) in the MRS should introduce > a node into the EDS, and each semantic role whose value is associated > with a node should yield an edge (additional edges should be > introduced for instances of predicate modification). additional or > illformed information in the MRS (e.g. invalid handle constraints or > roles whose value does not correspond to the label or intrinsic > variable of an EP) should be ignored. > I agree with this, except the part about ICONS introducing nodes surprised me.. I thought that EDS, like DMRS, is yet to provide a treatment for ICONS. Care to explain? > > [...] > > mike. why would you think any of the above might call for > grammar-specific solutions? i would like to encourage pyDelphin to > embrace the same robustness goals as the LKB-based converter. EDSs > were originally invented for practical utility, to make it easier for > downstream applications to work with ERG analyses; for that goal, any > MRS that comes out of the parser should also yield an EDS, no matter > its structure or contents. First, some clarifications/corrections: The EDS conversion error in PyDelphin is a bug, not expected behavior. Also, the dropped nodes for the disconnected DMRS was not completely described (sorry, Alexandre). The conversion actually creates nodes even for the disconnected EPs, but I was viewing the output of the dmrs-penman codec which is not capable of representing disconnected graphs (the same goes for eds-penman). They are present in other serialization formats. Regarding the grammar-specific solutions, as I understand the LKB's EDS code still maintains lists of ERG (1214?) predicates, roles, etc. for various uses, such as predicate modification. In the case of predicate modification, it and PyDelphin suture the unlinked nodes with the ARG1 role, if it was otherwise unused. I was under the impression that the LKB's converter did a bit more surgery in order to normalize some anticipated ill-formed structures. But if I was mistaken and the only other "value-added" part of conversion is the aforementioned top-selection with the other MRS-maladies ignored, then I think we share the same goal and view of robustness. -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Oct 7 19:50:28 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Wed, 7 Oct 2020 14:50:28 -0300 Subject: [developers] Fwd: [DELPH-IN Discourse] [ERG] Top level ERG page is down: http://www.delph-in.net/erg/ References: Message-ID: <125E6F19-E3FF-4EE0-876C-2D796FD68DC9@gmail.com> Hi Stephan and Dan, Just calling your attention to the message below about the http://www.delph-in.net/erg/ . Is this link different from the http://erg.delph-in.net/logon? Github link is easy to change, what should be the ERG official homepage? Best, Alexandre > From: Eric Zinda via DELPH-IN Discourse > Subject: [DELPH-IN Discourse] [ERG] Top level ERG page is down: http://www.delph-in.net/erg/ > Date: 7 October 2020 14:12:59 GMT-3 > To: arademaker at gmail.com > Reply-To: DELPH-IN Discourse > > EricZinda > October 7 > I think the top level ERG page has been down for some time (weeks? Months?): http://www.delph-in.net/erg/ > Linked from: > > lots of top level links if you google ERG > the top level grammar page: http://www.delph-in.net/wiki/index.php/Grammars > the github site: https://github.com/delph-in/erg > others I?m sure > Just thought someone might want to know? > > Visit Topic or reply to this email to respond. > > You are receiving this because you enabled mailing list mode. > > To unsubscribe from these emails, click here . > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Thu Oct 8 22:20:02 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 8 Oct 2020 22:20:02 +0200 Subject: [developers] Fwd: [DELPH-IN Discourse] [ERG] Top level ERG page is down: http://www.delph-in.net/erg/ In-Reply-To: <125E6F19-E3FF-4EE0-876C-2D796FD68DC9@gmail.com> References: <125E6F19-E3FF-4EE0-876C-2D796FD68DC9@gmail.com> Message-ID: > Just calling your attention to the message below about the http://www.delph-in.net/erg/. Is this link different from the http://erg.delph-in.net/logon? Github link is easy to change, what should be the ERG official homepage? thanks for the note, alexandre! yes, 'http://www.delph-in.net/erg/' is the ERG home page, which redirects to 'http://lingo.stanford.edu', which has crashed some months ago and has proven difficult to replace given pandemic-related constraints. dan will have to decide what to do about that page. 'http://erg.delph-in.net/logon' is just the on-line demonstration, which is running in oslo. that service has been subjected to occasional flooding attacks (with tens of thousands of queries on same days), which has caused some challenges in robustness and availability. but i feel committed to keeping that service alive, in principle :-). best, oe From arademaker at gmail.com Tue Oct 20 05:42:43 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 20 Oct 2020 00:42:43 -0300 Subject: [developers] www script in the logon distribution In-Reply-To: References: <5B1D74E8-863C-4547-9C80-0D9B0E41EF88@gmail.com> Message-ID: Hi Stephan, Using only the --erg I got: user at acb050b97030:~/logon$ ./www --erg /home/user/logon/bin/logon: line 94: /home/user/logon/franz/linux.x86.64/alisp: No such file or directory ^C It looks like it is trying to compile the code and looking for the allegro lisp interpreter. So I started with $ ./www --binary --debug --erg --port 9080 Everything inside my docker image. The docker redirects the 9080 to the host port. But in the host the request to localhost:9080/logon does not work. Even inside the docker image, in another shell, I get $ wget http://localhost:9080/logon --2020-10-20 03:41:17-- http://localhost:9080/logon Resolving localhost (localhost)... 127.0.0.1, ::1 Connecting to localhost (localhost)|127.0.0.1|:9080... failed: Connection refused. Connecting to localhost (localhost)|::1|:9080... failed: Cannot assign requested address. Retrying. After many messages, I see ... [t40005] reading `/home/user/logon/lingo/erg/pet/english.set'... including `/home/user/logon/lingo/erg/pet/common.set'... including `/home/user/logon/lingo/erg/pet/global.set'... including `/home/user/logon/lingo/erg/pet/repp.set'... including `/home/user/logon/li ngo/erg/pet/mrs.set'... loading `/home/user/logon/lingo/erg/english.grm' [t40002] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] [t40005] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] [t40004] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] [t40003] (ERG (1214)) reading ME model `/home/user/logon/lingo/erg/redwoods.mem'... [3643349 features] But I don?t have the REPL, it looks like this reading of ME model didn?t finish. Using $ ./www --binary --debug -?terg --port 9080 I got set-coding-system(): activated UTF8. ; Loading /home/user/logon/lingo/terg/Version.lsp ; Loading /home/user/logon/lingo/terg/lkb/globals.lsp ; Loading /home/user/logon/lingo/terg/lkb/user-fns.lsp ; Loading /home/user/logon/lingo/terg/lkb/checkpaths.lsp ; Loading /home/user/logon/lingo/terg/lkb/patches.lsp Reading in type file fundamentals Reading in type file tmt Reading in type file lextypes Syntax error: . expected and not found in N_-_C_LE at position 226346 Inserting . Error: "" should not be a string Restart actions (select using :continue): 0: retry the load of /home/user/logon/lingo/terg/lkb/script 1: skip loading /home/user/logon/lingo/terg/lkb/script 2: Return to Top Level (an "abort" restart). 3: Abort entirely from this (lisp) process. [changing package from "TSDB" to "LKB"] [1] LKB(7): Any idea? Best, Alexandre > On 6 Aug 2020, at 05:04, Stephan Oepen wrote: > > hi again, alexandre: > >> For some reason, the www script in the logon distribution does not start the webserver. Using the `--debug` option, I don't have any additional information in the log file (actually, the script didn't mention the debug anywhere). I am following all instructions from http://moin.delph-in.net/LogonOnline. In particular, pvmd3 is running without any error in the startup. I don't see any *.pvm file in the /tmp. The script bin/logon starts LKB and the [incr TSDB()] normally. I have used `?cat` to save a lisp file and load it manually in the ACL REPL, no error too. Any idea? > > i am slowly catching up to DELPH-IN email, with apologies for the long > turn-around! > > is the above still a current problem? is this within your container, > or does it also occur on a 'regular' linux box? > > to debug further, note that the 'www' script sets things up so that > you can interact with the running lisp image once initialization is > complete, i.e. just type into the lisp prompt, e.g. to inspect the > state of AllegroServe. > > when you observe that the web server is not started, does that mean it > does not even bind to its port? when running with the standard > '--erg' option, i would expect the following to work (and return the > dynamically generated top-level page): > > wget http://localhost:8100/logon > > best wishes, oe From gete2 at cam.ac.uk Mon Nov 9 16:12:19 2020 From: gete2 at cam.ac.uk (Guy Emerson) Date: Mon, 9 Nov 2020 15:12:19 +0000 Subject: [developers] Bug in interactive unification Message-ID: Dear all, I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure). I wasn?t sure where to report this bug. This is admittedly a rare situation (which is probably why it hasn?t been an issue until now). But it happens when recursive computation types lead to a unification failure. I?ve written a small example to illustrate the problem (see attached file). Note that there is no parsing involved here, just compilation of this file and interactive unification. In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5. Best, Guy -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: unification-bug.tdl Type: application/octet-stream Size: 3569 bytes Desc: not available URL: From J.A.Carroll at sussex.ac.uk Tue Nov 10 09:29:51 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Tue, 10 Nov 2020 08:29:51 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: References: Message-ID: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> Dear Guy, Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI. What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm... I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp") John On 9 Nov 2020, at 15:12, Guy Emerson > wrote: Dear all, I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure). I wasn?t sure where to report this bug. This is admittedly a rare situation (which is probably why it hasn?t been an issue until now). But it happens when recursive computation types lead to a unification failure. I?ve written a small example to illustrate the problem (see attached file). Note that there is no parsing involved here, just compilation of this file and interactive unification. In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5. Best, Guy -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: debug-unify2-patch.lsp Type: application/octet-stream Size: 3605 bytes Desc: debug-unify2-patch.lsp URL: From gete2 at cam.ac.uk Tue Nov 10 12:50:11 2020 From: gete2 at cam.ac.uk (Guy Emerson) Date: Tue, 10 Nov 2020 11:50:11 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> Message-ID: After loading that file, instead of displaying the incorrect result, LUI now displays nothing. The log file (/tmp/yzlui.debug.ubuntu) says: process_complete_command(): ` avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: NATNUM]] "natnum-with-copy-wrapper - expanded" ' process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded" ' process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)" [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] ' Item in list is not homogeneous (list type 12, item type 3) Path of failure was not a list of symbols (type 12) Item in list is not homogeneous (list type 13, item type 3) YZLUI: Received unknown lkb-protocol top-level command: AVM Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll < J.A.Carroll at sussex.ac.uk>: > Dear Guy, > > Thanks for this example showing the problem. I?ve reproduced it: > unification failure at SUCC.RESULT with LKB native graphics, but successful > unification with LUI. > > What gets executed is very different between the two cases. The LKB is > content to find the first failure path, whereas for LUI the LKB runs a > completely different ?robust? unifier which records all failure paths. I?ve > found a bug in the latter which I think accounts for the problem. > In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not > get assigned back to %failures% as it should. This means that currently a > failure in applying a constraint is only recorded if it's not the first > unification failure. Hmm... > > I attach a patch for the LKB (any version) which fixes the problem you > observed with LUI interactive unification. I hope it fixes the bug > completely, but I haven't tested on other examples. Since it's Lisp code, > you can load it by typing the following at the command line in a running > LKB: (load "path-to/debug-unify2-patch.lsp") > > John > > On 9 Nov 2020, at 15:12, Guy Emerson wrote: > > Dear all, > > I found a bug in interactive unification, which I posted about here: > https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 > > The bug is the following: if there is no possible type for a feature path, > but that path does not exist in either of the two input feature structures, > then interactive unification does not enforce all constraints (i.e. it > produces an incorrect result, rather than reporting unification failure). > > I wasn?t sure where to report this bug. > > This is admittedly a rare situation (which is probably why it hasn?t been > an issue until now). But it happens when recursive computation types lead > to a unification failure. I?ve written a small example to illustrate the > problem (see attached file). Note that there is no parsing involved here, > just compilation of this file and interactive unification. > > In more positive news, I can report that when there is no failure, the LKB > and interactive unification are both robust to extremely recursive type > constraints. I implemented the untyped lambda calculus as a type system, > and I tested it using the Ackermann function as a lambda expression on > Church numerals (the Ackermann function is non-primitive-recursive, so I > thought this would be a good test case). With 10,570 re-entrancies (no that > is not a typo), it correctly evaluated A(2,1)=5. > > Best, > Guy > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Tue Nov 10 13:07:27 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Tue, 10 Nov 2020 12:07:27 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> Message-ID: <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> I noticed that LUI didn't display anything in response to the unification failure, but didn't know why since I don't get a file in /tmp/. I can't see anything else obviously wrong with the LKB code concerned, but I don't know whether it's sending the right thing to LUI since I haven't found any documentation on the LKB-LUI interface. Woodley, can you shed any light on this? John On 10 Nov 2020, at 11:50, Guy Emerson > wrote: After loading that file, instead of displaying the incorrect result, LUI now displays nothing. The log file (/tmp/yzlui.debug.ubuntu) says: process_complete_command(): ` avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: NATNUM]] "natnum-with-copy-wrapper - expanded" ' process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded" ' process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)" [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] ' Item in list is not homogeneous (list type 12, item type 3) Path of failure was not a list of symbols (type 12) Item in list is not homogeneous (list type 13, item type 3) YZLUI: Received unknown lkb-protocol top-level command: AVM Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll >: Dear Guy, Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI. What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm... I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp") John On 9 Nov 2020, at 15:12, Guy Emerson > wrote: Dear all, I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure). I wasn?t sure where to report this bug. This is admittedly a rare situation (which is probably why it hasn?t been an issue until now). But it happens when recursive computation types lead to a unification failure. I?ve written a small example to illustrate the problem (see attached file). Note that there is no parsing involved here, just compilation of this file and interactive unification. In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5. Best, Guy -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Wed Nov 11 00:32:17 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 10 Nov 2020 20:32:17 -0300 Subject: [developers] Bug report for ERG In-Reply-To: References: Message-ID: BTW, regardless the tokenisation issue, an invalid MRS should not be produced, right? Best, Alexandre > On 10 Nov 2020, at 18:39, Alexandre Rademaker wrote: > > Hi, > > I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets: > > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > > ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate: > > _search_x.htm?csp=34/NN_u_unknown > > But the regex for predicates seems to support dot in the name of the predicate: > > http://moin.delph-in.net/MrsRfc#SerializationFormats > > Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix: > > % ace -g ~/hpsg/wn/terg-mac.dat -E > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa - search_x.htm?csp=34 > > ERG (2018) produced what I was expecting: > > % ace -g erg-mac.dat -E > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > > ERG (1214) produced what I was expecting: > > % ace -g erg-lingo-mac.dat -E > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > [ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ] > > >>>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]') > NOTE: hit RAM limit while unpacking > NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s > >>>> response.result(0).mrs() > Traceback (most recent call last): > File "", line 1, in > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs > mrs = simplemrs.decode(mrs) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode > return _decode_mrs(lexer) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs > rels.append(_decode_rel(lexer, variables)) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel > _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None)) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect > raise self._errcls('expected: ' + err, > delphin.mrs._exceptions.MRSSyntaxError: > line 1, character 2666 > [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ] [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ] [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ] [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ] [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ] [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ] [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ] [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ] [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ] [ _science_n_1<30:37> LBL: h28 ARG0: x25 ] [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ] [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ] [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ] [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ] [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ] [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ] [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ] [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e52 [ e SF: prop-or-ques ] ] [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ] [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ] [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ] [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ] [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ] [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ] [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ] [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ] > ^ > MRSSyntaxError: expected: a feature > > > Best, > Alexandre > From arademaker at gmail.com Tue Nov 10 22:39:32 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Tue, 10 Nov 2020 18:39:32 -0300 Subject: [developers] Bug report for ERG Message-ID: Hi, I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets: [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate: _search_x.htm?csp=34/NN_u_unknown But the regex for predicates seems to support dot in the name of the predicate: http://moin.delph-in.net/MrsRfc#SerializationFormats Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix: % ace -g ~/hpsg/wn/terg-mac.dat -E [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa - search_x.htm?csp=34 ERG (2018) produced what I was expecting: % ace -g erg-mac.dat -E [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ERG (1214) produced what I was expecting: % ace -g erg-lingo-mac.dat -E [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] [ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ] >>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]') NOTE: hit RAM limit while unpacking NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s >>> response.result(0).mrs() Traceback (most recent call last): File "", line 1, in File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs mrs = simplemrs.decode(mrs) File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode return _decode_mrs(lexer) File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs rels.append(_decode_rel(lexer, variables)) File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None)) File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect raise self._errcls('expected: ' + err, delphin.mrs._exceptions.MRSSyntaxError: line 1, character 2666 [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ] [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ] [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ] [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ] [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ] [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ] [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ] [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ] [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ] [ _science_n_1<30:37> LBL: h28 ARG0: x25 ] [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ] [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ] [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ] [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ] [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ] [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ] [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ] [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e52 [ e SF: prop-or-ques ] ] [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ] [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ] [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ] [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ] [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ] [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ] [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ] [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ] ^ MRSSyntaxError: expected: a feature Best, Alexandre From gete2 at cam.ac.uk Tue Nov 10 17:59:58 2020 From: gete2 at cam.ac.uk (Guy Emerson) Date: Tue, 10 Nov 2020 16:59:58 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> Message-ID: I also get this behaviour (no LUI window appearing) if I try to display a type that has no features (obviously, this isn't a useful display on its own, but it would be helpful for interactive unification to be able to drag and drop the type). The log file says: YZLUI: Received unknown lkb-protocol top-level command: AVM Am Di., 10. Nov. 2020 um 12:07 Uhr schrieb John Carroll < J.A.Carroll at sussex.ac.uk>: > I noticed that LUI didn't display anything in response to the unification > failure, but didn't know why since I don't get a file in /tmp/. > > I can't see anything else obviously wrong with the LKB code concerned, but > I don't know whether it's sending the right thing to LUI since I haven't > found any documentation on the LKB-LUI interface. > > Woodley, can you shed any light on this? > > John > > On 10 Nov 2020, at 11:50, Guy Emerson wrote: > > After loading that file, instead of displaying the incorrect result, LUI > now displays nothing. The log file (/tmp/yzlui.debug.ubuntu) says: > > process_complete_command(): ` > avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: > NATNUM]] "natnum-with-copy-wrapper - expanded" > ' > > process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: > #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded" > ' > > process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: > #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] > SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)" > [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] > #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] ' > > Item in list is not homogeneous (list type 12, item type 3) > Path of failure was not a list of symbols (type 12) > Item in list is not homogeneous (list type 13, item type 3) > YZLUI: Received unknown lkb-protocol top-level command: AVM > > > Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll < > J.A.Carroll at sussex.ac.uk>: > >> Dear Guy, >> >> Thanks for this example showing the problem. I?ve reproduced it: >> unification failure at SUCC.RESULT with LKB native graphics, but successful >> unification with LUI. >> >> What gets executed is very different between the two cases. The LKB is >> content to find the first failure path, whereas for LUI the LKB runs a >> completely different ?robust? unifier which records all failure paths. I?ve >> found a bug in the latter which I think accounts for the problem. >> In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not >> get assigned back to %failures% as it should. This means that currently a >> failure in applying a constraint is only recorded if it's not the first >> unification failure. Hmm... >> >> I attach a patch for the LKB (any version) which fixes the problem you >> observed with LUI interactive unification. I hope it fixes the bug >> completely, but I haven't tested on other examples. Since it's Lisp code, >> you can load it by typing the following at the command line in a running >> LKB: (load "path-to/debug-unify2-patch.lsp") >> >> John >> >> On 9 Nov 2020, at 15:12, Guy Emerson wrote: >> >> Dear all, >> >> I found a bug in interactive unification, which I posted about here: >> https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 >> >> The bug is the following: if there is no possible type for a feature >> path, but that path does not exist in either of the two input feature >> structures, then interactive unification does not enforce all constraints >> (i.e. it produces an incorrect result, rather than reporting unification >> failure). >> >> I wasn?t sure where to report this bug. >> >> This is admittedly a rare situation (which is probably why it hasn?t been >> an issue until now). But it happens when recursive computation types lead >> to a unification failure. I?ve written a small example to illustrate the >> problem (see attached file). Note that there is no parsing involved here, >> just compilation of this file and interactive unification. >> >> In more positive news, I can report that when there is no failure, the >> LKB and interactive unification are both robust to extremely recursive type >> constraints. I implemented the untyped lambda calculus as a type system, >> and I tested it using the Ackermann function as a lambda expression on >> Church numerals (the Ackermann function is non-primitive-recursive, so I >> thought this would be a good test case). With 10,570 re-entrancies (no that >> is not a typo), it correctly evaluated A(2,1)=5. >> >> Best, >> Guy >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gete2 at cam.ac.uk Tue Nov 10 21:04:38 2020 From: gete2 at cam.ac.uk (Guy Emerson) Date: Tue, 10 Nov 2020 20:04:38 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: References: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Message-ID: Thanks, John, with that patch I can also see a result, showing that zero and defective-natnum fail to unify, at the right place! I find the display a little counter-intuitive, because it gives a different result depending on which direction I do the unification. But that might be a matter of taste. It displays the failure and I can now use LKB+LUI to debug my code! For completeness, here is the log file for the two unifications: process_complete_command(): `avm 20 #D[defective-one-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: ZERO]]] "Unification Failures (2)" [#U[constraint 8 [NATNUM] DEFECTIVE-POS-WITH-COPY NATNUM-WITH-COPY DEFECTIVE-POS-WITH-COPY -1] #U[type 7 [NATNUM SUCC RESULT] ZERO DEFECTIVE-NATNUM 8]] ' process_complete_command(): `avm 9 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)" [#U[constraint 10 [NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 9 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 10]] ' a_tag_path: [0]->(null) Woodley, here's the command for a type with no features, along with the error again: process_complete_command(): `avm 1 ZERO "zero - expanded" ' YZLUI: Received unknown lkb-protocol top-level command: AVM Am Di., 10. Nov. 2020 um 17:51 Uhr schrieb Woodley Packard < sweaglesw at sweaglesw.org>: > Good sleuthing work, gentlemen. > > The name of the first constraint sounds like it?s being truncated somehow, > as I would hypothesize that ?vi? is part of ?violation?? Is that > consistent with what is displayed? > > John, it does look like for whatever reason I did not set maclui up to > open a log file ? an unfortunate oversight. I will fix that. > > Guy, when displaying the atomic AVM and getting no window, is there a > corresponding ?process_complete_command? line in the log? > > Regards, Woodley > > On Nov 10, 2020, at 9:30 AM, John Carroll > wrote: > > ? > Aha, the constraint object in your LUI log has unbalanced brackets. > Guessing which bracket is wrong, I've changed another LKB robust unifier > function, and attach a new version of the file debug-unify2-patch.lsp > > With this new patch file, LUI now displays a "Unification Failures" window > with 2 failures: "GLB Type Constraint Vi" and "No GLB Exists". Are these > correct? > > John > > > On 10 Nov 2020, at 12:07, John Carroll wrote: > > I noticed that LUI didn't display anything in response to the unification > failure, but didn't know why since I don't get a file in /tmp/. > > I can't see anything else obviously wrong with the LKB code concerned, but > I don't know whether it's sending the right thing to LUI since I haven't > found any documentation on the LKB-LUI interface. > > Woodley, can you shed any light on this? > > John > > On 10 Nov 2020, at 11:50, Guy Emerson wrote: > > After loading that file, instead of displaying the incorrect result, LUI > now displays nothing. The log file (/tmp/yzlui.debug.ubuntu) says: > > process_complete_command(): ` > avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: > NATNUM]] "natnum-with-copy-wrapper - expanded" > ' > > process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: > #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded" > ' > > process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: > #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] > SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)" > [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] > #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] ' > > Item in list is not homogeneous (list type 12, item type 3) > Path of failure was not a list of symbols (type 12) > Item in list is not homogeneous (list type 13, item type 3) > YZLUI: Received unknown lkb-protocol top-level command: AVM > > > Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll < > J.A.Carroll at sussex.ac.uk>: > > Dear Guy, > > Thanks for this example showing the problem. I?ve reproduced it: > unification failure at SUCC.RESULT with LKB native graphics, but successful > unification with LUI. > > What gets executed is very different between the two cases. The LKB is > content to find the first failure path, whereas for LUI the LKB runs a > completely different ?robust? unifier which records all failure paths. I?ve > found a bug in the latter which I think accounts for the problem. > In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not > get assigned back to %failures% as it should. This means that currently a > failure in applying a constraint is only recorded if it's not the first > unification failure. Hmm... > > I attach a patch for the LKB (any version) which fixes the problem you > observed with LUI interactive unification. I hope it fixes the bug > completely, but I haven't tested on other examples. Since it's Lisp code, > you can load it by typing the following at the command line in a running > LKB: (load "path-to/debug-unify2-patch.lsp") > > John > > On 9 Nov 2020, at 15:12, Guy Emerson wrote: > > Dear all, > > I found a bug in interactive unification, which I posted about here: > https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 > > The bug is the following: if there is no possible type for a feature path, > but that path does not exist in either of the two input feature structures, > then interactive unification does not enforce all constraints (i.e. it > produces an incorrect result, rather than reporting unification failure). > > I wasn?t sure where to report this bug. > > This is admittedly a rare situation (which is probably why it hasn?t been > an issue until now). But it happens when recursive computation types lead > to a unification failure. I?ve written a small example to illustrate the > problem (see attached file). Note that there is no parsing involved here, > just compilation of this file and interactive unification. > > In more positive news, I can report that when there is no failure, the LKB > and interactive unification are both robust to extremely recursive type > constraints. I implemented the untyped lambda calculus as a type system, > and I tested it using the Ackermann function as a lambda expression on > Church numerals (the Ackermann function is non-primitive-recursive, so I > thought this would be a good test case). With 10,570 re-entrancies (no that > is not a typo), it correctly evaluated A(2,1)=5. > > Best, > Guy > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweaglesw at sweaglesw.org Tue Nov 10 18:51:32 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Tue, 10 Nov 2020 09:51:32 -0800 Subject: [developers] Bug in interactive unification In-Reply-To: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> References: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Message-ID: Good sleuthing work, gentlemen. The name of the first constraint sounds like it?s being truncated somehow, as I would hypothesize that ?vi? is part of ?violation?? Is that consistent with what is displayed? John, it does look like for whatever reason I did not set maclui up to open a log file ? an unfortunate oversight. I will fix that. Guy, when displaying the atomic AVM and getting no window, is there a corresponding ?process_complete_command? line in the log? Regards, Woodley > On Nov 10, 2020, at 9:30 AM, John Carroll wrote: > > ? > Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp > > With this new patch file, LUI now displays a "Unification Failures" window with 2 failures: "GLB Type Constraint Vi" and "No GLB Exists". Are these correct? > > John > > >>> On 10 Nov 2020, at 12:07, John Carroll wrote: >>> >>> I noticed that LUI didn't display anything in response to the unification failure, but didn't know why since I don't get a file in /tmp/. >>> >>> I can't see anything else obviously wrong with the LKB code concerned, but I don't know whether it's sending the right thing to LUI since I haven't found any documentation on the LKB-LUI interface. >>> >>> Woodley, can you shed any light on this? >>> >>> John >>> >>> On 10 Nov 2020, at 11:50, Guy Emerson wrote: >>> >>> After loading that file, instead of displaying the incorrect result, LUI now displays nothing. The log file (/tmp/yzlui.debug.ubuntu) says: >>> >>> process_complete_command(): ` >>> avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: NATNUM]] "natnum-with-copy-wrapper - expanded" >>> ' >>> >>> process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded" >>> ' >>> >>> process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)" >>> [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] ' >>> >>> Item in list is not homogeneous (list type 12, item type 3) >>> Path of failure was not a list of symbols (type 12) >>> Item in list is not homogeneous (list type 13, item type 3) >>> YZLUI: Received unknown lkb-protocol top-level command: AVM >>> >>> >>> Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll : >>> Dear Guy, >>> >>> Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI. >>> >>> What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm... >>> >>> I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp") >>> >>> John >>> >>>> On 9 Nov 2020, at 15:12, Guy Emerson wrote: >>>> >>>> Dear all, >>>> >>>> I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 >>>> >>>> The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure). >>>> >>>> I wasn?t sure where to report this bug. >>>> >>>> This is admittedly a rare situation (which is probably why it hasn?t been an issue until now). But it happens when recursive computation types lead to a unification failure. I?ve written a small example to illustrate the problem (see attached file). Note that there is no parsing involved here, just compilation of this file and interactive unification. >>>> >>>> In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5. >>>> >>>> Best, >>>> Guy >>>> >>> >>> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Tue Nov 10 18:30:34 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Tue, 10 Nov 2020 17:30:34 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> Message-ID: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp With this new patch file, LUI now displays a "Unification Failures" window with 2 failures: "GLB Type Constraint Vi" and "No GLB Exists". Are these correct? John On 10 Nov 2020, at 12:07, John Carroll > wrote: I noticed that LUI didn't display anything in response to the unification failure, but didn't know why since I don't get a file in /tmp/. I can't see anything else obviously wrong with the LKB code concerned, but I don't know whether it's sending the right thing to LUI since I haven't found any documentation on the LKB-LUI interface. Woodley, can you shed any light on this? John On 10 Nov 2020, at 11:50, Guy Emerson > wrote: After loading that file, instead of displaying the incorrect result, LUI now displays nothing. The log file (/tmp/yzlui.debug.ubuntu) says: process_complete_command(): ` avm 1 #D[natnum-with-copy-wrapper NATNUM: #D[natnum-with-copy RESULT: NATNUM]] "natnum-with-copy-wrapper - expanded" ' process_complete_command(): `avm 2 #D[defective-one-wrapper NATNUM: #D[defective-pos SUCC: ZERO]] "defective-one-wrapper - expanded" ' process_complete_command(): `avm 3 #D[natnum-with-copy-wrapper NATNUM: #D[defective-pos-with-copy RESULT: #D[pos SUCC: <0>= DEFECTIVE-NATNUM] SUCC: #D[zero-with-copy RESULT: <0>]]] "Unification Failures (2)" [#U[constraint 1 [[NATNUM SUCC] ZERO-WITH-COPY ZERO ZERO-WITH-COPY -1] #U[type 0 [NATNUM SUCC RESULT] DEFECTIVE-NATNUM ZERO 1]] ' Item in list is not homogeneous (list type 12, item type 3) Path of failure was not a list of symbols (type 12) Item in list is not homogeneous (list type 13, item type 3) YZLUI: Received unknown lkb-protocol top-level command: AVM Am Di., 10. Nov. 2020 um 08:29 Uhr schrieb John Carroll >: Dear Guy, Thanks for this example showing the problem. I?ve reproduced it: unification failure at SUCC.RESULT with LKB native graphics, but successful unification with LUI. What gets executed is very different between the two cases. The LKB is content to find the first failure path, whereas for LUI the LKB runs a completely different ?robust? unifier which records all failure paths. I?ve found a bug in the latter which I think accounts for the problem. In debug-unify2 in src/glue/dag.lsp, (nconc %failures% failures) does not get assigned back to %failures% as it should. This means that currently a failure in applying a constraint is only recorded if it's not the first unification failure. Hmm... I attach a patch for the LKB (any version) which fixes the problem you observed with LUI interactive unification. I hope it fixes the bug completely, but I haven't tested on other examples. Since it's Lisp code, you can load it by typing the following at the command line in a running LKB: (load "path-to/debug-unify2-patch.lsp") John On 9 Nov 2020, at 15:12, Guy Emerson > wrote: Dear all, I found a bug in interactive unification, which I posted about here: https://delphinqa.ling.washington.edu/t/bug-in-interactive-unification/592 The bug is the following: if there is no possible type for a feature path, but that path does not exist in either of the two input feature structures, then interactive unification does not enforce all constraints (i.e. it produces an incorrect result, rather than reporting unification failure). I wasn?t sure where to report this bug. This is admittedly a rare situation (which is probably why it hasn?t been an issue until now). But it happens when recursive computation types lead to a unification failure. I?ve written a small example to illustrate the problem (see attached file). Note that there is no parsing involved here, just compilation of this file and interactive unification. In more positive news, I can report that when there is no failure, the LKB and interactive unification are both robust to extremely recursive type constraints. I implemented the untyped lambda calculus as a type system, and I tested it using the Ackermann function as a lambda expression on Church numerals (the Ackermann function is non-primitive-recursive, so I thought this would be a good test case). With 10,570 re-entrancies (no that is not a typo), it correctly evaluated A(2,1)=5. Best, Guy -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: debug-unify2-patch.lsp Type: application/octet-stream Size: 5174 bytes Desc: debug-unify2-patch.lsp URL: From oe at ifi.uio.no Wed Nov 11 17:57:44 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Wed, 11 Nov 2020 17:57:44 +0100 Subject: [developers] Bug in interactive unification In-Reply-To: References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> Message-ID: hi guy, > I also get this behaviour (no LUI window appearing) if I try to display a type that has no features (obviously, this isn't a useful display on its own, but it would be helpful for interactive unification to be able to drag and drop the type). The log file says: > > YZLUI: Received unknown lkb-protocol top-level command: AVM could you send the complete log output, i.e. including the 'avm' command that LUI fails to recognize? oe From gete2 at cam.ac.uk Wed Nov 11 18:42:19 2020 From: gete2 at cam.ac.uk (Guy Emerson) Date: Wed, 11 Nov 2020 17:42:19 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> Message-ID: Hi Stephan, The command is: process_complete_command(): `avm 1 ZERO "zero - expanded" ' YZLUI: Received unknown lkb-protocol top-level command: AVM Best, Guy Am Mi., 11. Nov. 2020 um 16:57 Uhr schrieb Stephan Oepen : > hi guy, > > > I also get this behaviour (no LUI window appearing) if I try to display > a type that has no features (obviously, this isn't a useful display on its > own, but it would be helpful for interactive unification to be able to drag > and drop the type). The log file says: > > > > YZLUI: Received unknown lkb-protocol top-level command: AVM > > could you send the complete log output, i.e. including the 'avm' > command that LUI fails to recognize? > > oe > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sweaglesw at sweaglesw.org Wed Nov 11 19:08:14 2020 From: sweaglesw at sweaglesw.org (Woodley Packard) Date: Wed, 11 Nov 2020 10:08:14 -0800 Subject: [developers] Bug in interactive unification In-Reply-To: References: Message-ID: My best guess is LUI expected a #D structure instead of a symbol; e.g. #D[zero] One could argue that LUI should be a bit more forgiving in enforcing type constraints on its commands. Both internally and in the protocol, atomic values are treated differently from feature structures. An atomic value at the top level is unanticipated, but I think it should work just fine if wrapped into a (trivial) feature structure. Woodley >> On Nov 11, 2020, at 9:42 AM, Guy Emerson wrote: > ? > Hi Stephan, > > The command is: > > process_complete_command(): `avm 1 ZERO "zero - expanded" > ' > > YZLUI: Received unknown lkb-protocol top-level command: AVM > > Best, > Guy > >> Am Mi., 11. Nov. 2020 um 16:57 Uhr schrieb Stephan Oepen : >> hi guy, >> >> > I also get this behaviour (no LUI window appearing) if I try to display a type that has no features (obviously, this isn't a useful display on its own, but it would be helpful for interactive unification to be able to drag and drop the type). The log file says: >> > >> > YZLUI: Received unknown lkb-protocol top-level command: AVM >> >> could you send the complete log output, i.e. including the 'avm' >> command that LUI fails to recognize? >> >> oe -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Wed Nov 11 22:49:28 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Wed, 11 Nov 2020 22:49:28 +0100 Subject: [developers] Bug in interactive unification In-Reply-To: References: Message-ID: hi woodley, and all: > One could argue that LUI should be a bit more forgiving in enforcing type constraints on its commands. Both internally and in the protocol, atomic values are treated differently from feature structures. An atomic value at the top level is unanticipated, but I think it should work just fine if wrapped into a (trivial) feature structure. yes, i am almost tempted to make that argument :-). looking over the code, it appears that for the serialization of AVMs the LUI team (in 2003, i would think) decided to piggy-back on what the LKB calls its 'linear' dag output format. that looks like a format invented by ann or john prior to LUI integration, and it does indeed consider any dag without outgoing arcs an 'atomic' feature structure that is serialized without any #D[...] decoration. i am not sure written records of protocol negotiations internal to the LUI team exist, but if the above were by and large how the protocol was defined ... it would not be unreasonable to expect LUI to accept the 'linear' serialization of such atomic dags. on the other hand, the current LUI interpretation of the protocol has a broad and loyal user base, and it is not hard to accommodate its expectations on the LKB side. i just checked in the following work-around (to both the LOGON and FOS branches of the LKB source code), which appears to have the desired effect and should end up in the next round of binary builds then. best wishes, oe Index: src/glue/lui.lsp =================================================================== --- src/glue/lui.lsp (revision 29084) +++ src/glue/lui.lsp (working copy) @@ -307,10 +307,16 @@ (*package* (find-package :lkb))) (lui-parameters :avm) - (let ((string (with-output-to-string (stream) - (format stream "avm ~d " id) - (display-dag1 dag 'linear stream)))) - (format %lui-stream% string)) + (let* ((string (with-output-to-string (stream) + (display-dag1 dag 'linear stream))) + ;; + ;; work around a LUI idiosyncrasy: dress up atomic dags with a + ;; (kind of) spurious outermost decoration. + ;; + (string (if (char= (char string 0) #\#) + string + (concatenate 'string "#D[" string "]")))) + (format %lui-stream% "avm ~d ~a" id string)) #+:null (format %lui-stream% " ~s~%" path) (format %lui-stream% " ~s~%" title) From oe at ifi.uio.no Wed Nov 11 22:52:09 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Wed, 11 Nov 2020 22:52:09 +0100 Subject: [developers] Bug in interactive unification In-Reply-To: <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Message-ID: hi john: > Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp many thanks for the quick diagnostics and fixes! i looked over both your patches, and they seem like just the right fix to two genuine bugs that have been lurking (for the past sixteen or so years :-) in the interactive unifier behind the LUI drag-and-drop interface. i have just picked them up (and added my own fix for the LUI display of atomic dags) and committed these changes to both the LOGON and FOS repositories. best wishes, oe From goodman.m.w at gmail.com Thu Nov 12 03:19:02 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Thu, 12 Nov 2020 10:19:02 +0800 Subject: [developers] Bug report for ERG In-Reply-To: References: Message-ID: Hi Alexandre, I was able to reproduce the issue using the ERG 2018 (which creates a named EP with the URL as its CARG) and a ~3-month old trunk version of the ERG (which tokenized the URL). I'll leave the question of the ERG's behavior to the pros, and I'll address the MRS syntax problem. PyDelphin reported the syntax error at the '.' character because that's the point at which the SimpleMRS parser was unable to proceed, but the problem is in fact the '_' in the lemma portion of the predicate symbol. Currently there is no agreed-upon way to have a lemma containing '_', as '_' is the delimiter between the lemma and pos fields. The so-called "TypePred" production in the SimpleMRS BNF at http://moin.delph-in.net/MrsRfc#Simple is overly permissive (note: I wrote it, adapting Bec's original). Stephan and I had some discussion about the mini-format of predicate symbols on GitHub (https://github.com/delph-in/pydelphin/issues/302) but unfortunately little of that conversation made it to this list. In short, I propose a character-escaping solution for use in predicate symbols for all serialization formats. For this, we could recycle TSDB's three escapes (\s, \n, and \\), where in this case the separator \s is '_' instead of '@'. The serialization formats (SimpleMRS, MRX, EDS native, etc.). Any other characters that might cause issues in parsing (such as a space or '<' in SimpleMRS, also '[', '{', or '(' in EDS, etc.) would be handled by those formats individually. For SimpleMRS, I suggest quoting any predicate that contains a space or '<' (and quotes are not part of the predicate format, only part of SimpleMRS's), and then escaping quotes (\") inside predicates. This means that abstract predicates (compound, udef_q, etc) would also be quoted, if they had a space or '<'. In MRX, a predicate with '<' would need to replace it with <, and so on. If we agree on such a change, then both PyDelphin and ACE (and other processors) would need to be modified to get around the issue you're experiencing. Of course, this specific issue could be sidestepped by getting the ERG to put URLs back into CARGs instead of being tokenized and parsed into generic predicate symbols. On Thu, Nov 12, 2020 at 12:54 AM Alexandre Rademaker wrote: > > BTW, regardless the tokenisation issue, an invalid MRS should not be > produced, right? > > Best, > Alexandre > > > On 10 Nov 2020, at 18:39, Alexandre Rademaker > wrote: > > > > Hi, > > > > I am trying to parse the sentences from EWT corpus ( > https://github.com/universaldependencies/UD_English-EWT) but in the DEV > set I have a non-sense sentence with only an url between brackets: > > > > [ > http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > ] > > > > ACE reports an invalid MRS. The error is in the character 2666, so > probably the error is the predicate: > > > > _search_x.htm?csp=34/NN_u_unknown > > > > But the regex for predicates seems to support dot in the name of the > predicate: > > > > http://moin.delph-in.net/MrsRfc#SerializationFormats > > > > Anyway, the pre-processing of the sentence seems wrong to me in ERG > trunk version, the tokenisation broke the url into many tokens and consumed > the protocol `http://` prefix: > > > > % ace -g ~/hpsg/wn/terg-mac.dat -E > > [ > http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > ] > > www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa - > search_x.htm?csp=34 > > > > ERG (2018) produced what I was expecting: > > > > % ace -g erg-mac.dat -E > > [ > http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > ] > > www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > > > > ERG (1214) produced what I was expecting: > > > > % ace -g erg-lingo-mac.dat -E > > [ > http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > ] > > [ > http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > ] > > > > > >>>> response = ace.parse(grm, '[ > http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > ') > > NOTE: hit RAM limit while unpacking > > NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s > > > >>>> response.result(0).mrs() > > Traceback (most recent call last): > > File "", line 1, in > > File > "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line > 146, in mrs > > mrs = simplemrs.decode(mrs) > > File > "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", > line 112, in decode > > return _decode_mrs(lexer) > > File > "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", > line 200, in _decode_mrs > > rels.append(_decode_rel(lexer, variables)) > > File > "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", > line 252, in _decode_rel > > _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None)) > > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", > line 473, in expect > > raise self._errcls('expected: ' + err, > > delphin.mrs._exceptions.MRSSyntaxError: > > line 1, character 2666 > > [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: > indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e > SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques > TENSE: tensed MOOD: indicative ] ] [ unknown<8:21> LBL: h1 ARG0: e4 ARG: > u6 ] [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] > ARG1: u6 ] [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: > prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques > TENSE: tensed MOOD: indicative ] ] [ unknown<21:49> LBL: h1 ARG0: e8 ARG: > x10 ] [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] [ > udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 > ] [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ] [ _and_c<24:25> LBL: > h19 ARG0: x10 ARG1: x15 ARG2: x20 ] [ udef_q<25:49> LBL: h21 ARG0: x20 > RSTR: h22 BODY: h23 ] [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: > sg ] RSTR: h26 BODY: h27 ] [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 > [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: > x25 ] [ _science_n_1<30:37> LBL: h28 ARG0: x25 ] [ _and_c<37:38> LBL: h30 > ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ] [ > proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ] [ compound<38:49> > LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - > PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ] [ udef_q<38:44> LBL: h38 > ARG0: x37 RSTR: h39 BODY: h40 ] [ _space//NN_u_unknown<38:44> LBL: h41 > ARG0: x37 ] [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ] [ > implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: > tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed > MOOD: indicative ] ] [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: > 3 NUM: sg IND: + ] ] [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: > h48 ] [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ] [ > implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques > TENSE: tensed MOOD: indicati! > ve ] ARG2: e52 [ e SF: prop-or-ques ] ] [ unknown<52:55> LBL: h1 ARG0: > e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<52:55> LBL: h54 > ARG0: x53 RSTR: h55 BODY: h56 ] [ yofc<52:54> LBL: h57 CARG: "09" ARG0: > x53 ] [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ] > [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ] [ compound<55:79> > LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - > PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ] [ > proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ] [ named<55:59> > LBL: h69 CARG: "NASA" ARG0: x65 ] [ > _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 > qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq > h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ] > > > > > > > > > > > > > > > > > > > > > > > > > > > > ! > > > > > > > > > ^ > > MRSSyntaxError: expected: a feature > > > > > > Best, > > Alexandre > > > > > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From danf at stanford.edu Thu Nov 12 03:40:44 2020 From: danf at stanford.edu (Dan Flickinger) Date: Thu, 12 Nov 2020 02:40:44 +0000 Subject: [developers] Bug report for ERG In-Reply-To: References: , Message-ID: One of the unfortunate consequences of the change in tokenization for the trunk ERG (treating punctuation marks as separate tokens) is that we no longer correctly handle web addresses in text, because the tokenizer now splits at slashes and periods, `exploding' URLs into many separate tokens. This is obviously not the desired behavior, and Stephan has been leading an effort to get a uniform preprocessing mechanism into the various platforms so we can cope with URLs and the like, by ensuring that they are single tokens by the time the parser sees them. In the meantime, Alexandre, perhaps you can write a little temporary script that replaces URLs with a single simple token before presenting a sentence to ACE for parsing. Dan ________________________________ From: developers-bounces at emmtee.net on behalf of goodman.m.w at gmail.com Sent: Wednesday, November 11, 2020 6:19 PM To: Alexandre Rademaker Cc: developers Subject: Re: [developers] Bug report for ERG Hi Alexandre, I was able to reproduce the issue using the ERG 2018 (which creates a named EP with the URL as its CARG) and a ~3-month old trunk version of the ERG (which tokenized the URL). I'll leave the question of the ERG's behavior to the pros, and I'll address the MRS syntax problem. PyDelphin reported the syntax error at the '.' character because that's the point at which the SimpleMRS parser was unable to proceed, but the problem is in fact the '_' in the lemma portion of the predicate symbol. Currently there is no agreed-upon way to have a lemma containing '_', as '_' is the delimiter between the lemma and pos fields. The so-called "TypePred" production in the SimpleMRS BNF at http://moin.delph-in.net/MrsRfc#Simple is overly permissive (note: I wrote it, adapting Bec's original). Stephan and I had some discussion about the mini-format of predicate symbols on GitHub (https://github.com/delph-in/pydelphin/issues/302) but unfortunately little of that conversation made it to this list. In short, I propose a character-escaping solution for use in predicate symbols for all serialization formats. For this, we could recycle TSDB's three escapes (\s, \n, and \\), where in this case the separator \s is '_' instead of '@'. The serialization formats (SimpleMRS, MRX, EDS native, etc.). Any other characters that might cause issues in parsing (such as a space or '<' in SimpleMRS, also '[', '{', or '(' in EDS, etc.) would be handled by those formats individually. For SimpleMRS, I suggest quoting any predicate that contains a space or '<' (and quotes are not part of the predicate format, only part of SimpleMRS's), and then escaping quotes (\") inside predicates. This means that abstract predicates (compound, udef_q, etc) would also be quoted, if they had a space or '<'. In MRX, a predicate with '<' would need to replace it with <, and so on. If we agree on such a change, then both PyDelphin and ACE (and other processors) would need to be modified to get around the issue you're experiencing. Of course, this specific issue could be sidestepped by getting the ERG to put URLs back into CARGs instead of being tokenized and parsed into generic predicate symbols. On Thu, Nov 12, 2020 at 12:54 AM Alexandre Rademaker > wrote: BTW, regardless the tokenisation issue, an invalid MRS should not be produced, right? Best, Alexandre > On 10 Nov 2020, at 18:39, Alexandre Rademaker > wrote: > > Hi, > > I am trying to parse the sentences from EWT corpus (https://github.com/universaldependencies/UD_English-EWT) but in the DEV set I have a non-sense sentence with only an url between brackets: > > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > > ACE reports an invalid MRS. The error is in the character 2666, so probably the error is the predicate: > > _search_x.htm?csp=34/NN_u_unknown > > But the regex for predicates seems to support dot in the name of the predicate: > > http://moin.delph-in.net/MrsRfc#SerializationFormats > > Anyway, the pre-processing of the sentence seems wrong to me in ERG trunk version, the tokenisation broke the url into many tokens and consumed the protocol `http://` prefix: > > % ace -g ~/hpsg/wn/terg-mac.dat -E > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > www.usatoday. com / tech/ science / space/ 2005 ? 03 ? 09 - nasa - search_x.htm?csp=34 > > ERG (2018) produced what I was expecting: > > % ace -g erg-mac.dat -E > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 > > ERG (1214) produced what I was expecting: > > % ace -g erg-lingo-mac.dat -E > [http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34] > [ http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34 ] > > >>>> response = ace.parse(grm, '[http://www.usatoday.com/tech/science/space/2005-03-09-nasa-search_x.htm?csp=34]') > NOTE: hit RAM limit while unpacking > NOTE: parsed 1 / 1 sentences, avg 1536033k, time 51.15306s > >>>> response.result(0).mrs() > Traceback (most recent call last): > File "", line 1, in > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/interface.py", line 146, in mrs > mrs = simplemrs.decode(mrs) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 112, in decode > return _decode_mrs(lexer) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 200, in _decode_mrs > rels.append(_decode_rel(lexer, variables)) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/codecs/simplemrs.py", line 252, in _decode_rel > _, label = lexer.expect((FEATURE, 'LBL'), (SYMBOL, None)) > File "/Users/ar/.venv/lib/python3.9/site-packages/delphin/util.py", line 473, in expect > raise self._errcls('expected: ' + err, > delphin.mrs._exceptions.MRSSyntaxError: > line 1, character 2666 > [ LTOP: h0 INDEX: e2 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] RELS: < [ implicit_conj<8:79> LBL: h1 ARG0: e2 ARG1: e4 [ e SF: prop TENSE: tensed MOOD: indicative ] ARG2: e5 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<8:21> LBL: h1 ARG0: e4 ARG: u6 ] [ _www.usatoday./JJ_u_unknown<8:21> LBL: h1 ARG0: e7 [ e SF: prop ] ARG1: u6 ] [ implicit_conj<21:79> LBL: h1 ARG0: e5 ARG1: e8 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e9 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<21:49> LBL: h1 ARG0: e8 ARG: x10 ] [ udef_q<21:49> LBL: h11 ARG0: x10 RSTR: h12 BODY: h13 ] [ udef_q<21:24> LBL: h14 ARG0: x15 [ x PERS: 3 NUM: sg ] RSTR: h16 BODY: h17 ] [ _com/NN_u_unknown<21:24> LBL: h18 ARG0: x15 ] [ _and_c<24:25> LBL: h19 ARG0: x10 ARG1: x15 ARG2: x20 ] [ udef_q<25:49> LBL: h21 ARG0: x20 RSTR: h22 BODY: h23 ] [ udef_q<25:37> LBL: h24 ARG0: x25 [ x PERS: 3 NUM: sg ] RSTR: h26 BODY: h27 ] [ _tech//JJ_u_unknown<25:30> LBL: h28 ARG0: e29 [ e SF: prop TENSE: untensed MOOD: indicative PROG: bool PERF: - ] ARG1: x25 ] [ _science_n_1<30:37> LBL: h28 ARG0: x25 ] [ _and_c<37:38> LBL: h30 ARG0: x20 ARG1: x25 ARG2: x31 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<38:49> LBL: h32 ARG0: x31 RSTR: h33 BODY: h34 ] [ compound<38:49> LBL: h35 ARG0: e36 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x31 ARG2: x37 [ x PT: pt ] ] [ udef_q<38:44> LBL: h38 ARG0: x37 RSTR: h39 BODY: h40 ] [ _space//NN_u_unknown<38:44> LBL: h41 ARG0: x37 ] [ yofc<44:48> LBL: h35 CARG: "2005" ARG0: x31 ] [ implicit_conj<49:79> LBL: h1 ARG0: e9 ARG1: e43 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ARG2: e44 [ e SF: prop-or-ques TENSE: tensed MOOD: indicative ] ] [ unknown<49:52> LBL: h1 ARG0: e43 ARG: x45 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<49:52> LBL: h46 ARG0: x45 RSTR: h47 BODY: h48 ] [ yofc<49:51> LBL: h49 CARG: "03" ARG0: x45 ] [ implicit_conj<52:79> LBL: h1 ARG0: e44 ARG1: e51 [ e SF: prop-or-ques TENSE: tensed MOOD: indicati! ve ] ARG2: e52 [ e SF: prop-or-ques ] ] [ unknown<52:55> LBL: h1 ARG0: e51 ARG: x53 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q<52:55> LBL: h54 ARG0: x53 RSTR: h55 BODY: h56 ] [ yofc<52:54> LBL: h57 CARG: "09" ARG0: x53 ] [ unknown<55:79> LBL: h1 ARG0: e52 ARG: x59 [ x PERS: 3 NUM: sg ] ] [ udef_q<55:79> LBL: h60 ARG0: x59 RSTR: h61 BODY: h62 ] [ compound<55:79> LBL: h63 ARG0: e64 [ e SF: prop TENSE: untensed MOOD: indicative PROG: - PERF: - ] ARG1: x59 ARG2: x65 [ x PERS: 3 NUM: sg IND: + PT: pt ] ] [ proper_q<55:60> LBL: h66 ARG0: x65 RSTR: h67 BODY: h68 ] [ named<55:59> LBL: h69 CARG: "NASA" ARG0: x65 ] [ _search_x.htm?csp=34/NN_u_unknown<60:79> LBL: h63 ARG0: x59 ] > HCONS: < h0 qeq h1 h12 qeq h19 h16 qeq h18 h22 qeq h30 h26 qeq h28 h33 qeq h35 h39 qeq h41 h47 qeq h49 h55 qeq h57 h61 qeq h63 h67 qeq h69 > ICONS: < > ] > ! ^ > MRSSyntaxError: expected: a feature > > > Best, > Alexandre > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From gete2 at cam.ac.uk Fri Nov 13 12:34:58 2020 From: gete2 at cam.ac.uk (Guy Emerson) Date: Fri, 13 Nov 2020 11:34:58 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Message-ID: Hi John, Woodley, and Stephan, Thank you all for your fast responses! This has been really helpful. I have come across two further small bugs, both to do with type names consisting entirely of numeric characters. Such names are allowed internally in the LKB, and in terms of unification, they seem to behave exactly as I expect them to. I couldn't find documentation suggesting that numeric characters should be treated differently. However: (1) in the View>Expanded type pop-up window, a string of numeric characters gives the message "Not defined - try again.", even if the type is defined. (When the pop-up window shows a drop-down instead of a text box, such types appear in the list.) (2) Displaying such a type causes LUI to crash. For example, a type named "1" causes a crash, with the following in the log file (where the value of RESULT is 1): process_complete_command(): ` avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]] "null-with-push-1-here - expanded" ' Type of dag was not a symbol or string (type 2) Best, Guy Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen : > hi john: > > > Aha, the constraint object in your LUI log has unbalanced brackets. > Guessing which bracket is wrong, I've changed another LKB robust unifier > function, and attach a new version of the file debug-unify2-patch.lsp > > many thanks for the quick diagnostics and fixes! i looked over both > your patches, and they seem like just the right fix to two genuine > bugs that have been lurking (for the past sixteen or so years :-) in > the interactive unifier behind the LUI drag-and-drop interface. i > have just picked them up (and added my own fix for the LUI display of > atomic dags) and committed these changes to both the LOGON and FOS > repositories. > > best wishes, oe > -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Sun Nov 15 16:43:37 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Sun, 15 Nov 2020 15:43:37 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Message-ID: <6850BE66-CE3A-4059-A855-189A4240C7CE@sussex.ac.uk> Hi Guy, thanks for the bug reports! Responses below. On 13 Nov 2020, at 11:34, Guy Emerson > wrote: Hi John, Woodley, and Stephan, Thank you all for your fast responses! This has been really helpful. I have come across two further small bugs, both to do with type names consisting entirely of numeric characters. Such names are allowed internally in the LKB, and in terms of unification, they seem to behave exactly as I expect them to. I couldn't find documentation suggesting that numeric characters should be treated differently. You're right - types with all-numeric names should work fine. At http://moin.delph-in.net/TdlRfc the relevant clause is Identifier := /[^\s!"#$%&'(),.\/:;<=>[\]^|]+/ which allows any characters apart from whitespace and a few other non-alphanumerics. However: (1) in the View>Expanded type pop-up window, a string of numeric characters gives the message "Not defined - try again.", even if the type is defined. (When the pop-up window shows a drop-down instead of a text box, such types appear in the list.) Yes, this is a bug in the LKB. I'll email you a patch file which you can load to fix it - and I'll commit the changes to the LOGON and FOS branches of the LKB. (2) Displaying such a type causes LUI to crash. For example, a type named "1" causes a crash, with the following in the log file (where the value of RESULT is 1): process_complete_command(): ` avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]] "null-with-push-1-here - expanded" ' Type of dag was not a symbol or string (type 2) This error message comes from LUI, and I think it needs fixing there. John Best, Guy Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen >: hi john: > Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp many thanks for the quick diagnostics and fixes! i looked over both your patches, and they seem like just the right fix to two genuine bugs that have been lurking (for the past sixteen or so years :-) in the interactive unifier behind the LUI drag-and-drop interface. i have just picked them up (and added my own fix for the LUI display of atomic dags) and committed these changes to both the LOGON and FOS repositories. best wishes, oe -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Sun Nov 15 16:58:55 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sun, 15 Nov 2020 16:58:55 +0100 Subject: [developers] Bug in interactive unification In-Reply-To: References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Message-ID: further toward perfection in the LKB interfaces :-). guy, can you try asking for the type display by typing in |1| (including the vertical bars). i suspect the prompt windows just uses the lisp read() function, which will interpret a string of digits as a number, rather than as a symbol. in TDL parsing, however, all (unquoted) type names are interpreted as symbols. the |...| syntax will force symbol interpretation. this would be easy to fix, and likely applies in other input routines that prompt for type or grammar entity names. regarding LUI communication, i am inclined to suggest that this is a bug in the #D[...] reader ... woodley, how could you not agree? cheers, oe ps: i had originally drafted this message yesterday; i suspect the LKB input fix that john has in mind is likely along the lines above? On Fri, 13 Nov 2020 at 12:36 Guy Emerson wrote: > Hi John, Woodley, and Stephan, > > Thank you all for your fast responses! This has been really helpful. > > I have come across two further small bugs, both to do with type names > consisting entirely of numeric characters. Such names are allowed > internally in the LKB, and in terms of unification, they seem to behave > exactly as I expect them to. I couldn't find documentation suggesting that > numeric characters should be treated differently. However: > > (1) in the View>Expanded type pop-up window, a string of numeric > characters gives the message "Not defined - try again.", even if the type > is defined. (When the pop-up window shows a drop-down instead of a text > box, such types appear in the list.) > > (2) Displaying such a type causes LUI to crash. For example, a type named > "1" causes a crash, with the following in the log file (where the value of > RESULT is 1): > > process_complete_command(): ` > avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]] > "null-with-push-1-here - expanded" > ' > > Type of dag was not a symbol or string (type 2) > > > Best, > Guy > > > Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen : > >> hi john: >> >> > Aha, the constraint object in your LUI log has unbalanced brackets. >> Guessing which bracket is wrong, I've changed another LKB robust unifier >> function, and attach a new version of the file debug-unify2-patch.lsp >> >> many thanks for the quick diagnostics and fixes! i looked over both >> your patches, and they seem like just the right fix to two genuine >> bugs that have been lurking (for the past sixteen or so years :-) in >> the interactive unifier behind the LUI drag-and-drop interface. i >> have just picked them up (and added my own fix for the LUI display of >> atomic dags) and committed these changes to both the LOGON and FOS >> repositories. >> >> best wishes, oe >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Sun Nov 15 17:13:52 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Sun, 15 Nov 2020 16:13:52 +0000 Subject: [developers] Bug in interactive unification In-Reply-To: References: <821C2158-B968-457B-92DC-F6C158C7F70C@sussex.ac.uk> <3B9E74B2-E09E-452B-B0A4-4F714222E92D@sussex.ac.uk> <8F5D0001-F235-42E1-8CB6-4FCBAC4D0D57@sussex.ac.uk> Message-ID: Stephan, yes, my fix avoids the need to type in the vertical bars. It should apply to all dialogs that ask for an identifier. The same problem occurred with type names starting with a decimal digit but also containing non-numeric characters, such as ??_j in Zhong (where the first two characters are Unicode fullwidth digits). John On 15 Nov 2020, at 15:58, Stephan Oepen > wrote: further toward perfection in the LKB interfaces :-). guy, can you try asking for the type display by typing in |1| (including the vertical bars). i suspect the prompt windows just uses the lisp read() function, which will interpret a string of digits as a number, rather than as a symbol. in TDL parsing, however, all (unquoted) type names are interpreted as symbols. the |...| syntax will force symbol interpretation. this would be easy to fix, and likely applies in other input routines that prompt for type or grammar entity names. regarding LUI communication, i am inclined to suggest that this is a bug in the #D[...] reader ... woodley, how could you not agree? cheers, oe ps: i had originally drafted this message yesterday; i suspect the LKB input fix that john has in mind is likely along the lines above? On Fri, 13 Nov 2020 at 12:36 Guy Emerson > wrote: Hi John, Woodley, and Stephan, Thank you all for your fast responses! This has been really helpful. I have come across two further small bugs, both to do with type names consisting entirely of numeric characters. Such names are allowed internally in the LKB, and in terms of unification, they seem to behave exactly as I expect them to. I couldn't find documentation suggesting that numeric characters should be treated differently. However: (1) in the View>Expanded type pop-up window, a string of numeric characters gives the message "Not defined - try again.", even if the type is defined. (When the pop-up window shows a drop-down instead of a text box, such types appear in the list.) (2) Displaying such a type causes LUI to crash. For example, a type named "1" causes a crash, with the following in the log file (where the value of RESULT is 1): process_complete_command(): ` avm 3 #D[null-with-push-1-here RESULT: #D[1 REST: NULL]] "null-with-push-1-here - expanded" ' Type of dag was not a symbol or string (type 2) Best, Guy Am Mi., 11. Nov. 2020 um 21:52 Uhr schrieb Stephan Oepen >: hi john: > Aha, the constraint object in your LUI log has unbalanced brackets. Guessing which bracket is wrong, I've changed another LKB robust unifier function, and attach a new version of the file debug-unify2-patch.lsp many thanks for the quick diagnostics and fixes! i looked over both your patches, and they seem like just the right fix to two genuine bugs that have been lurking (for the past sixteen or so years :-) in the interactive unifier behind the LUI drag-and-drop interface. i have just picked them up (and added my own fix for the LUI display of atomic dags) and committed these changes to both the LOGON and FOS repositories. best wishes, oe -------------- next part -------------- An HTML attachment was scrubbed... URL: From arademaker at gmail.com Fri Nov 20 22:41:19 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Fri, 20 Nov 2020 18:41:19 -0300 Subject: [developers] WQL query language (the WSI interface query language) Message-ID: <301D3F23-7731-4F73-9297-DCC1F8BEAB5D@gmail.com> Hi Stephan, The WSI interface points to [1] for the documentation of the query language. In [2] we also have some more limited documentation. 1. The semeval 2015 page is not working properly, images and CSS can?t be loaded. 2. One particular operator not well defined is the ^ . In [1] we have > The following query demonstrates the use of the top operator (?^?), to retrieve graphs rooted in a coordinate structure, i.e. where the top node has an outgoing dependency matching the pattern ?_*_c? (again, assuming the DM representations); here, specification of the role value can be omitted, as there is no predication constraining the argument node: > > ^[_*_c] First the WQL should be representation independent, right? Why the comment about DM? So in an MRS, I am assuming this ^ operator should match the TOP predication, am I right? But the pattern inside the bracket should match the TOP predicate? If so, should I also be able to use other patterns such as lemma pattern, like ?^[+bark]?? I didn?t understand the fragment 'the role value can be omitted, as there is no predication constraining the argument node?. How the role values would be supplied? Is it talking about the roles of _*_c predicate in the example? Why not restrict the argument of the ^ operator to an node id? If I search for sentences where the TOP predicate has lemma bark, I could use: ^[x] x:+bark Does it make sense? 3. There is no proviso for querying VarSort? For instance, find representations where a given verb has as argument a node that is first person singular. We can?t search for verbs in a specific tense or aspect. 4. The ERS fingerprints (http://moin.delph-in.net/ErgSemantics) and WQL are very related, right? Do we have any document that describes ERS fingerprints? My idea is to reimplement the parser of WDL and the transformation to SPARQL [3]. I would like to support MRS, DMRS and EDS initially. The reimplementation will match the new RDF encoding for the semantic structures that I am proposing. The RDF vocabulary is still under construction, in particular, there are parts of the semantic structure that are grammar dependent (for example, the VarSort) and I am still not sure how to deal with that. This is my first very preliminar draft of the WQL BNF is: WQL := predexp predexp := predication | predexp OP predexp | ( predexp ) | ! predexp OP := ?|" | ? " predication := [id ?:?] pattern [ ?[" arglist ?]? ] arglist := argument | argument ?," arglist argument:= rolelabel id rolelabel := wdpattern pattern := wdpattern | lemma_pattern | pos_pattern | sense_pattern lemma_pattern := ?+" wdpattern pos_pattern := ?/" wdpattern sense_pattern := ?=" wdpattern wdpattern := [^?* ][\w]+ Ps: can I potentially implement a HPSG grammar to parse any context free grammar like the one above, right? It would be funny to have grammars to parse this DSL. [1] https://alt.qcri.org/semeval2015/task18/index.php?id=search [2] http://moin.delph-in.net/WeSearch/QueryLanguage [3] https://www.w3.org/TR/sparql11-query/ Best, Alexandre From oe at ifi.uio.no Sat Nov 28 20:01:11 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sat, 28 Nov 2020 20:01:11 +0100 Subject: [developers] migration of collaboration infrastructure Message-ID: dear colleagues: over the next two weeks, services hosted in the following domains will be migrated to a new system at the university of oslo: + delph-in.net + emmtee. net + nlpl.eu + sigparse. org there may be short interruptions in service availability, delays in mailing list processing, temporary locks on wikis and SVN, and more generally unexpected behavior. please exercise some patience during the migration phase, and feel free to notify me of any surprising behavior you might experience. some services will not be migrated but rather are being discontinued: + ?pet at delph-in.net? list + ?logon at delph-in.net? list + ?wesearch.delph-in.net? regarding the two mailing lists, they have had little traffic in recent years and overlap with the DELPH-IN ?developers? list. i encourage active PET or LOGON users to subscribe to that list. the WeSearch semantic index regrettably has become too difficult to upgrade and maintain. fortunately, there is an ongoing initiative at IBM Research to develop a replacement service with similar functionality. best wishes from norway! oe -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Tue Dec 15 23:31:14 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Tue, 15 Dec 2020 22:31:14 +0000 Subject: [developers] support for default unification / YADU Message-ID: <4FE421C2-CEB1-42E4-9362-A431EC78B883@sussex.ac.uk> Hi, I'm wondering whether it's worth preserving support for default feature structures in the LKB. Has anyone tried to use defaults in the recent past - and is it likely that anyone will want to do so in the future? Since I've started the LKB-FOS effort I've tried to retain this facility, but I've never been able to verify that it actually works since I've never tested grammars containing defaults. I'm asking about this now, since there's a change I want to make to the LKB that will probably irretrievably break default unification in both parsing and generation. Any opinions? Please reply if you feel strongly one way or the other. Thanks, John From goodman.m.w at gmail.com Wed Dec 16 10:38:55 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Wed, 16 Dec 2020 17:38:55 +0800 Subject: [developers] Serializing EDS without a top Message-ID: Hello developers, It's been a while but I'm returning to a discussion we were having about serializing EDS in the native format when there is no TOP and when there's no INDEX to backoff to. Stephan suggested that EDS is a line-based format (i.e., line breaks are required), while I would like to continue to support single-line EDS in PyDelphin. I think the last word on the subject from Stephan, at least on this list, was mid-September ( http://lists.delph-in.net/archives/developers/2020/003140.html), where he said he'd continue discussion on another thread, which presumably meant the thread from late August ( http://lists.delph-in.net/archives/developers/2020/003127.html). I don't think the discussion did continue, so I'm starting this thread in case anyone is interested. As an example, here's an EDS (without properties) for "It rained." {e2: e2:_rain_v_1<3:9>[] } In PyDelphin, when an EDS has no TOP, I was outputting the first colon anyway, intentionally: {: e2:_rain_v_1<3:9>[] } It's a bit ugly, but it allows me to detect, with 1 token of lookahead, if there's a top or not. If the colon is omitted then it's not clear if "e2:" is the top or the start of the first node. If line breaks are required, we just assume the first line is for the top, whether or not it's there. But for single-line EDS, we need 4 tokens of lookahead to determine if there's a top (assuming the parser treats variables and predicates as the same kinds of tokens): {e2: e2:_rain_v_1<3:9>[]} {e2:_rain_v_1<3:9>[]} Here is the parsing algorithm, once we've consumed the first '{': 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph status), '}', or '|' (node status), then we know that TOP is missing (the ':' is for PyDelphin's current output) 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if the 3rd token is a graph or node status, OR if the 4th token is ':', then the 1st token is the TOP 3. Otherwise TOP must be missing I think this covers all the cases but let me know if I've missed anything. -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Thu Dec 17 11:39:46 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Thu, 17 Dec 2020 11:39:46 +0100 Subject: [developers] Serializing EDS without a top In-Reply-To: References: Message-ID: hi mike, yes, i am sorry i now see i never returned to the original thread i had in mind on M$ GitHub! in a nutshell, EDS native serialization is indeed line-oriented, and i am inclined to hold fast on the one-node-per-line convention. i would not want to muddy these waters, since the format has been around since 2002, and there has been some EDS activity beyond DELPH-IN. i know of at least two EDS readers that rely on the presence of line breaks. i do see the benefits of a more compact serialization, however, but would recommend you call that something else (say EDSLines), if you decide to implement it in pyDelphin. you would then be free to make up your own rules, where i could for example imagine either one of the following (assuming a missing top): {_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] } {\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n } {: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] } the above order reflects what i believe would be my personal ranking just now :-). i frequently use underscores for ?anonymous? MRS variables, and the first variant feels maybe most natural: there should be a top identifier, but in this case it is missing. the second variant also would seem to maintain compatibility with the native EDS serialization, only introducing an inline encoding of line breaks. variant #3, on the other hand, i believe would depart from how native serialization deals with missing tops; thus, if you were to opt for this format, it would be even more important to maintain a clear distinction between EDS native serialization and the pyDelphin EDSLines format. i hope the above makes sense to you? oe On Wed, Dec 16, 2020 at 10:41 AM goodman.m.w at gmail.com wrote: > > Hello developers, > > It's been a while but I'm returning to a discussion we were having about serializing EDS in the native format when there is no TOP and when there's no INDEX to backoff to. Stephan suggested that EDS is a line-based format (i.e., line breaks are required), while I would like to continue to support single-line EDS in PyDelphin. I think the last word on the subject from Stephan, at least on this list, was mid-September (http://lists.delph-in.net/archives/developers/2020/003140.html), where he said he'd continue discussion on another thread, which presumably meant the thread from late August (http://lists.delph-in.net/archives/developers/2020/003127.html). I don't think the discussion did continue, so I'm starting this thread in case anyone is interested. > > As an example, here's an EDS (without properties) for "It rained." > > {e2: > e2:_rain_v_1<3:9>[] > } > > In PyDelphin, when an EDS has no TOP, I was outputting the first colon anyway, intentionally: > > {: > e2:_rain_v_1<3:9>[] > } > > It's a bit ugly, but it allows me to detect, with 1 token of lookahead, if there's a top or not. If the colon is omitted then it's not clear if "e2:" is the top or the start of the first node. If line breaks are required, we just assume the first line is for the top, whether or not it's there. But for single-line EDS, we need 4 tokens of lookahead to determine if there's a top (assuming the parser treats variables and predicates as the same kinds of tokens): > > {e2: e2:_rain_v_1<3:9>[]} > {e2:_rain_v_1<3:9>[]} > > Here is the parsing algorithm, once we've consumed the first '{': > > 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph status), '}', or '|' (node status), then we know that TOP is missing (the ':' is for PyDelphin's current output) > 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if the 3rd token is a graph or node status, OR if the 4th token is ':', then the 1st token is the TOP > 3. Otherwise TOP must be missing > > I think this covers all the cases but let me know if I've missed anything. > > -- > -Michael Wayne Goodman From goodman.m.w at gmail.com Fri Dec 18 09:10:56 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Fri, 18 Dec 2020 16:10:56 +0800 Subject: [developers] Serializing EDS without a top In-Reply-To: References: Message-ID: Thanks for the response, Stephan, On Thu, Dec 17, 2020 at 6:39 PM Stephan Oepen wrote: > [...] > in a nutshell, EDS native serialization is indeed line-oriented, and i > am inclined to hold fast on the one-node-per-line convention. i would > not want to muddy these waters, since the format has been around since > 2002, and there has been some EDS activity beyond DELPH-IN. i know of > at least two EDS readers that rely on the presence of line breaks. > Ok, sounds good. Then perhaps my previous message may be informative if the maintainer(s) of those two readers ever decide to embrace the convenience of single-line EDS. Other than determining the top of the graph, adapting the readers should be trivial: just treat \n as any other whitespace. i do see the benefits of a more compact serialization, however, but > would recommend you call that something else (say EDSLines), if you > decide to implement it in pyDelphin. It's been implemented for some time now. In fact all codecs have a -lines variant (simplemrs -> simplemrs-lines, dmrx -> dmrx-lines, etc.). E.g., in the case of XML formats, it outputs each item ( or ) on a line and suppresses the root nodes (, ). > [...] > {_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] } > {\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n } > {: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] } > > the above order reflects what i believe would be my personal ranking > just now :-). i frequently use underscores for ?anonymous? MRS > variables, and the first variant feels maybe most natural: there > should be a top identifier, but in this case it is missing. The 'anonymous' node identifier for a fake top is fine and, conveniently, PyDelphin can already read in this variant. The difference is that '_' is a valid identifier in EDS, so it's not actually missing, just unlinked. I think logically an unlinked top is the same as a null top, but this means that PyDelphin may write an EDS that is different (in terms of Python data structures, viz., upon re-reading the serialization) as the source EDS. > The > second variant also would seem to maintain compatibility with the > native EDS serialization, only introducing an inline encoding of line > breaks. Inserting a literal '\' and 'n' is awkward and changes the format, and I don't see how it's compatible at all besides having '\' and 'n' in the same location as your preferred newline characters. > variant #3, on the other hand, i believe would depart from > how native serialization deals with missing tops; thus, if you were to > opt for this format, it would be even more important to maintain a > clear distinction between EDS native serialization and the pyDelphin > EDSLines format. > If the thing between the first '{' and the first ':' is the top identifier, then if nothing is there the top is null. This is easy to parse and (I thought) easy to understand. As EDS native serialization from PyDelphin has done this for some time, I will continue to read it in, but going forward I will not write it out. As of the latest commit, I just omit the top entirely, which is what your newline-ful variant would do if it were simply newline-less (see the last EDS of my first message). I have written, but have not yet pushed to GitHub, a change that inserts an anonymous '_' top if the top is null (if '_' is already used by some node, I try '_0', then '_1', etc. until I get an unused one). I have also made the following changes (which I think you'll be happy with): - The default serialization is now indented with newlines (and this is true of all codecs); use eds-lines to get the single-line variant - Conversion from MRS now uses predicate modification by default - Blank lines are inserted between indented EDSs (not sure if your readers actually require this) > > i hope the above makes sense to you? oe > > > On Wed, Dec 16, 2020 at 10:41 AM goodman.m.w at gmail.com > wrote: > > > > Hello developers, > > > > It's been a while but I'm returning to a discussion we were having about > serializing EDS in the native format when there is no TOP and when there's > no INDEX to backoff to. Stephan suggested that EDS is a line-based format > (i.e., line breaks are required), while I would like to continue to support > single-line EDS in PyDelphin. I think the last word on the subject from > Stephan, at least on this list, was mid-September ( > http://lists.delph-in.net/archives/developers/2020/003140.html), where he > said he'd continue discussion on another thread, which presumably meant the > thread from late August ( > http://lists.delph-in.net/archives/developers/2020/003127.html). I don't > think the discussion did continue, so I'm starting this thread in case > anyone is interested. > > > > As an example, here's an EDS (without properties) for "It rained." > > > > {e2: > > e2:_rain_v_1<3:9>[] > > } > > > > In PyDelphin, when an EDS has no TOP, I was outputting the first colon > anyway, intentionally: > > > > {: > > e2:_rain_v_1<3:9>[] > > } > > > > It's a bit ugly, but it allows me to detect, with 1 token of lookahead, > if there's a top or not. If the colon is omitted then it's not clear if > "e2:" is the top or the start of the first node. If line breaks are > required, we just assume the first line is for the top, whether or not it's > there. But for single-line EDS, we need 4 tokens of lookahead to determine > if there's a top (assuming the parser treats variables and predicates as > the same kinds of tokens): > > > > {e2: e2:_rain_v_1<3:9>[]} > > {e2:_rain_v_1<3:9>[]} > > > > Here is the parsing algorithm, once we've consumed the first '{': > > > > 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph > status), '}', or '|' (node status), then we know that TOP is missing (the > ':' is for PyDelphin's current output) > > 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and if > the 3rd token is a graph or node status, OR if the 4th token is ':', then > the 1st token is the TOP > > 3. Otherwise TOP must be missing > > > > I think this covers all the cases but let me know if I've missed > anything. > > > > -- > > -Michael Wayne Goodman > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: From olzama at uw.edu Sat Dec 19 02:11:12 2020 From: olzama at uw.edu (Olga Zamaraeva) Date: Fri, 18 Dec 2020 17:11:12 -0800 Subject: [developers] Delph-in viz demo page Message-ID: Did something go wrong with the delph-in-viz demo page? http://delph-in.github.io/delphin-viz/demo/#input=Abrams%20knew%20that%20it%20rained.&count=5&grammar=erg2018-uw&tree=true&mrs=true I don't seem to be able to gen any analyses for anything at all. Thanks, -- Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: From goodman.m.w at gmail.com Sat Dec 19 02:23:20 2020 From: goodman.m.w at gmail.com (Michael Wayne Goodman) Date: Sat, 19 Dec 2020 09:23:20 +0800 Subject: [developers] Delph-in viz demo page In-Reply-To: References: Message-ID: <527c61a6-140e-424f-8053-6046f0c79b29@Spark> Same with the old Demophin site. The UW server hosting them is having some problems. The server?s disk is full, but I?m not sure if that?s it. I?ve contacted Brandon. In the meantime you can get results from the UiO server if you change the grammar, but it only has the 1214 version of the ERG. On Dec 19, 2020, 9:12 AM +0800, Olga Zamaraeva , wrote: > Did something go wrong with the delph-in-viz demo page? > > http://delph-in.github.io/delphin-viz/demo/#input=Abrams%20knew%20that%20it%20rained.&count=5&grammar=erg2018-uw&tree=true&mrs=true > > I don't seem to be able to gen any analyses for anything at all. > > Thanks, > -- > Olga Zamaraeva -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Sat Dec 19 08:46:55 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sat, 19 Dec 2020 08:46:55 +0100 Subject: [developers] wanted: collaboration infrastructure task force Message-ID: dear colleagues, the DELPH-IN standing committee is looking for volunteers to help with our shared collaboration infrastructure, e.g. the wiki, mailing lists, discourse forum, code repository, etc. we would like to form a task force of at least two technologically minded DELPH-IN members, to help design and implement modern infrastructure solutions. we have started to discuss modernizing our infrastructure at the 2020 summit; for background, please see: http://moin.delph-in.net/VirtualInfrastructure the most valuable service, arguably, is the DELPH-IN wiki. but the underlying MoinMoin platform is no longer sustainable. we should look into migrating all relevant content into a modern platform, for example MediaWiki. this is where we are most urgently looking for volunteers. the mailing lists are also increasingly difficult to sustain. i wonder whether we still need them? the UW discourse site appears to largely have superseded discussion on the ?developers? list, and for all i know also supports email notification. we should look into ingesting our email archives into the discourse platform. or maybe take discussion and code repositories to M$ GitHub wholesale? please consider volunteering! this need not be very time-consuming, overall, and could be fun in the right group of people. we will look for at least one member of the standing committee to guide and support our infrastructure transition into the 21st century. please respond to ?standing at delph-in.net? (one of our three remaining active mailing lists :-). best wishes, oe -------------- next part -------------- An HTML attachment was scrubbed... URL: From oe at ifi.uio.no Sat Dec 19 09:33:49 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Sat, 19 Dec 2020 09:33:49 +0100 Subject: [developers] WQL query language (the WSI interface query language) In-Reply-To: <301D3F23-7731-4F73-9297-DCC1F8BEAB5D@gmail.com> References: <301D3F23-7731-4F73-9297-DCC1F8BEAB5D@gmail.com> Message-ID: hi alexandre, we are not actively working on the WeSearch Infrastructure at UiO, and i am very happy for you to push this work further. my recommendation would be to not worry too much about backward compatibility in this space but maybe rather derive your own solution. on this path, i would suggest you coin different names, or at least make explicit that, say, WQL 2.0 is different from the original query language and search engine. the original SemEval description remains available through the SDP site: http://sdp.delph-in.net/2015/search.html please bear in mind that the above is for the bi-lexical SDP frameworks (CCD, DM, PAS, and PSD). in the current WSI design, there are in fact framework-specific interpretation rules for some elements of the query language. the ?+? (lemma), ?/? (pos), and ?=? (frame or sense) operators, for example, do not apply to EDS or MRS, because these node properties are not defined there. conversely, identifiers are only interpreted as typed (?h?, ?i?, ?e?, and ?x?) in MRS; here, the underlying RDF graph topology is also quite different, e.g. with typed variables as nodes in their own right. the query language hides some of the underlying differences: we use the same node identifier operator (?:?) to denote the LBL of an elementary predication; node labels match its predicate symbol; and the syntax for labeled outgoing edges queries role?argument pairs in the predication. the WQL ?^? (top) operator is straightforwardly defined for the SDP frameworks, and probably for EDS too, where there is an explicit notion of the top node(s) in these graphs. for MRS, i am actually not sure we have defined this operator; i would think it should match the variable that is the TOP element of the MRS. finally, yes, the ErgSemantics fingerprint language is the WQL dialect for MRS search. i am afraid, i believe no documentation is available for this dialect. best wishes, oe On Fri, 20 Nov 2020 at 22:43 Alexandre Rademaker wrote: > Hi Stephan, > > The WSI interface points to [1] for the documentation of the query > language. In [2] we also have some more limited documentation. > > 1. The semeval 2015 page is not working properly, images and CSS can?t be > loaded. > > 2. One particular operator not well defined is the ^ . In [1] we have > > > The following query demonstrates the use of the top operator (?^?), to > retrieve graphs rooted in a coordinate structure, i.e. where the top node > has an outgoing dependency matching the pattern ?_*_c? (again, assuming the > DM representations); here, specification of the role value can be omitted, > as there is no predication constraining the argument node: > > > > ^[_*_c] > > First the WQL should be representation independent, right? Why the comment > about DM? So in an MRS, I am assuming this ^ operator should match the TOP > predication, am I right? But the pattern inside the bracket should match > the TOP predicate? If so, should I also be able to use other patterns such > as lemma pattern, like ?^[+bark]?? > > I didn?t understand the fragment 'the role value can be omitted, as there > is no predication constraining the argument node?. How the role values > would be supplied? Is it talking about the roles of _*_c predicate in the > example? Why not restrict the argument of the ^ operator to an node id? If > I search for sentences where the TOP predicate has lemma bark, I could use: > > ^[x] > x:+bark > > Does it make sense? > > 3. There is no proviso for querying VarSort? For instance, find > representations where a given verb has as argument a node that is first > person singular. We can?t search for verbs in a specific tense or aspect. > > 4. The ERS fingerprints (http://moin.delph-in.net/ErgSemantics) and WQL > are very related, right? Do we have any document that describes ERS > fingerprints? > > > My idea is to reimplement the parser of WDL and the transformation to > SPARQL [3]. I would like to support MRS, DMRS and EDS initially. The > reimplementation will match the new RDF encoding for the semantic > structures that I am proposing. The RDF vocabulary is still under > construction, in particular, there are parts of the semantic structure that > are grammar dependent (for example, the VarSort) and I am still not sure > how to deal with that. > > This is my first very preliminar draft of the WQL BNF is: > > WQL := predexp > predexp := predication | predexp OP predexp | ( predexp ) | ! predexp > OP := ?|" | ? " > predication := [id ?:?] pattern [ ?[" arglist ?]? ] > arglist := argument | argument ?," arglist > argument:= rolelabel id > rolelabel := wdpattern > pattern := wdpattern | lemma_pattern | pos_pattern | sense_pattern > lemma_pattern := ?+" wdpattern > pos_pattern := ?/" wdpattern > sense_pattern := ?=" wdpattern > wdpattern := [^?* ][\w]+ > > > Ps: can I potentially implement a HPSG grammar to parse any context free > grammar like the one above, right? It would be funny to have grammars to > parse this DSL. > > > [1] https://alt.qcri.org/semeval2015/task18/index.php?id=search > [2] http://moin.delph-in.net/WeSearch/QueryLanguage > [3] https://www.w3.org/TR/sparql11-query/ > > > Best, > Alexandre > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.A.Carroll at sussex.ac.uk Mon Dec 28 20:00:41 2020 From: J.A.Carroll at sussex.ac.uk (John Carroll) Date: Mon, 28 Dec 2020 19:00:41 +0000 Subject: [developers] new release of LKB-FOS Message-ID: Hi all, I've made a new release of LKB-FOS. Pre-packaged binaries etc are at the usual place http://users.sussex.ac.uk/~johnca/lkb_fos.tgz and the LKB SVN fos branch is up to date. There's not a lot new on the surface: it's just a bit more zippy and less buggy. See below for details (taken from the README). John ------ * Made dialog boxes open a little less slowly on macOS, partially working around a widely complained-about graphics issue in XQuartz. * Internal improvements to use more appropriate data structures in quickcheck and generator. More thorough consistency checking when reading quickcheck paths file. * In the parser, setting the parameter *non-idiom-root* had no effect; now, if set to the name of an instance, this is checked against each parsing result to see whether *additional-root-condition* needs testing. In the generator, failure of *additional-root-condition* now only outputs a warning. * Reversed a poor decision in the July 2020 version to generalise passive edge top types internally to improve packing; this caused problems with GG. * Updated tsdb and swish++ binaries to March 2020 versions from the LOGON distribution; profiles with fields containing integers longer than 30 bits are now retrieved correctly. [incr tsdb()] failed to follow symbolic links to profiles - fixed. * In LUI, when attempting interactive unification, a failure when applying a type constraint was sometimes ignored - fixed. Also fixed a problem displaying a type with no features in LUI. * In View... command dialogs, grammar entities with names consisting only of digits are now found. Also fixed related bug where the initial suggestion was displayed with vertical bars around it if it started with a digit. * Reading of transfer and MRS rules now follows the revised TDL syntax specification in TdlRfc; error recovery after TDL syntax errors in these rule files is also improved. From oe at ifi.uio.no Mon Dec 28 20:04:15 2020 From: oe at ifi.uio.no (Stephan Oepen) Date: Mon, 28 Dec 2020 20:04:15 +0100 Subject: [developers] migration of collaboration infrastructure In-Reply-To: References: Message-ID: dear colleagues, > over the next two weeks, services hosted in the following domains will be migrated to a new system at the university of oslo: > > + delph-in.net [...] the bulk of the service migration is now complete, and there is both good news and bad news. i would like to invite everyone to take a critical look and let me know if you find anything missing (please recall that the WeSearch semantic query interface has been discontinued). in particular, moving the DELPH-IN wiki turned out more involved than anticipated. for the time being, the wiki is read-only, and (truth be told) i am not sure i will be able to re-enable edit functionality. on the technical side (owing to a dependency on Python 2.x), we had to convert from running in the WSGI framework to a more traditional CGI set-up, which means that URLs in the new wiki now require an extra path component, e.g. http://moin.delph-in.net/FrontPage --> http://moin.delph-in.net/wiki/FrontPage old-style URLs will be automatically rewritten by the server, so in principle the above should not cause any broken links. content-wise, as part of the migration, we holiday-cleaned some 12,538 wiki accounts and 33,481 spam pages. this was done heuristically, but i hope no genuine content has been lost. to regain a fully functional DELPH-IN wiki, i would like to urgently ask for help: i believe we should either find another site to host and maintain a fresh MoinMoin instance (preferably with WSGI support) or migrate the wiki content (and, ideally, user accounts and revision history) into a more modern platform. regarding the latter, i imagine either MediaWiki or GitHub wikis would be strong candidates (and could presumably be hosted on the public M$ GitHub service). now that we are down to a manageable number of pages and users in the DELPH-IN MoinMoin instance, i believe the migration task should not be insurmountable. i will be happy to provide an archive of all content and revision history. i sincerely hope we can find volunteers in the DELPH-IN community to pick up the ball and work toward a modern and sustainable DELPH-IN wiki? best wishes, oe From arademaker at gmail.com Mon Dec 28 23:56:15 2020 From: arademaker at gmail.com (Alexandre Rademaker) Date: Mon, 28 Dec 2020 19:56:15 -0300 Subject: [developers] [delph-in] migration of collaboration infrastructure In-Reply-To: References: Message-ID: <4144A6B2-C080-48C1-A3C2-DEB83C905059@gmail.com> Hi, Tomorrow, at 3PM (Brazil, 10AM Pacific Time) we (Olga and me) will have a meeting to discuss alternatives to the current DELPHI-IN wiki running in a MoinMoin instance. Anyone here is welcome to the discussion, our first step is to identify the requirements and alternatives. I am also working to try a temporary solution, that is, moving the MoinMoin to a new machine. Here is the link for the meeting https://ibm.webex.com/meet/alexrad Best, Alexandre > On 28 Dec 2020, at 16:04, Stephan Oepen wrote: > > dear colleagues, > >> over the next two weeks, services hosted in the following domains will be migrated to a new system at the university of oslo: >> >> + delph-in.net > > [...] > > the bulk of the service migration is now complete, and there is both > good news and bad news. i would like to invite everyone to take a > critical look and let me know if you find anything missing (please > recall that the WeSearch semantic query interface has been > discontinued). > > in particular, moving the DELPH-IN wiki turned out more involved than > anticipated. for the time being, the wiki is read-only, and (truth be > told) i am not sure i will be able to re-enable edit functionality. > > on the technical side (owing to a dependency on Python 2.x), we had to > convert from running in the WSGI framework to a more traditional CGI > set-up, which means that URLs in the new wiki now require an extra > path component, e.g. > > http://moin.delph-in.net/FrontPage --> http://moin.delph-in.net/wiki/FrontPage > > old-style URLs will be automatically rewritten by the server, so in > principle the above should not cause any broken links. content-wise, > as part of the migration, we holiday-cleaned some 12,538 wiki accounts > and 33,481 spam pages. this was done heuristically, but i hope no > genuine content has been lost. > > to regain a fully functional DELPH-IN wiki, i would like to urgently > ask for help: i believe we should either find another site to host and > maintain a fresh MoinMoin instance (preferably with WSGI support) or > migrate the wiki content (and, ideally, user accounts and revision > history) into a more modern platform. regarding the latter, i imagine > either MediaWiki or GitHub wikis would be strong candidates (and could > presumably be hosted on the public M$ GitHub service). > > now that we are down to a manageable number of pages and users in the > DELPH-IN MoinMoin instance, i believe the migration task should not be > insurmountable. i will be happy to provide an archive of all content > and revision history. i sincerely hope we can find volunteers in the > DELPH-IN community to pick up the ball and work toward a modern and > sustainable DELPH-IN wiki? > > best wishes, oe From goodman.m.w at gmail.com Wed Dec 30 03:19:10 2020 From: goodman.m.w at gmail.com (goodman.m.w at gmail.com) Date: Wed, 30 Dec 2020 11:19:10 +0800 Subject: [developers] Serializing EDS without a top In-Reply-To: References: Message-ID: Hello, just a brief update about EDS serialization with PyDelphin. I took out, for now, the code that creates an unlinked TOP as described in the previous message. The other changes (indentation, predicate modification, blank lines) remain. Since PyDelphin now indents EDS with newlines by default, those without any TOP or INDEX will only have '{' on the first line, which I think is how the LKB behaves, based on our discussions. Inserting an unlinked top seems like it might be useful as a more general (not just EDS) future extension, if there's a need. I've just released PyDelphin 1.5.0 with these changes. On Fri, Dec 18, 2020 at 4:10 PM goodman.m.w at gmail.com wrote: > Thanks for the response, Stephan, > > On Thu, Dec 17, 2020 at 6:39 PM Stephan Oepen wrote: > >> [...] >> in a nutshell, EDS native serialization is indeed line-oriented, and i >> am inclined to hold fast on the one-node-per-line convention. i would >> not want to muddy these waters, since the format has been around since >> 2002, and there has been some EDS activity beyond DELPH-IN. i know of >> at least two EDS readers that rely on the presence of line breaks. >> > > Ok, sounds good. Then perhaps my previous message may be informative if > the maintainer(s) of those two readers ever decide to embrace the > convenience of single-line EDS. Other than determining the top of the > graph, adapting the readers should be trivial: just treat \n as any other > whitespace. > > i do see the benefits of a more compact serialization, however, but >> would recommend you call that something else (say EDSLines), if you >> decide to implement it in pyDelphin. > > > It's been implemented for some time now. In fact all codecs have a -lines > variant (simplemrs -> simplemrs-lines, dmrx -> dmrx-lines, etc.). E.g., in > the case of XML formats, it outputs each item ( or ) on a line > and suppresses the root nodes (, ). > > >> [...] >> {_: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] } >> {\n e2:_rain_v_1<3:9>[]\n e3:_heavy_a_1<10:42>[ARG1 e2]\n } >> {: e2:_rain_v_1<3:9>[] e3:_heavy_a_1<10:42>[ARG1 e2] } >> >> the above order reflects what i believe would be my personal ranking >> just now :-). i frequently use underscores for ?anonymous? MRS >> variables, and the first variant feels maybe most natural: there >> should be a top identifier, but in this case it is missing. > > > The 'anonymous' node identifier for a fake top is fine and, conveniently, > PyDelphin can already read in this variant. The difference is that '_' is a > valid identifier in EDS, so it's not actually missing, just unlinked. I > think logically an unlinked top is the same as a null top, but this means > that PyDelphin may write an EDS that is different (in terms of Python data > structures, viz., upon re-reading the serialization) as the source EDS. > > > >> The >> second variant also would seem to maintain compatibility with the >> native EDS serialization, only introducing an inline encoding of line >> breaks. > > > Inserting a literal '\' and 'n' is awkward and changes the format, and I > don't see how it's compatible at all besides having '\' and 'n' in the same > location as your preferred newline characters. > > >> variant #3, on the other hand, i believe would depart from >> how native serialization deals with missing tops; thus, if you were to >> opt for this format, it would be even more important to maintain a >> clear distinction between EDS native serialization and the pyDelphin >> EDSLines format. >> > > If the thing between the first '{' and the first ':' is the top > identifier, then if nothing is there the top is null. This is easy to parse > and (I thought) easy to understand. As EDS native serialization from > PyDelphin has done this for some time, I will continue to read it in, but > going forward I will not write it out. As of the latest commit, I just omit > the top entirely, which is what your newline-ful variant would do if it > were simply newline-less (see the last EDS of my first message). I have > written, but have not yet pushed to GitHub, a change that inserts an > anonymous '_' top if the top is null (if '_' is already used by some node, > I try '_0', then '_1', etc. until I get an unused one). > > I have also made the following changes (which I think you'll be happy > with): > - The default serialization is now indented with newlines (and this is > true of all codecs); use eds-lines to get the single-line variant > - Conversion from MRS now uses predicate modification by default > - Blank lines are inserted between indented EDSs (not sure if your readers > actually require this) > > > >> >> i hope the above makes sense to you? oe >> >> >> On Wed, Dec 16, 2020 at 10:41 AM goodman.m.w at gmail.com >> wrote: >> > >> > Hello developers, >> > >> > It's been a while but I'm returning to a discussion we were having >> about serializing EDS in the native format when there is no TOP and when >> there's no INDEX to backoff to. Stephan suggested that EDS is a line-based >> format (i.e., line breaks are required), while I would like to continue to >> support single-line EDS in PyDelphin. I think the last word on the subject >> from Stephan, at least on this list, was mid-September ( >> http://lists.delph-in.net/archives/developers/2020/003140.html), where >> he said he'd continue discussion on another thread, which presumably meant >> the thread from late August ( >> http://lists.delph-in.net/archives/developers/2020/003127.html). I don't >> think the discussion did continue, so I'm starting this thread in case >> anyone is interested. >> > >> > As an example, here's an EDS (without properties) for "It rained." >> > >> > {e2: >> > e2:_rain_v_1<3:9>[] >> > } >> > >> > In PyDelphin, when an EDS has no TOP, I was outputting the first colon >> anyway, intentionally: >> > >> > {: >> > e2:_rain_v_1<3:9>[] >> > } >> > >> > It's a bit ugly, but it allows me to detect, with 1 token of lookahead, >> if there's a top or not. If the colon is omitted then it's not clear if >> "e2:" is the top or the start of the first node. If line breaks are >> required, we just assume the first line is for the top, whether or not it's >> there. But for single-line EDS, we need 4 tokens of lookahead to determine >> if there's a top (assuming the parser treats variables and predicates as >> the same kinds of tokens): >> > >> > {e2: e2:_rain_v_1<3:9>[]} >> > {e2:_rain_v_1<3:9>[]} >> > >> > Here is the parsing algorithm, once we've consumed the first '{': >> > >> > 1. If the 1st lookahead token is ':', '(fragmented)' (or another graph >> status), '}', or '|' (node status), then we know that TOP is missing (the >> ':' is for PyDelphin's current output) >> > 2. Otherwise the 1st and 2nd tokens must be a symbol and a colon, and >> if the 3rd token is a graph or node status, OR if the 4th token is ':', >> then the 1st token is the TOP >> > 3. Otherwise TOP must be missing >> > >> > I think this covers all the cases but let me know if I've missed >> anything. >> > >> > -- >> > -Michael Wayne Goodman >> > > > -- > -Michael Wayne Goodman > -- -Michael Wayne Goodman -------------- next part -------------- An HTML attachment was scrubbed... URL: