[developers] semantic representations in RDF

Tue Jul 21 22:31:55 CEST 2020

Hi Stephan,

Thank you for your attention on that thread. I am afraid that we should have more differences between the code running in http://wesearch.delph-in.net/deepbank/search.jsp and the code in the SVN repository http://svn.delph-in.net/wsi/trunk/ that I compile and it is running in my local machine following the steps in http://moin.delph-in.net/WeSearch/Interface.

I am attaching the two SPARQL produced by the same search string `x: _fi*[ARG* y]`. In both cases, the query was submitted to the EDS representations.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sparql-wesearch.txt
URL: <http://lists.delph-in.net/archives/developers/attachments/20200721/668479b6/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: sparql-local.txt
URL: <http://lists.delph-in.net/archives/developers/attachments/20200721/668479b6/attachment-0001.txt>
-------------- next part --------------



Note how in the local instance, the pattern `_fi*` is transformed into an enumeration of the predicates found in the dataset:

{ ?100 eds:predicate "_fight_n_1"^^xsd:string } UNION { ?100 eds:predicate "_fight_v_1"^^xsd:string }

But in the SPARQL on the delph-in.net server, the pattern is transformed into a regex filter 

regex(?100TEXT, "^_fi.*$?)

The same happens when I submitted the query `h:_fi*[ARG* x]` to the MRS representations. 

For the SVN to Git, if you agree, I can repeat the process that I executed to FFTB (reported in another email today) to create a git clone from the WSI SVN. Maybe if Michael add me in the Delphin-in organization I can already create the repository there.

Yes, I agree that we can have ontologies/vocabularies defined to each representation and I could work on that. We could take as starting point the discussion at http://moin.delph-in.net/WeSearch/Rdf, right? There are some notes in the end of the page http://moin.delph-in.net/ErgWeSearch too.

But first, I want to understand what updates we have from the 2015 SDP shared-task data formats and the current work you are doing in the http://mrp.nlpl.eu/2020/index.php and https://github.com/cfmrp/mtool. EDS is one particular format that can be described in MRP format, right? We also have the SDP tabular format, does it make sense to support all these formats? If you prefer, we can schedule a call for sync on the goals and possible approaches. 

For code, yes, I don?t like Java. It would be nice to take the opportunity to better understand the Lisp code embedded in the LKB, TSDB and some other packages in the LOGON repository. 

The only changed that I made in the code so far is shown below. I am also using apache-jena-3.15.0, the last version of Jena.

% svn diff
Index: src/common-gui/src/main/webapp/WEB-INF/web.xml
===================================================================

--- src/common-gui/src/main/webapp/WEB-INF/web.xml	(revision 28808)
+++ src/common-gui/src/main/webapp/WEB-INF/web.xml	(working copy)
@@ -18,7 +18,7 @@
 		<servlet-class>no.uio.ifi.wsi.gui.SearchInterface</servlet-class>
 		<init-param>
 			<param-name>DATA_PATH</param-name>
-			<param-value>/ltg/ls/aserve/indices/sdp/</param-value>
+			<param-value>/Users/ar/hpsg/text-entailment/data/</param-value>
 		</init-param>

 		<load-on-startup>1</load-on-startup>
Index: src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java
===================================================================
--- src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java	(revision 28808)
+++ src/rdf-generator/src/main/java/no/uio/ifi/wsi/generator/CreateIndex.java	(working copy)
@@ -27,8 +27,8 @@
 		CountIndexGenerator generator = new CountIndexGenerator(cmlReader.getCountDirectory());
 		generator.index(cmlReader.getRdfDirectory());
 		generator.writeCache();
-		runProcess(new String[] { "apache-jena-2.11.0/bin/tdbloader2", "--loc", cmlReader.getTdbDirectory() + "/1",
-				cmlReader.getRdfDirectory() + "/*" });
+		runProcess(new String[] { "apache-jena/bin/tdbloader2", "--loc", cmlReader.getTdbDirectory() + "1",
+				cmlReader.getRdfDirectory() + "1.nq" });
 	}

 	public static void runProcess(String[] command) throws Exception {


Best,
Alexandre



> On 21 Jul 2020, at 15:57, Stephan Oepen <oe at ifi.uio.no> wrote:
> 
> hi again, alexandre:
> 
>> By current interface I mean the one I was able to run in my local machine taking the current version of the code in:
>> 
>> http://svn.delph-in.net/wsi/trunk
> 
> i see, i had not realized you had gotten so far as to run your own WSI
> instance ... congratulations on that milestone!
> 
>> Documentation of the query language WQL in http://alt.qcri.org/semeval2015/task18/index.php?id=search is not clear about the operators vs format they support.
> 
> yes, in fact there is no complete documentation of the WQL syntax and
> of which operators are restricted to which formats.  the above page
> (the closest we come to WQL documentation, i believe) is from the SDP
> shared tasks, hence only applies to the bi-lexical frameworks (DM,
> PAS, PSD, and CCD).
> 
>> I was expecting that regex would work in the predicates of EDS or MRS. So a query `x: _fight*[ARG* y]` could match a sentence with a predicate `_fight_v_1`.
> 
> yes, that type of wildcarding should indeed be applicable to pretty
> much any query elements and graph formats.  your example query works
> on the DeepBank index for ESD:
> 
> http://wesearch.delph-in.net/deepbank/
> 
> it does not match any results when searching the DeepBank MRSs,
> however.  that is because WQL variables in an MRS index are
> (interpreted as if) typed using the standard MRS conventions, i.e.
> there is no predication whose label is of type 'x' and where there is
> some argument of type 'y'.  if works if you modify the query to comply
> with MRS types: 'h:_fight_*[ARG* x]'.
> 
>> You mentioned that you have an instance of the wsearch interface running too. Are you using the same code of the repository above? Do you know about any update/branch of this code?
> 
> i believe UW is not currently running their own WSI instance, because
> they worry that index performance inside a virtual machine might not
> scale favorably.  the improvements made by the UW MSc student are in
> the WSI trunk, so you (unlike me) are using the latest and greatest
> :-).
> 
> $ svn log http://svn.delph-in.net/wsi/trunk |head
> ------------------------------------------------------------------------
> r27878 | rpearah at uw.edu | 2019-05-25 20:30:59 +0200 (Sat, 25 May 2019) | 1 line
> 
> chore: ? Add missing dependencies to pom.xml
> ------------------------------------------------------------------------
> r27877 | rpearah at uw.edu | 2019-05-25 20:30:54 +0200 (Sat, 25 May 2019) | 1 line
> 
> style: ? Some minor style changes to MRS representation
> ------------------------------------------------------------------------
> r27804 | rpearah at uw.edu | 2019-05-15 23:08:31 +0200 (Wed, 15 May 2019) | 1 line
> 
>> 1. New code (not java based) for transform the semantic representations to RDF
>> 2. New code (not java based) to transform WQL to SPARQL.
> 
> yes, the first of these is also something i have been meaning to do
> natively in lisp, i.e. export directly to RDF, rather than export to
> those [incr tsdb()] ASCII files and then parse these in java, to
> convert to RDF.  i believe we should have turtle 'ontologies' (or
> schemas, if you will) for the various RDF representations, i.e. at
> least MRS, EDS, and DM.  i am tempted to migrate the WSI code from SVN
> to M$ GitHub, and then we could maybe collect these schemas there, and
> you could look into generating the RDF serializations without java?
> 
> as for the second, the WQL parser is fairly tightly integrated with
> the web application and RDF back-end ... here i am not as sure that
> isolating just the parser will be worthwhile?  i take it you are about
> as eager a java person as i am :-)?
> 
> best wishes, oe