[developers] Increasing parse coverage

Sat Apr 4 23:38:31 CEST 2015

I hope people more knowledgeable than I am will comment, but:

- titles without quotes won't help - I don't know if there's any mileage 
in trying to insert these via a list of movie names.

- there are many very bizarre uses of ... - any idea where they came 
from?  It doesn't look like normal punctuation use to me.  I would be 
tempted to just try removing all of them ...

Ann

On 2015-04-04 17:01, Guy Emerson wrote:
> I'm trying to use the ERG to produce DMRSs for a sentiment analysis
> task.  However, I'm getting relatively low coverage at the moment.
> 
> I have run ACE with a freshly downloaded pre-compiled ERG as follows:
> 
> ace -g erg.dat -1Tq filename
> ace -g erg.dat -1Tq -r "root_strict root_frag root_informal
> root_inffrag" filename
> 
> In the first case, I got 64.4% coverage, and in the second, 86.4%.
> 
> Are there are any further tricks I could use to improve coverage?  I'm
> using the Stanford Sentiment Treebank, and I've put a
> 'sentence'-segmented version of the text here:
> 
> https://raw.githubusercontent.com/guyemerson/Sentimantics/master/data/sentibank.txt
> [1]
> 
> Many lines are noun phrases or adjective phrases.  There are also a
> lot of make-it-up-as-you-go hyphenated tokens.
> 
> Best,
> Guy.
> 
> Links:
> ------
> [1]
> https://raw.githubusercontent.com/guyemerson/Sentimantics/master/data/sentibank.txt