[developers] Increasing parse coverage
aac10 at cam.ac.uk
Sat Apr 4 23:38:31 CEST 2015
I hope people more knowledgeable than I am will comment, but:
- titles without quotes won't help - I don't know if there's any mileage
in trying to insert these via a list of movie names.
- there are many very bizarre uses of ... - any idea where they came
from? It doesn't look like normal punctuation use to me. I would be
tempted to just try removing all of them ...
On 2015-04-04 17:01, Guy Emerson wrote:
> I'm trying to use the ERG to produce DMRSs for a sentiment analysis
> task. However, I'm getting relatively low coverage at the moment.
> I have run ACE with a freshly downloaded pre-compiled ERG as follows:
> ace -g erg.dat -1Tq filename
> ace -g erg.dat -1Tq -r "root_strict root_frag root_informal
> root_inffrag" filename
> In the first case, I got 64.4% coverage, and in the second, 86.4%.
> Are there are any further tricks I could use to improve coverage? I'm
> using the Stanford Sentiment Treebank, and I've put a
> 'sentence'-segmented version of the text here:
> Many lines are noun phrases or adjective phrases. There are also a
> lot of make-it-up-as-you-go hyphenated tokens.
More information about the developers