[developers] sentence splitting

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Wed May 9 14:25:00 CEST 2007


is anyone using an XML-aware sentence splitter? By this I mean something that 
can be given XML marked up text and parameterized to put sentence boundaries 
in `sensible' places given the markup.

e.g., with a file containing:

This is sentence is partly in <IT>italics.</IT>

we would like the sentence boundary to be inserted after the </IT>

I am aware there are lots of issues in doing this properly, but leads would be 
appreciated.  As usual, we're interested in solutions that would be generally 
available, preferably Open Source, not proprietary software.

Ann





More information about the developers mailing list