[developers] New ERG with improved tokenization/preprocessing for PET

Stephan Oepen oe at ifi.uio.no
Mon Apr 13 09:49:46 CEST 2009


> I think there may be an issue with TnT in that as far as I know TnT is
> not open, so some DELPH-IN members will not be able to use the
> unknown-word handling.  For example, I am fairly sure I don't have a
> local license for tnt, and then you also (probably) need a WSJ license
> for the model (which I do have).  

true, TnT source is not freely available, but binaries are provided
for public download from:

  http://heartofgold.dfki.de/Download.html#TnT

these binaries are also included with the LOGON tree (see LogonTop on
the wiki).  so, as far as i understand it, no need to fax an agreement
to thorsten any longer.

it might be worth looking into truly open-source taggers at some point,
though the current TnT licensing conditions seem suitable for typical
DELPH-IN use.

> Does everything apart from the unknown-words work without TnT?

yes, i should think so.

> Let me be the first to say Wheeeeeeeee!

for more background on this (emerging) new treebank, see:

  http://wiki.delph-in.net/moin/WeScience

                                               god påske :-)  -  oe

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++     CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++       --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the developers mailing list