[developers] New ERG with improved tokenization/preprocessing for PET
Stephan Oepen
oe at ifi.uio.no
Mon Apr 13 09:49:46 CEST 2009
> I think there may be an issue with TnT in that as far as I know TnT is
> not open, so some DELPH-IN members will not be able to use the
> unknown-word handling. For example, I am fairly sure I don't have a
> local license for tnt, and then you also (probably) need a WSJ license
> for the model (which I do have).
true, TnT source is not freely available, but binaries are provided
for public download from:
http://heartofgold.dfki.de/Download.html#TnT
these binaries are also included with the LOGON tree (see LogonTop on
the wiki). so, as far as i understand it, no need to fax an agreement
to thorsten any longer.
it might be worth looking into truly open-source taggers at some point,
though the current TnT licensing conditions seem suitable for typical
DELPH-IN use.
> Does everything apart from the unknown-words work without TnT?
yes, i should think so.
> Let me be the first to say Wheeeeeeeee!
for more background on this (emerging) new treebank, see:
http://wiki.delph-in.net/moin/WeScience
god påske :-) - oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++ --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the developers
mailing list