[developers] New ERG with improved tokenization/preprocessing for PET
Francis Bond
bond at ieee.org
Sat Jul 18 12:54:50 CEST 2009
G'day,
2009/7/18 Stephan Oepen <stephan.oepen at gmail.com>:
> howdy,
>
> i was expecting the topic to (also) come up naturally on wednesday morning.
> maybe we can see which specific questions we want to address related to
> praraphrasing, and which ones we leave to chart mapping and pre-processing.
> and maybe i'll even manage to summarize before then what we concluded for
> this forthcoming ERG release from the earlier discussion ...
That would be great. I was planning to talk a little bit about (i)
what we wanted to be able to do with unknown words in paraphrasing and
(ii) one possible approach. I would be happy to stop there, and
leave the full discussion for Wednesday.
> i don't suppose you noticed that generating involving unknown words to
> parsing now works (in the original paraphrase setup, i.e. /not/ EnEn)?
You mean passing something like "frodo_n_unk_rel" and hoping it would
generate? Yes we noticed that, thank you. We rely on it in JaEn.
I also noticed with sorrow:
(mt::parse-interactively "The frodo barks.")
NIL
TSNLP(11): [18:48:47] translate(): read 1 MRS as generator input.
[18:48:47] translate(): processing MRS # 0 (6 EPs).
[18:48:47] translate(): error `invalid predicates: |"_frodo/NN_u_unknown_rel"|'.
[18:48:49] gc-after-hook(): {L#89 N=2.4m O=0 E=80%} [S=1009m R=822m].
So I hope to discuss it a little in paraphrasing, and then more on
Wednesday morning.
--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
More information about the developers
mailing list