[developers] mocking a POS-tagger to handle unk words

Thu Mar 22 19:42:32 CET 2018

Dear developers!

I am looking into the problem of handling unknown roots with LKB and ACE in
a situation where we want to first be able to analyze the word
morphologically (apply lexical rules).

I had already sent an email about that a year ago, and Francis and I
actually sat down and went through the process of constructing a minimal
example which showed that there was a problem of some sort preventing us
from analyzing the word morphologically and using the unknown word handling
machinery at the same time.

Alas, I cannot recover any record of this. It is possible that we did that
on Francis's computer,...

Anyway, I want to reconstruct this minimal example one more time, this time
hopefully understanding more and producing some actual documentation.

I would like to start from recreating what e.g. the ERG does: treating the
words as full-form, relying on a POS tag which maps the word to a specific
unknown_type.

I have a small grammar to which I added what I was able to detect as
relevant in the ERG (generic lexical entries, unknown onset etc). I also
included mtr.tdl and I included it into the script.

Next thing I need to understand (I think) is what does it actually mean to
"mock the POS tagger". How do I make the system aware of that information?

I can see that the tags can be mapped to the generic lexical entries as
described in http://moin.delph-in.net/PetInput. But how do I get the tags
in the first place? Suppose I just want to consider everything the same
POS, for starters.

Thank you!
Olga
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20180322/56300f61/attachment.html>