[developers] mocking a POS-tagger to handle unk words
sweaglesw at sweaglesw.org
Thu Mar 22 23:40:49 CET 2018
Since you are interested primarily in a demonstration rather than a real world system from what I understand, why not specify the POS tags as part of the input, using YY mode?
> On Mar 22, 2018, at 11:42 AM, Olga Zamaraeva <olzama at uw.edu> wrote:
> Dear developers!
> I am looking into the problem of handling unknown roots with LKB and ACE in a situation where we want to first be able to analyze the word morphologically (apply lexical rules).
> I had already sent an email about that a year ago, and Francis and I actually sat down and went through the process of constructing a minimal example which showed that there was a problem of some sort preventing us from analyzing the word morphologically and using the unknown word handling machinery at the same time.
> Alas, I cannot recover any record of this. It is possible that we did that on Francis's computer,...
> Anyway, I want to reconstruct this minimal example one more time, this time hopefully understanding more and producing some actual documentation.
> I would like to start from recreating what e.g. the ERG does: treating the words as full-form, relying on a POS tag which maps the word to a specific unknown_type.
> I have a small grammar to which I added what I was able to detect as relevant in the ERG (generic lexical entries, unknown onset etc). I also included mtr.tdl and I included it into the script.
> Next thing I need to understand (I think) is what does it actually mean to "mock the POS tagger". How do I make the system aware of that information?
> I can see that the tags can be mapped to the generic lexical entries as described in http://moin.delph-in.net/PetInput. But how do I get the tags in the first place? Suppose I just want to consider everything the same POS, for starters.
> Thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the developers