<div dir="ltr">Thanks very much, Paul, Woodley, and Michael. Michael, thanks especially for the detailed explanation!<div><br></div><div>I did not notice that YY mode has a field for a POS tag. I will try that then.</div><div><br></div><div>Best,</div><div>Olga</div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Mar 22, 2018 at 4:11 PM Michael Wayne Goodman <<a href="mailto:goodmami@uw.edu">goodmami@uw.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div>Following Woodley's suggestion, for YY-mode I can point you to a few things.<br><br></div>In Jacy, we use POS tags from an external morphological analyzer (previously Chasen; recently MeCab). We have a script that takes the output of MeCab and transforms it into the YY format. Note the definition of the pos_info variable---it holds POS data that is
slightly more complex than a simple, e.g., NNS or VBG tag.<br><br> <a href="https://github.com/delph-in/jacy/blob/develop/utils/jpn2yy" target="_blank">https://github.com/delph-in/jacy/blob/develop/utils/jpn2yy</a><br><br>Then see gle.tdl in Jacy, which maps the POS "tags" to generic lexical entries:<br><br> <a href="https://github.com/delph-in/jacy/blob/develop/gle.tdl" target="_blank">https://github.com/delph-in/jacy/blob/develop/gle.tdl</a>.<br><br>For ACE (and presumably other processors) you might also need to define paths to the token info:<br><br> <a href="https://github.com/delph-in/jacy/blob/develop/ace/config.tdl#L143-L151" target="_blank">https://github.com/delph-in/jacy/blob/develop/ace/config.tdl#L143-L151</a><br></div><br></div>When you call ACE you'll need to tell it to expect YY input. I think it's the -y option. There might be some other pieces to this that Woodley or Francis can probably fill in for you. In my experiments, YY mode did help a bit for getting parses where the standard machinery for unknowns failed.<br></div><br></div>If you're working in Python, then PyDelphin's 'tokens' module can help with constructing YY input. This section of the relevant unit tests might be informative:<br><br> <a href="https://github.com/delph-in/pydelphin/blob/develop/tests/tokens_test.py#L40-L59" target="_blank">https://github.com/delph-in/pydelphin/blob/develop/tests/tokens_test.py#L40-L59</a><br></div></div><div class="gmail_extra"></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 22, 2018 at 3:40 PM, Woodley Packard <span dir="ltr"><<a href="mailto:sweaglesw@sweaglesw.org" target="_blank">sweaglesw@sweaglesw.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto"><div></div><div>Hi Olga,</div><div><br></div><div>Since you are interested primarily in a demonstration rather than a real world system from what I understand, why not specify the POS tags as part of the input, using YY mode?</div><span class="m_6731330848249017968HOEnZb"><font color="#888888"><div><br></div><div>Woodley</div></font></span><div><div class="m_6731330848249017968h5"><div><br>On Mar 22, 2018, at 11:42 AM, Olga Zamaraeva <<a href="mailto:olzama@uw.edu" target="_blank">olzama@uw.edu</a>> wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr">Dear developers!<div><div><br></div><div>I am looking into the problem of handling unknown roots with LKB and ACE in a situation where we want to first be able to analyze the word morphologically (apply lexical rules). </div><div><br></div><div>I had already sent an email about that a year ago, and Francis and I actually sat down and went through the process of constructing a minimal example which showed that there was a problem of some sort preventing us from analyzing the word morphologically and using the unknown word handling machinery at the same time.</div><div><br></div><div>Alas, I cannot recover any record of this. It is possible that we did that on Francis's computer,...</div><div><br></div><div>Anyway, I want to reconstruct this minimal example one more time, this time hopefully understanding more and producing some actual documentation.</div><div><br></div><div>I would like to start from recreating what e.g. the ERG does: treating the words as full-form, relying on a POS tag which maps the word to a specific unknown_type.</div><div><br></div><div>I have a small grammar to which I added what I was able to detect as relevant in the ERG (generic lexical entries, unknown onset etc). I also included mtr.tdl and I included it into the script.</div><div><br></div><div>Next thing I need to understand (I think) is what does it actually mean to "mock the POS tagger". How do I make the system aware of that information? </div></div><div><br></div><div>I can see that the tags can be mapped to the generic lexical entries as described in <a href="http://moin.delph-in.net/PetInput" target="_blank">http://moin.delph-in.net/PetInput</a>. But how do I get the tags in the first place? Suppose I just want to consider everything the same POS, for starters. </div><div><br></div><div>Thank you!</div><div>Olga</div></div>
</div></blockquote></div></div></div></blockquote></div><br><br clear="all"><br></div><div class="gmail_extra">-- <br><div class="m_6731330848249017968gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">Michael Wayne Goodman<div>Ph.D. Candidate, UW Linguistics</div></div></div>
</div></blockquote></div>