<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">
<META NAME="GENERATOR" CONTENT="GtkHTML/3.10.3">
</HEAD>
<BODY>
On Fri, 2006-11-10 at 12:24 +0100, Yi Zhang wrote:<BR>
<BLOCKQUOTE TYPE=CITE>
<FONT COLOR="#000000">Hi Stephan and all,</FONT><BR>
<BR>
<FONT COLOR="#000000">I do find the changes appropriate :-) thanks for the work. It is true that the forest creation is relatively inexpensive. However, Valia and I are still a little concerned about the potential efficiency loss on the German Grammar. Berthold, could you estimate how large the efficiency loss will be? Is an extra option necessary? </FONT><BR>
<BR>
</BLOCKQUOTE>
I can do a test run next week. But if restricting the number of solutions during forest creation may result in losing the optimal parse, I do not think that sounds too attractive as a performance measure. Probably, this can only be decided on experimentally. <BR>
As I said in another mail to Stephan, attacking the discontinuity issue is probably a more appropriate place to solve most of the remaining efficiency problems with German. <BR>
<BR>
<BR>
Berthold<BR>
<BLOCKQUOTE TYPE=CITE>
<FONT COLOR="#000000">Theoretically, this might lead to the discussion of necessity of selective (k-best) forest creation. But an extra option for the forest creation will be an easy (though no-optimal) solution.</FONT><BR>
<BR>
<FONT COLOR="#000000">Another use of such an option I can think of is in the coverage test, where only the parsability of the sentence is interested. </FONT><BR>
<FONT COLOR="#000000">In such cases, the creation of the entire parse forest does not seem necessary. </FONT><BR>
<BR>
<FONT COLOR="#000000">Stephan, Berthold and Bernd, what do you think?</FONT><BR>
<BR>
<FONT COLOR="#000000">Best,</FONT><BR>
<FONT COLOR="#000000">yi</FONT><BR>
<BR>
</BLOCKQUOTE>
<BLOCKQUOTE TYPE=CITE>
<FONT COLOR="#000000">On 11/9/06, </FONT><FONT COLOR="#000000"><B>Stephan Oepen</B></FONT><FONT COLOR="#000000"> <<A HREF="mailto:oe@csli.stanford.edu">oe@csli.stanford.edu</A>> wrote:</FONT><BR>
<BLOCKQUOTE>
<FONT COLOR="#000000">hi again,</FONT><BR>
<BR>
<FONT COLOR="#000000">> I also think the use of `-nsolutions' is particularly vague at the </FONT><BR>
<FONT COLOR="#000000">> moment. I believe this is partly due to the split of the parsing</FONT><BR>
<FONT COLOR="#000000">> phases. To PET developers, should the option be splitted for</FONT><BR>
<FONT COLOR="#000000">> particular phases of parsing?</FONT><BR>
<BR>
<FONT COLOR="#000000">i had to check the code to convince me the above was true :-). i think </FONT><BR>
<FONT COLOR="#000000">in packing mode, `-nsolutions' should only affect the second phase, and</FONT><BR>
<FONT COLOR="#000000">we should always compute the full forest. i was so sure of this point</FONT><BR>
<FONT COLOR="#000000">of view that i just checked in the code changes to make it so. here is </FONT><BR>
<FONT COLOR="#000000">what i put into the ChangeLog:</FONT><BR>
<BR>
<FONT COLOR="#000000"> - ignore nsolutions limit in forest construction phase when packing</FONT><BR>
<FONT COLOR="#000000"> is on; the rationale here is that (a) forest construction is cheap</FONT><BR>
<FONT COLOR="#000000"> and (b) we need to have the full forest available for selective </FONT><BR>
<FONT COLOR="#000000"> unpacking to compute the correct sequence of n-best results.</FONT><BR>
<BR>
<FONT COLOR="#000000">in fact, what i say about selective unpacking here is equally true for</FONT><BR>
<FONT COLOR="#000000">the exhaustive unpacking mode (which should soon be deprecated, as it</FONT><BR>
<FONT COLOR="#000000">remains restricted to local features). while i write this, i realize</FONT><BR>
<FONT COLOR="#000000">that forest construction may be more expensive in GG, hence my change</FONT><BR>
<FONT COLOR="#000000">might cause berthold a loss in efficiency? a small price for greater</FONT><BR>
<FONT COLOR="#000000">precision, i would hope! berthold, if not, i volunteer to add another </FONT><BR>
<FONT COLOR="#000000">switch, just as zhang yi had suggested.</FONT><BR>
<BR>
<FONT COLOR="#000000">while making this change, i checked in a few more minor updates, viz:</FONT><BR>
<BR>
<FONT COLOR="#000000"> - allow selective unpacking by default when `-packing' is on, i.e. it</FONT><BR>
<FONT COLOR="#000000"> is no longer required to say `-packing=15' (but still `-nsolutions' </FONT><BR>
<FONT COLOR="#000000"> greater than 0 is needed to actually get selective unpacking);</FONT><BR>
<FONT COLOR="#000000"> - fix an error in the YY tokenizer to make it robust to tokens coming</FONT><BR>
<FONT COLOR="#000000"> in out of surface order;</FONT><BR>
<FONT COLOR="#000000"> - complete spring cleaning of identity2() along the lines of my email </FONT><BR>
<FONT COLOR="#000000"> of 31-oct (bernd i could not test jxchg output, but i am optimistic</FONT><BR>
<FONT COLOR="#000000"> i did the right thing);</FONT><BR>
<FONT COLOR="#000000"> - make the MEM reader robust to various value formats in the global</FONT><BR>
<FONT COLOR="#000000"> parameter section;</FONT><BR>
<FONT COLOR="#000000"> - ditch the (deprecated) *maxent-grandparenting* parameter; its name </FONT><BR>
<FONT COLOR="#000000"> is canonically *feature-grandparenting*, and [incr tsdb()] will use</FONT><BR>
<FONT COLOR="#000000"> that name in generating MEM files.</FONT><BR>
<BR>
<FONT COLOR="#000000">zhang yi and bernd, i hope you will all of the above agreeable!</FONT><BR>
<BR>
<FONT COLOR="#000000"> best - oe </FONT><BR>
<BR>
<FONT COLOR="#000000">+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++</FONT><BR>
<FONT COLOR="#000000">+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125</FONT><BR>
<FONT COLOR="#000000">+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515 </FONT><BR>
<FONT COLOR="#000000">+++ --- <A HREF="mailto:oe@csli.stanford.edu">oe@csli.stanford.edu</A>; <A HREF="mailto:oe@ifi.uio.no">oe@ifi.uio.no</A>; <A HREF="mailto:stephan@oepen.net">stephan@oepen.net</A> ---</FONT><BR>
<FONT COLOR="#000000">+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ </FONT>
</BLOCKQUOTE>
</BLOCKQUOTE>
<BLOCKQUOTE TYPE=CITE>
<BR>
</BLOCKQUOTE>
</BODY>
</HTML>