[developers] FSPP conventions and PET configuration
Stephan Oepen
oe at ifi.uio.no
Mon Aug 18 16:26:00 CEST 2008
hi richard,
> 1. I noticed that when passing input to the LKB-based FSPP
> which contains a literal "&", this is reproduced literally
> within SMAF XML output, which, of course, makes the XML
> non-well-formed. Ann thinks that the convention is to pass
> input to FSPP in XML-escaped form, but wasn't sure. Can
> anyone confirm this?
i would call the behavior you describe a bug in FSPP (in XML mode). i
would not want to make the assumption that all input to our parsers is
XML-escaped. on the contrary, i would start from the assumption that
the parser input is `plain' text (which might include characters that
are special to XML, or any other mark-up language for that matter, say
`&', `<', and '>'). in other words, the grammar should be able to use
strings like `AT&T' or `P&P' (as stems) in the lexicon. likewise, the
current WebErsatz FSPP rule correctly matches URLs like
http://foo.bar/baz.php?foo=1&bar=2
but, using the LKB and FSPP in SMAF mode, this fails to parse. the ERG
actually has a few FSPP rules to map XML entities to plain characters,
i.e. it is anticipating `enriched' text (including some mark-up) as its
input. so even XML-escaping the input may not help in these examples,
as post-FSPP one might be back to literal ampersands. in my view, the
only solution here is to make FSPP XML-escape its output properly.
ben has been the maintainer of FSPP in recent years (and the one to add
SMAF support), so i wonder whether he has the time to address this bug?
> 2. I was wondering how PET interprets the "start-symbols" option,
> and the "-results" and "-nsolutions" options.
>
> The default in the ERG's "english.set" is as follows:
>
> start-symbols := $root_strict $root_frag $root_informal $root_inffrag.
>
> Is this list prioritized in any sense? In the example, are all
> results that match "root_strict" returned BEFORE "root_frag", or
> is the ordering decided strictly by the parse selection module?
no, the set of root nodes is unordered, i.e. there is no way of giving
preference to results licensed by, say, root_strict vs. root_frag; the
parse selection model is solely determining the order of results here.
all best - oe
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++ Universitetet i Oslo (IFI); Boks 1080 Blindern; 0316 Oslo; (+47) 2284 0125
+++ CSLI Stanford; Ventura Hall; Stanford, CA 94305; (+1 650) 723 0515
+++ --- oe at ifi.uio.no; oe at csli.stanford.edu; stephan at oepen.net ---
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the developers
mailing list