[developers] ICONS and generation

Sun Feb 7 11:12:27 CET 2016

Thanks! and thanks all!  I've come to a view on this which I think is 
consistent with what everyone has been saying.

First of all, note that in the MRS syntax, we do not distinguish between 
terminated and non-terminated lists/bags.  If we think about it from the 
perspective of typed feature structures, it is clear that there is a 
distinction - for instance a type `list' is the most general type of 
list, and the type `e-list' (empty list) is usually a maximally specific 
type.    Coming back to the notation I used in an earlier message, there 
is a distinction between { ... } (analogous to list in a TFS) and {} (cf 
e-list).

Now, there are two possible interpretations of ICONS as it arises from a 
DELPH-IN grammar (i.e., as it is output after parsing):
1. information structure
2. information structure as it arises from morphosyntax

In the `normal' sentences of the `Kim chased the dog' type, no 
information structure elements arise from morphosyntax.  We can, 
however, expect that various contexts (e.g., discourse) give rise to 
information structure in association with such a sentence.  Hence, with 
respect to interpretation 1, ICONS is not strictly empty but 
underspecified (and similarly a one element ICONS may be underspecified 
with respect to a two-element ICONS and so on).  I think this is 
consistent with what Sanghoun and Emily are saying. Under this 
interpretation, an MRS with no specification of ICONS should indeed 
generate all the variants sentences we've been discussing.  And so on.

However, with respect to interpretation 2, the ICONS emerging from the 
parse of such a sentence is terminated.  Once we've finished parsing, 
we're guaranteeing no more ICONS elements will arise from morphosyntax, 
whatever someone does in discourse.  Under this interpretation, if I say 
I want to generate a sentence with an empty ICONS, I mean I want to 
generate a sentence with no ICONS contribution from morphosyntax.  This 
is also a legitimate use of the realiser, considered as a stand-alone 
module.

Since ICONS is something which I have always thought of as on the 
boundary of morphosyntax and discourse, I want to be able to enrich 
ICONS emerging from parsing with discourse processing, so interpretation 
1 makes complete sense.  However, I believe it is also perfectly 
legitimate to be able to divide the world into what the grammar can be 
expected to do and what it can't, and that is consistent with 
interpretation 2.

As a hypothetical move, consider an additional classification of ICONS 
elements according to whether or not they arise from morphosyntax.  Then 
we can see that a single ICONS value could encompass both 
interpretations.  i.e., what would arise from a parse would be a 
terminated list of morphosyntactic-ICONS elements but the ICONS as a 
whole could be non-terminated.

I think there may be reasons to be able to distinguish ICONS elements 
according to whether they are intended as grammar-derived or not, though 
I do see this might look messy.  But anyway, I want to first check that 
everyone agrees with this analysis of the situation before trying to 
work out what we might do about it in terms of notation.

Incidentally - re Dan's message - my overly brief comment about 
Sanghoun's use of DMRS earlier was intended to point out that if DMRS 
had the necessary links for demonstrating ICONS, then in principle this 
was something we know how to extract.  But right now, I'm not clear 
whether or not we do need all the underspecified elements, and that's 
something I would like Dan to comment on before we go further.

All best,

Ann

On 06/02/2016 17:59, Sanghoun Song wrote:
> My apologies for my really late reply!
>
> I am not sure whether I fully understand your discussion, but I would 
> like to leave several my ideas on using ICONS for generation.
>
> First, in my analysis (final version), only expressions that 
> contribute to information structure introduce an ICONS element into 
> the list. For example, the following unmarked sentence (a below) has 
> no ICONS element (i.e. empty ICONS).
>
> a. plain: Kim chases the dog.
> b. passivization: The dog is chased by Kim.
> c. fronting: The dog Kim chases.
> d. clefting: It is the dog that Kim chases.
>
> Using  the type hierarchy for information structure in my thesis, I 
> can say the followings
>
> (i) The subject Kim and the object the dog in a plain active sentence 
> (a) are in situ. They may or may not be focused depending on which 
> constituent bears a specific accent, but in the sentence-based 
> processing their information structure values had better remain 
> underspecified for flexible representation.
>
> (ii) The promoted argument the dog in the passive sentence (b) is 
> evaluated as conveying focus-or-topic, while the demoted argument Kim 
> is associated with non-topic.
>
> (iii) In (c), the fronted object the dog is assumed to be assigned 
> focus-or-topic in that the sentence conveys a meaning of either "As 
> for the dog, Kim chases it". or (d), while the subject in situ is 
> evaluated as containing neither topic nor focus (i.e. background). 
> (Background may not be implemented in the ERG, I think.)
>
> (iv) The focused NP in (d) carries focus, and the subject in the cleft 
> clause Kim is also associated with bg.
>
> Thus, we can create a focus specification hierarchy amongst (a-d) as 
> [clefting > fronting > passivization > plain].
>
> What I want to say is that a set of sentences which share some 
> properties may have subtle shades of meaning depending on how focus is 
> assigned to the sentences. Paraphrasing is made only in the direction 
> from the right to the left of [clefting > fronting > passivization > 
> plain], because paraphrasing in the opposite direction necessarily 
> causes loss of information. For example, a plain sentence such as (a) 
> can be paraphrased into a cleft construction such as (d), but not vice 
> versa.
>
> In a nutshell, a more specific sentence might not better to be 
> paraphrased into a less specific sentence in terms of information 
> structure.
>
> Second, I provided many dependency graphs in my thesis. The main 
> reason was that nobody outside of the DELPH-IN can fully understands 
> the complex co-indexation in ICONS/MRS. At that time, I didn't work on 
> DMRS with respect to ICONS. If there is a way to represent ICONS in 
> DMRS (direct from TFS or via MRS), I am interested in the formalism.
>
>
> Sanghoun
>
>
> On Sat, Feb 6, 2016 at 1:26 AM, Ann Copestake <aac10 at cam.ac.uk 
> <mailto:aac10 at cam.ac.uk>> wrote:
>
>     Briefly (more this evening maybe) - I don't see a particular
>     problem with filling in the ICONS since what you describe are
>     relationships that are overt in the *MRS anyway, aren't they?  I
>     thought, in fact, that these are pretty clear from the DMRS graph
>     - which is why Sanghoun uses it to describe what's going on.
>
>     I believe we can build the DMRS graph direct from the TFS,
>     incidentally - don't need to go via MRS ...
>
>     Cheers,
>
>     Ann
>
>
>     On 05/02/2016 23:40, Dan Flickinger wrote:
>>
>>     As I understand the Soon and Bender account, an MRS for a
>>     sentence should include in the ICONS list at least one element
>>     for each individual (eventuality or instance) that is introduced.
>>     In the ERG this would mean that the value of each ARG0 should
>>     appear in at least one ICONS entry, where most of these would be
>>     of the maximally underspecified type `info-str', but possibly
>>     specialized because of syntactic structure or stress/accent or
>>     maybe even discourse structure.
>>
>>
>>     I see the virtue of having these overt ICONS elements even when
>>     of type `info-str', to enable the fine-grained control that
>>     Stephan notes that we want for generation, and also to minimize
>>     the differences between the ERG and grammars being built from the
>>     Matrix which embody Sanghoun's careful work.
>>
>>
>>     If the grammarian is to get away with not explicitly introducing
>>     each of these ICONS elements in the lexical entries, as Sanghoun
>>     does in the Matrix, then it would have to be possible to predict
>>     and perhaps mechanically add the missing ones after composition
>>     was completed.  I used to hope that this would be possible, but
>>     now I'm doubtful, leading me to think that there is no good
>>     alternative to the complication (maybe I should more kindly use
>>     the term `enrichment') of the grammar with the overt introduction
>>     of these guys everywhere.  Here's my reasoning:
>>
>>
>>     I assume that what we'll want in an MRS for an ordinary sentence
>>     is an ICONS list that has exactly one entry for each pair of an
>>     individual `i' and the eventuality which is the ARG0 of each
>>     predication in which `i' appears as an argument.  Thus for `the
>>     cat persuaded the dog to bark' the ICONS list should have four
>>     elements: one for cat/persuade, one for dog/persuade, one for
>>     bark/persuade, and one for dog/bark.  Now if I wanted to have the
>>     grammar continue to only insert ICONS elements during composition
>>     for the non-vanilla info-str phenomena, and fill in the rest
>>     afterward, I would have to know not only the arity of each
>>     eventuality-predication, but which of its arguments was realized
>>     in the sentence, and even worse, which of the realized syntactic
>>     arguments corresponded to semantic arguments (so for example not
>>     the direct object of `believe'). Maybe I give up too soon here,
>>     but this does not seem doable just operating on the MRS resulting
>>     from composition, even with access to the SEM-I.
>>
>>
>>     So if the necessary ICONS elements have to be introduced overtly
>>     by the lexicon/grammar during composition, then I would still
>>     like to explore a middle ground that does not result in the full
>>     set of ICONS elements Soon and Bender propose for a sentence. 
>>     That is, I wondered whether we could make do with adding to the
>>     ERG the necessary introduction of just those ICONS elements that
>>     would enable us to draw the distinctions between `unmarked',
>>     'topic', and 'focus' that we were used to exploiting in the days
>>     of messages.   But since pretty much any preposition's or
>>     adjective's or verb's complement can be extracted, and any verb's
>>     subject can be extracted, and most verbs' direct and indirect
>>     objects can be passivized, I think we'll still end up with an
>>     ICONS entry for each eventuality/argument pair for every
>>     predication-introducing verb, adjective, and preposition in a
>>     sentence, and maybe also for some nouns as in "who is that
>>     picture of?". This still lets us exclude ICONS elements involving
>>     adverbs and maybe also the arguments of conjunctions,
>>     subordinators, modals.  If we went this route, I think it would
>>     be possible to make modest additions to certain of the
>>     constructions, and not have to meddle with lexical types, to get
>>     these ICONS elements into the MRS during composition.
>>
>>
>>     Such a partial approach does not have the purity of Soon and
>>     Bender's account, but might be more practical, at least as a
>>     first step, for the ERG.  It would at least enable what I think
>>     is a more consistent interpretation of the ICONS elements for
>>     generation, and should give us the fine-grained control I agree
>>     that we want.  Thus to get the generator to produce all variants
>>     from an MRS produced by parsing a simple declarative, one would
>>     have to remove the info-str ICONS element whose presence excludes
>>     the specialization to focus or topic because of our friend Skolem.
>>
>>
>>     Counsel?
>>
>>
>>      Dan
>>
>>
>>     ------------------------------------------------------------------------
>>     *From:* developers-bounces at emmtee.net
>>     <mailto:developers-bounces at emmtee.net>
>>     <developers-bounces at emmtee.net>
>>     <mailto:developers-bounces at emmtee.net> on behalf of Ann Copestake
>>     <aac10 at cam.ac.uk> <mailto:aac10 at cam.ac.uk>
>>     *Sent:* Friday, February 5, 2016 1:43 PM
>>     *To:* Emily M. Bender; Stephan Oepen
>>     *Cc:* developers; Ann Copestake
>>     *Subject:* Re: [developers] ICONS and generation
>>     Thanks!
>>
>>     On 05/02/2016 21:30, Emily M. Bender wrote:
>>>     Not sure if this answers the question, but a couple of comments:
>>>
>>>     (a) I do think that written English is largely underspecified
>>>     for information structure.
>>>     It's part of what makes good writing good that the information
>>>     structure is made apparent
>>>     somehow.
>>>
>>
>>     OK.  should I understand you as saying that composition (as in,
>>     what we do in the grammars) leaves it mostly underspecified, but
>>     that discourse level factors make it apparent?  or that it really
>>     is underspecified?
>>
>>>     (b) I think the "I want only the unmarked form back" case might
>>>     be handled by either
>>>     a setting which says "no ICONS beyond what as in the input"
>>>     (i.e. your ICONS { }) or
>>>     a pre-processing/generation fix-up rule that takes ICONS { ... }
>>>     and outputs something
>>>     that would be incompatible with anything but the unmarked form. 
>>>     Or maybe the
>>>     subsumption check goes the wrong way for this one?
>>>
>>     Yes, I think the ICONS {} might be a possible way of thinking
>>     about it.  I should make it clear - I don't think there's a
>>     problem with constructing an implementation that produces the
>>     `right' behaviour but I would much prefer that the behaviour is
>>     specifiable cleanly in the formalism rather than as another
>>     parameter to the generator or whatever.
>>
>>>     I hope Sanghoun has something to add here!
>>>
>>>     Emily
>>>
>>>     On Fri, Feb 5, 2016 at 1:01 PM, Stephan Oepen <oe at ifi.uio.no
>>>     <mailto:oe at ifi.uio.no>> wrote:
>>>
>>>         colleagues,
>>>
>>>         my ideal would be a set-up where the provider of generator
>>>         inputs has three options: (a) request topicalization (or
>>>         similar), (b) disallow it, or (c) underspecify and get both
>>>         variants.
>>>
>>>         we used to have that level of control (and flexibility) in
>>>         the LOGON days where there were still messages: in the
>>>         message EPs, there were two optional ‘pseudo’ roles (TPC and
>>>         PSV) to control topicalization or passivization of a
>>>         specific instance variable.  effectively, when
>>>         present, these established a binary relation between the
>>>         clause and one of its nominal constituents.  if i recall
>>>         correctly, blocking topicalization was accomplished by
>>>         putting an otherwise unbound ‘anti’-variable into the TPC or
>>>         PSV roles.
>>>
>>>         could one imagine something similar in the ICONS realm, and
>>>         if so, which form would it have to take?
>>>
>>>         best wishes, oe
>>>
>>>
>>>         On Friday, February 5, 2016, Woodley Packard
>>>         <sweaglesw at sweaglesw.org <mailto:sweaglesw at sweaglesw.org>>
>>>         wrote:
>>>
>>>             I can confirm that under ACE, behavior is what you
>>>             indicate, i.e. generating from parsing the topicalized
>>>             feline-canine-playtime I get just the topicalized
>>>             variant out, but when generating from parsing the
>>>             ordinary word order I get all 5 variants out.
>>>
>>>             I believe this was designed to imitate the long-standing
>>>             condition that the MRS of generation results must be
>>>             subsumed by the input MRS. The observed behavior seems
>>>             to me to be the correct interpretation of the
>>>             subsumption relation with ICONS involved.  Note that an
>>>             MRS with an extra intersective modifier would also be
>>>             subsumed, for example, but such MRS are never actually
>>>             generated since those modifier lexical entries never
>>>             make it into the chart.
>>>
>>>             It’s certainly reasonable to ask whether (this notion
>>>             of) subsumption is really the right test.  I’ve met lots
>>>             of folks who prefer to turn that subsumption test off
>>>             entirely.  I guess it’s also possible that the
>>>             subsumption test is right for the RELS portion of the
>>>             MRS but not for the ICONS, though that seems a bit odd
>>>             to consider.  However, given that we don’t have many
>>>             ideas about truth-conditional implications of ICONS,
>>>             maybe not so odd.
>>>
>>>             I don’t really have much to offer in terms of opinions
>>>             about what the right behavior should be.  I (believe I)
>>>             just implemented what others asked for a couple years
>>>             ago :-)
>>>
>>>             -Woodley
>>>
>>>             > On Feb 5, 2016, at 8:03 AM, Ann Copestake
>>>             <aac10 at cl.cam.ac.uk> wrote:
>>>             >
>>>             > I'm part way through getting ICONS support working in
>>>             Lisp, testing on the version of the ERG available as
>>>             trunk. I have a question about generation. If I
>>>             implemented the behaviour described in
>>>             http://moin.delph-in.net/IconsSpecs there doesn't seem
>>>             to be a way of specifying that I want a `normal'
>>>             ordering for English.
>>>             >
>>>             > e.g., if I take the MRS resulting from
>>>             >
>>>             > that dog, the cat chased.
>>>             >
>>>             > without ICONS check, there are 5 realizations,
>>>             including the `null ICONS' case `The cat chased that
>>>             dog.'  With an exact ICONS check, I can select
>>>             realizations with the same ICONS (modulo order of ICONS
>>>             elements, of course, in the case where there's more than
>>>             one element).  But with the
>>>             <http://moin.delph-in.net/IconsSpecs>http://moin.delph-in.net/IconsSpecs
>>>             behaviour, there's no way of specifying I want a
>>>             `normal' order - if I don't give an ICONS, I will always
>>>             get the 5 realisations. In fact, as I understand it, I
>>>             can always end up with more icons in the realisation
>>>             than in the input, as long as I can match the ones in
>>>             the input.
>>>             >
>>>             > So:
>>>             > - is the IConsSpec behaviour what is desired for the
>>>             ERG (e.g., because one can rely on the realisation
>>>             ranking to prefer the most `normal' order)?
>>>             > - or does the ERG behave differently from Emily and
>>>             Sanghoun's grammars, such that different generator
>>>             behaviour is desirable? and if so, could we change
>>>             things so we don't need different behaviours
>>>             >
>>>             > Ann
>>>             >
>>>             >
>>>             >
>>>
>>>
>>>
>>>
>>>
>>>     -- 
>>>     Emily M. Bender
>>>     Professor, Department of Linguistics
>>>     Check out CLMS on facebook!
>>>     <http://www.facebook.com/uwclma>http://www.facebook.com/uwclma
>>
>
>
>
>
> -- 
> =================================
> Sanghoun Song
> Assistant Professor
> Dept. of English Language and Literature
> Incheon National University
> http://corpus.mireene.com
> phone: +82-32-835-8129 (office)
> =================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.delph-in.net/archives/developers/attachments/20160207/26109a4a/attachment-0001.html>