[developers] [lkb] Parsing rich data with LKB

Sun Mar 21 13:42:24 CET 2010

I wouldn't go down that road.

I would use the LKB with a PET backend for parsing.  PET now has
a "chartmapping" lattice-rewriting frontend which supports arbitrary
features to be injected into lexically instantiated feature structures
via an XML format.

You'd have to do a fair amount of programming to get that to work.
I believe Peter Adolphs is the person you want to talk to about that.

I'm forwarding this to the developers list, on the off chance that
the relevant people may not be reading the LKB list.

Richard

On Saturday 20 March 2010 19:28:24 Glenn Slayden wrote:
> I'm certainly no expert on the LKB, but only two approaches come to my
> mind. Both are hacky.
>
> 1. you could attach your metadata to lemmas as coded suffixes, and then use
> the morphological features of the LKB to map these into the appropriate
> feature structures. This approach would allow you to use a static grammar
> to parse unseen sentences, as you probably want.
>
> 2. you could programmatically generate TDL with the feature structures for
> a particular input, and then load that as part of a "grammar" which is a
> grammar that is specific for parsing that one input.
>
> Best,
>
> Glenn
>
> -----Original Message-----
> From: lkb-bounces at emmtee.net [mailto:lkb-bounces at emmtee.net] On Behalf Of
> Katya Alahverdzhieva
> Sent: Wednesday, March 17, 2010 6:48 AM
> To: lkb at delph-in.net
> Subject: [lkb] Parsing rich data with LKB
>
> Dear LKB people,
>
> How would you go about using LKB to parse data that is richer than just
> text, and also to define temporal constraints? How do I parse data which
> comes not as a stream of tokens, but as a list of feature structures?
>
> I have a corpus of transcriptions of spoken text, annotated with gesture
> and prosody information, including the time of their performance. I'm
> trying to write a grammar in LKB whose rules take into account the
> timestamps, the pitch accents and the gestures represented as sets of
> feature-values.
>
> For instance, I need to somehow capture in my grammar rules the notion
> of temporal overlap, i.e., whether a gesture is happening at the same
> time as a word/sequence of words. Also, I am trying to parse richer data
> where words are not just tokens, but whole feature structures
> (containing prosody, timestamps and gesture description).
>
> In practical terms, what would everyone's recommended approach be to
> parsing structured data like this and to comparing temporal
> performances? Are there plugins or such software for this? Does anyone
> know of any examples that I could look at to examine how it's done?
>
> Thanks in advance for any hints!
>
> Cheers
> Katya