[developers] '@' in orthography, tsdb profiles

Thu Mar 21 00:35:05 CET 2013

Hi Joshua,

The '@' character is used as a field separator in TSDB profiles.  I believe the correct escape sequence is '\s' (I'm sure Stephan will correct me if I'm wrong).

There is a program named "mkprof" that is bundled with the "art" tool, which does what you are describing (i.e. takes plain lines of text and creates a tsdb profile out of them that can be used for parsing -- including the escaping):

http://sweaglesw.org/linguistics/libtsdb/art.html

Good luck,
Woodley

On Mar 20, 2013, at 4:23 PM, Joshua Crowgey wrote:

> Hello developers,
> 
> I'm working on upgrading a script which takes a plain text formatted testsuite and emits [incr tsdb()] item files.
> 
> One of the testsuites used an ascii transliteration for Old English. Specifically, schwa was represented as '@'.
> 
> I tried escaping the schwa with a backslash where it appeared in the orthography line.  Upon attempting to create an instance using this file, I got 'error processing tsdb(1) podium event'.
> 
> If I refresh all tsdb, I find the profile created but upon trying to browse items, I get 'no data in path/to/testsuite/ matching TSDB Query'.
> 
> If I replace the "\@"s with "E", all loads without a problem.
> 
> What's the right thing to do with '@'s inside a TSDB field?
> 
> --Joshua