[developers] Parsing linux forum data
Andrew MacKinlay
admackin at gmail.com
Tue Nov 17 16:30:28 CET 2009
Hi,
Following on from my previous post, my more general task is the
parsing of linux forum data, as some of you may be aware.
Ann suggested I might probe the Delph-in hive mind for any suggestions
on dealing with this kind of data - eg recommendations for POS-
tagging, sentence splitting etc.
Posts are, as you might imagine, wildly varying in quality and contain
combinations of parseable data (which contains the entiy such as a URL
which we would like to treat as atomic) and console output. For
example, here is a randomly selected post that shows some of these
features:
=============
This is my second run at this problem (history is at http://www.linuxquestions.org/questi...hreadid=3109)
.
After recompiling kernel 2.4 and doing all the 'make' options, I ran
'lilo'. -
I have one problem and three questions:
-
PROBLEM:
When I run lilo, get the following:
-
Warning: device 0x0306 exceeds 1024 cylinder limit.
Fatal: Sector 51220658 too large for linear mode
(try 'lba32' instead) .
-
Questions:
1.) How do I find out what device 0x0306 is?
2.) How do I find out what is on sector 51220658 and why the system
says it is too big?
3.) I am booting from a floppy, which doesn't seem to care
about device 0x0306 or sector 51220658. Why isn't this a problem using
a floppy?
Thanks to the guys that helped me the first time around and many
thanks for taking another look. Hope I don't seem to be stupid or
thankless, but, as yet, I don't understand.
==============
If anyone has any advice they could offer on the basis of experiences
they've had with similar data in the past, it would be gratefully
received.
Thanks,
Andy
More information about the developers
mailing list