[developers] Parsing linux forum data
admackin at gmail.com
Tue Nov 17 16:30:28 CET 2009
Following on from my previous post, my more general task is the
parsing of linux forum data, as some of you may be aware.
Ann suggested I might probe the Delph-in hive mind for any suggestions
on dealing with this kind of data - eg recommendations for POS-
tagging, sentence splitting etc.
Posts are, as you might imagine, wildly varying in quality and contain
combinations of parseable data (which contains the entiy such as a URL
which we would like to treat as atomic) and console output. For
example, here is a randomly selected post that shows some of these
This is my second run at this problem (history is at http://www.linuxquestions.org/questi...hreadid=3109)
After recompiling kernel 2.4 and doing all the 'make' options, I ran
I have one problem and three questions:
When I run lilo, get the following:
Warning: device 0x0306 exceeds 1024 cylinder limit.
Fatal: Sector 51220658 too large for linear mode
(try 'lba32' instead) .
1.) How do I find out what device 0x0306 is?
2.) How do I find out what is on sector 51220658 and why the system
says it is too big?
3.) I am booting from a floppy, which doesn't seem to care
about device 0x0306 or sector 51220658. Why isn't this a problem using
Thanks to the guys that helped me the first time around and many
thanks for taking another look. Hope I don't seem to be stupid or
thankless, but, as yet, I don't understand.
If anyone has any advice they could offer on the basis of experiences
they've had with similar data in the past, it would be gratefully
More information about the developers