Creating database for text classfication

Internet seams the best choice because we are interested in choosing different types of data. The restriction for my test is that I want that the positive data contains only onelines(short jokes). The negative data in order to have a good classification has to have the same structure(short sentences).
1 answer

RSS feeds

I have searched for titles of articles from BBC, Google news, Yahoo News (trustable source) for negative data. for the positive data I have searched on Google for one liner, jokes, one liner database and for each site I have created a crawler because most of them there are unstructured sites. I have verified the sources not to contains the same sentences

and after that I have chosen randomly 100 negative and 100 positive data to see how good is my created data base.