Reuters Corpus - "Reuters is making available large quantities of Reuters News stories for use in research and development of natural-language-processing, information-retrieval or machine learning systems."
"The Corpus consists of 806,791 XML files in NewsML format. They are distributed in the form of 365 zip files, one per day, over 2 CDs. Approximately 3.7Gb is required for the storage of the uncompressed XML files. "
Ian Davis has kindly put the region codes and news topic codes into RDF Schema (maybe 50-100 terms in each).
"The Corpus consists of 806,791 XML files in NewsML format. They are distributed in the form of 365 zip files, one per day, over 2 CDs. Approximately 3.7Gb is required for the storage of the uncompressed XML files. "
Ian Davis has kindly put the region codes and news topic codes into RDF Schema (maybe 50-100 terms in each).

<< Home