Seren Corpus

About the Corpus

In linguistics, a corpus is a large and structured set of texts used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. wikipedia.org

The Seren Corpus is a growing collection of articles taken from wikinews English language pages with an emphasis on the latest news items to reflect current use of language online in the English language.

In partnership with the Forensic Linguistics Institute