In partnership with the Forensic Linguistics Institute we developed a corpus which allows an analysis of the occurance of words. For the source of the words we built software which read tens of thousands of wiki news pages and then created a structured database which allows users to search for occurance and association of words in the English language.
The Corpus will maintain a minimum of 4 million words. You can license the software and corpus for use on your website or on a private web page.
About the Corpus
In linguistics, a corpus is a large and structured set of texts used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. wikipedia.org
The Seren Corpus is a growing collection of articles taken from wikinews English language pages with an emphasis on the latest news items to reflect current use of language online in the English language.
In partnership with the Forensic Linguistics Institute