Mike Slater Web Technology

Latest News

Corpus

In partnership with the Forensic Linguistics Institute we developed a corpus which allows an analysis of the occurance of words. For the source of the words we built software which read tens of thousands of wiki news pages and then created a structured database which allows users to search for occurance and association of words in the English language.

The Corpus will maintain a minimum of 4 million words. You can license the software and corpus for use on your website or on a private web page.

Seren Corpus

About the Corpus

In linguistics, a corpus is a large and structured set of texts used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. wikipedia.org

The Seren Corpus is a growing collection of articles taken from wikinews English language pages with an emphasis on the latest news items to reflect current use of language online in the English language.

In partnership with the Forensic Linguistics Institute

mike slater web technology

Latest News

Corpus

Seren Corpus

About the Corpus

License now