web technologies

Latest News


In partnership with the Forensic Linguistics Institute we developed a corpus which allows an analysis of the occurance of words. For the source of the words we built software which read tens of thousands of wiki news pages and then created a structured database which allows users to search for occurance and association of words in the English language.

The Corpus will maintain a minimum of 4 million words. You can license the software and corpus for use on your website or on a private web page.

Seren Corpus

Type in a search word in the box above

About the Corpus

In linguistics, a corpus is a large and structured set of texts used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. wikipedia.org

The Seren Corpus is a growing collection of articles taken from wikinews English language pages with an emphasis on the latest news items to reflect current use of language online in the English language.

In partnership with the Forensic Linguistics Institute

License now

60 day trial and then $25 per month

Please include your name and the best way to contact you