mike slater web technology

Seren Corpus

Search:
Type in a search word in the box above

Searching 3,196,934 words


About the Corpus

In linguistics, a corpus is a large and structured set of texts used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

The Seren Corpus is a growing collection of articles taken from wikinews English language pages with an emphasis on the latest news items to reflect current use of language online in the English language.

Other Text Analysis Software

Since 1999, in collaboration with the late Dr John Olsson, we developed a range of text tools to help in the analysis of texts including tools for word occurrence, comparing phrases in two separate texts and an analysis of percentage of words in common across texts. These are free for you to use - please contact us for licensing versions with no limits on text sizes and for additional bespoke tools for textual analysis. Mike Slater thetext.co.uk

Word Occurrence Script

Occurence of words in a text based on word length


Comparing Phrases

Phrases of six words in length between two texts, then five, four, three, two


Percentage of Words in Common

Number of words and the number of instances of each word in common