MikeSlater *Seren

Seren Corpus

Type in a search word in the box above

Searching 3,196,934 words

About the Corpus

In linguistics, a corpus is a large and structured set of texts used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

The Seren Corpus is a growing collection of articles taken from wikinews English language pages with an emphasis on the latest news items to reflect current use of language online in the English language.

Other Text Analysis Software

Since 1999, in collaboration with the late Dr John Olsson, we developed a range of text tools to help in the analysis of texts including tools for word occurrence, comparing phrases in two separate texts and an analysis of percentage of words in common across texts. These are free for you to use - please contact us for licensing versions with no limits on text sizes and for additional bespoke tools for textual analysis. Mike Slater thetext.co.uk

Word Occurrence Script

Occurence of words in a text based on word length

Comparing Phrases

Phrases of six words in length between two texts, then five, four, three, two

Percentage of Words in Common

Number of words and the number of instances of each word in common

Seren star

* "seren" (noun): - star [ welsh ]
"myseren" (noun): - your star, the space you control
*MySeren *MySeren Client Sign-in

*seren UK trading since 1st July 1995
Privacy Policy and Cookie Statement
© 1995 - 2024 Mike Slater / *seren