Short Descriptions

 

The Brown Corpus (The Standard Corpus of Present-Day Edited American English)

  • first computer-readable general corpus of texts for linguistic research on modern English
  • compiled by W. Nelson Francis and Henry Kučera at Brown University
  • compiled 1963 - 64
  • period: 1961
  • over 1 million words (500 samples of 2000+ words each)
  • written American English

 

The LOB Corpus (The Lancaster-Oslo/Bergen Corpus), and tagged version

  • British English counterpart of the Brown Corpus
  • Compiled by Geoffrey Leech (project leader), Stig Johansson (project leader), Knut Hofland (head of computing), Roger Garside (head of computing, POS-tagged version)
  • compiled: original version 1970–1978, POS-tagged version 1981–1986
  • period: 1961
  • 1 million words (500 texts of circa 2000 words each)
  • 15 text categories, 9 informative and 6 imaginative
  • British English

 

The Freiburg-Brown Corpus of American English (FROWN), and tagged version

  • Freiburg update of the Brown corpus
  • intended to match the Brown as closely as possible in size and composition
  • language of the early 1990s
  • compiled by: Christian Mair, Geoffrey Leech, Nick Smith and their teams
  • compiled: 1992 – 1996
  • period 1992
  • 1 million words (500 texts of around 2000 words each, 15 text categories, 9 informative and 6 imaginative)
  • American English

 

The Freiburg-LOB Corpus of British English (FLOB), and tagged version

  • The Freiburg update of the LOB corpus (F-LOB)
  • intended to match LOB as closely as possible in size and composition
  • language of the early 1990s
  • compiled by: Christian Mair (original), Geoffrey Leech (POS-tagged version)
  • compiled: 1991 – 1996
  • period: 1991
  • 1 million words (500 texts of around 2000 words each, 15 text categories)
  • British English

 

The International Corpus of English - East African component

  • information on https://www.ice-corpora.uzh.ch/en.html

 

The London-Lund Corpus of Spoken English (LLC)

  • derives from two projects:
  • the Survey of English Usage (SEU) at University College London
  • Survey of Spoken English (SSE) at Lund University in 1975 (sister project of the London Survey)
  • Compiled by: Jan Svartvik (Survey of Spoken English (SSE), Lund University), Randolph Quirk (Survey of English Usage, University College London), Sidney Greenbaum (Survey of English Usage, University College London), Knut Hofland (Norwegian Computing Centre for the Humanities, Bergen)
  • Compiled 1959 - 1990
  • 500,000 words (100 spoken texts, 5000 words each
  • spoken British English
    • further references:
    • Svartvik, J. & R. Quirk, eds. 1980. A Corpus of English Conversation. Lund Studies in English, 56. Lund: Liber/Gleerups.
    • Svartvik, J., ed. 1990. The London Corpus of Spoken English: Description and Research. Lund Studies in English, 82. Lund: Lund University Press.