Martin,
Sorry that I'm responding to this so late.
The 475 million word Corpus of Historical American English https://www.english-corpora.org/coha/ (COHA) has about 220 million words of fiction from the 1820s-2010s (see information on the number of texts and words by decade below). Nearly all of the texts from the 1820s-1980s are novels, whereas there are more short stories from the 1990s-2010s. Full information (for all texts) can be found here https://www.english-corpora.org/coha/files/sources-coha-2020.zip.
Full-text (downloadable) data can be found at CorpusData.org https://www.corpusdata.org/ as well as the Univ Stuttgart https://www.ims.uni-stuttgart.de/en/research/resources/corpora/ccoha/.
Best,
Mark Davies English-Corpora.org
------------------------------------------
decade #texts #words 1820s 90 3,778,554 1830s 179 7,492,464 1840s 243 8,615,569 1850s 151 9,175,764 1860s 249 9,279,356 1870s 217 10,454,445 1880s 264 11,204,077 1890s 257 11,261,720 1900s 266 12,096,794 1910s 296 12,266,683 1920s 281 12,668,146 1930s 533 11,959,731 1940s 420 12,030,426 1950s 470 12,014,411 1960s 403 11,652,761 1970s 335 11,652,921 1980s 334 11,664,130 1990s 1711 13,337,688 2000s 4224 14,624,639 2010s 3672 15,150,555 TOTAL 14595 222,380,834
From: Martin Wynne via Corpora corpora@list.elra.info
Sent: Sunday, October 27, 2024 8:11 AM To: corpora@list.elra.info Subject: [Corpora-List] Corpora of English novels
I have a student who is interested in tracing the development of the
English novel from its origins to the present day (or at least to the start of the twentieth century), and I'm trying to gather information about relevant corpora covering this text type and period.
We know about the European Literary Text Collection (ELTeC,
https://www.google.com/url?q=https://www.distant-reading.net/eltec/&sour...) which will be very useful for the later end of the timescale. We also know it is possible to assemble a corpus from Project Gutenberg, archive.org, Oxford Text Archive, etc.
, but would be interested in re-using any corpora that people might
already have made, which aim to be representative of particular periods within this genre.
The student has some flexibility with her research question, so while
the original idea of 'English novels' was probably 'novels in English from Great Britain and Ireland', other related areas such as US novels might be interesting as well.
Any tips and suggestions gratefully received. If we get a number of
interesting direct emails, I'll be happy to summarize the results to the list.
Best wishes, Martin
-- Senior Researcher in Corpus Linguistics Faculty of Linguistics, Philology and Phonetics, University of Oxford
National Co-ordinator, CLARIN-UK martin.wynne@ling-phil.ox.ac.uk https://www.google.com/url?q=https://orcid.org/0000-0002-4155-0530&sourc...
Corpora mailing list -- corpora@list.elra.info
https://www.google.com/url?q=https://list.elra.info/mailman3/postorius/lists...
To unsubscribe send an email to corpora-leave@list.elra.info
-- Senior Researcher in Corpus Linguistics Faculty of Linguistics, Philology and Phonetics, University of Oxford National Co-ordinator, CLARIN-UK martin.wynne@ling-phil.ox.ac.uk https://orcid.org/0000-0002-4155-0530
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info