At English-Corpora.org, we’ve added new AI/LLM-based tools directly into the corpus interface, while still keeping the corpus data at the center of analysis. An overview of the features is available at *https://www.english-corpora.org/ai-llms/ https://www.english-corpora.org/ai-llms/*.
Using nine different LLMs (like GPT, Gemini, and Claude), users can now do things such as:
-- semantically cluster and categorize collocates and phrases, such as the collocates of *identity *or the highly polysemous *bow*, or results for the phrase *soft *NOUN -- compare words via collocates, such as *quandary*/*predicament*, *provoke* /*incite*, or *completely*/*entirely* -- analyze differences in frequency or collocates across corpus sections, such as genres, historical periods, or dialects -- analyze KWIC lines, including semantic prosody, collocates, grammatical patterns, text types, and pragmatic functions -- generate words and phrases by topic, translation, or rephrasing -- and then see their frequency in different sections of the corpus
Users can also:
-- switch easily between LLMs to compare analyses across nine different models
*-- view results in 30 different languages*-- select one of 14 "user profiles" (e.g. linguist, translator, teacher, or learner), for customized results -- save, retrieve, and annotate AI results (categorizations, analyses, and generated words/phrases)
The goal is not to replace careful corpus analysis, but to complement it. The LLMs can suggest patterns, categories, and comparisons -- but the underlying corpus data is always visible, so users can verify, adapt, or challenge the AI output. We hope these tools will be useful for learners, teachers, researchers, translators, and anyone interested in richer ways of exploring corpus data.
============================================ Mark Davies english-corpora.org mark-davies.org ============================================