Those who are interested in how well the predictions of AI / LLMs compare to actual corpus data may be interested in:
*https://www.english-corpora.org/ai-llms/ https://www.english-corpora.org/ai-llms/*
The data includes seven in-depth comparisons of LLMs and corpora (about 90 pages of discussion and examples) for the following topics: *word frequency https://www.english-corpora.org/ai-llms/words.pdf, phrase frequency https://www.english-corpora.org/ai-llms/phrases.pdf, collocates https://www.english-corpora.org/ai-llms/collocates.pdf, comparing words https://www.english-corpora.org/ai-llms/compare-words.pdf* (via collocates), *genre-based variation https://www.english-corpora.org/ai-llms/genres.pdf, historical variation https://www.english-corpora.org/ai-llms/historical.pdf*, and *dialectal variation* https://www.english-corpora.org/ai-llms/dialects.pdf.
Best,
Mark Davies https://www.mark-davies.org/ English-Corpora.org