Discorpus is a web-based corpus analysis workbench designed for linguists and discourse researchers. It combines classic corpus linguistics tools (KWIC, collocations, n-grams, lexical diversity) with NLP-powered analysis (lemmatization, named entity recognition, topic modelling, sentiment) — all from a single browser interface, with no programming required.
Developed as part of PhD research in Applied Linguistics (Universitat Politècnica de València), with a focus on Critical Discourse Analysis and the automated study of discriminatory language in digital platforms.
MIT license.
Github repository: https://github.com/joserolania-boop/DIscorpus
Feedback from the community is very welcome, as this is an ongoing project aimed at supporting linguistic research and corpus-based work.
The tool is freely available for academic and educational use.
Hi...
That seems an interesting tool. Since it works with SpaCy, can we adapt it for using other models/languages?
Rod.
*---* *Prof. Dr. Rodrigo Esteves de Lima Lopes* *Universidade Estadual de Campinas* Livre Docente em Linguagem e Tecnologia || Habilitated Professor in Language and Technology || CV (Português) http://lattes.cnpq.br/1654734521861377*||*ORCID https://orcid.org/0000-0003-3681-1553 *|| *Google Scholar https://scholar.google.com.br/citations?user=q1V4jksAAAAJ&hl=pt-BR
On Thu, 28 May 2026 at 15:19, Jose Rolanía Navarro via Corpora < corpora@list.elra.info> wrote:
Discorpus is a web-based corpus analysis workbench designed for linguists and discourse researchers. It combines classic corpus linguistics tools (KWIC, collocations, n-grams, lexical diversity) with NLP-powered analysis (lemmatization, named entity recognition, topic modelling, sentiment) — all from a single browser interface, with no programming required.
Developed as part of PhD research in Applied Linguistics (Universitat Politècnica de València), with a focus on Critical Discourse Analysis and the automated study of discriminatory language in digital platforms.
MIT license.
Github repository: https://github.com/joserolania-boop/DIscorpus
Feedback from the community is very welcome, as this is an ongoing project aimed at supporting linguistic research and corpus-based work.
The tool is freely available for academic and educational use. _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info