Discorpus – a new tool for corpus and discourse analysis (feedback welcome) - Corpora

28 May 2026


      Discorpus is a web-based corpus analysis workbench designed for linguists and discourse researchers. It combines classic corpus linguistics tools (KWIC, collocations, n-grams, lexical diversity) with NLP-powered analysis (lemmatization, named entity recognition, topic modelling, sentiment) — all from a single browser interface, with no programming required.
Developed as part of PhD research in Applied Linguistics (Universitat Politècnica de València), with a focus on Critical Discourse Analysis and the automated study of discriminatory language in digital platforms.
MIT license.
Github repository: https://github.com/joserolania-boop/DIscorpus
Feedback from the community is very welcome, as this is an ongoing project aimed at supporting linguistic research and corpus-based work.
The tool is freely available for academic and educational use.