Ulysses Tesemõ, a large corpus specifically built for the Brazilian legal domain. The corpus consists of over 3.5 million files, totaling 30.7 GiB of raw text, collected from 159 sources encompassing judicial, legislative, academic, news, and other related data.
https://doi.org/10.1007/s10579-024-09762-8
Best Regards,
Ellen Souza