BrPoliCorpus - Brazilian Portuguese Political Corpus V 1.0 - Corpora

29 Jul 2024


      https://github.com/rll307/BrPoliCorpus
Dear Colleagues,
I’d like to introduce BrPoliCorpus https://github.com/rll307/BrPoliCorpus,
my current project. It comprises 441.644.155 million words and is an open
data set of Brazilian political documents. It is intended to be useful for
linguists and political scientists.
At this project stage, distribution is via the R package or freely
downloadable spreadsheets. The spreadsheets contain the texts and all the
metadata, which is useful for context. The datasets include:
- Ordinary congress floor sessions
- Ordinary congress committees
- CPI (Parliamentary Inquiry Commission)
- Presidential inaugural speeches
- Government programmes for the offices of Governor and President of the
Republic
Here is a breakdown of its current status:
Doc Types Tokens
CPI 2615392 4563382
Parliamentary Committees 7985000 108251624
Floor Parliamentary speeches 3423405 322893136
Gov. Programmes 688342 5849807
Inaugural Speeches 31959 86206
Total 14.744.098 441.644.155
The corpus is available here https://github.com/rll307/BrPoliCorpus.
Please, contact me if any further information is needed.
All the best,
Rodrigo
*---*
*Dr Rodrigo Esteves de Lima Lopes*
*Universidade Estadual de Campinas (UNICAMP) *
Livre Docente em Linguagem e Tecnologia
Professor Associado *||* Associate Professor
Depto. de Linguística Aplicada *|| *Dept. of Applied Linguistics
CV (Português) http://lattes.cnpq.br/1654734521861377 *||* ORCID
https://orcid.org/0000-0003-3681-1553 *||* Google Scholar
https://scholar.google.com.br/citations?user=q1V4jksAAAAJ&hl=pt-BR
rll307@unicamp.br