Dear Colleagues,

I’d like to introduce BrPoliCorpus, my current project. It comprises 441.644.155 million words and is an open data set of Brazilian political documents. It is intended to be useful for linguists and political scientists.
At this project stage, distribution is via the R package or freely downloadable spreadsheets. The spreadsheets contain the texts and all the metadata, which is useful for context. The datasets include:
- Ordinary congress floor sessions
- Ordinary congress committees
- CPI (Parliamentary Inquiry Commission)
- Presidential inaugural speeches
- Government programmes for the offices of Governor and President of the Republic

Here is a breakdown of its current status:

Doc	Types	Tokens
CPI	2615392	4563382
Parliamentary Committees	7985000	108251624
Floor Parliamentary speeches	3423405	322893136
Gov. Programmes	688342	5849807
Inaugural Speeches	31959	86206
Total	14.744.098	441.644.155

The corpus is available here.

Please, contact me if any further information is needed.

All the best,

Rodrigo

---

Dr Rodrigo Esteves de Lima Lopes

Universidade Estadual de Campinas (UNICAMP)

Livre Docente em Linguagem e Tecnologia

Professor Associado || Associate Professor

Depto. de Linguística Aplicada || Dept. of Applied Linguistics

CV (Português) || ORCID || Google Scholar

rll307@unicamp.br