https://github.com/rll307/BrPoliCorpus
Dear Colleagues,
I’d like to introduce BrPoliCorpus https://github.com/rll307/BrPoliCorpus, my current project. It comprises 441.644.155 million words and is an open data set of Brazilian political documents. It is intended to be useful for linguists and political scientists. At this project stage, distribution is via the R package or freely downloadable spreadsheets. The spreadsheets contain the texts and all the metadata, which is useful for context. The datasets include: - Ordinary congress floor sessions - Ordinary congress committees - CPI (Parliamentary Inquiry Commission) - Presidential inaugural speeches - Government programmes for the offices of Governor and President of the Republic
Here is a breakdown of its current status: Doc Types Tokens CPI 2615392 4563382 Parliamentary Committees 7985000 108251624 Floor Parliamentary speeches 3423405 322893136 Gov. Programmes 688342 5849807 Inaugural Speeches 31959 86206 Total 14.744.098 441.644.155
The corpus is available here https://github.com/rll307/BrPoliCorpus.
Please, contact me if any further information is needed.
All the best,
Rodrigo
*---* *Dr Rodrigo Esteves de Lima Lopes* *Universidade Estadual de Campinas (UNICAMP) * Livre Docente em Linguagem e Tecnologia Professor Associado *||* Associate Professor Depto. de Linguística Aplicada *|| *Dept. of Applied Linguistics CV (Português) http://lattes.cnpq.br/1654734521861377 *||* ORCID https://orcid.org/0000-0003-3681-1553 *||* Google Scholar https://scholar.google.com.br/citations?user=q1V4jksAAAAJ&hl=pt-BR rll307@unicamp.br
Dear Sender,
I am currently out of the office and will not be checking emails regularly. I will return on September 9, and will respond to your message as soon as possible after that date.
Best regards, Charlott Jakob
On 29 Jul 2024, at 18:47, Lima-Lopes, Rodrigo Esteves de via Corpora corpora@list.elra.info wrote:
Dear Colleagues,
I’d like to introduce BrPoliCorpus, my current project. It comprises 441.644.155 million words and is an open data set of Brazilian political documents. It is intended to be useful for linguists and political scientists. At this project stage, distribution is via the R package or freely downloadable spreadsheets. The spreadsheets contain the texts and all the metadata, which is useful for context. The datasets include: - Ordinary congress floor sessions - Ordinary congress committees - CPI (Parliamentary Inquiry Commission) - Presidential inaugural speeches - Government programmes for the offices of Governor and President of the Republic
Here is a breakdown of its current status:
Doc Types Tokens CPI 2615392 4563382 Parliamentary Committees 7985000 108251624 Floor Parliamentary speeches 3423405 322893136 Gov. Programmes 688342 5849807 Inaugural Speeches 31959 86206 Total 14.744.098 441.644.155 The corpus is available here.
Please, contact me if any further information is needed.
All the best,
Rodrigo
--- Dr Rodrigo Esteves de Lima Lopes Universidade Estadual de Campinas (UNICAMP) Livre Docente em Linguagem e Tecnologia Professor Associado || Associate Professor
Depto. de Linguística Aplicada || Dept. of Applied Linguistics CV (Português) || ORCID || Google Scholar rll307@unicamp.br _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info