Albertina 1.5B + Gervásio 7B: collection of LLMs for Portuguese expanded - Corpora

5 Mar 2024


      Good day,
This is to announce the expansion of the collection of open Large Language Models (LLMs)
for the Portuguese language with the following models:
- the family of *encoders* is enlarged with the new *_Albertina 1.5B_
*https://huggingface.co/PORTULAN/albertina-1b5-portuguese-ptpt-encoder
- the family of *decoders* has now _*Gervásio 7B*_
https://huggingface.co/PORTULAN/gervasio-7b-portuguese-ptpt-decoder
This ecosystem encompasses now over ten LLMs that were specifically developed for
the Portuguese language, covering both its European variant, spoken in Portugal (PTPT),
and its American variant, spoken in Brazil (PTBR), and that can be run
on consumer-grade hardware.
The Albertina family includes encoders with *100M*, *900M* and *1.5B* parameters.
The Gervásio family, in turn, integrates a decoder with *7B* parameters.
All these models are *fully open*, being open source and openly distributed,
for free and with no registration required, under an open license, including
for research and commercial purposes.
They are also *fully documented*, thus including reports also on their evaluation scores,
which indicate they are top performing solutions for fully open models of their class
for Portuguese.
These models, their companion datasets and their documentation, for both PTPT and PTBR,
can all be found at https://huggingface.co/PORTULAN
Regards,
António Branco
University of Lisbon
NLX Natural Language and Speech Group
Faculdade de Ciências, Departamento de Informática