Good day,
This is to announce the expansion of the collection of open Large Language Models (LLMs) for the Portuguese language with the following models:
- the family of *encoders* is enlarged with the new *_Albertina 1.5B_ *https://huggingface.co/PORTULAN/albertina-1b5-portuguese-ptpt-encoder
- the family of *decoders* has now _*Gervásio 7B*_ https://huggingface.co/PORTULAN/gervasio-7b-portuguese-ptpt-decoder
This ecosystem encompasses now over ten LLMs that were specifically developed for the Portuguese language, covering both its European variant, spoken in Portugal (PTPT), and its American variant, spoken in Brazil (PTBR), and that can be run on consumer-grade hardware.
The Albertina family includes encoders with *100M*, *900M* and *1.5B* parameters.
The Gervásio family, in turn, integrates a decoder with *7B* parameters.
All these models are *fully open*, being open source and openly distributed, for free and with no registration required, under an open license, including for research and commercial purposes.
They are also *fully documented*, thus including reports also on their evaluation scores, which indicate they are top performing solutions for fully open models of their class for Portuguese.
These models, their companion datasets and their documentation, for both PTPT and PTBR, can all be found at https://huggingface.co/PORTULAN
Regards,
António Branco
University of Lisbon NLX Natural Language and Speech Group Faculdade de Ciências, Departamento de Informática