A Robust Pre-trained Model in French for Biomedical and Clinical domains - Corpora

5 Apr 2023


      Dear all,
We are proud to announce our first biomedical language model for French called DrBERT. It's now available on HuggingFace and Arxiv ( [ https://arxiv.org/abs/2304.00958 | https://arxiv.org/abs/2304.00958 ] ).
You can now use the model on your own documents and get state-of-the-art performances in only 3 lines of code.
Check out the: 
- Project website: [ https://drbert.univ-avignon.fr/ | https://drbert.univ-avignon.fr/ ] 
- Hugging Face models: [ https://huggingface.co/Dr-BERT | https://huggingface.co/Dr-BERT ]
Our model was trained on 128 GPU from Jean-Zay Supercomputer and assessed on 11 distinct practical biomedical tasks for French language, which came from public and private data. These tasks include : Named Entity Recognition (NER), Part-Of-Speech tagging (POS), binary/multi-class/multi-label classification, and multiple-choice question answering. The outcomes revealed that DrBERT enhanced the performance of most tasks compared to prior techniques, indicating that from-scratch pre-trained strategy is still the most effective for BERT language models on French Biomedical.
Tutorials about biomedical natural language processing are coming soon, stay tuned !!
With Yanis Labrak (LIA / Zenidoc), Adrien Bazoge (LS2N), Richard Dufour (LS2N), Mickael Rouvier (LIA), Emmanuel Morin (LS2N), Béatrice Daille (LS2N) and Pierre-Antoine Gourraud (Nantes University / CHU Nantes).
Best regards.