Multilingual Encoder-Decoder models - Corpora

29 Aug 2024


      Hi all,
In the age of decoder-only LLMs, I'll like to ask if there's any
competitive encoder-decoder architectures that are known to scale well for
multilingual seq2seq tasks?
- https://huggingface.co/docs/transformers/en/model_doc/mt5
- https://huggingface.co/facebook/m2m100_418M
- https://huggingface.co/google-bert/bert-base-multilingual-cased +
https://www.kaggle.com/code/alvations/neural-plasticity-bert2bert-on-wmt14
- https://huggingface.co/Helsinki-NLP/opus-mt-en-mul +
https://huggingface.co/Helsinki-NLP/opus-mt-mul-en
- https://huggingface.co/docs/transformers/en/model_doc/umt5
There's these that reported state-of-the-art NLI scores but they were not
known to be multilingual
- https://huggingface.co/google/ul2
- https://huggingface.co/docs/transformers/en/model_doc/flan-t5
- https://huggingface.co/docs/transformers/en/model_doc/byt5
There's some ideas on doing encoder with mamba
https://github.com/state-spaces/mamba/issues/78 but it looks like an open
question.
Other than the above, are there any competitive encoder-decoder
architectures that are known to scale well for multilingual seq2seq tasks?
Thank you in advance for the pointers!
Regards,
Liling