Dear all, We are happy to announce the release of our LLMeBench framework. The framework is designed to accelerate and simplify evaluation and benchmarking of large language models. It is modular, language-agnostic and simple to extend. It currently supports interactions with LLMs through APIs. The framework also features zero- and few-shot learning settings. It will be open-sourced to encourage improvements and extensions from the community.
The framework currently hosts recipes for a diverse set of Arabic NLP tasks using OpenAI’s GPT and BLOOMZ models. Specifically, it currently serves 31 unique NLP tasks (from word-level to sentence pairs tasks) with specific focus on Arabic tasks, using 53 publicly available datasets. It also comes equipped with 200 prompts for these setups. It has recipes for 12 languages including Arabic, Bangla, Bulgarian, Dutch, English, French, German, Italian, Polish, Russian, Spanish, Turkish, and more to come.
We hope this will encourage experimentation with LLMs for multilingual studies content. We extend an invitation to the research community to participate and improve the framework. We are excited to hear all your feedback and suggestions and we thank you for your contribution.
For further details please take a look at the repository and the paper below.
Code: https://github.com/qcri/LLMeBench Paper: https://arxiv.org/pdf/2308.04945.pdf
Regards Firoj
................ Firoj Alam, PhD http://sites.google.com/site/firojalam/