OSACT 2026 Workshop, First Call for Papers 11 May 2026, Palma de Mallorca, Spain https://osact-lrec.github.io
Hosted by LREC 2026 https://lrec2026.info/
Workshop Description The Open-Source Arabic Corpora and Processing Tools (OSACT) workshop series provides a forum for researchers, practitioners, and students in computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) to share and discuss ongoing work on Arabic language resources and technologies. While Arabic remains comparatively resource-poor in relation to English, recent years have seen the emergence of large, freely available classical and Modern Standard Arabic (MSA) corpora, as well as dialectical corpora and processing tools. Now in its seventh edition, OSACT7 takes an important step forward by celebrating this milestone with seven shared tasks, each addressing timely challenges in Arabic NLP and reflecting broader themes relevant to NLP research in general. OSACT7 builds on its long-standing commitment to open-source contributions that advance accessibility, reproducibility, and fairness, and this year it places inclusivity at the heart of its mission. A key focus is to recognize and support minority dialects and underrepresented varieties of Arabic, ensuring that diverse linguistic voices and resources are not only acknowledged but actively valued within the community. The workshop will cover general topics in CL, NLP, and IR, with special emphasis on Large Language Models (LLMs) and Generative AI, including pre-trained Arabic language models, corpus design and evaluation, and annotated corpora for tasks such as named entity recognition, machine translation, sentiment analysis, and text classification. Additional areas of focus include crowdsourcing for data annotation, tools for language education, tokenization, normalisation, morphological analysis, part-of-speech tagging, dialect identification and translation, fake news detection, and web and social media analytics. Methodologies for resource creation and annotation, knowledge extraction, ontologies, terminology, knowledge representation, and integration with the Semantic Web (e.g. Linked Data, Knowledge Graphs) will also be explored.
Workshop Topics The workshop welcomes (including but not limited to) topics in the following areas: A) Language Resources: · Pre-trained Arabic language models. · Surveys and evaluations of existing Arabic corpora and their associated processing tools. · Development and release of new annotated corpora for NLP and IR tasks such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning. · Assessing the effectiveness of crowdsourcing platforms for Arabic data annotation. · Arabic text and speech processing toolkits.
B) Tools and Technologies: · Language education, including first (L1) and second (L2) language learning applications. · Pre-training & fine-tuning approaches for Arabic. · Tokenization, normalisation, segmentation, morphology, and POS tagging. · Sentiment analysis, dialect ID, & classification. · Web and social media analytics. · Arabic LRs for text, speech, sign, gesture, image, & multimodal data. · Best practices for LR interoperability. · Construction and annotation of LRs. · Knowledge extraction, acquisition, and representation. · Ontologies, terminology, and frameworks. · LRs and the Semantic Web (Linked Data, Knowledge Graphs). · Data contamination, synthetic data, and quality issues.
Important Dates · February 18, 2026: Paper submission deadline · March 12, 2026 Notification of acceptance · March 30, 2026: Camera-ready deadline · May 11, 2026: Workshop Date
Submission Instructions We invite submissions on topics of interest between 4 and 8 pages of content. The page limit of 8 pages does not include acknowledgements, references, potential Ethics Statements and discussion on Limitations in line with the policy of the main LREC conference. All submissions must follow the LREC stylesheet (https://lrec2026.info/authors-kit/).
All submissions are double-blind. Any submissions which are not-anonymised, over-length, poorly formatted or make excessive use of appendices to circumvent page limits are liable to desk-rejection.
At the time of submission, authors are offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map (https://lremap.elra.info/), which provides metadata for the resource. Organizing Committee · Hend Al-Khalifa, Professor, King Saud University, Riyadh, Saudi Arabia, hendk@ksu.edu.samailto:hendk@ksu.edu.sa · Mo El-Haj, Reader, VinUniversity, Vietnam, Lancaster University, UK, elhaj.m@vinuni.edu.vnmailto:elhaj.m@vinuni.edu.vn · Saad Ezzini, Assistant Professor, King Fahd University of Petroleum and Minerals (KFUPM), Saudi Arabia, saad.ezzini@kfupm.edu.samailto:saad.ezzini@kfupm.edu.sa
Dr. Saad Ezzini Assistant Professor, King Fahd University of Petroleum and Minerals (KFUPM) Saudi Arabia ezzini.mehttp://ezzini.me ezzini.github.io/CSALhttp://ezzini.github.io/CSAL
********************************************************************** DISCLAIMER: The information in this email and its attachments (if any) is intended for the addressee only and may contain confidential or privileged information. If you are not the intended recipient, please delete the email and its attachments from your system and notify the sender immediately. You should not retain, disclose, copy, or use this email or any of its contents for any purpose, nor disclose its contents to any other person. KFUPM is not responsible for changes made to this message after it was sent. Statements and opinions expressed in this e-mail are those of the sender, and do not necessarily reflect those of KFUPM. KFUPM is not liable for any effect or virus damage caused by this message. إن المعلومات الواردة في هذا البريد الإلكتروني ومرفقاته إن وجدت، قد تكون خاصة أو سرية؛ فإذا لم تكن المقصود بهذه الرسالة؛ فيُرجى منك حذفها ومرفقاتها من نظامك وإخطار المرسل بخطأ وصولها إليك فورا. كما لا يجوز نسخ أي جزء منها أو مرفقاتها ، أو الإفصاح عن محتوياتها لأي شخص أو استعمالها لأي غرض آخر. إن جامعة الملك فهد للبترول والمعادن لا تتحمل مسؤولية التغييرات التي يتم إجراؤها على هذه الرسالة بعد إرسالها. وإن البيانات أو الآراء المعبر عنها في هذا البريد، هي بيانات تخص مُرسلها، ولا تعكس بالضرورة رأي وبيانات الجامعة. كما لا تتحمل الجامعة مسؤولية أي تأثير ينتج عن هذه الرسالة أوعن أي فيروس قد تحمله.