**The 6th Workshop on Open-Source Arabic Corpora and Processing Tools (Hybrid) with shared tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation**
The workshop will be conducted in a *hybrid* format to ensure maximum participation, accommodating attendees both online and in-person. Submission deadline: extended to * March 1 *, 2024
*Workshop site* : https://osact-lrec.github.io/
*shared tasks:* Task 1: Arabic LLMs Hallucination (contact Hamdy Mubarak), Link: https://sites.google.com/view/arabic-llms-hallucination Task 2: Dialect to MSA Machine Translation (contact Kareem Darwish), Link: https://codalab.lisn.upsaclay.fr/competitions/17118
*Co-located with LREC-COLING 2024* https://lrec-coling-2024.org/ Turin, Italy, 20-25 May 2024
* Important Dates* Submission deadline: extended to * March 1 *, 2024 Notification of acceptance: March 25, 2024 Camera-ready papers due: March 30, 2024 Workshop date: May 25, 2024
*Workshop Description* In the computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) communities, Arabic is considered to be relatively resource-poor compared to English. This situation was thought to be the reason for the limited number of language resources -based studies in Arabic. However, the past few years witnessed the emergence of new considerably large and free classical and Modern Standard Arabic (MSA) as well as dialectical corpora and to a lesser extent Arabic processing tools.
This workshop follows the footsteps of previous editions of OSACT to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic Language Resources. The sixth workshop comes to encourage researchers and practitioners of Arabic language technologies, including CL, NLP and IR to share and discuss their latest research efforts, corpora, and tools. The workshop will also give special attention to Large Language Models (LLMs) and Generative AI, which is a hot topic nowadays. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Arabic LLMs Hallucination and Dialect to MSA Machine Translation.
*Submissions Topics* Language Resources: - Pre-trained Arabic language models and their applications. - Surveying and evaluating the design of available Arabic corpora, their associated and processing tools. - Availing new annotated corpora for NLP and IR applications such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning. - Evaluating the use of crowdsourcing platforms for Arabic data annotation. - Open source Arabic processing toolkits.
Tools and Technologies: Language education, e.g., L1 and L2. - Language modeling and pre-trained models. - Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, etc. - Sentiment analysis, dialect identification, and text classification. - Dialect translation. - Fake news detection. - Web and social media search and analytics. - Issues in the design, construction, and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data. - Guidelines, standards, best practices, and models for LRs interoperability. - Methodologies and tools for LRs construction and annotation. - Methodologies and tools for extraction and acquisition of knowledge. - Ontologies, terminology, and knowledge representation. - LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.).
Issues in the design, construction and use of Arabic LRs: - Guidelines, standards, best practices and models for LRs interoperability. - Methodologies and tools for LRs construction and annotation. - Methodologies and tools for extraction and acquisition of knowledge. - Ontologies, terminology and knowledge representation. - LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.).
*Submissions* - Submission Instructions: https://lrec-coling-2024.org/authors-kit/ - Submission Link: https://softconf.com/lrec-coling2024/osact2024/
*Workshop organizers* - Hend Al-Khalifa ( King Saud University, KSA) - Hamdy Mubarak (Qatar Computing Research Institute, Qatar) - Kareem Darwish (aiXplain Inc., US) - Tamer Elsayed (Qatar University, Qatar) - Mona Ali (Northeastern University, Canada)