- Corpora - ELRA lists

Call for Participation: GenoVarDis at IberLEF 2024
by Luis Chiruzzo - Inco 28 Feb '24

28 Feb '24

*** Call for Participation for GenoVarDis at IberLEF 2024 *** GenoVarDis: NER in Genomic Variants and related Diseases at IberLEF 2024 https://codalab.lisn.upsaclay.fr/competitions/17733 We look forward to your participation in advancing Spanish biomedical text processing through the GenoVarDis challenge at IberLEF 2024. This task addresses the shortage of resources for Spanish in the domain of NER and genomic variants. The first one of this kind. By leveraging a unique corpus of a wider spectrum of mutations and variant-related entities (including gene, disease and symptom) in Spanish (mainly, translating from English and curated by human-experts), we aim to provide valuable data for training and evaluating NER models in this low-resource domain. Description of the task: * Given a text (sequence of tokens), identify the named entities as spans in the text and classify them according to one of: Variant on DNA sequence, RS number, Allele on DNA sequence, Wild type and mutant, Variant with insufficient information, and transcript IDs. Metrics will include precision, recall, and F1 scores for the task (F-score is the primary metric), considering exact matches. Example of text: Neurofibromatosis tipo I. Mutación de splicing detectada por MLPA y secuenciación en la Argentina La neurofibromatosis tipo 1 (NF1) es un desorden genético autosómico dominante, con una prevalencia de 1 en 2500-3000 nacidos vivos. La dificultad diagnóstica se debe al tamaño extenso del gen NF1 con pocos sitios hot-spot, la ausencia de una clara relación genotipo-fenotipo y rasgos clínicos con un espectro muy heterogéneo. Un caso sospechoso de NF1 procedente de la provincia de Jujuy fue analizado por MLPA (multiplex ligation-dependent probe amplification) en nuestro laboratorio. Mujer, adolescente mestiza (Amerindia/Europea), con un osteoma maxilar, lordosis lumbar, neurofibromas cutáneos y manchas café con leche. Por MLPA se detectó una alteración en el exón 13 del gen NF1. Por secuenciación del exón 13 se identificó una mutación “missense” en la posición 1466 del ARNm (NM_000267.3:c.1466A>G) que introduce un sitio de splicing aberrante. #pmid start end term entity 25919870 0 24 Neurofibromatosis tipo I Disease 25919870 101 125 neurofibromatosis tipo 1 Disease 25919870 127 130 NF1 Gene 25919870 291 294 NF1 Gene 25919870 447 450 NF1 Gene 25919870 640 655 Osteoma maxilar Disease 25919870 657 672 Lordosis lumbar Disease 25919870 674 696 Neurofibromas cutáneos Disease 25919870 699 721 Manchas café con leche Disease 25919870 747 771 Alteración en el exón 13 OtherMutation 25919870 780 783 NF1 Gene 25919870 833 872 Mutación “missense” en la posición 1466 OtherMutation 25919870 883 894 NM_000267.3 Transcript 25919870 895 904 c.1466A>G DNAMutation How to participate: If you want to participate in this task, please join our Codalab competition: https://codalab.lisn.upsaclay.fr/competitions/17733 Important Dates: * March 22, 2024: release training corpus. * May 24, 2024: release test corpus. * June 7, 2024: publication of results. * June 17, 2024: paper submission. * June 28, 2024: notification of acceptance. * July 3, 2024: camera ready paper submission. * September, 2024: IberLEF 202 Workshop.

1 0

1 Postdoc and 2 PhD positions in NLP and ML for healthcare at Amsterdam UMC, University of Amsterdam
by Iacer Calixto 28 Feb '24

28 Feb '24

*** Apologies for cross-posting *** Dear colleagues, The NLP4Health Lab <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnlp4healt…> in the Department of Medical Informatics at the Amsterdam UMC <https://www.amsterdamumc.org/en/research.htm> and University of Amsterdam <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.uva.n…> is hiring *one postdoctoral researcher *and *two PhD students* in *Responsible Natural Language Processing (NLP) and Machine Learning (ML) for Healthcare*, all positions are fully-funded. Do you have a strong background in NLP and ML, and a keen interest in large language models and healthcare? Please consider applying, we are accepting applications (until March 15*)! Please check details and apply via the links below: - PhD positions: https://werkenbij.amsterdamumc.org/en/vacatures/research/2-phd-positions-in… - Postdoc researcher position: https://werkenbij.amsterdamumc.org/en/vacatures/research/postdoctoral-resea… The positions are funded by an NGF AiNed Fellowship Grant for the "CaRe-NLP: Human-Centric and Responsible NLP methods for Dutch healthcare" project. The overall goal of the project is to develop human-centric and responsible NLP and ML methods for healthcare in the Netherlands, Europe, and worldwide. We will design, build, and evaluate state-of-the-art large language models (LLMs) for healthcare data that include (combinations of) free-text clinical notes collected in primary/secondary/intensive care settings, medical images, time series measurements, medical knowledge graphs, and multi-modal electronic heath records (EHRs). Our methods' goals are to ensure privacy and fairness, prevent bias, cope with data scarcity, and be interpretable and explainable. We collaborate with a network of clinicians across multiple specialties, and you will tackle relevant clinical problems with real-world impact. The project team is led by dr. Iacer Calixto and prof. Ameen Abu-Hanna, and we are housed at the Amsterdam UMC location University of Amsterdam in the beautiful city of Amsterdam. (Please feel free to share with your students/communities!) Have a great week, Iacer.

1 0

The ReproHum 2024 Survey of NLP and ML researchers' experience and views of reproducibility
by Thomson, Craig 28 Feb '24

28 Feb '24

Dear Corpora list members, As part of the EPSRC UK ReproHum project (https://reprohum.github.io), we are performing a survey of NLP and ML researchers’ experience and views of reproducibility. We would like to hear from as many researchers as possible (NLP or ML), not just those who work on evaluation! If you completed a similar survey in 2022 then you can still complete this one, we are interested in the difference in your experience and views between then and now. We would be most grateful if you are able to spend 5-10 minutes taking part in the survey, it can be accessed via the below link: https://forms.gle/RshrHcvAXxAEEFj59 With thanks and apologies for cross-posting. Craig Thomson Research Fellow Computing Science University of Aberdeen The University of Aberdeen is a charity registered in Scotland, No SC013683. Tha Oilthigh Obar Dheathain na charthannas clàraichte ann an Alba, Àir. SC013683.

1 0

PAN 2024: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality
by Matti Wiegmann 28 Feb '24

28 Feb '24

PAN 2024: Shared Tasks on Authorship Analysis, Computational Ethics, and Originality Call for Participation We'd like to invite you to participate in the following shared tasks at PAN 2024 held in conjunction with the CLEF conference in Grenoble, France. 1. Voight-Kampff Generative AI Authorship Verification. Given two texts, one authored by a human, one by a machine: pick out the human. https://pan.webis.de/clef24/pan24-web/generated-content-analysis.html 2. Oppositional Thinking Analysis. Given an online message, is it a conspiracy theory or critical thinking? https://pan.webis.de/clef24/pan24-web/oppositional-thinking-analysis.html 3. Multi-Author Writing Style Analysis. Given a document, determine at which positions the author changes. https://pan.webis.de/clef24/pan24-web/style-change-detection.html 4. Multilingual Text Detoxification. Given a toxic piece of text, re-write it in a non-toxic way while saving the main content as much as possible. https://pan.webis.de/clef24/pan24-web/text-detoxification.html Find out more at https://pan.webis.de/clef24/pan24-web Important Dates -------------------------- now Training Data Released May 05, 2024 Software submission May 31, 2024 Participant paper submission June 24, 2024 Peer review notification July 08, 2024 Camera-ready participant papers submission Sep 09-12, 2024 Conference Links -------------------------- PAN: https://pan.webis.de Contact: pan(a)webis.de We are looking forward to your submission! The PAN team

1 0

CLEF-2024 CheckThat! Lab -- Call for Participation (training set available for all tasks)
by Piotr Przybyła 28 Feb '24

28 Feb '24

Dear colleague, We invite you to participate in the 2024 edition of the CheckThat! Lab at CLEF 2024. This year, we feature six tasks ---two follow-up and four new--- that correspond to important components within and around the full fact-checking pipeline in multiple languages: Task 1 Check-worthiness in tweets. to identify claims that could be important to verify on social- and mainstream media (the only task that has been organized during all editions of the lab; Available in Arabic, English, Dutch and Spanish. Task 2 Subjectivity in news articles. to spot text that should be processed with specific strategies; benefiting the fact-checking pipeline. Available in Arabic, English, German, Italian, and Multilingual. Task 3 Persuasion Techniques. to identify text spans in which a persuasion technique is being issued to influence the reader. This task is offered in four languages: Arabic, Bulgarian, English, Portuguese and Slovene. Task 4 Detecting hero, villain, and victim from memes. Detecting hero, villain, and victim from memes:} to predict the role of each entity: hero, villain, victim, or other in a given meme and a list of entities. Available in Arabic, English and Code-mixed. Task 5 Rumor Verification using Evidence from Authorities. to retrieve evidence from trusted sources (authorities that have “real knowledge'' on the matter) and determine if the rumor is supported, refuted, or unverifiable according to the evidence. Available in Arabic and English. Task 6 Robustness of Credibility Assessment with Adversarial Examples. to discover small changes that could be applied to misinformation text, causing the provided classifiers to make wrong predictions. Available for news articles, tweets, propaganda techniques and claims (including regarding COVID-19) in English. Further information: https://checkthat.gitlab.io/ Datasets: https://gitlab.com/checkthat_lab/clef2024-checkthat-lab <https://gitlab.com/checkthat_lab/clef2023-checkthat-lab> Register and participate: https://clef2024-labs-registration.dei.unipd.it/registrationForm.php <https://clef2023-labs-registration.dei.unipd.it/registrationForm.php> Important Dates --------------------- - November 2023: Lab registration opens - January 2024: Release of the training materials - 22 April 2024: Lab registration closes - 2 May 2024: Beginning of the evaluation cycle (test sets release) - 6 May 2024 (23:59 AOE): End of the evaluation cycle (run submission) - 31 May 2024: Deadline for the submission of working notes - 10 June 2024: Submission of Condensed Lab Overviews [LNCS] - 21 June 2024: Camera Ready Copy of Condensed Lab Overviews [LNCS] due - 24 June 2024: Notification of acceptance of working notes - 8 July 2024: Deadline for submission of camera-ready working notes - 22-26 July 2024: Preview of working notes - 9-12 September 2024: CLEF 2024 Conference in Grenoble, France Best regards, The CLEF-2024 CheckThat! Lab Shared Task Organizers

1 0

ASONAM 2024: Call for Workshop Proposals
by Rajesh Sharma 28 Feb '24

28 Feb '24

The 16th International Conference on Advances in Social Networks Analysis and Mining -ASONAM-2024 September 02-05, 2024, Calabria, Italy. Conference Link: https://asonam.cpsc.ucalgary.ca/2024/ Workshop information : https://asonam.cpsc.ucalgary.ca/2024/CFW.php#key_dates CALL FOR WORKSHOP PROPOSALS The 16th International Conference on Advances in Social Networks Analysis and Mining (ASONAM-2024) invites proposals for workshops at its annual conference. ASONAM 2024 will be held between September 02-05, 2024 in Calabria, Italy. ASONAM is an interdisciplinary venue that brings together practitioners and researchers from a variety of Social Network Analysis and Mining fields to promote collaborations and exchange of ideas and practices. The ASONAM 2024 Committee invites proposals for workshops to be held on September 02-05, 2024 in conjunction with the main ASONAM 2024 conference. Workshops can be either scheduled for a full day (morning and afternoon) or for half a day. Proposals should include following information: - The name of the workshop. - The names and addresses of the organizers, and a designated contact person. - Description of the workshop: abstract, objectives, relevance, and expected outcome. - The names of program committee members and, if applicable, other potential applicants. - A description of the plans for workshop (e.g., program, keynotes, highlights, etc.). - The expected number of attendees and the planned length of the workshop. - A description of past versions of the workshop, including dates, organizers, submission and acceptance counts, attendance, sites, and any other relevant information. Important dates: Workshop proposal deadline March 20, 2024 11:59 PM AoE Workshop acceptance notification April 10, 2024 11:59 PM AoE For paper submission in your proposal, reviewing and final revisions, please consider the following deadlines: Workshop paper submission deadline June 10, 2024 11:59 PM AoE Workshop paper acceptance notification July 10, 2024 11:59 PM AoE Workshop paper camera-ready deadline July 18, 2024 11:59 PM AoE Organizers of accepted proposals will be responsible for publicizing and running the workshop, including sending out calls for papers, reviewing submissions, producing the camera ready workshop proceedings, and organizing the meeting days. Submission Link: https://easychair.org/conferences/?conf=workshopsasonam2024 Looking forward to your workshop proposals which will help make ASONAM 2024 a success! Kind Regards Rajesh Sharma Associate Professor, Head, Computational Social Science Lab, Institute of Computer Science, University of Tartu, Estonia https://rajeshsharma.cs.ut.ee/

1 0

WojoodNER 2024 The 2nd Arabic Named Entity Recognition Shared Task
by Mustafa Jarrar 28 Feb '24

28 Feb '24

WojoodNER 2024 The 2nd Arabic Named Entity Recognition Shared Task at ArabicNLP’24 https://dlnlp.ai/st/wojood/ ندعوكم للمشاركة في المسابقة العلمية الثانية لاكتشاف الاعلام في النصوص العربية. سيحصل المشاركين على مدونة وجود الجديدة (٥٥٠ الف كلمة + انواع مفصلة من الاعلام). يوجد ثلاث مهام في المسابقة يمكن المشاركة باي منها، احدى المهام حول الحرب على غزة ويمكن للمشاركين استخدام بيانات خارجية فيها Dataset: Wojood-Fine <https://aclanthology.org/2023.arabicnlp-1.25/> New version: Arabic Fine-Grained Entity Recognition (Wojood + Subtypes of entity types). Subtask-1 (Closed-Track Flat Fine-Grain NER): We provide the Wojood-Fine Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). External data is not allowed .... (read more <https://dlnlp.ai/st/wojood/>). Subtask-2 (Closed-Track Nested Fine-Grain NER): This subtask is similar to the subtask-1, we provide the Wojood-Fine Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%) .... (read more <https://dlnlp.ai/st/wojood/>). Subtask-3 (Open-Track NER - Gaza War): to allow participants to reflect on the utility of NER in the context of real-world events, allow them to use external resources, and encourage them to use generative models in different ways (fine-tuned, zero-shot learning, in-context learning, etc.). The goal of focusing on generative models in this particular subtask is to help the Arabic NLP research community better understand the capabilities and performance gaps of LLMs in information extraction, an area currently understudied. We provide development and test data related to the current War on Gaza. This is motivated by the assumption that discourse about recent global events will involve mentions from different data distribution. For this subtask, we include data from five different news domains related to the War on Gaza - but we keep the names of the domains hidden. Participants will be given a development dataset (10K tokens, 2K from each of the five domains), and a testing dataset (50K tokens, 10K from each domain). Both development and testing sets are manually annotated with fine-grain named entities using the same annotation guidelines used in Subtask1 and Subtask2 (also described in Liqreina et al., 2023). .... (read more <https://dlnlp.ai/st/wojood/>). BASELINES Two baseline models trained on WojoodFine (flat and nested) are provided (See Liqreina et al., 2023 <https://aclanthology.org/2023.arabicnlp-1.25/>). The code used to produce these baselines is available on GitHub <https://github.com/SinaLab/ArabicNER>. Subtask Precision Recall Average Micro-F1 Flat Fine-Grain NER (Subtask 1) 0.8870 0.8966 0.8917 Nested Fine-Grain NER (Subtask 2) 0.9179 0.9279 0.9229 GOOGLE COLAB NOTEBOOKS To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models. [1] Train Flat Fine-Grain NER <https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>: This notebook can be used to train our ArabicNER model on the flat Fine-grain NER task using the sample Wojood_Fine data. [2] Evaluate Flat Fine-Grain NER <https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. [3] Train Nested Fine-Grain NER <https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>: This notebook can be used to train our ArabicNER model on the nested Fine-grain task using the sample Wojood data. [4] Evaluate Nested Fine-Grain NER <https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. REGISTRATION Participants need to register via this form (NERSharedTask 2024) <https://docs.google.com/forms/d/1ISMILgQYfUug3XuDpxFmuPASXkWaduYOUc3xOZuGwq…>. Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups. FAQ For any questions related to this task, please check our Frequently Asked Questions <https://docs.google.com/document/d/1W_13FRpP3NbDx_ALYJWA3-ESXPRVomOjNovUuYf…> IMPORTANT DATES - February 25, 2024: Shared task announcement. - March 1, 2024: Release of training data, development sets, scoring script, and Codalab links. - April 5, 2024: Registration deadline. - April 26, 2024: Test set made available. - May 3, 2024: Codalab Test system submission deadline. - May 10, 2024: Shared task system paper submissions due. - June 17, 2024: Notification of acceptance. - July 1, 2024: Camera-ready version. - August 16, 2024: ArabicNLP 2024 conference in Thailand. CONTACT For any questions related to this task, please contact the organizers directly using the following email address: NERSharedtask(a)gmail.com <mailto:NERSharedtask@gmail.com> . ORGANIZERS - Mustafa Jarrar, Birzeit University - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI - Mohammed Khalilia, Birzeit University - Bashar Talafha, University of British Columbia - AbdelRahim Elmadany, University of British Columbia - Nagham Hamad, Birzeit University --Mustafa __________________________ Mustafa Jarrar, PhD Professor of Artificial Intelligence Chair, PhD Program in Computer Science Birzeit University, Palestine Whatsapp:+972599662258 | mjarrar(a)birzeit.edu <mailto:mjarrar@birzeit.edu> http://www.jarrar.info <http://www.jarrar.info/>

1 0

Extended Deadline [March 10, 2024] The First Workshop on Visualization for Natural Language Processing (Vis4NLP)
by Tariq Yousef 28 Feb '24

28 Feb '24

Apologies for cross-posting Extended Deadline: March 10, 2024 The First Workshop on Visualization for Natural Language Processing (Vis4NLP)May 27th 2024, Odense, Denmark The workshop will be co-located with EuroVis 2024 <https://www.eurovis.org/eurovis> in Odense, Denmark, and will take place in person on May 27. The workshop aims to create a dedicated space for interdisciplinary collaboration at the intersections of NLP and visualization. Vis4NLP serves as a pivotal platform where researchers, practitioners, and academics come together to collectively tackle the ever-evolving challenges and opportunities in NLP visualization. *Call for paper: http://vis4nlp.com/ <http://vis4nlp.com/>Workshop date: May 27, 2024Venue: Syddansk Universitet - University of Southern Denmark <http://sdu.dk/>* Important Dates - Workshop paper due: March 3, 2024 March 10, 2024 - Notification of acceptance: April 10, 2024 - Camera-ready papers due: April 20, 2024 - Workshop date: May 27, 2024 All submission deadlines are at 23:59 GMT on the date indicated. Best regards *Tariq Yousef*Assistant Professor of Data Science Department of Mathematics and Computer Science Faculty of Science *University of Southern Denmark* [image: image.png]

1 0

Final CfP: Games and NLP 2024 Workshop @ LREC-COLING
by Udo Kruschwitz 28 Feb '24

28 Feb '24

GAMES AND NLP 2024 @ LREC-COLING 2024 ===================================== Co-located with LREC-COLING in Turin, Italy 21st May 2024 https://gamesandnlp.com *** Deadline extended: Mar 4th *** Call for Papers -------------------- The 10th Workshop on Games and Natural Language Processing (Games and NLP 2024)—to be held at LREC-COLING 2024 — will examine the use of games and gamification for Natural Language Processing (NLP) tasks, as well as how NLP research can advance player engagement and communication within games. The Games and NLP workshop aims to promote and explore the possibilities for research and practical applications of games and gamification that have a core NLP aspect, either to generate resources and perform language tasks or as a game mechanic itself. This workshop investigates computational and theoretical aspects of natural language research that would be beneficial for designing and building novel game experiences, or for processing texts to conduct formal game studies. NLP would benefit from games in obtaining language resources (e.g., construction of a thesaurus or a parser through a crowdsourcing game), or in learning the linguistic characteristics of game users as compared to those of other domains. Topics (include, but are not limited to) -------------------------------------------------- • Games for collecting data useful for NLP • Gamification of NLP tasks • Player motivation and experience • Game design • Novel uses of natural language processing or generation as a game mechanic • Natural language in games as an alternative method of input for people with disabilities • Processing NLP game data • Analysis of large-scale game-related corpora • Real-time sentiment analysis of player discourse or chat • Evaluation of games for NLP • Serious games for learning languages • Player immersion in language-enabled mixed reality or physically embodied games • Narrative plot or text generation of text-based interactive narrative systems • Natural language understanding and generation of character dialogue • Ethical and privacy concerns of ownership of text and audio chat in massively multiplayer online games Submissions: ------------------ The papers should be submitted as a PDF document, conforming to the formatting guidelines provided in the call for papers of LREC-COLING conference (https://lrec-coling-2024.org/authors-kit/). Submissions are to be made via Softconf/START Conference Manager at https://softconf.com/lrec-coling2024/gamesandnlp2024/ Important Dates --------------------- • Submission Deadline: Mar 4th (*** extended ***) • Notification of Acceptance: Mar 26th • Camera Ready Deadline: Apr 1st • Workshop: May 21st Organisation Committee -------------------------------- • Chris Madge, chair (Queen Mary University of London) • Jon Chamberlain (University of Essex, UK) • Karën Fort (Sorbonne Université, France) • Udo Kruschwitz (University of Regensburg, Germany) • Stephanie Lukin (U.S. Army Research Laboratory) Programme Committee ------------------------------- • Alice Millour (Sorbonne Université) • Brent Harrison (University of Kentucky, US) • Ian Horswill (Northwestern University) • Jonathan Lessard (Universite Condoria) • Luisa Coheur (INESC-ID & Instituto Superior Técnico, University of Lisbon) • Mariët Theune (University of Twente) • Massimo Poesio (Queen Mary University, UK) • Mathieu Lafourcade (LIRMM, France) • Morteza Behrooz (University of California, Santa Cruz, US) • Pedro Santos (INESC-ID & Instituto Superior Técnico, University of Lisbon) • Richard Bartle (University of Essex, UK) • Seth Cooper (Northeastern University, US) • Valerio Basile (University of Turin, Italy) • Fatima Althani (Queen Mary University, UK)

1 0

Final CFP and deadline extension: 6th Workshop on Workshop on Open-Source Arabic Corpora and Processing Tools (Hybrid) with shared tasks @LREC-COLING 2024 in Turin (Italy)
by m.zakiali80＠gmail.com 27 Feb '24

27 Feb '24

**The 6th Workshop on Open-Source Arabic Corpora and Processing Tools (Hybrid) with shared tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation** The workshop will be conducted in a *hybrid* format to ensure maximum participation, accommodating attendees both online and in-person. Submission deadline: extended to * March 1 *, 2024 *Workshop site* : https://osact-lrec.github.io/ *shared tasks:* Task 1: Arabic LLMs Hallucination (contact Hamdy Mubarak), Link: https://sites.google.com/view/arabic-llms-hallucination Task 2: Dialect to MSA Machine Translation (contact Kareem Darwish), Link: https://codalab.lisn.upsaclay.fr/competitions/17118 *Co-located with LREC-COLING 2024* https://lrec-coling-2024.org/ Turin, Italy, 20-25 May 2024 * Important Dates* Submission deadline: extended to * March 1 *, 2024 Notification of acceptance: March 25, 2024 Camera-ready papers due: March 30, 2024 Workshop date: May 25, 2024 *Workshop Description* In the computational linguistics (CL), natural language processing (NLP), and information retrieval (IR) communities, Arabic is considered to be relatively resource-poor compared to English. This situation was thought to be the reason for the limited number of language resources -based studies in Arabic. However, the past few years witnessed the emergence of new considerably large and free classical and Modern Standard Arabic (MSA) as well as dialectical corpora and to a lesser extent Arabic processing tools. This workshop follows the footsteps of previous editions of OSACT to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic Language Resources. The sixth workshop comes to encourage researchers and practitioners of Arabic language technologies, including CL, NLP and IR to share and discuss their latest research efforts, corpora, and tools. The workshop will also give special attention to Large Language Models (LLMs) and Generative AI, which is a hot topic nowadays. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Arabic LLMs Hallucination and Dialect to MSA Machine Translation. *Submissions Topics* Language Resources: - Pre-trained Arabic language models and their applications. - Surveying and evaluating the design of available Arabic corpora, their associated and processing tools. - Availing new annotated corpora for NLP and IR applications such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning. - Evaluating the use of crowdsourcing platforms for Arabic data annotation. - Open source Arabic processing toolkits. Tools and Technologies: Language education, e.g., L1 and L2. - Language modeling and pre-trained models. - Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, etc. - Sentiment analysis, dialect identification, and text classification. - Dialect translation. - Fake news detection. - Web and social media search and analytics. - Issues in the design, construction, and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data. - Guidelines, standards, best practices, and models for LRs interoperability. - Methodologies and tools for LRs construction and annotation. - Methodologies and tools for extraction and acquisition of knowledge. - Ontologies, terminology, and knowledge representation. - LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.). Issues in the design, construction and use of Arabic LRs: - Guidelines, standards, best practices and models for LRs interoperability. - Methodologies and tools for LRs construction and annotation. - Methodologies and tools for extraction and acquisition of knowledge. - Ontologies, terminology and knowledge representation. - LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.). *Submissions* - Submission Instructions: https://lrec-coling-2024.org/authors-kit/ - Submission Link: https://softconf.com/lrec-coling2024/osact2024/ *Workshop organizers* - Hend Al-Khalifa ( King Saud University, KSA) - Hamdy Mubarak (Qatar Computing Research Institute, Qatar) - Kareem Darwish (aiXplain Inc., US) - Tamer Elsayed (Qatar University, Qatar) - Mona Ali (Northeastern University, Canada)

1 0

2026

2025

2024

2023

2022

Corpora