- Corpora - ELRA lists

[CFP] MAHED 2025: the First Shared Task on Multimodal Detection of Hope and Hate Emotions in Arabic Content
by Wajdi Zaghouani 22 Jun '25

22 Jun '25

We are pleased to announce MAHED 2025, the first multimodal shared task dedicated to Hope and Hate Detection in Arabic content. This novel multimodal challenge will be co-located with EMNLP 2025 at the ArabicNLP 2025 Conference. MAHED 2025 addresses critical real-world challenges in Arabic natural language processing by focusing on the detection of hate speech, hope speech, and emotions in both Arabic text and memes. This shared task aims to advance research in ethical AI while addressing the linguistic diversity and dialectal variations inherent in Arabic content. The shared task comprises three subtasks: Task 1: Text-based Hope & Hate Speech Classification Participants will develop models to classify Arabic text as containing hope speech, hate speech, or neutral content. Task 2: Multitask Learning for Emotion, Offensive Content, and Hate Detection This task involves simultaneous detection of emotions, offensive language, and hate speech in Arabic text. Task 3: Multimodal Hateful Meme Detection Participants will work with Arabic memes to detect hateful content using both textual and visual modalities. Registration Links: * Task 1: https://www.codabench.org/competitions/9136/ * Task 2: https://www.codabench.org/competitions/9166/ * Task 3: https://www.codabench.org/competitions/9192/ Important Dates: * June 10, 2025: Training data and evaluation scripts released * July 20, 2025: Final registration deadline and test set release * July 25, 2025: Test submission deadline * November 5-9, 2025: ArabicNLP 2025 Workshop at EMNLP 2025, Suzhou, China Resources and Registration: Website: https://marsadlab.github.io/mahed2025/ Dataset and Code: https://github.com/marsadlab/MAHED2025Dataset

1 0

The 16th IEEE International Conference on Knowledge Graphs (ICKG 2025): Last Call for Papers
by Announce 22 Jun '25

22 Jun '25

*** Last Call for Papers *** The 16th IEEE International Conference on Knowledge Graphs (ICKG 2025) November 13-14, 2025, 5* St. Raphael Resort and Marina, Limassol, Cyprus https://cyprusconferences.org/ickg2025/ (*** Proceedings to be published by IEEE ***) (*** Submission Deadline: July 4, 2025 AoE (extended and firm!) ***) The annual IEEE International Conference on Knowledge Graph (ICKG) provides a premier international forum for presentation of original research results in knowledge discovery and graph learning, discussion of opportunities and challenges, as well as exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of knowledge discovery from data, with a strong focus on graph learning and knowledge graph, including algorithms, software, platforms. ICKG 2025 intends to draw researchers and application developers from a wide range of areas such as knowledge engineering, representation learning, big data analytics, statistics, machine learning, pattern recognition, data mining, knowledge visualization, high performance computing, and World Wide Web etc. By promoting novel, high quality research findings, and innovative solutions to address challenges in handling all aspects of learning from data with dependency relationship. All accepted papers will be published in the conference proceedings by the IEEE Computer Society. Awards, including Best Paper, Best Paper Runner up, Best Student Paper, Best Student Paper Runner up, will be conferred at the conference, with a check and a certificate for each award. The conference also features a survey track to accept survey papers reviewing recent studies in all aspects of knowledge discovery and graph learning. At least five high quality papers will be invited for a special issue of the Knowledge and Information Systems Journal, in an expanded and revised form. In addition, at least eight quality papers will be invited for a special issue of Data Intelligence Journal in an expanded and revised form with at least 30% difference. TOPICS OF INTEREST Topics of interest include, but are not limited to: • Foundations, algorithms, models, and theory of knowledge discovery and graph learning • Knowledge engineering with big data. • Machine learning, data mining, and statistical methods for data science and engineering. • Acquisition, representation and evolution of fragmented knowledge. • Fragmented knowledge modeling and online learning. • Knowledge graphs and knowledge maps. • Graph learning security, privacy, fairness, and trust. • Interpretation, rule, and relationship discovery in graph learning. • Geospatial and temporal knowledge discovery and graph learning. • Ontologies and reasoning. • Topology and fusion on fragmented knowledge. • Visualization, personalization, and recommendation of Knowledge Graph navigation and interaction. • Knowledge Graph systems and platforms, and their efficiency, scalability, and privacy. • Applications and services of knowledge discovery and graph learning in all domains including web, medicine, education, healthcare, and business. • Big knowledge systems and applications. • Crowdsourcing, deep learning and edge computing for graph mining. • Large language models and applications • Open source platforms and systems supporting knowledge and graph learning. • Datasets and benchmarks for graphs • Neurosymbolic & Hybrid AI systems • Graph Retrieval Augmented Generation SURVEY TRACK Survey paper reviewing recent study in keep aspects of knowledge discover and graph learning. In addition to the above topics, authors can also select and target the following Special Track topics. Each special track is handled by respective special track chairs, and the papers are also included in the conference proceedings. • Special Track 01: KGC and Knowledge Graph Building • Special Track 02: KR and KG Reasoning. • Special Track 03: KG and Large Language Model • Special Track 04: GNN and Graph Learning • Special Track 05: QA and Graph Database • Special Track 06: KG and Multi-modal Learning. • Special Track 07: KG and Knowledge Fusion. • Special Track 08: Industry and Applications SUBMISSION GUIDELINES Paper submissions should be no longer than 8 pages, in the IEEE 2-column format, including the bibliography and any possible appendices. Submissions longer than 8 pages will be rejected without review. All submissions will be reviewed by the Program Committee based on technical quality, originality, significance, and clarity. For survey track paper, please preface the descriptive paper title with “Survey:”, followed by the actual paper title. For example, a paper entitled “A Literature Review of Streaming Knowledge Graph”, should be changed as “Survey: A Literature Review of Streaming Knowledge Graph”. This is for the reviewers and chairs to clearly bid and handle the papers. Once the paper is accepted, the word, such as “Survey:”, can be removed from the camera-ready copy. For special track paper, please preface the descriptive paper title with “SS##:”, where “##” is the two digits special track ID. For example, a paper entitled “Incremental Knowledge Graph Learning”, intended to target Special Track 01 (Machine learning and knowledge graph) should be changed as “SS01: Incremental Knowledge Graph Learning”. All manuscripts are submitted as full papers and are reviewed based on their scientific merit. The reviewing process is single blind, meaning that each submission should list all authors and affiliations. There is no separate abstract submission step. There are no separate industrial, application, or poster tracks. Manuscripts must be submitted electronically in the online submission system. No email submission is accepted. To help ensure correct formatting, please use the style files for U.S. Letter as template for your submission. These include LaTeX and Word. SUBMISSION LINK https://wi-lab.com/cyberchair/2025/ickg25/ IMPORTANT DATES • Paper submission (abstract and full paper): July 4, 2025 (AoE) (extended and firm!) • Notification of acceptance/rejection: September 5, 2025 • Camera-ready, copyright forms and author registration: September 20, 2025 • Early (non-author) registration: October 10, 2025 • Conference dates: November 13-14, 2025 ORGANISATION Conference and Local Organising Chair • George A. Papadopoulos, University of Cyprus Conference Co-Chair • Dan Guo, Hefei University of Technology Program Chairs • Cesare Alippi, Università della Svizzera italiana • Shirui Pan, Griffith University Local Organising Vice Chair • Irene Kinlanioti, National Technical University of Athens Finance Chair • Constantinos Pattichis, University of Cyprus Steering Committee Chair • Xindong Wu, Hefei University Of Technology

1 0

Call for Abstracts – NARNiHS 2026 – 8th Annual Meeting of the North American Research Network in Historical Sociolinguistics
by Lauersdorf, Mark R. 21 Jun '25

21 Jun '25

*** NARNiHS 2026 *** North American Research Network in Historical Sociolinguistics *** Eighth Annual Meeting *** 100% IN PERSON *** Co-Located with the Linguistic Society of America (LSA) Annual Meeting *** New Orleans, Louisiana USA *** 8-11 January 2026 This event offers an opportunity for historical sociolinguistics scholars from all over the world to gather and share leading research. We encourage our fellow historical sociolinguists and scholars in related fields from our global scholarly community to **join us in New Orleans** for our Eighth Annual Meeting. Consult this Call for Abstracts on the web: https://narnihs.org/?page_id=3135 . --------------- Call for Abstracts ---------------. Abstract submission online: https://easyabs.linguistlist.org/conference/NARNiHS_26/ . Deadline: Friday, 15 August 2025, 11:59 PM US Eastern Time. Late abstracts will not be considered. The North American Research Network in Historical Sociolinguistics (NARNiHS) is accepting abstracts for its Eighth Annual Meeting in New Orleans, Thursday, January 8 -- Sunday, January 11, 2026. The 8th edition of this inclusive NARNiHS event seeks to provide a collaborative environment where presenters bring fully developed work for presentation and enrichment. We see the NARNiHS Annual Meeting as a place for showcasing excellent projects in historical sociolinguistics, seeking feedback from peers, and engaging in productive development of the field’s enduring questions. NARNiHS welcomes papers in all areas of historical sociolinguistics, which is understood as the application and/or development of sociolinguistic theories, methods, and models for the study of historical language variation and change over time, or more broadly, the study of the interaction of language and society in historical periods and from historical perspectives. Thus, a wide range of linguistic areas, subdisciplines, methodologies, and adjacent disciplines easily find their place within historical sociolinguistics, and we encourage submission of abstracts that reflect this broad scope. Abstracts will be accepted for both 20-minute papers and posters. Please note that, at the NARNiHS annual meeting, poster presentations are an integral part of the conference (not second-tier presentations). Abstracts will be assigned a paper or a poster presentation based on determinations in the review process about the most effective format for the submission. However, if you prefer that your submission be considered primarily for poster presentation, please specify this in your abstract. Successful abstracts will demonstrate *thorough grounding* in historical sociolinguistics, *scientific rigor* in the formulation of research questions, and promise for rich discussion of ideas. Successful abstracts will be explicit about which *theoretical frameworks*, *methodological protocols*, and *analytical strategies* are being applied or critiqued. *Data sources and examples* should be sufficiently presented, so as to allow reviewers a full understanding of the scope and claims of the research. Please note that the *connection of your research to the field of historical sociolinguistics* should be explicitly outlined in your abstract. Failure to adhere to these criteria will likely result in rejection. *** Abstract Format Guidelines***. - Abstracts must be submitted in PDF format. - Abstracts must fit on one 8.5x11 inch page, with margins no smaller than 1 inch and a font style and size no smaller than Times New Roman 12 point. You are encouraged to use the entire page, providing a full and robust description of the research. All additional supporting content (visualizations, trees, tables, figures, captions, examples, and references) must fit on a single (1) additional page. No exceptions to these requirements are allowed; abstracts longer than one page or with more than one additional page of supporting content will be rejected without review. - Specify if you prefer your submission be considered primarily for a poster presentation. - Anonymize your abstract. We realize that sometimes complete anonymity is not attainable, but there is a difference between the nature of the research creating an inability to anonymize and careless non-anonymizing (in citations, references, file names, etc.). Be sure to anonymize your PDF file (you may do so in Adobe Acrobat Reader by clicking on "File", then "Properties", removing your name if it appears in the "Author" line of the "Description" tab, and re-saving the file before submission). Do not use your name when saving your PDF (e.g. Smith_Abstract.pdf); file names will not be automatically anonymized by the EasyAbs system. Rather, use non-identifying information in your file name (e.g. HistSoc4Lyfe.pdf). Your name should only appear in the online form accompanying your abstract submission. Papers that are not sufficiently anonymized wherever possible will be rejected without review. *** General Requirements ***. - Abstracts must be submitted electronically using the following link: https://easyabs.linguistlist.org/conference/NARNiHS_26/ . - Authors may submit a maximum of two abstracts: One single-author abstract and one co-authored abstract. - Authors may not submit identical abstracts for presentation at the NARNiHS annual meeting and the LSA annual meeting or another LSA sister society meeting (ADS, ANS, NAHoLS, SCiL, SPCL, or SSILA). - After submission, no changes of author, title, or wording of the abstract may occur. If your abstract is accepted, adjustment of typographical errors is permitted before a final version of the abstract is printed in the conference booklet. - Papers and posters must be delivered as projected in the abstract or represent bona fide developments of the same research. - Authors are expected to attend the conference in-person and present their own papers and posters. This will not be a hybrid event. Contact us at NARNiHistSoc(a)gmail.com with any questions.

1 0

Deadline extension: The 1st Workshop on Large Language Models for Cross-Temporal Research at COLM 2025
by wei.zhao＠abdn.ac.uk 20 Jun '25

20 Jun '25

We invite you to submit your ongoing, published or pre-reviewed works to our workshop on Large Language Models for Cross-Temporal Research (XTempLLMs) at COLM 2025. Our workshop website is available at https://xtempllms.github.io/2025/ *The deadline for submission has been extended to June 30, 2025 AOE* Workshop Description: Large language models (LLMs) have been used for a variety of time-sensitive applications such as temporal reasoning, forecasting and planning. In addition, there has been a growing number of interdisciplinary works that use LLMs for cross-temporal research in several domains, including social science, psychology, cognitive science, environmental science and clinical studies. However, LLMs are hindered in their understanding of time due to many different reasons, including temporal biases and knowledge conflicts in pretraining and RAG data but also a fundamental limitation in LLM tokenization that fragments a date into several meaningless subtokens. Such inadequate understanding of time would lead to inaccurate reasoning, forecasting and planning, and time-sensitive findings that are potentially misleading. Our workshop looks for (i) cross-temporal work in the NLP community and (ii) interdisciplinary work that relies on LLMs for cross-temporal studies. Cross-temporal work in the NLP community: * Novel benchmarks for evaluating the temporal abilities of LLMs across diverse date and time formats, culturally grounded time systems, and generalization to future contexts; * Novel methods (e.g., neuro-symbolic approaches) for developing temporally robust, unbiased, and reliable LLMs; * Data analysis such as the distribution of pretraining data over time and conflicting knowledge in pretraining and RAG data; * Interpretability regarding how temporal information is processed from tokenization to embedding across different layers, and finally to model output; * Temporal applications such as reasoning, forecasting and planning; * Consideration of cross-lingual and cross-cultural perspectives for linguistic and cultural inclusion over time. Interdisciplinary work that relies on LLMs for cross-temporal studies: * Time-sensitive discoveries, such as social biases over time and personality testing over time; * Assessment of time-sensitive discoveries to identify misleading findings if any; * Interdisciplinary evaluation benchmarks for LLMs’ temporal abilities, e.g., psychological time perception and episodic memory evaluation. Submission Modes: * Standard submissions: We invite the submission of papers that will receive up to three double-blind reviews from the XTempLLMs committee, and a final decision of acceptance from the workshop chairs. * Pre-reviewed submissions: We invite unpublished papers that have already been reviewed either through ACL ARR, or recent AACL/EACL/ACL/EMNLP/COLING venues. These papers will not receive new reviews but will be judged together with their reviews via a meta-review from the workshop chairs. * Published papers: We invite papers that have been published recently elsewhere to present at XTempLLMs. Please send the details of your paper (Paper title, authors, publication venue, abstract, and a link to download the paper) directly to xtempllms(a)gmail.com. This allows such papers to gain more visibility from the workshop audience. All deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”): * June 30, 2025: Submission deadline (standard and published papers) * July 18, 2025: Submission deadline for papers with ARR reviews * July 24, 2025: Notification of acceptance * October 10, 2025: Workshop day Invited Speakers: * Jose Camacho Collados, Cardiff University, United Kingdom * Ali Emami, Brock University, Canada * Alexis Huet, Huawei Technologies, France * Bahare Fatemi, Google Research, Canada * Vivek Gupta, Arizona State University, United States Organizing Committee: * Wei Zhao, University of Aberdeen, United Kingdom * Maxime Peyrard, Université Grenoble Alpes & CNRS, France * Katja Markert, Heidelberg University, Germany

1 0

LREC 2026 First Call for Papers
by info＠elda.org 19 Jun '25

19 Jun '25

[Apologies for cross-postings] FIRST CALL FOR PAPERS LREC 2026 Organised by the ELRA Language Resources Association Palma, Mallorca, Spain 11-16 May 2026 The Fifteenth biennial Language Resources and Evaluation Conference (LREC) will be held at the Palau de Congressos de Palma in Palma, Mallorca, Spain, on 11-16 May 2026. LREC serves as the primary forum for presentations describing the development, dissemination, and use of language resources involving both traditional and recently developed approaches. The scientific program will include invited talks, oral presentations, and poster and demo presentations, as well as a keynote address by the winner of the Antonio Zampolli Prize. Submissions describing all aspects of language resource development and use are invited, including, but not limited to, the following: Language Resource Development Methods and tools for mono- and multi-lingual language resource development and annotation Knowledge discovery/representation (knowledge graphs, linked data, terminologies, lexicons, ontologies, etc.) Resource development for less-resourced/endangered languages Guidelines, standards, best practices, and models for interoperability Language Resource Use Use of language resources in systems and applications for any area of language and speech processing Use of language resources in assistive technologies, support for accessibility Efficient/low-resource methods for language and speech processing Evaluation Methodologies and protocols for evaluation and benchmarking of language technologies Measures for validation of language resources and quality assurance Usability of user interfaces and dialogue systems Bias, safety, and user satisfaction metrics Interpretability/explainability of language models and language and speech processing tools Language Resources and Large Language Models Language resource development for LLMs (monolingual, multilingual, multimodal) (Semi-)automatic generation of training data Training, fine-tuning, adaptation, alignment, and representation learning Guardrails, filters, and modules for generative AI models Policy and Organizational Considerations International and national activities, projects, initiatives, and policies Language coverage and diversity Replicability and reproducibility Organisational, economic, ethical, climate, and legal issues Separate calls will be issued for Workshops, Tutorials and Industry Track. Submission Submissions should be 4 to 8 pages in length (excluding references) and follow the LREC stylesheet, which will soon be available on the conference website. At the time of submission, authors are offered the opportunity to share related language resources with the community. All repository entries are linked to the LRE Map [https://lremap.elra.info/], which provides metadata for the resource. Accepted papers will appear in the conference proceedings, which include both oral and poster papers in the same format. Determination of the presentation format (oral vs. poster) is based solely on an assessment of the optimal method of communication (more or less interactive), given the paper content. Important dates (All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”) Oral and poster (or poster+demo) paper submission: 17 October 2025 Notification of acceptance: 13 February 2026 Camera Ready due: 6 March 2026 Workshop and tutorial proposals submission: 17 October 2025 LREC 2026 conference: 11-16 May 2026 More information on LREC 2026: https://lrec2026.info/ Contact: info(a)lrec2026.info

1 0

Deadline extension: First Workshop on Optimal Reliance and Accountability in Interactions with Generative Language Models
by Nikhil Krishnaswamy 18 Jun '25

18 Jun '25

The First Workshop on Optimal Reliance and Accountability in Interactions with Generative Language Models (*ORIGen*) will be held in conjunction with the Second Conference on Language Modeling (COLM) at the Palais des Congrès in Montreal, Quebec, Canada, on October 10, 2025! *The deadline for submission has been extended to June 27, 2025, Anywhere on Earth.* With the rapid integration of generative AI, exemplified by large language models (LLMs), into personal, educational, business, and even governmental workflows, such systems are increasingly being treated as “collaborators” with humans. In such scenarios, underreliance or avoidance of AI assistance may obviate the potential speed, efficiency, or scalability advantages of a human-LLM team, but simultaneously, there is a risk that subject matter non-experts may overrely on LLMs and trust their outputs uncritically, with consequences ranging from the inconvenient to the catastrophic. Therefore, establishing optimal levels of reliance within an interactive framework is a critical open challenge as language models and related AI technology rapidly advances. * What factors influence overreliance on LLMs? * How can the consequences of overreliance be predicted and guarded against? * What verifiable methods can be used to apportion accountability for the outcomes of human-LLM interactions? * What methods can be used to imbue such interactions with appropriate levels of “friction” to ensure that humans think through the decisions they make with LLMs in the loop? The ORIGen workshop provides a new venue to address these questions and more through a multidisciplinary lens. We seek to bring together broad perspectives from AI, NLP, HCI, cognitive science, psychology, and education to highlight the importance of mediating human-LLM interactions to mitigate overreliance and promote accountability in collaborative human-AI decision-making. Submissions are due *June 27, 2025*. Please see our call for papers [1] for more! [1] https://origen-workshop.github.io/submissions/ Organizers: - Nikhil Krishnaswamy, Colorado State University - James Pustejovsky, Brandeis University - Dilek Hakkani-Tür, University of Illinois Urbana Champaign - Vasanth Sarathy, Tufts University - Tejas Srinivasan, University of Southern California - Mariah Bradford, Colorado State University - Timothy Obiso, Brandeis University - Mert Inan, Northeastern University

1 0

Job Offers: Senior and Junior AI & Language Technologies Specialists – EUSKORPORA (San Sebastián, Spain)
by info 18 Jun '25

18 Jun '25

Dear colleagues, EUSKORPORA, a newly created Linguistic Data Center for Basque digital technologies based in San Sebastián (Donostia), Spain, is seeking candidates for two key roles in its Technology area: 1) Senior AI and Language Technologies Specialist 2) Junior AI and Language Technologies Specialist Both positions are part of the Center's mission to position the Basque language in the global digital space through open-source development and cutting-edge research. === SENIOR AI AND LANGUAGE TECHNOLOGIES SPECIALIST === EUSKORPORA, the Linguistic Data Center for Basque Digital Technologies, a new association based in Donostia/San Sebastián, is seeking a senior expert in AI technologies applied to natural language processing, with experience, to lead key tasks related to language technologies applied to the Basque language. The selected person will be part of an interdisciplinary team and will participate in projects involving the collection, analysis, and annotation of linguistic data, as well as the development of open-source foundational language models (ASR, TTS, MT, NLP) oriented to Basque, in a research and development context closely connected to industry. Responsibilities: - Supervise and optimize processes for linguistic corpus collection, annotation, and management - Lead the design and development of foundational language models applied to Basque (speech recognition, synthesis, translation, text processing, etc.) - Contribute to the technological architecture of the Center - Coordinate internal and external teams and mentor junior staff - Identify innovation opportunities and contribute to proposals, reports, and dissemination - Establish strategic relationships with ecosystem stakeholders Requirements: - Advanced degree (Master or PhD) in Computational Linguistics, NLP, AI, Computer Engineering, Data Science or related fields - Minimum 5 years of experience in language or speech technologies - Proven experience with ASR, TTS, MT, or NLP models - Strong programming skills in Python and familiarity with frameworks such as Hugging Face, PyTorch, TensorFlow, spaCy, Kaldi, ESPnet, Fairseq - Knowledge of MLOps, Git, and data science best practices - Familiarity with open repositories and licensing Languages: - Basque: desirable, intermediate level (B2 or higher) - Spanish: fluent - English: high level (especially technical) We offer: - Participation in strategic national and international projects - Competitive salary according to experience - Interdisciplinary environment and opportunities for professional growth === JUNIOR AI AND LANGUAGE TECHNOLOGIES SPECIALIST === EUSKORPORA, the Linguistic Data Center for Basque Digital Technologies, a new association based in Donostia/San Sebastián, is seeking young professionals at the beginning of their careers to support key tasks related to the creation of linguistic resources and language technologies for the Basque language. Selected individuals will join an interdisciplinary team and participate in projects involving the collection, annotation, and analysis of linguistic data, as well as the development of open-source foundational language models (ASR, TTS, MT, NLP) oriented to Basque, in a research and development context closely connected to industry. Responsibilities: - Support the collection, cleaning and annotation of linguistic corpora (text and audio) - Assist in the training and evaluation of language and speech models - Collaborate in the documentation and maintenance of language resources - Contribute to the integration of open-source NLP tools and libraries - Assist in reports and dissemination activities - Work in coordination with technical, linguistic and project management profiles Requirements: - Degree or Master in Computational Linguistics, Computer Engineering, Data Science, or similar - Basic knowledge of NLP, language models, or speech technologies - Python programming (basic/intermediate level) - Familiarity with linguistic annotation or text processing tools - Experience with Git and frameworks like Hugging Face or spaCy is a plus Languages: - Basque: high level (B2 or higher) - Spanish: fluent - English: high level (B2 or higher) We offer: - Dynamic and innovative environment based in San Sebastián - Continuous training in cutting-edge technologies - Real opportunities for growth within the team - Competitive salary according to training and experience For further information or to apply, please contact: info(a)euskorpora.eus Best regards, EUSKORPORA [Euskorpora]<https://www.euskorpora.eus/> Euskorpora info(a)euskorpora.eus<mailto:sarregi@euskorpora.eus> +(34) 611 02 81 72 Mezu elektroniko honetan jasotzen den informazioa hartzaileen erabilera pertsonal eta konfidentzialerako da. Okerreko mezu hau jaso baduzu, mesedez, jakinarazi eta ezabatu. [https://www.euskorpora.eus/wp-content/uploads/2025/02/eco.png] Ez inprimatu mezu hau behar-beharrezkoa ez bada.

1 0

Second CfP [New Dates]: Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models (OMMM 2025)
by Piotr Przybyła 18 Jun '25

18 Jun '25

We are pleased to invite submissions for the first Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models (OMMM 2025). The workshop will be held with the RANLP 2025 conference in Varna, Bulgaria, on 11-13 September 2025. Overview The use of Large Language Models (LLMs) pervades scientific practices in multiple disciplines beyond the NLP/AI communities. Alongside benefits for productivity and discovery, widespread use often entails misuse due to misalignment of values, lack of knowledge, or, more rarely, malice. LLM misuse has the potential to cause real harm in a variety of settings. Through this workshop, we aim to gather researchers interested in identifying and mitigating inappropriate and harmful uses of LLMs. These include misunderstood usage (e.g., misrepresentation of LLMs in the scientific literature); misguided usage (e.g., deployment of LLMs without adequate training or privacy safeguards); and malicious usage (e.g., generation of misinformation and plagiarism). Sample topics are listed below, but we welcome submissions on any domain related to the scope of the workshop. Important Dates Submission deadline *[NEW]*: *15 July 2025*, at 23:59 Anywhere on Earth Notification of acceptance: 01 August 2025 Camera-ready papers due: 30 August 2025 Workshop dates: September 11, 12, or 13, 2025 Submission Guidelines Submissions will be accepted as short papers (4 pages) and as long papers (8 pages), plus additional pages for references. All submissions undergo a double-blind review, so they should not include any identifying information. Submissions should conform to the RANLP guidelines; for further information and templates, please see https://ranlp.org/ranlp2025/index.php/submissions/ We welcome submissions from diverse disciplines, including NLP and AI, psychology, HCI, and philosophy. We particularly encourage reports on negative results that provide interesting perspectives on relevant topics. In-person presenters will be prioritised when selecting submissions to be presented at the workshop, but the workshop will take place in a hybrid format. Accepted papers will be included in the workshop proceedings in the ACL Anthology. Papers should be submitted on the RANLP conference system at https://softconf.com/ranlp25/OMMM2025/ Keynote Speaker We are excited to have Dr. Stefania Druga as the keynote speaker for the inaugural OMMM workshop. Dr. Druga is a Research Scientist at Google DeepMind, where she designs novel multimodal AI applications. Topics of Interest We welcome paper submissions on all topics related to inappropriate and harmful uses of LLMs, including but not limited to: - Misunderstood use (and how to improve understanding): - Misrepresentation of LLMs (e.g., anthropomorphic language) - Attribution of consciousness - Interpretability - Overreliance on LLMs - Misguided use (and how to find alternatives): - Underperformance and inappropriate applications - Structural limitations and ethical considerations - Deployment without proper training or safeguards - Malicious use (and how to mitigate it): - Adversarial attacks, jailbreaking - Detection and watermarking of machine-generated content - Generation of misinformation or plagiarism - Bias mitigation and trust design For more information, please refer to the workshop website: https://ommm-workshop.github.io/2025/. For any questions, please contact the organisers at ommm-workshop(a)googlegroups.com. The organisers, Piotr Przybyła, Universitat Pompeu Fabra Matthew Shardlow, Manchester Metropolitan University Clara Colombatto, University of Waterloo Nanna Inie, IT University of Copenhagen

1 0

CfP: Terminology Translation Task @WMT25
by Kirill Semenov 18 Jun '25

18 Jun '25

[Apologies for cross-posting] Terminology Translation Task at WMT2025 - Call for Participation We are excited to announce the third Shared Task on Terminology Translation<https://www2.statmt.org/wmt25/terminology.html>, which would be run within the 10th Conference on Machine Translation (WMT2025) in Suzhou, China. TL;DR: - We test the sentence-level and document-level translation of the texts in finance and IT domains, given the explicit terminology. - The language pairs are: English -> {Spanish, German, Russian, Chinese}, Chinese -> English. - We evaluate the overall quality of translation, terminology success rate and consistency. Additionally, we compare the performance of systems given no terms provided, proper terminology and random terms. - The task starts on 20th June 2025 AOE, the submission deadline is 20th July 2025 AOE. - Please pre-register via Google Forms here: https://forms.gle/ZSn2pNJkQJAzHFnA6 . OVERVIEW The advances in neural MT and LLM-assisted translation of the last decade show nearly human quality in general domain translation at least for the high-resource languages. However, when it comes to specialized domains like science, finance, or legal texts, where the correct and consistent use of special terms is crucial, the task is far from being solved. The Terminology Shared Task aims to assess the extent to which machine translation models can utilize additional information regarding the translation of terminologies. Compared to two previous editions, 2021 and 2023, the new test data have more various test cases, are more consistent in domains for each translation direction, and are broader in language coverage. TASK DESCRIPTION Track №1: Sentence/Paragraph-Level Translation You will be provided with sequence of input sentences long, and small terminology dictionaries that will correspond only to the terms present in the given sentence. Language Pairs: * en-de (English → German) * en-ru (English → Russian) * en-es (English → Spanish) Domain: information technology Track №2: Document-Level Translation The setup is similar to Track №1, with two exceptions: the length of the input texts now equals the document, and the dictionaries correspond to the whole set of input texts (i.e. they are corpus-level). This makes the task close to the real-life setup (where the dictionaries exist independently from the texts), while it may complicate the implementation (since for the solutions that require storing the whole dictionary it will take more memory). Additionally, for the whole document setup, the problem of the consistent usage of terms is becoming more important. Language Pairs: en-zh-Hant (English → Traditional Chinese) zh-Hant-en (Traditional Chinese → English) Domain: finance EVALUATION Terminology Modes: You are expected to compare your system’s performance under three modes: 1. No terminology: the system is only provided with input sentences/documents. 2. Proper terminology: the system is provided with input texts (same as 1.) and dictionaries of the format {source_term: target_term}. 3. Random terminology: the system is provided with input texts and translation dictionaries of the same format as in 2. The difference is that the dictionary items are not special terms but words randomly drawn from input texts. This mode is of special interest since we want to measure to what extent the proper term translations help to improve the system performance (2.), as opposed to an arbitrary broader input that does not contain the domain-specific terminology. Metrics: 1. Overall Translation Quality: we will evaluate the general aspects of machine translation outputs such as fluency, adequacy and grammaticality. We will do that with the general MT automatic metrics such as BLEU or COMET. In addition to that, we will pay special attention to the grammaticality of the translated terms. 2. Terminology Success Rate: This metric assesses the ability of the system to accurately translate technical terms given the specialized vocabulary. This will be carried out by comparing the occurrences of the correct term translations (i.e. the ones present in the dictionary) to the output terms. The goal is to have a higher success rate that will show adherence to dictionary translations. 3. Terminology Consistency: for domains such as science or legal texts, the consistent use of an introduced term throughout the text is crucial. In other words, we want a system to not only pick up a correct term in a target language but to use it consistently once it is chosen. This will be evaluated by comparing all translations of a given source term in a text and measuring the percentage of deviations from the most consistent translation. This metric is more important for the Document-Level track, but it will be used for both tracks. IMPORTANT DATES All dates are end of Anywhere on Earth (AoE). Data snippets released: 7th May 2025 Dev data released: 22nd May 2025 Test data release, task starts: 20th June 2025 (postponed) Submission deadline: 20th July 2025 (postponed) Paper submission to WMT25: in-line with WMT25 Camera-ready submission to WMT25: in-line with WMT25 Conference in Suzhou, China: 05-09 November 2025 SUBMISSION GUIDELINES 0. Please notify us about your participation prior to submission. This is optional, but will be very helpful for us for better understanding of our workload after submission. Please do it through this Google Form: https://forms.gle/ZSn2pNJkQJAzHFnA6 1. Check your submission files with the validation script. It will be published at test date publication. 2. Write a description of your system (optional). 3. Submit your system via Google Forms. The Google form with all necessary submission details will be published at the test set date. All details on submission as well as FAQ can be found at the webpage of the shared task. ORGANIZERS * Kirill Semenov (University of Zurich), main contact: FirstNаmе [dоt] LаstNаmе {аt} uzh /dоt/ ch * Nathaniel Berger (Heidelberg University) * Pinzhen Chen (University of Edinburgh & Aveni.ai) * Xu Huang (Nanjing University) * Arturo Oncevay (JP Morgan) * Dawei Zhu (Amazon) * Vilém Zouhar (ETH Zurich) WEBSITE: https://www2.statmt.org/wmt25/terminology.html In case of query, please send an email to Kirill Semenov (see email above).

1 0

Call for papers: The First Workshop on Natural Language Processing and Language Models for Digital Humanities (LM4DH 2025) @ RANLP_2025
by Nanomi Arachchige, Isuri (Postgraduate Researcher) 18 Jun '25

18 Jun '25

Call for papers: The First Workshop on Natural Language Processing and Language Models for Digital Humanities (LM4DH_2025) @ RANLP_2025 Date: 11th- to 13th September 2025 (TBC) Venue : Varna, Bulgaria Website: https://www.clarin.eu/event/2025/clarin-workshop-ranlp-2025 Submissions Portal: https://softconf.com/ranlp25/LM4DH2025/ Digital Humanities has emerged as an interdisciplinary field of research, serving as an intersection of computer science with many other fields such as linguistics, social sciences, history, psychology, etc. With the development of Large Language Models (LLMs), state-of-the-art Natural Language Processing (NLP) tasks such as entity recognition, sentiment analysis, and text summarisation have been significantly enhanced, offering powerful tools to analyse and interpret complex historical and cultural data. These developments offer transformative capabilities for analysing and interpreting complex historical and cultural datasets, including oral histories, archival documents, and literary texts. These advancements provide powerful tools for analysing and interpreting intricate historical, cultural, and social data, enabling researchers to identify patterns, extract meaningful relationships, and generate interpretations at unprecedented scale and precision. This workshop aims to provide a common platform for researchers, practitioners, and students from diverse disciplines to collaboratively explore and apply AI-driven techniques in the Digital Humanities. Through interdisciplinary discussion, the event aims to generate creative approaches, exchange best practices, and create a community committed to furthering AI-based research on human culture and history. The focus of the workshop is on applying natural language processing techniques to digital humanities research. The topics can be anything of digital humanities interest with a natural language processing or LLM-based application. We expect contributions related (but not limited) to the following topics: * Text analysis and processing related to the humanities using computational methods * Usage of the interpretability of large language models' output for DH-related tasks * Dataset creation and curation for NLP (e.g. digitisation, datafication, and data preservation * Automatic error detection, correction, and normalisation of textual data * Generation and analysis of literary works such as poetry and novels * Analysis and detection of text genres * Emotion analysis for the humanities and literature * Modelling of information and knowledge in the Humanities, Social Sciences, and Cultural Heritage * Low-resource and historical language processing * Search for scientific and/or scholarly literature * Profiling and authorship attribution Submission & Publication All papers must represent original and unpublished work that is not currently under review. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. Submissions must follow the RANLP 2025 submission guidelines<https://ranlp.org/ranlp2025/index.php/submissions/>, using ACL-style templates (LaTeX or MS Word). Paper must be submitted using SoftConf at https://softconf.com/ranlp25/LM4DH2025/ All papers will be double-blind peer reviewed. Authors of the accepted papers will present their work in either the oral or poster session. All accepted papers will appear in the workshop proceedings that will be published in ACL Anthology. Important Dates * Paper submission deadline: 20th July 2025 * Notification of acceptance: 2nd August 2025 * Camera-ready paper: 20th August 2025 * Workshop date: 11th September 2025 Organising Committee * Isuri Anuradha, Lancaster University, UK * Francesca Frontini, CNR-ILC, Italy & CLARIN ERIC * Paul Rayson, Lancaster University, UK * Ruslan Mitkov, Lancaster University, UK * Deshan Sumanathilake, Swansea University, UK This workshop has been organised with the generous support and coordination of CLARIN-EU. Gmail: dhranlp2(a)gmail.com<mailto:%20dhranlp2@gmail.com>

1 0

FIRE 2025- Call for Participation in Tracks - 12 Tracks -17th meeting of the Forum for Information Retrieval Evaluation -Dec. Varanasi
by Thomas Mandl 17 Jun '25

17 Jun '25

*Call for Participation in Tracks * *FIRE 2025: 17th meeting of the Forum for Information Retrieval Evaluation* Indian Institute of Technology (BHU) Varanasi 17th - 20th December Website: fire.irsi.org.in <http://fire.irsi.org.in/> *Call for Participation in Tracks* FIRE 2025 offers the following exciting tracks this year: * Cross-Lingual Mathematical Information Retrieval (CLMIR) <https://clmir2025.github.io/> * Code-Mixed Information Retrieval from Social Media Data (CMIR) <https://cmir-iitbhu.github.io/cmir/index.html> * Hate Speech and Offensive Content Identification in Memes in Bengali, Hindi, Gujarati and Bodo (HASOC-meme) <https://hasocfire.github.io/hasoc/2025/> * Information Retrieval in Software Engineering (IRSE) <https://sites.google.com/view/irse-2025/home> * Misinformation Detection and Prompt Recovery (PROMID) <https://promid.github.io/index.html> * Multilingual Story Illustration: Bridging Cultures through AI Artistry (MUSIA) <https://cse-iitbhu.github.io/MUSIA/index.html> * Offensive Language Identification in Dravidian Languages (DravidianCodeMix) <https://dravidian-codemix.github.io/2025/dataset.html> * Opinion Extraction and Question Answering from CryptoCurrency-Related Tweets and Reddit posts (CryptOQA) <https://sites.google.com/view/cryptoqa-2025/> * Research Highlight Generation from Scientific Papers (SciHigh) <https://sites.google.com/jadavpuruniversity.in/scihigh2025/home> * Spoken-Query Cross-Lingual Information Retrieval for the Indic Languages (SqCLIR) <https://sites.google.com/view/sqclir-2025> * Varanasi Tourism in Question Answer System (VATIKA) <https://sites.google.com/view/vatika-2025/> * Word-Level Identification of Languages in Dravidian Languages (WILD) <https://www.codabench.org/competitions/7902/> Research groups are invited to participate in the experiments. Please register directly with the organizers. FIRE 2025 is the 17th edition of the annual meeting of Forum for Information Retrieval Evaluation (fire.irsi.org.in). Since its inception in 2008, FIRE had a strong focus on shared tasks similar to those offered at Evaluation forums like TREC, CLEF, and NTCIR. The shared tasks focus on solving specific problems in the area information access and, more importantly help in generating evaluation datasets for the research community. Visit fire.irsi.org.in <http://fire.irsi.org.in>

1 0

CfP DHOW: Workshop Diffusion of Harmful Content on Online Web Workshop - @ ACMMM 2025
by Thomas Mandl 17 Jun '25

17 Jun '25

The 2st Workshop on DHOW: Diffusion of Harmful Content on Online Web Workshop The workshop will be conducted in a *hybrid* format to ensure maximum participation, accommodating attendees both *online* and in person. Submission deadline: *July 11 2025 AOE* *Workshop site*: https://dhow-workshop.github.io/2025/ *Co-located with ACMMM 2025* https://acmmm2025.org/ <https://lrec-coling-2024.org/> Dublin, Ireland, 27-31 October 2024 *Important Dates* Submission deadline: extended to *July 11, 2025* Notification of acceptance: August 01, 2025 Camera-ready papers due: August 11, 2025 Workshop date: October 27/28, 2025 *Workshop Description* With the advancement of digital technologies and gadgets, online content is easily accessible. At the same time, harmful content also gets spread. There are different harmful content available on different platforms in multiple languages. The topic of harmful content is broad and covers multiple research directions. But from the user’s aspect, they are affected by them all. Often, it is studied individually, like misinformation and hate speech. Research has been done on one platform, monolingual, on a particular issue. It leads to harmful content spreaders switching platforms and languages to reach the user base. Harmful is not limited to social media but also news media. Spreader shares harmful content in posts, news articles, comments, and hyperlinks. So, there is a need to study the harmful content by combining cross-platform, language, multimodal data and topics. We will bring the research on harmful content under one umbrella so that research on different topics (hate speech, misinformation, disinformation, self-harm, offensive content, etc.) can bring some novel methods and recommendations for users, leveraging text analysis with image, audio, and video recognition to detect harmful content in diverse formats. The workshop will cover the ongoing issue of war or elections in 2025. We believe this workshop will provide a unique opportunity for researchers and practitioners to exchange ideas, share latest developments, and collaborate on addressing the challenges associated with harmful contents spread across the Web. We expect that the workshop will generate insights and discussions that will help advance the field of societal artificial intelligence (AI) for the development of safer internet. In addition to attracting high quality research contributions to the workshop, one of the aims of the workshop is to mobilise the researchers working on the related areas to form a community. *Submissions Topics* •Studying different types of harmful content •Computational fact-checking & Misinformation Detection •Role of Generative AI in Mitigating Harmful Content •Harassment, Bullying, and Hate Speech Detection •Explainable AI for Harmful Content Analysis •Multimodal and Multilingual Harmful Content Detection such as fake news, spam, and troll detection. •Deepfake and Synthetic Media •Ethical & Societal Implications of AI in Content Moderation •Both Qualitative and Quantitative study on harmful content •Psychological effects of harmful content like mental health •Approaches for data collection or data annotation using multimodal large models on harmful content •User study on the effects of harmful content on human beings *Submissions* - Submission Instructions: https://dhow-workshop.github.io/2025/#call <https://dhow-workshop.github.io/2025/#call> - Submission Link: https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW <https://openreview.net/group?id=acmmm.org/ACMMM/2025/Workshop/DHOW> ***Workshop organizers* •Thomas Mandl (University of Hildesheim, Germany) •Haiming Liu (University of Southampton, United Kingdom) •Gautam Kishore Shahi(University of Duisburg-Essen, Germany) •Amit Kumar Jaiswal (University of Surrey, United Kingdom ) •Durgesh Nandini (University of Bayreuth, Germany) DHOW 2025

1 0

Call for papers : Ethical LLMs @ RANLP2025
by Dola Mullage, Damith (Postgraduate Researcher) 17 Jun '25

17 Jun '25

Ethical LLMs 2025: The first Workshop on Ethical Concerns in Training, Evaluating and Deploying Large Language Models<https://sites.google.com/view/ethical-llms-2025> @ RANLP2025<https://ranlp.org/ranlp2025/> Call for papers: Scope Large Language Models (LLMs) represent a transformative leap in Artificial Intelligence (AI), delivering remarkable language-processing capabilities that are reshaping how we interact with technology in our daily lives. With their ability to perform tasks such as summarisation, translation, classification, and text generation, LLMs have demonstrated unparalleled versatility and power. Drawing from vast and diverse knowledge bases, these models hold the potential to revolutionise a wide range of fields, including education, media, law, psychology, and beyond. From assisting educators in creating personalised learning experiences to enabling legal professionals to draft documents or supporting mental health practitioners with preliminary assessments, the applications of LLMs are both expansive and profound. However, alongside their impressive strengths, LLMs also face significant limitations that raise critical ethical questions. Unlike humans, these models lack essential qualities such as emotional intelligence, contextual empathy, and nuanced ethical reasoning. While they can generate coherent and contextually relevant responses, they do not possess the ability to fully understand the emotional or moral implications of their outputs. This gap becomes particularly concerning when LLMs are deployed in sensitive domains where human values, cultural nuances, and ethical considerations are paramount. For example, biases embedded in training data can lead to unfair or discriminatory outcomes, while the absence of ethical reasoning may result in outputs that inadvertently harm individuals or communities. These limitations highlight the urgent need for robust research in Natural Language Processing (NLP) to address the ethical dimensions of LLMs. Advancements in NLP research are crucial for developing methods to detect and mitigate biases, enhance transparency in model decision-making, and incorporate ethical frameworks that align with human values. By prioritising ethics in NLP research, we can better understand the societal implications of LLMs and ensure their development and deployment are guided by principles of fairness, accountability, and respect for human dignity. This workshop will dive into these pressing issues, fostering a collaborative effort to shape the future of LLMs as tools that not only excel in technical performance but also uphold the highest ethical standards. Submission Guidelines We follow the RANLP 2025 standards for submission format and guidelines. EthicalLLMs 2025 invites the submission of long papers, up to eight pages in length, and short papers, up to six pages in length. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references) papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix. To prepare your submission, please make sure to use the RANLP 2025 style files available here: * Latex<https://ranlp.org/ranlp2025/wp-content/uploads/2025/05/ranlp2025-LaTeX.zip> * Word<https://ranlp.org/ranlp2025/wp-content/uploads/2025/05/ranlp2025-word.docx> Papers should be submitted through Softconf/START using the following link: https://softconf.com/ranlp25/EthicalLLMs2025/ Topics of interest The workshop invites submissions on a broad range of topics related to the ethical development and evaluation of LLMs, including but not limited to the following. 1. Bias Detection and Mitigation in LLMs Research focused on identifying, measuring, and reducing social, cultural, and algorithmic biases in large language models. 2. Ethical Frameworks for LLM Deployment Approaches to integrating ethical principles—such as fairness, accountability, and transparency—into the development and use of LLMs. 3. LLMs in Sensitive Domains: Risks and Safeguards Case studies or methodologies for deploying LLMs in high-stakes fields such as healthcare, law, and education, with an emphasis on ethical implications. 4. Explainability and Transparency in LLM Decision-Making Techniques and tools for improving the interpretability of LLM outputs and understanding model reasoning. 5. Cultural and Contextual Understanding in NLP Systems Strategies for enhancing LLMs’ sensitivity to cultural, linguistic, and social nuances in global and multilingual contexts. 6. Human-in-the-Loop Approaches for Ethical Oversight Collaborative models that involve human expertise in guiding, correcting, or auditing LLM behaviour to ensure responsible use. 7. Mental Health and Emotional AI: Limits of LLM Empathy Discussions on the role of LLMs in mental health support, highlighting the boundary between assistive technology and the need for human empathy. Organisers Damith Premasiri – Lancaster University, UK Tharindu Ranasinghe – Lancaster University, UK Hansi Hettiarachchi – Lancaster University, UK Contact If you have any questions regarding the workshop, please contact Damith: d.dolamullage(a)lancaster.ac.uk

1 0

Survey on Queries in Syntactically Annotated Corpora
by Niklas Deworetzki 17 Jun '25

17 Jun '25

Dear all, We are currently doing a project aiming to make querying in syntactically annotated corpora easier and more accessible. For this purpose, we want to know what researchers are actually searching for. If you have a minute of your time, please feel free to fill out this form. https://forms.office.com/e/a8DgETSabB Feel free to reach out to ekavol(a)chalmers.se or nikdew(a)chalmers.se if you have any further questions. Best regards Niklas Deworetzki & Katja Voloshina PhD Students Department of Computer Science and Engineering Chalmers University of Technology | University of Gothenburg SE-412 96 Göteborg, Sweden www.gu.se<http://www.gu.se/> www.chalmers.se<http://www.chalmers.se/> [cid:a8138665-78e4-4530-80d5-cf9cbf2bd3c2]

1 0

CLEF 2025 – Registration Open
by JORGE AMANDO CARRILLO DE ALBORNOZ CUADRADO 17 Jun '25

17 Jun '25

CLEF 2025 – Registration Open Conference and Labs of the Evaluation Forum We are pleased to announce CLEF 2025, taking place 9–12 September 2025 in Madrid, Spain at UNED. This peer‑reviewed conference and associated labs foster research in multilingual, multimodal, and cross‑language information access https://clef2025.clef-initiative.eu/. Register now – Early‑bird registration is open! Standard registration opened earlier this year, and early-bird rates are currently available . Why attend? * Present and discuss original research at main conference. * Engage in innovative labs and challenges, including LifeCLEF, ImageCLEF, EXIST, eRisk, CheckThat!, and more https://clef2025.clef-initiative.eu/index.php?page=Pages/labs.html. * Benefit from rich networking with academic and industry experts in IR, NLP, multimedia retrieval, and evaluation sciences. For detailed conference and lab registration, registration deadlines, and pricing, please visit the official site: https://clef2025.clef-initiative.eu/index.php?page=Pages/registrationConfer… Important Dates * Early‑bird registration ongoing * Registration closes: 31 August 2025 * Conference & labs: 9–12 September 2025 — Madrid, Spain We look forward to welcoming participants from across the global community — see you this September in Madrid at CLEF 2025! Jorge Carrillo-de-Albornoz On behalf of the CLEF 2025 Organising Committee AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Oficina de Protección de datos<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://sede.uned.es/> de la Universidad. Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.

1 0

Second call for papers: Special issue on Machine Translation for Low-Resource Languages@Language Resources and Evaluation Journal
by Atul K. Ojha 16 Jun '25

16 Jun '25

Apologies for cross-posting. --------------------------------------------------------------------------- *CALL FOR PAPERS: Language Resources and Evaluation Journal- Special Issue on Machine Translation for Low-Resource Languages* https://link.springer.com/collections/gbdgacbgbg *Guest Editors:* - Atul Kr. Ojha (Insight Research Ireland Centre for Data Analytics, DSI, University of Galway, Ireland) - Chao-Hong Liu (Industrial Technology Research Institute, Potamu Research Ltd.) - Ekaterina Vylomova (University of Melbourne, Australia) - Flammie Pirinen (UiT The Arctic University of Norway, Tromsø) - Jonathan Washington (Swarthmore College, USA) - Nathaniel Oco (De La Salle University, Philippines) - Xiaobing Zhao (Minzu University of China) Machine translation (MT) technologies have been improved significantly in the last decade using neural MT (NMT) approaches. However, most of these methods rely on the availability of large parallel data for training the MT systems, resources which are not available for the majority of language pairs. Hence, current technologies often fall short in their ability to be applied to low-resource languages. Developing MT technologies using relatively small corpora still presents a major challenge for the MT community. In addition, many methods for developing MT systems still rely on several natural language processing (NLP) tools to pre-process texts in source languages and post-process MT outputs in target languages. The performance of these tools often has a great impact on the quality of the resulting translation. The availability of MT technologies and NLP tools can facilitate equal access to information for the speakers of a language and determine on which side of the digital divide they will end up. The lack of these technologies for many of the world's languages provides opportunities both for the field to grow and for making tools available for speakers of low-resource languages. In the past few years, several workshops and evaluations have been organized to promote research on low-resource languages. NIST has been conducting Low Resource Human Language Technology evaluations (LoReHLT) annually from 2016 to 2019. In LoReHLT evaluations, there is no training data in the evaluation language. Participants receive training data in related languages but need to bootstrap systems in the surprise evaluation language at the start of the evaluation. Methods for this include pivoting approaches and taking advantage of linguistic universals. The evaluations are supported by DARPA's Low Resource Languages for Emergent Incidents (LORELEI) program, which seeks to advance technologies that are less dependent on large data resources and that can be quickly pivoted to new languages within a very short amount of time so that information from any language can be extracted in a timely manner to provide situation awareness to emergent incidents. There are also the Workshop on Technologies for MT of Low-Resource Languages (LoResMT), Special Interest Group on Under-resourced Languages (SIGUL), Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI), the Workshop on Deep Learning Approaches for Low-Resource Natural Language Processing (DeepLo). AfricaNLP, TurkLang, Conference on Machine Translation (WMT), and International Conference on Spoken Language Translation (IWSLT) workshop, which provide a venue for sharing research and working on research and development in this field. This topical collection solicits original research papers on MT systems/methods and related NLP tools for low-resource languages in general. LoReHLT, LORELEI, LoResMT, SIGUL, EURALI, DeepLo, WMT, and IWSLT participants are very welcome to submit their work to the special issue. Summary papers on MT research for specific low-resource languages, as well as extended versions (>40% difference) of published papers from relevant conferences/workshops, are also welcome. Topics of the special issue include, but are not limited to: * Research and review papers on MT systems/methods for low-resource languages * Research and review papers on pre-processing and/or post-processing NLP tools for MT * Word tokenizers/de-tokenizers for low-resource languages * Word/morpheme segmenters for low-resource languages * Use of morphological analyzers and/or morpheme segmenters in MT * Multilingual/cross-lingual NLP tools for MT * Review of available corpora of low-resource languages for MT * Pivot MT for low-resource languages * Zero-shot MT for low-resource languages * Fast building of MT systems for low-resource languages * Re-usability of existing MT systems and/or NLP tools for low-resource languages * Machine translation for language preservation * Techniques that work across many languages and modalities * Techniques that are less dependent on large data resources * Use of language-universal resources * Bootstrap-trained resources for the short development cycle * Entity, relation- and event-extraction * Sentiment detection in MT * MT Summarisation * Processing diverse languages, genres (news, social media, etc.) and modalities (text, speech, video, etc.) * Speech Translation for low-resource languages * Multimodal MT for low-resource languages * MT models using LLMs for low-resource languages * Generative AI models for low-resource languages * Evaluation metrics and datasets for low-resource languages For further information on this initiative, please refer to https://link.springer.com/collections/gbdgacbgbg *IMPORTANT DATES* *August 26, 2025: Paper submission deadlineDecember 05, 2025: Revised papers dueMarch 2026: Publication* * SUBMISSION GUIDELINES* Authors should follow the "Instructions for Authors <https://link.springer.com/journal/10579/submission-guidelines> ( https://link.springer.com/journal/10579/submission-guidelines or Overleaf <https://link.springer.com/journal/10579/updates/17234296>)" on the LRE journal website <https://link.springer.com/journal/10579>. Thanks,

1 1

June 2025 Newsletter - LDC
by Penn LDC 16 Jun '25

16 Jun '25

In this newsletter: LDC data and commercial technology development New publications: Chinese Sentence Pattern Structure Treebank<https://catalog.ldc.upenn.edu/LDC2025T06> IWSLT 2022-2023 Shared Task Training, Development and Test Set<https://catalog.ldc.upenn.edu/LDC2025S05> KAIROS Schema Learning Complex Event Annotation<https://catalog.ldc.upenn.edu/LDC2025T07> ________________________________ LDC data and commercial technology development For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for further information. ________________________________ New publications: Chinese Sentence Pattern Structure Treebank<https://catalog.ldc.upenn.edu/LDC2025T06> was developed at Beijing Normal University<https://english.bnu.edu.cn/> and Peking University<https://english.pku.edu.cn/>. It contains 5,016 sentences and 119,627 tokens syntactically annotated following the concept of sentence constituent analysis which emphasizes sentence pattern structure. The source data consists of 27 chapters extracted from modern Mandarin and ancient Chinese works. There are three annotation layers: lexical sense and structural mode for dynamic words; syntactic structure for clauses; and inter-clause relation within complex sentence and sentence clusters. These structures can be visualized using the Jbw-viewer tool<https://github.com/bnucip/jbwviewer> which is included in the release. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * IWSLT 2022 - 2023 Shared Task Training, Development and Test Set<https://catalog.ldc.upenn.edu/LDC2025S05> was developed by LDC and contains 210 hours of Tunisian<https://catalog.ldc.upenn.edu/LDC2025S05> Arabic conversational telephone speech, transcripts, English translations, speaker metadata, and documentation. This material constitutes the training, development, and test data used in the International Conference on Spoken Language Translation (IWSLT) Dialectal Speech Translation task (2022)<https://iwslt.org/2022/dialect> and the Dialectal and Low-resource track (2023)<https://iwslt.org/2023/low-resource>. The telephone speech was collected by LDC in 2016-2017 from native speakers of Tunisian Arabic in Tunis. Speakers were recruited to make telephone calls to people in their social networks from a variety of noise conditions and handsets. Transcripts are orthographic following Buckwalter<https://catalog.ldc.upenn.edu/LDC2004L02> transliteration and cover 175 hours of the collected speech. IPA transcripts were added to a subset of the data. All transcribed segments were translated into English. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * KAIROS Schema Learning Complex Event Annotation<https://catalog.ldc.upenn.edu/LDC2025T07> was developed by LDC to support the DARPA KAIROS program. It contains English and Spanish text, audio, video, and image data labeled for 93 real-world complex events with event, relation, and argument annotations linking to document provenance. Source data was collected from the web; 3431 root web pages were collected and processed, yielding 1919 text data files, 24019 image files, 1472 video files, and 16 audio files. The DARPA KAIROS (Knowledge-directed Artificial Intelligence Reasoning Over Schemas) program aimed to build technology capable of understanding and reasoning about complex real-world events in order to provide actionable insights to end users. KAIROS systems utilized formal event representations in the form of schema libraries that specified the steps, preconditions, and constraints for an open set of complex events; schemas were then used in combination with event extraction to characterize and make predictions about real-world events in a large, multilingual, multimedia corpus. 2025 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

CLIN35 -- Deadline extension
by Vincent Vandeghinste 16 Jun '25

16 Jun '25

Dear CLIN enthusiasts We are extending the submission deadline for CLIN abstracts by one week. The new, final deadline is June 20th. Below you can find the original call for abstracts with a modified date. Website: https://clin35.ccl.kuleuven.be/ We invite submissions for CLIN35, the 35th edition of the Computational Linguistics in the Netherlands (CLIN) conference, which will take place in Leuven on September 12th, 2025. Abstracts describing theoretical or applied research in any area of computational linguistics and natural language processing are welcome. We especially encourage submissions related to the Dutch language, but contributions on other languages and multilingual approaches are equally welcome. Abstracts must be written in English and should not exceed 500 words. Submissions should include: * Name and affiliation of each author * Contact details * Presentation title and short abstract (max. 500 words) * Keywords * Your presentation format preference (We will do our best to accommodate your preference but may need to make changes to provide a well-balanced program) Abstracts must be submitted via the form on the website<https://clin35.ccl.kuleuven.be/call-for-abstracts> by Friday, 20th of June 2025. Notifications of acceptance will be sent out by Friday, 4th of July 2025. Accepted abstracts will be presented at the conference as oral or poster presentations. Authors with accepted abstracts will also have the opportunity to submit a full paper after the conference for publication in the CLIN Journal<https://www.clinjournal.org/clinj/>. Please share this call with your interested colleagues and network! For any questions you can reach us at this email address (clin35(a)kuleuven.be<mailto:clin35@kuleuven.be>). We look forward to your submissions and to welcoming you to CLIN35! CLIN35 local organizers ________________________________ Denk je aan het milieu? Print alleen als het nodig is. Aan dit bericht kunnen geen rechten worden ontleend. Het bericht is alleen bestemd voor de geadresseerde. Indien het bericht niet voor u is bestemd, verzoeken wij u dit aan ons te melden en het bericht te verwijderen. This message shall not constitute any obligations. This message is intended solely for the addressee. If you have received this message in error, please inform us and delete the message. ________________________________

1 0

EVALITA 2026 - Second Call for Tasks (NEW DEADLINES and TIMELINE)
by Marco Antonio Stranisci 13 Jun '25

13 Jun '25

****************************************************** ********* EVALITA 2026: Call for tasks ********* ******* NEW DEADLINES and TIMELINE ****** ****************************************************** EVALITA 2026 is an initiative of AILC (Associazione Italiana di Linguistica Computazionale, AILC https://www.ai-lc.it/). As in the previous editions (https://www.evalita.it/), EVALITA 2026 will be organized along a few selected tasks, which provide participants with opportunities to discuss and explore both emerging and traditional areas of Natural Language Processing and Speech. The participation is encouraged for teams working both in academic institutions and industrial organizations. TASK PROPOSAL SUBMISSION Task proposals should be no longer than 4 pages and should include: - task title and acronym; - names and affiliation of the organizers (minimum 2 organizers); - brief task description, including motivations and state of the art; - explanation of the international relevance of the task; - description and examples of the data, including information about their availability, development stage, and issues concerning privacy and data sensitivity. The examples are mandatory because they are intended to give potential participants an idea of what the task data will look like, how it’ll be formatted, etc. - expected number of participants and attendees; - names and contact information of the organizers. We also accept the re-annotation/expansion of datasets from previous years and previous challenges with new annotation levels, and texts from publicly available corpora. However, test annotations must be new and unpublished, as participants must not have access to the test data annotations until the end of EVALITA campaign. For new tasks, organizers must specify in the proposal why it would attract a reasonable number of participants, and why it is needed. For re-runs, organizers must describe the element of novelty from previous challenges. In submitting your proposal, please bear in mind that we strongly encourage: - tasks that pose non-trivial challenges and stimulate the creation of innovative systems (i.e., that integrate linguistic insights or external knowledge sources), rather than being easily addressed by off-the-shelf LLM prompting techniques; - tasks focused on multimodality, e.g., considering both textual and visual or any other modality; - tasks characterized by different levels of complexity, e.g., with a straightforward main subtask and one or more sophisticated additional subtasks; - to consider providing competitive baselines (e.g., small-scale LLMs in zero-shot setups), which participants are expected to improve upon, in order to encourage the design of advanced solutions; - application-oriented tasks, that is, tasks that have a clearly defined end-user application showcasing; - multilingual tasks, i.e. with data both in Italian and in other languages; - industrial tasks, i.e. tasks with real data provided by companies. The organizers of the accepted tasks should take care of planning, according to the scheduled deadlines (see below): - the development and distribution of datasets needed for the contest, i.e. data for training and development, and data for testing; the scorer to be used to evaluate the submitted systems should be included in the release of development data; - the development of task guidelines, where all the instructions for the participation are made clear, together with a detailed description of data and evaluation metrics applied for the evaluation of the participant's results; - the collection of participants' results; - the evaluation of participants' results according to standard metrics and baseline(s); - the solicitation of participation and submissions; - the reviewing process of the papers describing the participants' approach and results (according to the template to be made available by the EVALITA 2026 chairs); - the production of a paper describing the task (according to the template to be made available by the EVALITA 2026 chairs). *** Email your proposal in PDF format to evalitacampaign(a)gmail.com with "EVALITA 2026 TASK Proposal" as the subject line by the submission deadline: July 28th 2025. *** Please feel free to contact the EVALITA 2026 chairs at evalitacampaign(a)gmail.com in case of any questions or suggestions. Deadlines of the task proposal: - July 21th 2025 July 28th 2025: submission of task proposals - July 31th 2025 August 7th 2025: notification of task proposal acceptance Timelines of EVALITA 2026: - 22nd September 2025: development data available to participants - 3 - 17th November 2025: evaluation windows - 28th November 2025: assessments returned to participants - 15th December 2025: final reports (from participants) due to task organizers - 22nd December 2025: final reports (from task organizers) due to EVALITA chairs - 19th January 2025: review deadline - 2nd February 2026: camera-ready version deadline - 26 - 27th February 2026: final workshop in Bari EVALITA 2026 CHAIRS Francesco Cutugno (Università di Napoli) Alessio Miaschi (Istituto di Lingustica Computazionale “A. Zampolli” - CNR) Alessio Palmero Aprosio (Università di Trento) Giulia Rambelli (Università di Bologna) Lucia Siciliani (Università di Bari) Marco Antonio Stranisci (Università di Torino) FURTHER INFORMATION Website: https://www.evalita.it/campaigns/evalita-2026/call-for-tasks/ Mail: evalitacampaign(a)gmail.com Marco, UNITO <https://www.unito.it/persone/mstranis> and aequa-tech <https://aequa-tech.com/>

1 0

Lancaster free live streamed sessions on CL methodology
by Brezina, Vaclav 13 Jun '25

13 Jun '25

Dear all, We are pleased to invite you to join a series of free, livestreamed public lectures during the week of 16 June 2025, as part of the International Lancaster Summer Schools in Corpus Linguistics 2025<https://www.lancaster.ac.uk/corpussummerschools/>. Hosted by the ESRC Centre for Corpus Approaches to Social Science (CASS) at Lancaster University, a global leader in corpus approaches to language. Lancaster Linguistics is currently ranked 3rd in the world (QS World University Rankings 2025). Please see the links below for live streamed sessions and join us for this exciting opportunity! All times are given as UK times. Best, Vaclav Professor Vaclav Brezina Professor in Corpus Linguistics Co-Director of ESRC Centre for Corpus Approaches to Social Science Lancaster University Lancaster, LA1 4YD Office: County South, room C05 T: +44 (0)1524 510828 @vaclavbrezina [cid:image001.jpg@01DBDC37.90D87B20]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…> Monday 16 June 1-2pm UK time Frequency and concordancing: https://teams.microsoft.com/l/meetup-join/19%3ameeting_NzYxODdkNjItODNmMS00… Tue 17 June 10.30-11.30am UK time Building your own corpus: https://teams.microsoft.com/l/meetup-join/19%3ameeting_YjJiYTliOGMtZmM3YS00… 1.30-2.30pm UK time Introduction to corpus statistics: https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZDZkYjk3N2YtNTVlNi00… 3-4pm UK time Validity, AI and language testing: https://teams.microsoft.com/l/meetup-join/19%3ameeting_YWVjMDU5ZjEtYWZlOS00… Wed 18 June 10.30-11.30 am UK time Statistics & data visualisation: https://teams.microsoft.com/l/meetup-join/19%3ameeting_MTVjZTUwNjEtZTg0OC00… Thurs 19 June 10.30-11.30am UK time Analysing & visualising collocations: https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZDkyNTJlNDktMzU3MC00… 12.35 - 12.50pm UK time Official launch of #LancsBox X v. 5.5 - an innovative software tool: https://teams.microsoft.com/l/meetup-join/19%3ameeting_MmIxMzU2OTUtYjU4Ni00…

1 0

Fully funded PhD Student and/or Postdoc Position in the field of NLP/ML, UKP Lab, TU Darmstadt
by Niemann, Elisabeth 12 Jun '25

12 Jun '25

The UKP Lab at the Department of Computer Science, Technical University Darmstadt, Germany, is looking for *** two fully funded 𝗣𝗵𝗗 𝗦𝘁𝘂𝗱𝗲𝗻𝘁𝘀 𝗮𝗻𝗱/𝗼𝗿 𝗣𝗼𝘀𝘁𝗱𝗼𝗰𝘀 *** for an exciting project in machine-generated text detection. This is a unique opportunity to join the UKP Lab on the intersection of AI Safety, Natural Language Processing and Machine Learning. If you're excited about shaping the future of Large Language Models, AI Agents, human-AI interaction, building novel prototypes, and publishing at top-tier venues of NLP, ML and AI, we’d love to hear from you. 🔗 More information: https://www.informatik.tu-darmstadt.de/ukp/ukp_home/jobs_ukp/2025_phd_ukp.e… 📩 Apply here: https://careers.ukp.informatik.tu-darmstadt.de/ukprecruitment 📅 Application deadline: June 29th, 2025 -------------------------------------------------------------------- Prof. Dr. Iryna Gurevych UKP Lab Technical University Darmstadt, Germany http://www.ukp.tu-darmstadt.de/

1 0

Third call for papers Sixth Workshop on Resources for African Indigenous Language (RAIL)
by Menno Van Zaanen 12 Jun '25

12 Jun '25

Third call for papers Sixth Workshop on Resources for African Indigenous Language (RAIL) Co-located with DHASA 2025 https://sadilar.org/rail-2025/ RAIL Workshop date: 10 November 2025 DHASA Conference dates: 10-14 November 2025 Venue: CSIR International Convention Centre. The sixth RAIL workshop website: https://sadilar.org/rail-2025/ DHASA website: https://digitalhumanities.org.za/ The sixth Resources for African Indigenous Languages (RAIL) workshop will be co-located with the Digital Humanities Association of Southern Africa (DHASA) 2025 conference at the CSIR International Convention Centre in Pretoria, South Africa, on 10 November 2025. The RAIL workshop is an interdisciplinary platform for researchers working on African indigenous languages resources such as natural languages processing (NLP) tools, Human Language Technologies (HLT), data collections, and annotations. This workshop aims to foster a scientific community of practice that focuses on computational linguistic tools and data that are designed for or applied to the indigenous languages of Africa. Many African languages are under-resourced while only a few are considered to be somewhat better resourced. These languages often share interesting properties such as writing systems, making them different from most high-resourced languages. From a computational perspective, these languages lack enough corpora to undertake high level development of NLP and HLT tools, which in turn impedes the development of African languages in these areas. During previous workshops, it was noted that the problems and solutions presented were not only applicable to African languages but were also relevant to many other low-resource languages across the world. Because these languages share similar challenges, this workshop provides researchers with opportunities to work collaboratively on issues of language resource development and learn from each other. The RAIL workshop has several aims. First, the workshop brings together researchers who work on African indigenous languages, forming a community of practice for people working on indigenous languages. Second, the workshop aims to reveal currently unknown or unpublished existing resources (corpora, NLP tools, and applications), resulting in a better overview of the current state-of-the-art, and also allows for discussions on novel, desired resources for future research in this area. Third, it enhances sharing of knowledge on the development of low-resource languages. Finally, it enables discussions on how to improve the quality as well as availability of the resources. The workshop has “Language resources in the age of large language models” as its theme, but submissions on any topic related to properties of African indigenous languages (including related non- African languages) may be accepted. Suggested topics include (but are not limited to) the following: * Digital representations of linguistic structures * Descriptions of corpora or other data sets of African indigenous languages * Building resources for (under-resourced) African indigenous languages * Developing and using African indigenous languages in the digital age * Effectiveness of digital technologies for the development of African indigenous languages * Revealing unknown or unpublished existing resources for African indigenous languages * Developing desired resources for African indigenous languages * Improving quality, availability and accessibility of African indigenous language resources Submission requirements: We invite papers on original, unpublished work related to the topics of the workshop. Submissions, presenting completed work, may consist of up to eight (8) pages of content plus additional pages of references. The final camera-ready version of accepted long papers are allowed one additional page of content (up to 9 pages) so that reviewers’ feedback can be incorporated. Papers should be formatted according to the DHASA style sheet which is provided on the Journal of the Digital Humanities Association of Southern Africa website (https://upjournals.up.ac.za/index.php/dhasa/about). Reviewing is double-blind, so make sure to anonymise your submission (e.g., do not provide author names, affiliations, project names, etc.) Limit the amount of self citations (anonymised citations should not be used). The RAIL workshop follows the DHASA submission requirements. Please submit papers in PDF format (the submission link will be available soon). Accepted papers will be published in proceedings linked to the DHASA conference. Important dates: Submission deadline: 14 July 2025 Date of notification: 16 September 2025 Camera ready copy deadline: 24 October 2025 Workshop: 10 November 2025 DHASA conference: 10 November 2025-14 November 2025 Organising Committee Rooweither Mabuya, South African Centre for Digital Language Resources (SADiLaR), South Africa Muzi Matfunjwa, South African Centre for Digital Language Resources (SADiLaR), South Africa Mmasibidi Setaka, South African Centre for Digital Language Resources (SADiLaR), South Africa Menno van Zaanen, South African Centre for Digital Language Resources (SADiLaR), South Africa -- Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za Professor in Digital Humanities South African Centre for Digital Language Resources https://www.sadilar.org ________________________________ NWU PRIVACY STATEMENT: http://www.nwu.ac.za/it/gov-man/disclaimer.html DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system. ________________________________

1 0

Third call for papers DHASA Conference 2025
by Menno Van Zaanen 12 Jun '25

12 Jun '25

Third call for papers DHASA Conference 2025 https://dh2025.digitalhumanities.org.za Theme: The role of humanities in digital humanities and artificial intelligence The Digital Humanities Association of Southern Africa (DHASA) is pleased to announce its fifth conference, focusing on the theme The role of humanities in digital humanities and artificial intelligence. In a region where the field of Digital Humanities is still relatively underdeveloped, this conference aims to address this gap and foster growth and collaboration in the field. The conference offers an opportunity for researchers interested in showcasing their work in the broad field of Digital Humanities to come together. By doing so, the conference provides a comprehensive overview of the current state-of- the-art in Digital Humanities, particularly within the Southern Africa region. As such, we welcome submissions related to Digital Humanities research conducted by individuals from Southern Africa or research focused on the geographical area of Southern Africa in the broad sense. Furthermore, the conference serves as a platform for information sharing and networking among researchers passionate about Digital Humanities. By bringing together experts working on Digital Humanities in Southern Africa or with a focus on Southern Africa, we aim to promote collaboration and facilitate further research in this dynamic field. In addition to the main conference, affiliated workshops and tutorials will be organised, providing researchers with valuable insights into novel technologies and tools. These supplementary events are designed for researchers interested in specific aspects of Digital Humanities or seeking practical information to enter or advance their knowledge in the field. The DHASA conference welcomes interdisciplinary contributions from researchers in various domains of Digital Humanities, including, but not limited to, language, literature, visual art, performance and theatre studies, media studies, music, history, sociology, psychology, language technologies, library studies, philosophy, methodologies, software and computation, AI, and more. Our goal is to cultivate an inclusive scientific community of practice within Digital Humanities. Suggested topics include the following: * The role of AI in digital humanities, the role of Digital Humanities in shaping AI, and the broader role of the humanities in both AI and DH projects; * Digital archives and the preservation of marginalised voices; * Intersectionality and the digital humanities: exploring the intersections of race, gender, sexuality, culture, and class in digital research and activism; * Activism and social change through digital media: how digital humanities tools and methodologies can be used to promote inclusion; * Engaging marginalised communities in the creation and use of digital tools, resources, and AI; * Exploring the role of digital humanities in decolonising knowledge and promoting indigenous perspectives; * The ethics of data collection and analysis in digital humanities and AI research; * The role of digital humanities and AI in promoting inclusive and equitable pedagogy; * Digital humanities and inclusion in the context of African and global perspectives and international collaborations; * Critical approaches to digital humanities and inclusion: examining the limitations and possibilities of digital tools and methodologies in promoting inclusion; and * Collaborative digital humanities projects with non-profit organisations, community groups, and cultural institutions; * Development of digital and AI tools for supporting digital humanities; * Novel utilisation of digital and AI tools for performing digital humanities research; * The role of digital humanities in the classroom: reimagining literacy and AI fluency * Digital humanities data and project management; * The role of librarians in the digital humanities project; * Any other digital humanities-related topic that serves the Southern African community. Submission Guidelines The DHASA conference 2025 asks for three types of submissions: * Long papers: Authors may submit long papers with a maximum of 8 content pages and unlimited pages for references and appendices. The final versions of accepted long papers will be granted an additional page (leading to a total of up to 9 content pages) to incorporate reviewers' comments. Long papers accepted for the conference will be presented in 30-minute time slots (which includes 10 minutes for questions). * Short papers: Authors may submit short papers with a maximum of 5 content pages and unlimited pages for references and appendices. The final versions of accepted short papers will be allowed an extra page (leading to a total of up to 6 content pages) to accommodate reviewers' comments. Short papers accepted for the conference will be presented in 15-minute time slots (which includes 5 minutes for questions). * Executive summaries: Authors can submit an executive summary for work in progress, limited to 1 page. Executive summaries accepted for the conference will be presented as posters during a dedicated poster presentation slot. All accepted long and short paper submissions that are presented at the conference will be published in the JDHASA journal, see https://upjournals.up.ac.za/index.php/dhasa. In addition, the executive summaries for the poster presentations will be published in a book of executive summaries before the conference. We particularly encourage student submissions where the first author is a student. All submissions should adhere to the ACL style guide: https://acl-org.github.io/ACLPUB/formatting.html Submissions should be submitted in PDF format. Submissions that do not adhere to the prescribed style guide will be rejected. Follow this link to go to the submission platform: https://dh2025.digitalhumanities.org.za/submission/ Authors are encouraged to upload their datasets to the SADiLaR repository: https://repo.sadilar.org/. In case of difficulties uploading the datasets, please reach out to Benito Trollip (benito.trollip(a)nwu.ac.za). Important dates Submission deadline: 14 July 2025 Date of notification: 16 September 2025 Camera-ready copy deadline: 24 October 2025 Conference: 10 November 2025 - 14 November 2025 Conference venue: CSIR ICC, Pretoria, South Africa Co-located events Several co-located events are currently being prepared, including workshops and tutorials. These will be updated on the conference website. Organising Committee Aby Louw, Council for Scientific and Industrial Research Andiswa Bukula, South African Centre for Digital Language Resources Avi Moodley, Council for Scientific and Industrial Research Franco Mak, Council for Scientific and Industrial Research Franziska Pannach, Rijksuniversiteit Groningen Ilana Wilken, Council for Scientific and Industrial Research Johannes Sibeko, Nelson Mandela University Juan Steyn, South African Centre for Digital Language Resources Laurette Marais, Council for Scientific and Industrial Research Marissa Griesel, South African Centre for Digital Language Resources Menno van Zaanen, South African Centre for Digital Language Resources Privolin Naidoo, Council for Scientific and Industrial Research Sthembiso Mkhwanazi, Council for Scientific and Industrial Research -- Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za Professor in Digital Humanities South African Centre for Digital Language Resources https://www.sadilar.org ________________________________ NWU PRIVACY STATEMENT: http://www.nwu.ac.za/it/gov-man/disclaimer.html DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system. ________________________________

1 0

[CFP] Call for Papers for the WiNLP 2025 workshop in conjunction with EMNLP 2025
by ACL Announcements 12 Jun '25

12 Jun '25

WINLP 2025 WORKSHOP The Widening NLP (WiNLP) workshop aims to foster an inclusive environment that highlights the contributions of researchers from underrepresented groups in NLP. Anyone who self-identifies as being from an underrepresented background--based on gender, ethnicity, nationality, sexual orientation, disability, or otherwise--is encouraged to submit. In 2025, WiNLP will continue placing emphasis on access, disability, and diversity across scientific backgrounds, disciplines, training, and underrepresented languages. Our annual Widening Natural Language Processing Workshop (WiNLP) will be held in conjunction with EMNLP 2025 in Suzhou, China. Since EMNLP is anticipating a hybrid format for their conference, we also anticipate our workshop will be hybrid, with both online and in-person attendees. The one-day workshop will occur during EMNLP's workshop period with an exact date to be announced soon. The full-day event includes invited talks, oral presentations, and poster sessions. The workshop provides an excellent opportunity for junior members in the community to showcase their work and connect with senior mentors for feedback and career advice. It also offers recruitment opportunities with leading industrial labs. Most importantly, the workshop will provide an inclusive and accepting space, and work to lower structural barriers to joining and collaborating with the NLP community at large. Information on Submission guidelines at: https://www.winlp.org/call-for-submissions-2025/ PRE-SUBMISSION MENTORSHIP PROGRAM WiNLP offers an optional pre-submission mentorship program to help authors improve the quality of their writing and presentation before final submission. The program focuses on enhancing the clarity and structure of the paper, not critiquing the research content. * Submission: Authors must submit a draft of their paper via the designated Google Form (https://forms.gle/J33K2ea6VruN82ke9) by June 20, 2025. The draft should adhere to the same formatting and length guidelines as final submissions. * Mentor Assignment: Organizers will check the draft for compliance with formatting requirements before assigning a mentor. The mentor will not be involved in reviewing the final submission. * Feedback: Mentors will provide feedback by July 18, 2025, offering suggestions to improve writing and presentation. Authors are encouraged to incorporate this feedback before the final submission deadline. * Non-Anonymous: The mentorship process is not anonymized. * Final Submission: Authors who participate in the mentorship program should submit their final paper as a new submission via OpenReview by August 1st, 2025 to be considered for WiNLP workshop. Participation in the mentorship program is not a prerequisite for submitting a paper to WiNLP. TRAVEL SUPPORT WiNLP offers a limited number of travel grants to support one author per accepted submission. Grants may cover expenses such as registration, travel, lodging, or visa costs. Funded authors may choose to attend virtually if preferred. * Travel grant application deadline: September 26, 2025 * Notification: October 6, 2025 * Eligibility: One author per accepted submission is eligible. The funded author must be identified in the travel grant application form. Additional funding for virtual attendance by other authors may be considered if surplus funds are available, but in-person attendance for additional authors is not guaranteed. Travel expenses are handled via reimbursement (primarily through USD check or PayPal). Authors unable to front travel costs should contact the organizers early to discuss alternatives. Authors are encouraged to explore local funding options (e.g., institutional support) to maximize the reach of WiNLP's limited funds. We recommend additional student authors keep an eye out for the EMNLP call for student volunteers or call for D&I subsidies as opportunities for further funding. IMPORTANT DATES All deadlines are 11:59 PM UTC-12:00 "Anywhere on Earth" * Pre-submission mentoring deadline: June 20, 2025 * Pre-submission feedback returned: July 18, 2025 * Paper submission deadline: August 1, 2025 * Acceptance notifications: September 15, 2025 * Camera-ready deadline: October 1, 2025 * Travel grant applications due: September 26, 2025 * Travel grant notifications: October 6, 2025 CONTACT INFORMATION Website: https://www.winlp.org/call-for-submissions-2025/ Twitter: @winlpworkshop [1] Facebook: Widening NLP [2] LinkedIn: Widening NLP [3] E-mail: winlp-chairs(a)googlegroups.com Links: ------ [1] https://twitter.com/WiNLPWorkshop [2] https://www.facebook.com/WideningNLP [3] https://www.linkedin.com/company/winlp

1 0

[Workshop CFP] - (R2LM) From Rules to Language Models: Comparative Performance Evaluation @ RANLP
by alicia.picazo＠ua.es 12 Jun '25

12 Jun '25

[CFP] - (R2LM) From Rules to Language Models: Comparative Performance Evaluation @ RANLP 2025 (Varna, Bulgaria) - 11-13 September 2025 https://r2lm2025.github.io/R2LM/ Workshop Description Deep learning (DL) and large language models (LLMs) have driven major advances in natural language processing (NLP), enabling impressive performance across many tasks. However, they continue to face key challenges in handling complex linguistic phenomena such as multiword expressions, long-context reasoning, and robustness to adversarial inputs. In parallel, concerns remain about the scalability, interpretability, and domain adaptability of these models, particularly in applications requiring high precision, such as grammar checking, legal analysis, or medical NLP. These limitations have sparked renewed interest in rule-based and knowledge-based approaches, which often offer better explainability and remain competitive, especially in low-resource or high-stakes scenarios. Our workshop aims to gather contributions that deal with the following topics: • Role of rule-based and knowledge-based NLP methods in modern applications • Comparative analysis of rule-based, machine-learning, deep-learning and large language models for different NLP tasks • Emerging trends in NLP research beyond deep learning and Large Language Models • Limitations and performance bottlenecks in scalability and accuracy of deep learning models Submission Details • Long papers: up to 8 pages (excluding references) • Short papers: up to 4 pages (excluding references) • Format: ACL-style (LaTeX or MS Word) • Submission portal and template info available on the RANLP 2025 website Important dates Paper Submission Deadline: 6 July 2025 Notification of Acceptance: 31 July 2025 Workshop date: 11, 12 or 13 September 2025 Organising Committee: Alicia Picazo-Izquierdo, University of Alicante, Spain Ernesto Luis Estevanell-Valladares, University of Alicante, Spain Rafael Muñoz Guillena, University of Alicante, Spain Ruslan Mitkov, Lancaster University, UK Raúl García Cerdá, University of Alicante, Spain

1 0