***Call for Papers WASP @ IJCNLP-AACL 2025***
https://ui.adsabs.harvard.edu/WIESP/2025/
Building on the success of the First Workshop on Information Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022 and the Second WIESP at IJCNLP-AACL 2023, the Third Workshop on Artificial intelligence for Scientific Publications (WASP) at IJCNLP-AACL 2025 aims to establish itself as a pivotal platform for promoting discussions and research in the field of Natural Language Processing (NLP) and Artificial Intelligence (AI). This gathering will bring together esteemed experts and renowned organizations with students and early-career researchers who are interested and invested in efforts to extract and mine the world’s scientific knowledge from research papers. Their collaboration will be focused on developing advanced algorithms, models, and tools that will lay the foundation for future machine comprehension of scientific literature. The third iteration of WASP will specifically concentrate on various topics related to Artificial Intelligence research for scientific publications:
***Topics (not limited to)***
Scientific document parsing and structured information extraction
Scientific named-entity recognition and concept identification
Citation context/span extraction and citation-based knowledge mining
Argument extraction and scientific discourse analysis
Scientific article summarization and headline generation
Question-answering and fact retrieval from scientific literature
Prompt engineering and retrieval-augmented generation (RAG) for science Q&A
Chain-of-thought reasoning and scientific problem-solving with LLMs
LLM-powered information extraction from scientific texts
Pretraining and fine-tuning LLMs on scientific corpora
Evaluation and alignment of LLMs for scientific understanding
AI-assisted scientific discovery and hypothesis generation
Ethical and responsible use of LLMs in scientific publishing
Large Language Reasoning Models for Scientific Discovery
LLM hallucinations and impact on scientific knowledge, publications
Challenges, Future of AI in Scientific Publishing
AI, Peer Review, and Scientific Publishing
Impact of Generative AI on Scientific Publishing
In addition to papers, WASP will also host a shared task.
***Telescope Reference and Astronomy Categorization Shared (TRACS)***
https://ui.adsabs.harvard.edu/WIESP/2025/shared_task
We will publish a separate CfP on the shared task. Shared task authors will be invited to write their system descriptions, which will then undergo light peer review.
All accepted papers and shared task system papers will be published in the WASP proceedings as part of IJCNLP-AACL 2025 and indexed in the ACL Anthology.
***Important Dates***
Paper submission deadline (WASP+TRACS): September 29, 2025
ARR commitment deadline: October 27, 2025
Notification of paper acceptance (WASP+TRACS): November 3, 2025
Camera-ready submission deadline (WASP+TRACS): November 11, 2025
Workshop: December 23, 2025 (hybrid)
All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)
***Paper Submission Site***
https://openreview.net/group?id=aclweb.org/AACL-IJCNLP/2025/Workshop/WASP
Submission will be via OpenReview. Submissions should follow the ACLPUB formatting guidelines and use the provided template files. Paper formatting guidelines - ACLPUB
Submissions (Long and Short Papers) will be subject to a double-blind peer-review process. We follow the same policies as IJCNLP-AACL 2025 regarding anonymity, preprints, and double submissions.
Please reach out to the organizers (cc'ed) for any queries.
Thank you!
--
+++++++++++++++++++++++++++++++++++
Dr. Tirthankar Ghosal
Scientist (NLP/AI and HPC)
National Center for Computational Sciences (NCCS)
Oak Ridge National Laboratory, United States
&
Affiliate Faculty (NLP/AI)
University of Tennessee Knoxville
United States
https://www.tirthankarghosal.com
++++++++++++++++++++++++++++++++++++
We have a number of permanent academic Assistant/Associate Prof posts open in all areas of Computer Science including Artificial Intelligence and Natural Language Processing in the School of Computer Science. Applications due September 30th. Please see the following link for more details:
https://www.jobs.ac.uk/job/DOI907/assistant-or-associate-professor-in-compu…
With best regards,
Mark
Mark Lee
DPVC (India)
Professor of Artificial Intelligence
www.cs.bham.ac.uk/~mgl<http://www.cs.bham.ac.uk/~mgl>
University of Birmingham
ConsILR-2025 - deadline extension
the 20th edition of the International Conference on Linguistic Resources
and Tools for Natural Language Processing
(https://conferences.info.uaic.ro/consilr/2025
<https://conferences.info.uaic.ro/consilr/2025/index.html>)
Important Dates:
August 23, 2025September 10, 2025– abstracts submission (max 300 words)
August 31, 2025September 13, 2025 – paper submission
September 21, 2025 September 20, 2025 – authors’ notification
September 31, 2025 – final form submission
October 8 - 10, 2025 – ConsILR Conference
October 17, 2025 - final form submission
Venue: Casa Academiei Române (House of the Romanian Academy), 13, Calea 13
Septembrie, Bucharest, Romania and ONLINE
We invite papers presenting original and unpublished research, as well as
descriptions of accomplished or in-progress work, in all areas of natural
language processing. We welcome contributions covering a range of topics,
including but not limited to:
-
Natural Language Processing (NLP) Techniques and Applications
-
Large Language Models (LLMs) and Applications
-
Digital Humanities in Language Technology
-
(Mono- or multimodal) Language Resources and Tools for text, speech,
images and videos
-
Computational Models and Algorithms in Language Processing
-
Applied Linguistics and NLP Integration
-
Morphosyntactic Structures in Language Processing
-
Semantic and Pragmatic Analysis in NLP
-
Multi-word Expressions and Idiomatic Language in NLP
-
Cultural and Contextual Factors in Language Technology
-
Romanian Language Processing and Contrastive Linguistics
Authors are encouraged to submit, in addition to the papers per se,
open-source linguistic resources, such as corpora (or corpus examples),
demo code, video and sound files.
Confirmed invited speakers:
Agata Savary <https://perso.limsi.fr/savary/>
Amalia Todirașcu <https://fr.linkedin.com/in/amalia-todirascu>
Marius Ursache <https://www.linkedin.com/in/mariusursache/>
Paula Gradu <https://www.linkedin.com/in/paula-gradu-7505591b0>
Organisers:
-
“Mihai Drăgănescu” Research Institute for Artificial Intelligence
of the Romanian
Academy
-
Institute of Computer Science of the Romanian Academy – Iași Branch
-
Faculty of Computer Science of the “Alexandru Ioan Cuza” University of
Iași
-
“Alexandru Philippide” Institute of Philology of the Romanian Academy –
Iași Branch
-
Romanian Association of Computational Linguistics
-
Academy of Technical Sciences of Romania
The abstracts (max. 300 words) and papers (an even number of pages, between
6 and 12, including references) must be written in British English.
Details about the paper format are available on the conference website.
The Proceedings of the Conference will be sent for indexing to Clarivate
Analytics.
Further information can be found on the conference web site:
https://conferences.info.uaic.ro/consilr/2025
<https://conferences.info.uaic.ro/consilr/2025/index.html>.
On behalf of the ConsILR-2025 Organising Committee,
Dr. Elena Irimia
Due to many requests, the new extended deadline is 1st November 2025.
Apologies for cross-posting.
---------------------------------------------------------------------------
CALL FOR PAPERS: Language Resources and Evaluation Journal - Special Issue on Advancing Arabic Language Models: Resources, Evaluation, and Applications in the Era of Large Language Models
Springer Nature: https://link.springer.com/collections/ieheibhacc
Guest Editors:
Wassim El Hajj (American University of Beirut - Mediterraneo, Cyprus)
Hend S. Al-Khalifa (King Saud University, KSA)
Ahmed Ali (Saudi Authority for Data and Artificial Intelligence (SDAIA), KSA)
Overview
This special issue aims to explore the latest advancements, challenges, and future directions in Arabic Language Models (LMs), with a particular focus on Large Language Models (LLMs). It addresses the critical need for robust language resources, comprehensive evaluation frameworks, and innovative applications that cater to the unique linguistic characteristics of Arabic, including its dialects. The issue brings together researchers, linguists, and practitioners to discuss state-of-the-art methodologies, datasets, and evaluation metrics that contribute to the development of more accurate, culturally aligned, and ethically sound Arabic LLMs. By focusing on the intersection of Arabic linguistics, artificial intelligence, and cultural studies, this special issue will provide a comprehensive overview of the current state and future prospects of Arabic LLMs, contributing significantly to the field of language resources and evaluation.
Topics of Interest
We invite submissions on topics including, but not limited to, the following areas. Both original research contributions and substantial extensions of previously published work are welcome.
· Development of large-scale datasets for Arabic, including dialectal varieties.
· Novel approaches to Arabic language modeling, including deep learning and hybrid methodologies.
· Evaluation frameworks and benchmarks for Arabic LLMs, with a focus on comprehensive assessment across dialects and tasks.
· Cultural and ethical considerations in developing and deploying Arabic LLMs.
· Applications of Arabic LLMs in education, communication, and other domains.
· Challenges and solutions in handling the syntactic and dialectic variations of Arabic in LLMs.
· Comparative studies of Arabic LLMs with other language models.
· Techniques for improving the efficiency and performance of Arabic LLMs
· Interpreting and explaining Arabic LLMs.
For further information on this initiative, please refer to https://link.springer.com/collections/ieheibhacc
IMPORTANT DATES
Submission Deadline: November 1, 2025
Final Decisions: February 30, 2026
Publication: Second Quarter of 2026
SUBMISSION GUIDELINES
Authors should follow the "Instructions for Authors (https://link.springer.com/journal/10579/submission-guidelines)" on the LRE journal website.
Best Regards on behalf of the Guest Editors,
Wassim El Hajj
Hend S. Al-Khalifa
Ahmed Ali
*** Last Mile for Workshop Proposals Submission ***
The Annual ACM Conference on Intelligent User Interfaces (IUI 2026)
March 23-26, 2026, 5* Coral Beach Hotel & Resort, Paphos, Cyprus
https://iui.hosting.acm.org/2026/
(*** Submission Deadline: August 29, 2025 (extended and final!) ***)
We are pleased to invite proposals for workshops to be held in conjunction
with the Annual International ACM Conference on Intelligent User Interfaces (ACM IUI
2026), Paphos, Cyprus.
Workshops aim to provide a venue for presenting research on emerging or specialized
topics of interest and to offer an informal forum for discussing research questions and
challenges. Potential workshop topics should be related to the general theme of the
conference (“Where HCI meets AI”).
We welcome proposals for a wide range of *full-day* or *half-day* workshops, including
but not limited to:
• Mini Conferences: Workshops that focus on a specific topic and may have their own
paper submission and review processes.
• Interactive Formats: Workshops that encourage active participation and hands-on
experiences through break-out sessions or group work to explore specific topics. They
may have their own paper submission and review process or target a report summarizing
the discussions and outcomes.
• Emerging Work Sessions: Workshops that foster discussion around emerging ideas.
Organizers may raise specific topics and invite position papers, late-breaking results, or
extended abstracts.
• Project-Centric Formats: Workshops tied closely to a specific existing large-scale
funded project(e.g., NSF, EU) with the goal to engage a broader community.
• Interactive Competitions: Formats that invite individuals and teams to participate in
challenges or hackathons on selected topics relevant to IUI.
Review and Oversight by Workshop Chairs
Proposals will be reviewed and evaluated by the Workshop Chairs. It is possible that
workshops may be cancelled, shortened, merged, or restructured if there are insufficient
submissions.
Workshop summaries will be included in the ACM Digital Library for ACM IUI 2026. We will
also publish joint workshop proceedings for accepted workshop submissions
(through CEUR or a similar venue).
Responsibilities of Workshop Organizers
• Coordinate the Call for Papers, including solicitation, submission handling, and peer
review process.
• Create and maintain a dedicated website with workshop information. The IUI 2026
website will link to this page.
• Prepare and communicate a Call for Participation, targeting both IUI and broader relevant
communities (e.g., via mailing lists, social media, newsgroups, or offline events).
• Facilitate the planned activities, including paper presentations, discussions, and/or
interactive elements.
• Submit a workshop summary for inclusion in the ACM Digital Library.
• Collect camera-ready papers and author agreements from workshop participants for the
joint workshop proceedings (CEUR or similar).
Note that for the joint proceedings (CEUR or similar), submissions should be peer-reviewed
and will need to meet publishers’ guidelines. CEUR, for example, requires a 5-page
minimum per contribution. Note that not all workshop formats listed above may meet
these requirements, and we may not be able to include them.
IUI 2026 is an in-person event, and we expect workshop organizers to attend, allowing the
workshop to be conducted on-site. One author per paper is expected to attend in person
to present the work.
Proposal Format
Workshop proposals should be a maximum of four pages long (single-column format).
Prepare your submission using the latest templates: Word Submission Template
(https://authors.acm.org/binaries/content/assets/publications/taps/acm_submi…),
or the LaTex Template
(https://authors.acm.org/proceedings/production-information/preparing-your-a…).
For Latex, please use “\documentclass[manuscript,review]{acmart}”.
The proposals should be organized as follows:
• Name and title: A one-word acronym and a full title. Please indicate “(Workshop)” after
the title.
• Abstract: A brief summary of the workshop.
• Description of workshop topic: Should discuss the relevance of the proposed topic to
IUI and its interest for the IUI 2026 audience. Include a concise discussion of why this
workshop is particularly relevant for the intended audience and how it will complement
and enhance topics covered at the main conference.
• Previous history: List of previous workshops on this topic, including the conferences
that hosted them and the number of participants. If available, report on past editions of
the workshop (including URLs), along with a brief statement of the workshop
series (e.g., covering topics, number of paper submissions, and participants), as well as
post-workshop publications over the years and acceptance statistics. If this is the first
edition of the workshop, describe how it differs from others on similar topics (e.g., by
including conference names and years).
• Organizer(s): Names, affiliations, emails, and web pages of the organizer(s). Provide a
brief description of the background of the organizer(s). Strong proposals normally include
organizers who bring differing perspectives on the topic and are actively connected to the
communities of potential participants. Please indicate the primary contact person and the
organizers who will attend the workshop. Also, please provide a list of other workshops
organized by workshop organizers in the past.
• Workshop program committee: Names and affiliation of the members of the (tentative)
workshop program committee that will evaluate the workshop submissions.
• Participants: Include a statement of how many participants you expect and how you plan
to invite participants for the workshop. If possible, include the names of at least 10 people
who have expressed interest in participating in the workshop or tutorial.
• Workshop activities: A brief description of the format regarding the mix of
events or activities, such as paper presentations, invited talks, panels, demonstrations,
teaching activities, hands-on practical exercises, and general discussion.
• Planned outcomes of the workshop: What are you hoping to achieve by the end of the
workshop? Please list here any planned publications or other outcomes expected.
• Length: Full-day or half-day.
Submission Platform
• All materials must be submitted electronically to PCS 2.0
http://new.precisionconference.com/~sigchi by the proposal submission deadline.
• In PCS 2.0, first click "Submissions" at the top of the page, from the dropdown menus for
society, conference, and track, select "SIGCHI", "IUI 2026", and then "IUI 2026 Workshops",
and press "Go".
We encourage both researchers and industry practitioners to submit workshop proposals.
To support diverse perspectives in the workshops, we strongly recommend including
organizers from varied institutions and backgrounds.
Furthermore, we welcome workshops with an innovative structure that can attract diverse
types of contributions and foster valuable interactions.
Prospective organizers are encouraged to contact the Workshop Chairs in
advance (workshops2026(a)iui.acm.org) to discuss ideas, receive feedback, or seek
assistance in preparing engaging proposals. Especially for workshop proposals featuring
innovative interactive formats, we are happy to help further develop and implement the
ideas.
Important Dates (AoE)
• Workshop Proposals: August 29, 2025 (extended and final!)
• Decision Notification: September 19, 2025
• Camera-ready Summaries: February 6, 2026
Workshop Chairs
Karthik Dinakar, Pienso, USA
Werner Geyer, IBM Research, USA
Patricia Kahr, University of Zurich, Switzerland
Antonela Tommasel, CONICET, Argentina
The Centre for Translation Studies (CTS) at the University of Surrey invites applications for a place in our stand-alone course on "Introduction to Artificial Intelligence for Translators and Interpreters". This course introduces students to the fundamentals of Artificial Intelligence (AI) and its applications in the field of translation and interpreting. The course covers a wide range of topics, from the basic concepts of AI to more advanced areas and techniques including machine learning, large language models (LLMs) and LLM leveraging and customisation of automatic speech recognition (ASR) engines. Students will be taught different prompting techniques which allows them to interact with LLMs like ChatGPT, so they can develop advanced problem-solving skills.
Students will tackle AI-related tasks that are relevant in the fields of translation and interpreting, such as machine translation, customisation of ASR engines and the use of machine assistance in tasks requiring creativity skills (e.g. transcreation). They will also explore the ethical implications of AI and the potential impact of AI on the future of the language industry.
The module is offered in synchronous online mode. The module will run for 11 weeks starting on 23rd September. The tentative time slot is Tuesdays, 4-6pm UK time. You can find more details about the module and how to register at https://www.surrey.ac.uk/cpd-and-short-courses/tram511-introduction-artific…
The full list of standalone courses we offer this year is available at https://www.surrey.ac.uk/centre-translation-studies/continuing-professional…
Should you have any questions, do not hesitate to get in touch.
---
Prof Constantin Orăsan
Professor of Language and Translation Technologies
Centre for Translation Studies<https://www.surrey.ac.uk/centre-translation-studies> | School of Literature and Languages<https://www.surrey.ac.uk/school-literature-languages>
Personal page: https://www.surrey.ac.uk/people/constantin-orasan
Office: 06LC03, Email: C.Orasan(a)surrey.ac.uk<mailto:C.Orasan@surrey.ac.uk>
Library and Learning Centre, University of Surrey, Guildford, Surrey, GU2 7XH, UK
TL;DR SHROOM-CAP is an Indic-centric shared task co-located with CHOMPS-2025 to advance the SOTA in hallucination detection for scientific content generated with LLMs. We have annotated hallucinated content in 4* high-resource languages and surprisal 3* low-resource Indic languages using top-tier LLMs. Participate in as many languages as you like by accurately detecting the presence of hallucinated content.
Stay informed by joining our Google group !
Full Invitation
We are excited to announce the SHROOM-CAP shared task on cross-lingual hallucination detection for scientific publication (link to website). We invite participants to detect whether or not there is hallucination in the outputs of instruction-tuned LLMs within a cross-lingual scientific context.
About This shared task builds upon our previous iteration, SHROOM, with three key highlights: LLM-centered, cross-lingual annotations & hallucination and fluency prediction.
LLMs frequently produce "hallucinations," where models generate plausible but incorrect outputs, while the existing metrics prioritize fluency over correctness. This results in an issue of growing concern as these models are increasingly adopted by the public.
With SHROOM-CAP, we want to advance the state-of-the-art in detecting hallucinated scientific content. This new iteration of the shared task is held in a cross-lingual and multimodel context: we provide data produced by a variety of open-weights LLMs in 4*+3* different high and low resource languages (English, French, Spanish, Hindi, and to-be-later-revealed Indic languages).
Participants are invited to participate in any of the languages available and are expected to develop systems that can accurately identify hallucinations in generated scientific content. Additionally, participants will also be invited to submit system description papers, with the option to present them in oral/poster format during the CHOMPS workshop (collocated with IJCNLP-AACL 2025, Mumbai, India). Participants that elect to write a system description paper will be asked to review their peers’ submissions (max 2 papers per author).
Key Dates:
All deadlines are “anywhere on Earth” (23:59 UTC-12). - Dev set available by: 31.07.2025 - Test set available by: 05.10.2025 - Evaluation phase ends: 15.10.2025 - System description papers due: 25.10.2025 (TBC) - Notification of acceptance: 05.11.2025 (TBC) - Camera-ready due: 11.11.2025 (TBC) - Proceedings due: 01.12.2025 (TBC) - CHOMPS workshop: 23/24th December 2025 (co-located with IJCNLP-AACL 2025)
Evaluation Metrics: Participants will be ranked along two criteria: 1. factuality mistakes measured via macro-F1 gold reference vs. predicted; 2. fluency mistakes measured via macro-F1 gold reference vs. predicted based on our annotations.
Rankings and submissions will be done separately per language: you are welcome to focus only on the languages you are interested in!
How to Participate: - Register: Please register your team https://forms.gle/hWR9jwTBjZQmFKAE7 and join our google group: https://groups.google.com/g/shroomcap - Submit results: use our platform to submit your results before 15.10.2025 - Submit your system description: system description papers should be submitted by 25.10.2025 (TBC, further details will be announced at a later date).
Want to be kept in the loop?
Join our Google group mailing list! We look forward to your participation and to the exciting research that will emerge from this task.
Best regards,
SHROOM-CAP organizers
We welcome you to the next Natural Language Processing and Vision (NLPV) seminars at the University of Exeter.
Talk 1
Scheduled: Thursday 21 Aug 2025 at 16:00 to 17:00, GMT+1
Location: https://Universityofexeter.zoom.us/j/97587944439?pwd=h4rnPO0PafT9oRrrqQsezG… (Meeting ID: 975 8794 4439 Password: 064414)
Title: Trustworthy Optimization of Pre-Trained Models for Healthcare: Generalizability, Adaptability, and Security
Abstract: Pre-trained language models have opened new possibilities in healthcare, showing promise in mining scientific literature, analyzing large-scale clinical data, identifying patterns in emerging diseases, and automating workflows, positioning themselves as intelligent research assistants. However, general-purpose models, typically trained on web-scale corpora, often lack the clinical grounding necessary for reliable deployment in high-stakes domains like healthcare. To be effective, they must be adapted to meet domain-specific requirements. My PhD thesis addresses three core challenges in leveraging pre-trained models for healthcare: (i) the scarcity of labeled data for fine-tuning, (ii) the evolving nature of healthcare data, and (iii) the need to ensure transparency and traceability of AI-generated content. In this talk, I will focus on the third challenge: enabling traceability of content generated by large language models. I will begin with an overview of prior watermarking approaches and then present our proposed solution. We introduce a watermarking algorithm applied at inference time that perturbs the model’s logits to bias generation toward a subset of vocabulary tokens determined by a secret key. To ensure that watermarking does not compromise generation quality, we propose a multi-objective optimization (MOO) framework that employs lightweight networks to produce token-specific watermarking logits and splitting ratios, specifying how many tokens to bias and by how much. This approach effectively balances watermark detectability with semantic coherence. Experimental results show that our method significantly improves detectability and robustness against removal attacks while preserving the semantics of the generated text, outperforming existing watermarking techniques.
Speaker's bio: Dr. Sai Ashish Somayajula is a Senior Applied Scientist in Generative AI at Oracle Cloud Infrastructure, where he develops large-scale foundation models for enterprise applications. He earned his PhD in Electrical and Computer Engineering from the University of California (UC), San Diego. His research focused on addressing key challenges in adapting and utilizing pre-trained models for healthcare. Specifically, his work spanned three core areas: (1) synthetic data generation using meta-learning-based feedback mechanisms, (2) continual learning for handling dynamic data streams without catastrophic forgetting, and (3) token-level watermarking techniques to ensure content provenance and security. His research has been published in premier venues, including the International Conference on Machine Learning (ICML), Annual Meeting of the Association for Computational Linguistics (ACL), Transactions of the Association for Computational Linguistics (TACL), Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Scientific Reports (Nature Portfolio), and Transactions of Machine Learning Research (TMLR). He is a recipient of the Jacobs School of Engineering Departmental Fellowship at UC San Diego. Ashish has collaborated with leading industrial research labs through internships at Apple and Tencent AI Lab. He holds a Bachelor's degree in Electrical Engineering with a minor in Computer Science from the Indian Institute of Technology, Hyderabad, where he was twice awarded the Academic Excellence Award, and a Master’s in Intelligent Systems and Robotics from UC San Diego.
Talk 2
Scheduled: Thursday 4 Sep 2025 at 13:00 to 14:00, GMT+1
Location: https://Universityofexeter.zoom.us/j/95827730937?pwd=Te1wejfgr68A5lplwLQjxw…
(Meeting ID: 958 2773 0937 Password: 879296)
Title: Towards end-to-end tokenization and adaptive memory in foundation models
Abstract: Foundation models (FMs) process information as a sequence of internal representations; however, the length of this sequence is fixed and entirely determined by tokenization. This essentially decouples representation granularity from information content, which exacerbates the deployment costs of FMs and narrows their “horizons” in long sequences. What if, instead, we could free FMs from tokenizers by modelling bytes directly, while making them faster than current tokenizer-bound FMs? I argue that a recipe to achieve this goal already exists. In particular, I helped prototype how to: 1) dynamically pool representations in internal layers, progressively learning abstractions from raw data; 2) compress the KV cache of Transformers during generation without loss of performance; 3) predict multiple bytes per time step in an efficient yet expressive way; 4) retrofit existing tokenizer-bound FMs into byte-level FMs through cross-tokenizer distillation. By blending these ingredients, we may soon witness the emergence of efficient byte-level FMs.
Speaker's short bio (based on website): Edoardo Ponti is an assistant professor in Natural Language Processing at the University of Edinburgh and a visiting professor at NVIDIA. His research focuses on efficient architectures (see NeurIPS 2024 tutorial on dynamic sparsity), modular deep learning (designing neural architectures that route information to specialised modules, e.g., sparse subnetworks), and computational typology (understand how languages vary, across the world and its cultures, within a computational and mathematical framework). Previously, Edorado was a visiting postdoctoral scholar at Stanford University and a postdoctoral fellow in computer science at Mila - Quebec AI Institute in Montreal. In 2021, Edorado obtained a PhD from the University of Cambridge, St John’s College. Once upon a time Edorado studied typological and historical linguistics at the University of Pavia. Edoardo’s research has been featured on the Economist and Scientific American, among others. Edoardo received a Google Research Faculty Award and 2 Best Paper Awards at EMNLP 2021 and RepL4NLP 2019. Edoardo is a board member of SIGTYP, the ACL special interest group for computational typology, a Scholar of the European Lab for Learning and Intelligent Systems (ELLIS), and part of the TACL journal editorial team.
We will update future talks at the website: https://sites.google.com/view/neurocognit-lang-viz-group/seminars
Joining our *Google group* for future seminar and research information: https://groups.google.com/g/neurocognition-language-and-vision-processing-g…
RANLP 2025 TUTORIALS (6-7 September)
Call for Participation
Website - https://ranlp.org/ranlp2025/index.php/tutorials/
RANLP 2025 belongs to a sequence of events with similar name and continues the tradition of successful training events that were held in Bulgaria since 1989.
RANLP 2025 plans 4 half-day tutorials, each with duration of 185 minutes, distributed as follows: 45 min presentation + 20 min break + 45 min presentation + 30 min coffee break + 45 min presentation.
Tutorial Presenters
* Burcu Can Buglalilar (University of Sterling, UK)
* Salima Lamsiyah (University of Luxembourg, Luxembourg)
* Tharindu Ranasinghe and Damith Dola Mullage Premasiri (Lancaster University, UK)
* Anna Rogers and Max Müller-Eberstein (IT University of Copenhagen, Denmark)
Programme
6th September 2025, 9am
Tharindu Ranasinghe and Damith Premasiri: NLP in the LLM era
This tutorial examines the transformation of Legal NLP in the era of large language models, beginning with key principles of task formulation and data preparation. We will discuss retrieval and judgment prediction in detail, exploring their methodologies, challenges, and applications in legal contexts. We conclude with a forward-looking discussion on the future of Legal AI and the ethical considerations surrounding its applications in the practice of law.
6th September 2025, 2pm
Burcu Can Buglalilar: From Large to Small: Building Affordable Language Models with Limited Resources
This tutorial aims to question the limitations and harms of Large Language Models, followed by a comprehensive review of Small Language Models, covering prominent examples, their key techniques, and their capabilities. It will also give an overview of even smaller ‘baby’ language models. Finally, the tutorial will conclude by presenting some recent studies in which we developed baby language models using a very small amount of data.
7th September 2025, 9am
Anna Rogers and Max Müller-Eberstein: Studying Generalization in the Age of Contamination
The tutorial will discuss the challenges of doing NLP research in the age of LLMs, when we can no longer be sure that the test data was not observed in training. We will cover the main approaches to studying generalization in various settings, and present a new framework for working with controlled test-train splits across linguistically annotated data at scale.
7th September 2025, 2pm
Salima Lamsiyah: AI Content in NLP: Trends, Detection, and Applications
This tutorial provides a comprehensive overview of AI-generated content in Natural Language Processing (NLP). It covers recent trends in text generation, methods for detecting AI-generated text, and practical applications of such content. The content includes an exploration of state-of-the-art models and techniques for text generation, approaches to identifying machine-generated text, a review of key benchmarks and datasets, and a discussion of open research challenges.
We are looking forward to your participation!
The organisers of RANLP 2025