Tokshop: Tokenization Workshop (ICML 2025)
Submission to the Tokenization Workshop begins on April 14, 2025, via OpenReview. The deadline for submissions is May 30, 2025, at 11:59pm (anywhere on earth). Notifications of acceptance will be sent out on June 9, 2025, and camera-ready papers will be due shortly afterward at 11:59pm (anywhere on earth). The workshop will take place on July 18, 2025.
Workshop Description The Tokenization Workshop (TokShop) at ICML aims to bring together researchers and practitioners from all corners of machine learning to explore tokenization in its broadest sense. We will discuss innovations, challenges, and future directions for tokenization across diverse data types and modalities.
Call for Papers
Topics of interest include:
- Subword Tokenization in NLP: Analysis of techniques such as BPE, WordPiece, and UnigramLM, as well as improvements for efficiency, interpretability, and adaptability. - Multimodal Tokenization: Tokenization strategies for images, audio, video, and other modalities, including methods to align representations across different types of data. - Multilingual Tokenization: Development of tokenizers that work robustly across languages and scripts, and investigation into failure modes tied to tokenization. - Tokenizer Modification Post-Training: Methods for updating tokenizers after model training to boost performance and/or efficiency without retraining from scratch. - Alternative Input Representations: Exploration of non-traditional tokenization approaches, such as byte-level, pixel-level, or patch-based representations. - Statistical Perspectives on Tokenization: Empirical analysis of token distributions, compression properties, and correlations with model behavior. By broadening the scope of tokenization research beyond language, this workshop seeks to foster cross-disciplinary dialogue and inspire new advances at the intersection of representation learning, data efficiency, and model design.
Submission guidelines Our author guidelines follow the ICML requirements unless otherwise specified. - Paper submission is hosted on OpenReview. - Each submission should contain up to 9 pages, not including references or appendix (shorter submissions also welcome). - Please use the provided LaTeX template (Style Files) for your submission. Please follow the paper formatting guidelines general to ICML as specified in the style files. Authors may not modify the style files or use templates designed for other conferences. - The paper should be anonymized and uploaded to OpenReview as a single PDF. - You may use as many pages of references and appendix as you wish, but reviewers are not required to read the appendix. - Posting papers on preprint servers like ArXiv is permitted. - We encourage each submission to discuss the limitations as well as ethical and societal implications of their work, wherever applicable (but neither are required). These sections do not count towards the page limit. - This workshop offers both archival and non-archival options for submissions. Archival papers will be indexed with proceedings, while non-archival submissions will not. - The review process will be double-blind
Read more: https://tokenization-workshop.github.io/
(apologies for multiple postings)
Dear colleagues,
We would like to inform you that the registration for the eLex 2025 conference has now opened (https://elex.link/elex2025/registration/). The deadline for early-bird fee is 5 September 2025.
A call for Hornby bursary applications is also out (https://elex.link/elex2025/hornby-bursary/). The bursaries cover participants' registration fee, so if you intend to apply, please wait for results before paying the registration fees (you can still complete all the steps of the registration process and pay later).
Finally, the special rates for rooms at the venue and partner hotels are available: https://elex.link/elex2025/venue/. There are a limited number of rooms available so early booking is advisable (there is a very friendly cancellation option).
Please monitor the conference website for further updates on the programme, proceedings and related news.
Looking forward to seeing you at the conference.
Best regards
Iztok Kosem
Head of the eLex 2025 organising committee
----------------------------
HealTAC 2025
June 16-18th, 2025, Glasgow (UK)
https://healtac2025.github.io/
----------------------------
Call for participation
----------------------------
The 8th Healthcare Text Analytics Conference (HealTAC 2025) invites everyone for three days of state of the art discussions on healthcare text analytics. The programme features
-- keynotes on "Addressing the Missing Context Problem in Foundation Models for Healthcare" (by Jason Fries, Stanford University) and "AI for Healthcare: Text as a Medium for Multimodal datasets" (by Alison O'Neil, Canon Medical Research Europe)
-- panels on "Opportunities and challenges in LLMs for health research: A multidisciplinary perspective on surfacing social inequalities, bias detection, and mitigation" and "Challenges in AI deployment within NHS";
-- 18 talks describing current PhD projects;
-- a workshop on "NLP in mental healthcare and research" (June 16th);
-- 4 demos and 9 lightning talks;
-- 24 posters presenting healthcare text analytics research.
The detailed programme is available at: https://healtac2025.github.io/programme/
----------------------------
Registration fees
----------------------------
Due to generous support from Health Data Research UK, CogStack, Frontiers, DataMind, University of Glasgow, Research Data Scotland and Healtex, the registration fee is only £100 (for students) and £200 (for everyone else), and includes the full 3-day programme, lunches, the conference dinner and even breakfast on day 1.
This is the early registration fee until May 29th. Registration details:https://healtac2025.github.io/registration/
----------------------------
Accommodation and travel
----------------------------
The University accommodation is available for the registered participants for only £43 per night. All details are available at: https://healtac2025.github.io/accommodation/
Follow the conference announcements on social media at #HEALTAC2025 . We are looking forward to welcoming you to HealTAC 2025.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
The 2nd LLMs4Subjects Shared Task: LLM-based Subject Tagging for the TIB Technical Library's Digital Catalog
Theme: The Development of Energy- and Compute-Efficient LLM Systems
Organized as part of the German Evaluation (GermEval 2025) Shared Task Series
10. - 12. September, 2025
Hildesheim, Germany
(co-located with KONVENS 2025 - Conference on Natural Language Processing)
2nd LLMs4Subjects Shared Task: https://sites.google.com/view/llms4subjects-germeval/
Join the Codabench Competition: https://www.codabench.org/competitions/8373/
KONVENS 2025: https://konvens-2025.hs-hannover.de/about/
Task Overview
LLMs4Subjects challenges the research community to develop cutting-edge LLM-based solutions for subject tagging of technical records from Leibniz University's Technical Library (TIBKAT). Participants are tasked with leveraging large language models (LLMs) to tag technical records using the GND taxonomy. The task involves bilingual language modeling, as systems must process technical documents in both German and English. Successful solutions may be integrated into the operational workflows of TIB, the Leibniz Information Centre for Science and Technology.
With the rapid advancements in LLMs, the focus is shifting toward making these models more energy- and compute-efficient while maintaining high performance. Recent innovations, such as the DeepSeek series, have demonstrated how techniques like mixture-of-experts (MoE) and model distillation can significantly reduce computational costs without sacrificing effectiveness.
The 2nd LLMs4Subjects shared task highlights the importance of efficiency in LLMs, encouraging participants to explore strategies that enhance model performance while optimizing for energy consumption and inference speed. We welcome approaches (but not limited to) that leverage model compression, quantization, efficient fine-tuning, and adaptive computation techniques to push the boundaries of sustainable AI development.
Subtasks
The 2nd LLMs4Subjects shared task organizes the following two subtasks:
Subtask 1 - Multi-Domain Classification of Library Records
Subtask 2 - Large-scale Multilabel Subject Indexing of Library Records
Important Dates
* Release of training data: March 8, 2025
* Release of testing data: May 30, 2025
* Deadline for system submissions: June 2, 2025
* Evaluation end: June 27, 2025
* Paper submission deadline: July 7, 2025
* Notification of acceptance: June 28, 2025
* Camera-ready paper due: August 15, 2025
* Workshop/KONVENS: September 10 - 12, 2025 (TBA)
Note: Submit your system outputs on our Codabench live leaderboard at https://www.codabench.org/competitions/8373/
(Apologies for cross-posting)
*SEM2025: The 14th Joint Conference on Lexical and Computational Semantics, Suzhou, China. (Co-located with EMNLP)
https://starsem2025.github.io/
Third and Final Call for Papers
*SEM brings together researchers interested in the semantics of natural languages and its computational modelling. The conference embraces a wide range of approaches including data-driven, neural, probabilistic and symbolic; practical applications as well as theoretical contributions are welcome. The long-term goal of *SEM is to provide a forum for NLP researchers working on any aspect of natural language semantics.
*SEM invites submissions related to the computational modelling of natural language semantics (understood broadly) and its application. Relevant areas include (but are not limited to) theoretical aspects of computational semantics, empirical and data-driven approaches, resources, evaluation and applications/tools.
*SEM encourages authors to consider ethical aspects of their work, and to address and discuss ethical questions and implications relevant to their research. *SEM also values reproducibility and particularly welcomes submissions that adhere to the reproducibility guidelines as specified here<https://folk.idi.ntnu.no/odderik/reproducibility_guidelines.pdf>.
Submission Instructions
Submissions must describe unpublished work and be written in English. We solicit both long and short papers. Long papers describe original research and may consist of up to eight (8) pages of content, plus unlimited pages for references. Appendices are allowed after the references, but the paper should be self-contained and reviewers will not be required to check the appendices, if any. Final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account. Short papers describe original focused research and may consist of up to four (4) pages, plus unlimited pages for references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers comments in their final versions.
Limitations and Ethics Statement sections are allowed and encouraged, but are not mandatory. These sections should be placed after the conclusion and will not count towards the overall page limit.
Submissions should follow the ARR formatting requirements<https://github.com/acl-org/acl-style-files>.
Submission routes and deadlines
*SEM solicits both direct submissions and ACL Rolling Review (ARR) commitments. The deadline for direct submissions is May 30, 2025, and these submissions will be reviewed by the *SEM2025 program committee. ACL Rolling Review (ARR) submissions can be committed to *SEM up to August 22, 2025 (authors of ARR-reviewed papers need to include their OpenReview link with reviews in the submission form). Both types of submissions are made through OpenReview.
Direct submission link:
https://openreview.net/group?id=aclweb.org/StarSEM/2025/Conference<https://openreview.net/group?id=aclweb.org/StarSEM/2025/Conference>
Multiple submission policy: *SEM does not prohibit the submission of work that is under consideration for another venue at the same time as the *SEM review period. However, authors of such papers will be asked to declare this at submission time.
Important Dates
(All deadlines are 11:59pm UTC-12h, AoE)
Direct submission deadline (long & short papers): May 30, 2025
ARR-reviewed submission deadline (long & short papers): August 22, 2025
Notification of acceptance: September 5, 2025
Camera-ready deadline: September 26, 2025
Conference date: TBA (co-located with EMNLP 2025)
Following the ACL and ARR policies<https://www.aclweb.org/portal/content/report-acl-committee-anonymity-policy>, there is no anonymity period requirement.
Kemal Kurniawan | Research Fellow | (he/him) PhD
School of Computing and Information Systems | Faculty of Engineering and IT
Level 4, Melbourne Connect, 700 Swanston St
The University of Melbourne, Victoria 3010 Australia
E: kurniawan.k(a)unimelb.edu.au<mailto:kurniawan.k@unimelb.edu.au>
Dear All,
We are pleased to invite you to the one-day interdisciplinary workshop titled Language, Linguistics, and Natural Language Processing: An Interdisciplinary Approach
*
Date: 10th June 2025
*
Location: Cardiff University (face-to-face; talks will also be made available online)
*
Registration: the event is free to attend, but registration is required – please book your ticket here<https://www.eventbrite.co.uk/e/language-linguistics-and-natural-language-pr…> <https://www.eventbrite.co.uk/e/language-linguistics-and-natural-language-pr…>
The workshop will bring together researchers from language and applied linguistics, communication and NLP/computational sciences to explore opportunities for interdisciplinary research and teaching collaborations that account for the technological changes in our everyday and professional communication practices. We will:
*
Promote discussion and idea generation for interdisciplinary research collaboration
*
Identify complementary areas of interest and opportunities for joint projects
*
Build a shared language to promote cross-disciplinary understanding
*
Identify avenues for collaborative teaching that enhance postgraduate research and transferable skills
We invite early career and senior researchers, as well as postgraduate students, of communication-related disciplines and computer sciences to join us for a day of discussions, networking and knowledge exchange activities.
Invited speakers:
Dr Alistair Baron<https://www.lancaster.ac.uk/scc/about-us/people/alistair-baron> – Lancaster University – Spelling variation: problems, solutions, and analysis.
Dr Emma McClaughlin<https://www.nottingham.ac.uk/english/people/emma.mcclaughlin> – The University of Nottingham – Language in Health Contexts: Interdisciplinary Applications of Corpus Linguistics
If you are interested in the intersection of language, communication and digital technologies, we hope to see you there!
Best wishes,
Sara – on behalf of Dawn Knight, Hui Sun, Carla Pérez-Almendros.
Dr Sara Vilar-Lluch
Lecturer in Language and Linguistics
School of English, Communication and Philosophy
Cardiff University
John Percival Building
Column Drive
Cardiff CF10 3EG
Email: VilarLluchS(a)cardiff.ac.uk<mailto:VilarLluchS@cardiff.ac.uk>
Dr Sara Vilar-Lluch
Darlithydd mewn laith ac leithyddiaeth
Ysgol Saesneg, Cyfathrebu ac Athroniaeth
Prifysgol Caerdydd
Adeilad John Percival
Rhodfa Colum
Cardiff CF10 3EG
Ebost: VilarLluchS(a)cardiff.ac.uk<mailto:VilarLluchS@cardiff.ac.uk>
[cid:3a725877-f008-4e4a-aff7-67f1844f3510]<https://outlook.office.com/bookwithme/user/5c6b57420d584ae3bb80be4c4baca223…>
Book time to meet with me<https://outlook.office.com/bookwithme/user/5c6b57420d584ae3bb80be4c4baca223…>
Dear all,
As the deadline is fast approaching, I would like to inform you about a
call for papers for a thematic track at FedCSIS 2025 (IEEE #61123)
called "AI in Digital Humanities, Computational Social Sciences and
Economics Research (AI-HuSo)". FedCSIS 2025 will be held in Kraków,
Poland, 14-17 September 2025.
Paper submission (no extensions): 25.05.2025
See https://2025.fedcsis.org/thematic/ai-huso for details.
This thematic session is dedicated to the computational study of Social
Sciences, Economics and Humanities, including all subjects like, for
example, education, labour market, history, religious studies, theology,
cultural heritage, and informative predictions for decision-making and
behavioural-science perspectives. While digital methods, intelligence
systems, and AI have been emerging topics in these fields for several
decades, this thematic session is not only limited to discoveries in
these domains, but also dedicated to the reflections of these methods
and results within the field of computer science. Thus, we are in
particular interested in interdisciplinary exchange and dissemination
with a clear focus on computational and AI methods for intelligence systems.
Since there is a clear methodological overlap between these three
domains and often similar algorithms and AI approaches are considered,
we see this thematic session as place for interdisciplinary learning,
discussing a joint toolbox as a support for scholars from these field
with human and context-aware agents.
The aim of this thematic session is thus to bridge the gap between
scientific domains, foster interdisciplinary exchange and discuss how
research questions from other domains challenge current computer
science. In particular, we are interested in communications between
researchers from different fields of computer science, social sciences,
economics, humanities, and practitioners from different fields.
Topics
======
The list of topics includes, but is not limited to:
- AI and computational approaches for the interdisciplinary work of
the social sciences, economics, and humanities: report on theoretical,
methodological, experimental, and applied research.
- AI and computational approaches for linking data from different
digital resources, including online social networks, web and data
mining, Knowledge Graphs, Ontologies.
- AI and computational methods for text mining and textual analysis,
for example texts within social sciences, digital literary studies,
computational stylistics and stylometry.
- Text encoding, computational linguistics, annotation guidelines,
OCR for humanities, economics, and social sciences.
- Network analysis, including social and historical network analysis.
- Ethical and philosophical considerations of AI in society,
education and humanties research
In general, the applications of interest are included in the list below,
but are not limited to:
- Labour market research and qualification, including
behavioral-science perspectives.
- Education: Digital methods and systems, e-learning, adult
education, etc.
- Contributions to the application of technology to culture, history,
and societal issues: For example, computational text analysis,
analytical and visualization, databases, etc.
- In particular, we welcome submissions which focus on a critical
reflection of digital methods in the humanities, economics and social
sciences within computer science.
- Linking of digital resources, a discussion of data sets, their
quality and reliability, combining quantitative and qualitative data,
anonymization and data protection.
Contact: ai-huso(a)fedcsis.org
Submission rules
================
- Authors should submit their papers as Postscript, PDF or MSWord files.
- The total length of a paper should not exceed 12 pages IEEE style
(including tables, figures and references). More pages can be added, for
an additional fee. IEEE style templates are available here.
- Papers will be refereed and accepted on the basis of their
scientific merit and relevance to the Topical Area.
- Preprints containing accepted papers will be published online.
- Only papers presented at the conference will be published in
Conference Proceedings and submitted for inclusion in the IEEE Xplore®
database.
- Conference proceedings will be published in a volume with ISBN,
ISSN and DOI numbers and posted at the conference WWW site.
- Conference proceedings will be submitted for indexation according
to information here.
- Organizers reserve right to move accepted papers between FedCSIS
Sessions.
Extended versions of selected papers presented at the conference may be
published in a volume entitled "Advances in Computational Social
Sciences: AI, Computational Methods and Applications for the Study of
Society", to be published by Springer. In addition, selected papers may
be submitted to a special issue of AI entitled "Integrating Data Sources
for Smarter Interdisciplinary AI Solutions: Challenges and Opportunities".
Only *10 days left* to apply for AthNLP 2025, taking place at the campus
of NCSR Demokritos in Athens, Greece between 4-10 September 2025. Don't
miss the chance to be part of the most exciting events in Europe for
NLPers, one of the most dynamic research communities nowadays, learn
about exciting technologies as LLMs and take the opportunity to present
your work in the poster sessions!
> Are you passionate NLP and Machine Learning?
>
> Join us this September in Athens for a week full of lectures, hands-on
> sessions, and networking opportunities with top researchers in the
> field. AUEB Natural Language Processing Group invites everyone
> interested in Natural Language Processing and Machine Learning to
> attend the 3rd Athens Natural Language Processing Summer School -
> AthNLP 2025! More information can be found at:
> https://athnlp.github.io/2025/
>
> *Important Dates*
>
> * Application Deadline: *May 30, 2025*
> * Decision: June 10, 2025
> * Registration: June 17, 2025
> * Summer School: September 4-10, 2025
>
> Stay tuned – more details about the programme coming soon!
>
>
>
> -------- Forwarded Message --------
> Subject: [athnlp2024participants] Call for Participation AthNLP 2025
> | Application deadline 30 May 2025 | Summer School 4-10 September 2025
> | Athens, Greece | NCSR Demokritos Campus
> Date: Fri, 2 May 2025 12:19:08 +0000
> From: Athens Natural Language Processing Summer School 2024
> <athnlp2024(a)athenarc.gr>
> Reply-To: Athens Natural Language Processing Summer School 2024
> <athnlp2024(a)athenarc.gr>
> To: Athens Natural Language Processing Summer School 2024
> <athnlp2024(a)athenarc.gr>
>
>
>
> *
> AthNLP 2025 - Athens Natural Language Processing Summer School
> <https://athnlp.github.io/2025/index.html>*
> We invite everyone interested in Natural Language Processing and
> Machine Learning to participate in the *3rd Athens Natural Language
> Processing Summer School* taking place in *Athens, Greece* *at NCSR
> Demokritos Campus* between *4-10 September 2025.*
>
> *Application Deadline: 30 May 2025*
> *Submit your application here: https://athnlp.github.io/2025/cfp.html*
> The full programme and list of speakers will be announced soon.
>
> _Important Dates_
> Application Deadline: *30 May 2025*
> Decision announcement: *10 June 2025*
> Registration until: *17 June 2025*
> Summer School: *4-10 September 2025*
>
> Following successful editions in 2019 and 2024, *AthNLP 2025* returns
> to the campus of *NCSR Demokritos* in Athens. The summer school is
> organised by *NCSR Demokritos*, the *Athens University of Economics
> and Business*, *RC ATHENA*, and *Heriot-Watt University*, in close
> collaboration with *LxMLS* (Lisbon, 19–25 July 2025).
>
> The school focuses on *_machine learning methods for NLP_*, offering:
>
> Morning lectures on theory
>
> Afternoon hands-on lab sessions
>
> Evening research talks, poster sessions, and demos
> *Participants will also have the opportunity to present their work in
> *poster sessions* throughout the week.
>
>
> *_Target Audience:_*
> - Students and researchers in NLP and Computational Linguistics
>
> Computer scientists with interest in NLP and ML
>
> Industry professionals seeking deeper understanding of these fields
>
> ** No prior experience in NLP or ML is required—just basic math and
> Python.
>
> *_Features of AthNLP_*_:_
> * Attendance at the Social Event, daily lunch as well as morning and
> afternoon coffee breaks are included in the application fee.
> * Lecturers are leading researchers in machine learning and natural
> language processing.
> * Students will be able to (optionally) show their current work in
> poster sessions during coffee breaks.
> * In the demo day, students will be able to interact with technical
> companies and research institutions working in machine learning.
>
>
> _Fees_
> *300 EUR for students
> 400 EUR for University professors or researchers at public Institutes
> 500 EUR for everyone else*
>
> Any questions should be directed to: athnlp2024(a)athenarc.gr
>
> We are looking forward to your participation!
>
> /The Organising Committee of AthNLP 202//5/
>
> --
> Pelagia Drosaki
>
> Communication team
> Institute of Informatics & Telecommunications
> NCSR Demokritos Ag. Paraskevi, Greece
> +30 210 650 3197
> www.iit.demokritos.gr <http://www.iit.demokritos.gr/>
>
Second call for papers Sixth Workshop on Resources for African
Indigenous Language (RAIL)
Co-located with DHASA 2025
https://sadilar.org/rail-2025/
RAIL Workshop date: 10 November 2025
DHASA Conference dates: 10-14 November 2025
Venue: CSIR International Convention Centre.
The sixth RAIL workshop website: https://sadilar.org/rail-2025/
DHASA website: https://digitalhumanities.org.za/
The sixth Resources for African Indigenous Languages (RAIL) workshop
will be co-located with the Digital Humanities Association of Southern
Africa (DHASA) 2025 conference at the CSIR International Convention
Centre in Pretoria, South Africa, on 10 November 2025. The RAIL
workshop is an interdisciplinary platform for researchers working on
African indigenous languages resources such as natural languages
processing (NLP) tools, Human Language Technologies (HLT), data
collections, and annotations. This workshop aims to foster a
scientific community of practice that focuses on computational
linguistic tools and data that are designed for or applied to the
indigenous languages of Africa.
Many African languages are under-resourced while only a few are
considered to be somewhat better resourced. These languages often share
interesting properties such as writing systems, making them different
from most high-resourced languages. From a computational perspective,
these languages lack enough corpora to undertake high level development
of NLP and HLT tools, which in turn impedes the development of African
languages in these areas. During previous workshops, it was noted that
the problems and solutions presented were not only applicable to
African languages but were also relevant to many other low-resource
languages across the world. Because these languages share similar
challenges, this workshop provides researchers with opportunities to
work collaboratively on issues of language resource development and
learn from each other.
The RAIL workshop has several aims. First, the workshop brings together
researchers who work on African indigenous languages, forming a
community of practice for people working on indigenous languages.
Second, the workshop aims to reveal currently unknown or unpublished
existing resources (corpora, NLP tools, and applications), resulting in
a better overview of the current state-of-the-art, and also allows for
discussions on novel, desired resources for future research in this
area. Third, it enhances sharing of knowledge on the development of
low-resource languages. Finally, it enables discussions on how to
improve the quality as well as availability of the resources.
The workshop has “Language resources in the age of large language
models” as its theme, but submissions on any topic related to
properties of African indigenous languages (including related non-
African languages) may be accepted. Suggested topics include (but are
not limited to) the following:
* Digital representations of linguistic structures
* Descriptions of corpora or other data sets of African indigenous
languages
* Building resources for (under-resourced) African indigenous languages
* Developing and using African indigenous languages in the digital age
* Effectiveness of digital technologies for the development of African
indigenous languages
* Revealing unknown or unpublished existing resources for African
indigenous languages
* Developing desired resources for African indigenous languages
* Improving quality, availability and accessibility of African
indigenous language resources
Submission requirements:
We invite papers on original, unpublished work related to the topics of
the workshop. Submissions, presenting completed work, may consist of up
to eight (8) pages of content plus additional pages of references. The
final camera-ready version of accepted long papers are allowed one
additional page of content (up to 9 pages) so that reviewers’ feedback
can be incorporated. Papers should be formatted according to the DHASA
style sheet which is provided on the Journal of the Digital Humanities
Association of Southern Africa website
(https://upjournals.up.ac.za/index.php/dhasa/about). Reviewing is
double-blind, so make sure to anonymise your submission (e.g., do not
provide author names, affiliations, project names, etc.) Limit the
amount of self citations (anonymised citations should not be used). The
RAIL workshop follows the DHASA submission requirements.
Please submit papers in PDF format (the submission link will be
available soon). Accepted papers will be published in proceedings
linked to the DHASA conference.
Important dates:
Submission deadline: 14 July 2025
Date of notification: 16 September 2025
Camera ready copy deadline: 24 October 2025
Workshop: 10 November 2025
DHASA conference: 10 November 2025-14 November 2025
Organising Committee
Rooweither Mabuya, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Mmasibidi Setaka, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Menno van Zaanen, South African Centre for Digital Language Resources
(SADiLaR), South Africa
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za
Professor in Digital Humanities
South African Centre for Digital Language Resources
https://www.sadilar.org
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________
🚀 Second Call for Interest: DISRPT 2025 Shared Task on Discourse Relation Parsing and Treebanking.
🛎️ sample data has been released!
In conjunction with CODI-CRAC & EMNLP 2025 - Suzhou, China, Nov. 5-9.
This year, we are organizing the fourth edition of the DISRPT shared task on discourse processing across formalisms, for a variety of languages and genres, with three subtasks:
* Task 1: Discourse segmentation
* Task 2: Connective identification
* Task 3: Relation classification
We will provide training, development and test datasets from (almost) all available languages in RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies, using a uniform format. Because different corpora, languages, and frameworks use different guidelines, the shared task will promote the design of flexible methods for dealing with various guidelines, and will help to push forward the discussion of converging standards for discourse units. For datasets which have treebanks, we will evaluate segmentation in two different scenarios: with and without gold syntax. An automatically parsed version is provided for all corpora without a gold parse.
This year, the shared task will feature:
* The inclusion of more frameworks, with datasets from: RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies * The inclusion of new corpora and new languages, some of them kept a surprise! * A unified set of labels for the discourse relations, to make easier the evaluation across datasets * A new constraint: only one multilingual model should be submitted per task, and it should be small! This will make our replication work easier, but more importantly, it will simplify using such a model and test the robustness of your solution.
Today, we’re excited to announce the release of the sample data for the DISRPT 2025 Shared Task! You can now access the data, format documentation, and tools on our GitHub 🔗 https://github.com/disrpt/sharedtask2025
The sample covers five discourse frameworks — RST / eRST, PDTB, SDRT, and Discourse Dependencies — across 12 languages: English, Basque, French, Dutch, Italian, Portuguese, Spanish, Frasi, Chinese, Russian, Turkish, and Thai.
We invite researchers and teams interested in participating to register now. Registered participants will be added to our mailing list and receive all future updates.
📅 The full training data will be released on June 16, 2025 — stay tuned!
To join the mailing list and stay informed, please email us at:
📧 disrpt_chairs(a)googlegroups.com
Let us know you're interested — we’d love to have you on board!
**Important dates**
* May 16 2025 – Sample data release [NOW] * June 16 2025 – Training data release * July 14 2025 – Test data release * August 1 2025 – System + paper submissions due * September 12 2025 – Notification of acceptance * September 19 2025 – Camera ready papers * November 8-9 2025 – CODI at EMNLP
All deadlines are 11.59 pm UTC -12h (AoE, "Anywhere on Earth").
**Information:**
Contact the organizers: disrpt_chairs(a)googlegroups.com
Official website: https://sites.google.com/view/disrpt2025/
Google group for participants, please join us on: disrpt2025_participants(a)googlegroups.com
**Organization:**
Peter Bourgonje (Universität Potsdam, Germany)
Chloé Braud (CNRS - IRIT, University of Toulouse, France)
Chuyuan Li (University of British Columbia, Canada)
Janet Yang Liu (LMU Munich, Germany)
Philippe Muller (CNRS - University of Toulouse, France)
Amir Zeldes (Georgetown University, Washington DC, USA)