ACL 2025 Call for Papers
MAIN CONFERENCE
ACL 2025
Website: https://2025.aclweb.org/ [1]
Submission Deadline: February 15, 2025
Conference Dates: July 27 to August 1, 2025
Location: Vienna, Austria
Special Theme: "Generalization of NLP Models"
Contact:
* Roberto Navigli [2] (General Chair)
* Wanxiang Che [3], Joyce Nabende [4], Mohammad Taher Pilehvar [5],
Ekaterina Shutova [6] (Program Chairs):
For questions related to paper submission, email:
editors(a)aclrollingreview.org
For all other questions, email: acl2025pcs(a)gmail.com
OVERVIEW
ACL 2025 invites the submission of long and short papers featuring
substantial, original, and unpublished research in all aspects of
Computational Linguistics and Natural Language Processing. ACL 2025 has
a goal of a diverse technical program--in addition to traditional
research results, papers may contribute negative findings, survey an
area, announce the creation of a new resource, argue a position, report
novel linguistic insights derived using existing computational
techniques, and reproduce, or fail to reproduce, previous results. As in
recent years, some of the presentations at the conference will be of
papers accepted by the Transactions of the ACL (TACL) and by the
Computational Linguistics (CL) journals.
Papers submitted to ACL 2025, but not selected for the main conference,
will also automatically be considered for publication in the Findings of
the Association of Computational Linguistics.
PAPER SUBMISSION INFORMATION
Papers may be submitted to the ARR 2025 February cycle. Papers that have
received reviews and a meta-review from ARR (whether from the ARR 2025
February cycle or an earlier ARR cycle) may be committed to ACL 2025 via
the conference commitment site (TBA).
SUBMISSION TOPICS
ACL 2025 aims to have a broad technical program. Relevant topics for the
conference include, but are not limited to, the following areas (in
alphabetical order):
* Computational Social Science and Cultural Analytics
* Dialogue and Interactive Systems
* Discourse and Pragmatics
* Efficient/Low-Resource Methods for NLP
* Ethics, Bias, and Fairness
* Generation
* Information Extraction
* Information Retrieval and Text Mining
* Interpretability and Analysis of Models for NLP
* Language Modeling
* Linguistic theories, Cognitive Modeling and Psycholinguistics
* Machine Learning for NLP
* Machine Translation
* Multilinguality and Language Diversity
* Multimodality and Language Grounding to Vision, Robotics and Beyond
* NLP Applications
* Phonology, Morphology and Word Segmentation
* Question Answering
* Resources and Evaluation
* Semantics: Lexical and Sentence-Level
* Sentiment Analysis, Stylistic Analysis, and Argument Mining
* Speech recognition, text-to-speech and spoken language understanding
* Summarization
* Syntax: Tagging, Chunking and Parsing
* Special Theme: Generalization of NLP Models
ACL 2025 Theme Track: Generalization of NLP Models
Following the success of the ACL 2020-2024 Theme tracks, we are happy to
announce that ACL 2025 will have a new theme with the goal of reflecting
and stimulating discussion about the current state of development of the
field of NLP.
Generalization is crucial for ensuring that models behave robustly,
reliably, and fairly when making predictions on data different from
their training data. Achieving good generalization is critically
important for models used in real-world applications, as they should
emulate human-like behavior. Humans are known for their ability to
generalize well, and models should aspire to this standard.
The theme track invites empirical and theoretical research and position
and survey papers reflecting on the Generalization of NLP Models. The
possible topics of discussion include (but are not limited to) the
following:
* How can we enhance the generalization of NLP models across various
dimensions--compositional, structural, cross-task, cross-lingual,
cross-domain, and robustness?
* What factors affect the generalization of NLP models?
* What are the most effective methods for evaluating the
generalization capabilities of NLP models?
* While Large Language Models (LLMs) significantly enhance the
generalization of NLP models, what are the key limitations of LLMs in
this regard?
The theme track submissions can be either long or short. We anticipate
having a special session for this theme at the conference and a Thematic
Paper Award in addition to other categories of awards.
TWO-STAGE REVIEW: SUBMISSION TO ARR, COMMITMENT TO ACL 2025
ACL 2025 will use ACL Rolling Review [7] (ARR)
https://aclrollingreview.org/cfp as a reviewing system, but final
decisions will be made by the conference. Both submissions of articles
for review and commitment of reviewed articles to the conference will be
performed via the Open Review [8] platform. Specifically, authors will
follow a two-step process:
* Authors submit articles to ARR, where submissions receive reviews
and meta-reviews from ARR reviewers and action editors;
* Authors commit their reviewed articles to a publication venue (e.g.,
ACL 2025), where Senior Area Chairs and Program Chairs make acceptance
decisions from the ARR reviews and meta-reviews.
ACL 2025 has chosen this approach in coordination with *CL 2024
conferences, which are adopting the same procedure and a coordinated
submission plan to allow maximum flexibility during their submission
periods for the authors. At each cycle, after a paper has been fully
reviewed, authors have the option to commit their paper to a conference
or revise and resubmit for another round of reviews.
The reviewing process will continue to be double-blind. Reviewers will
not see authors, nor will authors see reviewers, and reviews on ARR will
not be made publicly visible. However, authors will be given the option
through ARR to make their anonymized submitted articles publicly
visible.
MANDATORY REVIEWING WORKLOAD
AS THE PACE OF RESEARCH IN THE FIELD CONTINUES TO INCREASE, WE NEED TO
STRENGTHEN THE COMMITMENT TO REVIEWING FOR EACH PAPER SUBMISSION. DURING
THE ARR SUBMISSION PROCESS, AUTHORS WILL BE REQUIRED TO SPECIFY WHICH
CO-AUTHORS ARE COMMITTING TO COVER REVIEWING IN THIS REVIEWING CYCLE.
PLEASE SEE THE NEW ARR POLICY REGARDING REVIEWING WORKLOAD HERE. AS THIS
IS AN ARR-WIDE POLICY FOR ALL *CL CONFERENCES, QUESTIONS OR
CLARIFICATIONS SHOULD BE ADDRESSED TO ARR DIRECTLY.
IMPORTANT DATES
Submission deadline (all papers are submitted to ARR)
February 15, 2025
ARR reviews & meta-reviews available to authors of the February cycle
April 15, 2025
Commitment deadline for ACL 2025
April 20, 2025
Notification of acceptance
May 15, 2025
Withdrawal deadline
May 30, 2025
Camera-ready papers due
May 30, 2025
Tutorials
July 27, 2025
Conference
July 28 - 30, 2025
Workshops
July 31 - August 1, 2025
Note: All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
Paper Submission Details
Both long and short paper submissions should follow all of the ARR
submission requirements [9], including:
* Long Papers [10] (8 pages) and Short Papers [11] (4 pages)
* Instructions for Two-Way Anonymized Review [12]
* Authorship [13]
* Citation and Comparison [14]
* Multiple Submission Policy [15], Resubmission Policy [16], and
Withdrawal Policy [17]
* Ethics Policy [18] including the responsible NLP research checklist
[19]
* Limitations [20]
* Paper Submission and Templates [21]
* Optional Supplementary Materials [22]
Final versions of accepted papers will be given one additional page of
content (up to 9 pages for long papers, up to 5 pages for short papers)
to address reviewers' comments.
Following the ACL and ARR policies
(https://www.aclweb.org/portal/content/report-acl-committee-anonymity-policy)
[23], there is no anonymity period requirement.
At the time of submission to ARR, authors will be asked to select a
preferred venue (e.g., ACL 2025). This is used only to calculate
acceptance rates. Authors who selected ACL 2025 as a preferred venue
when submitting to ARR may choose not to commit to ACL 2025 after
receiving their reviews, and authors who selected a preferred venue
other than ACL 2025 when submitting to ARR are still welcome to commit
to ACL 2025.
Presentation at the Conference
All accepted papers must be presented at the conference to appear in the
proceedings. The conference will include both in-person and virtual
presentation options. Papers without at least one presenting author
registered by the early registration deadline may be subject to desk
rejection. Long and short papers will be presented orally or as posters
as determined by the program committee. While short papers will be
distinguished from long papers in the proceedings, there will be no
distinction in the proceedings between papers presented orally and
papers presented as posters.
Links:
------
[1] https://2025.aclweb.org/
[2] https://www.diag.uniroma1.it/navigli/
[3] http://ir.hit.edu.cn/~car/
[4] https://sites.google.com/view/jnabende/home?authuser=0
[5] https://pilehvar.github.io/
[6] https://www.shutova.org/
[7] https://aclrollingreview.org/cfp
[8] https://openreview.net/
[9] https://aclrollingreview.org/cfp#paper-submission-information
[10] https://aclrollingreview.org/cfp#long-papers
[11] https://aclrollingreview.org/cfp#short-papers
[12]
https://aclrollingreview.org/cfp#instructions-for-two-way-anonymized-review
[13] https://aclrollingreview.org/cfp#authorship
[14] https://aclrollingreview.org/cfp#citation-and-comparison
[15] https://aclrollingreview.org/cfp#multiple-submission-policy
[16] https://aclrollingreview.org/cfp#resubmission-policy
[17] https://aclrollingreview.org/cfp#withdrawal-policy
[18] https://aclrollingreview.org/cfp#ethics-policy
[19] https://aclrollingreview.org/responsibleNLPresearch
[20] https://aclrollingreview.org/cfp#limitations
[21] https://aclrollingreview.org/cfp#paper-submission-and-templates
[22]
https://aclrollingreview.org/cfp#optional-supplementary-materials-appendice…
[23]
https://www.aclweb.org/portal/content/report-acl-committee-anonymity-policy
Second Forlì International Workshop on Corpus-based Interpreting Studies and Applications: At the Interface of Data and Technology
8–10 May 2025, Forlì, Italy, and online
Call for Papers
The Department of Interpreting and Translation at the University of Bologna is organising the Second Forlì International Workshop on Corpus-based Interpreting Studies and Applications on 8–10 May 2025 in Forlì, Italy, and online.
Background
The Forlì International Workshop series was launched in 2015 to stimulate the creation of interpreting corpora and corpus-based research projects. The first workshop gathered more than 100 scholars from around the world, resulting in a volume of state-of-the-art research (Russo et al. 2018<https://link.springer.com/book/10.1007/978-981-10-6199-8>) and a special issue (Bendazzoli et al. 2018<https://www.intralinea.org/specials/cbis>). The tenth anniversary of the first workshop marks an opportune occasion to take stock of recent developments and chart new directions in light of corpora’s fundamental role in technological advancements.
Theme
Interpreting corpora serve as the descriptive foundation of research and the ground truth against which machine interpreting technologies are evaluated. Corpus-based interpreting studies, as envisaged by Shlesinger (1998<https://doi.org/10.7202/004136ar>), have developed into a highly productive line of inquiry with theoretical inputs from cognitive linguistics and sociology and methodological contributions from natural language processing, prosody research, and multimodality. Recently, large interpreting corpora have fuelled the deployment of machine interpreting technologies, together with deep learning algorithms that synthesise signing images and texts (e.g. Saunders et al. 2022<https://doi.org/10.1109/CVPR52688.2022.00508>). Amidst changing conceptual boundaries (Pöchhacker 2024<https://doi.org/10.1075/ts.23028.poc>), methodological developments, and technological landscape, a field-wide reflection on the role of corpora is necessary.
In this context, we view the Second Forlì International Workshop as an opportunity to bring together researchers who create, analyse, and use corpora to study interpreting and develop tools and applications for corpus-based research, computer-assisted interpreting, machine interpreting, automated interpreting quality assessment, pedagogy, and other related domains.
Submissions
We particularly welcome abstracts addressing the following topics:
*
Theory testing and comparisons using corpora
*
Replication studies using corpora
*
Quantitative and qualitative approaches to corpus-based studies
*
Triangulating corpora with other data sources (e.g. user response, eye-tracking, EEG, and workplace ethnography)
*
Underrepresented corpora, e.g. those involving signed language interpreting and onsite versus remote interpreting
*
Corpora of constrained communication (Kotze & van Rooy 2024<https://doi.org/10.1075/coll.60.01kot>), e.g. those by untrained interpreters and learners
*
Multimodal and multiple interpreting corpora
*
Computer-assisted interpreting software and applications
*
Machine interpreting systems
*
Automated interpreting quality assessment based on corpora
*
Corpus design according to principles of scientific data management (e.g. FAIR principles; Wilkinson et al. 2016<https://doi.org/10.1038/sdata.2016.18>)
The first page of the submission should contain each author’s name, affiliation (if any), and e-mail address. The abstract should be placed from the second page onwards, including the title, a text between 300 and 400 words (including examples, excluding references) that clearly states the research questions, methods, data, and (preliminary) results, and up to five keywords. The abstract should not contain any identifying information to allow for double-blind peer reviewing.
There will be three types of presentations:
*
Full paper (20 minutes + 10 minutes Q & A)
*
Poster
*
Software demonstration
Abstracts will be submitted to unic(a)dipintra.it<mailto:unic@dipintra.it>. Authors may indicate a full paper, a poster, or a software presentation upon submission. Posters will be displayed on Day 1 of the Workshop, and a time slot will be reserved in the programme for participants to discuss with poster presenters. Software demonstrations will be held on Day 2 in a dedicated time slot.
A selection of papers will be published in an edited volume and a special issue in a scientific journal.
Language of the conference
English
Important dates
*
Submission deadline: 1 February 2025
*
Notification of acceptance and rejection: 1 March 2025
*
Pre-workshop session on the Unified Interpreting Corpus (UNIC; https://unic.dipintra.it/) platform: 15:00–17:30, 8 May 2025
*
Workshop: 9–10 May 2025
Pre-workshop session convenors
*
Nannan Liu
*
Mariachiara Russo
Scientific committee members
Claudia Angelelli (Herriot-Watt University)
Alberto Barrón-Cedeño (University of Bologna)
Claudio Bendazzoli (University of Verona)
Silvia Bernardini (University of Bologna)
Sabine Braun (University of Surrey)
Agnieszka Chmiel (Adam Mickiewicz University)
Elena Davitti (University of Surrey)
Bart Defrancq (Ghent University)
Adriano Ferraresi (University of Bologna)
Min-hua Liu (Hong Kong Baptist University)
Nannan Liu (University of Bologna)
Bernd Meyer (Johannes Gutenberg University of Mainz)
Maja Miličević Petrović (University of Bologna)
Koen Plevoets (Ghent University)
Bianca Prandi (University of Bologna)
Mariachiara Russo (University of Bologna)
Annalisa Sandrelli (University of the International Studies of Rome)
Elisabet Tiselius (Stockholm University)
Ira Torresi (University of Bologna)
Kim Wallmach (Stellenbosch University)
Organising committee members
Alberto Barrón-Cedeño (University of Bologna)
Michela Bertozzi (University of Bologna)
Silvia Bernardini (University of Bologna)
Francesca D’Angelo (University of Bologna)
Bart Defrancq (Ghent University)
Adriano Ferraresi (University of Bologna)
Serena Ghiselli (University of Bologna)
Nannan Liu (University of Bologna)
Marco Lobascio (University of Bologna)
Natacha Niemants (University of Bologna)
Maja Miličević Petrović (University of Bologna)
Bianca Prandi (University of Bologna)
Mariachiara Russo (University of Bologna)
Nicoletta Spinolo (University of Bologna)
Han Wang (University of Bologna)
Contact
unic(a)dipintra.it<mailto:unic@dipintra.it>
Workshop website
Website coming soon.
Dr Nannan Liu
Marie Curie Fellow
Project FAITH<https://cordis.europa.eu/project/id/101108651>
Department of Interpreting and Translation
University of Bologna
Dear colleagues,
We cordially invite submissions of proposals for shared tasks, workshops, and tutorials to be held at the SwissText 2025 conference. SwissText will take place from June 17-18, 2025 at ZHAW in Winterthur.
ABOUT SwissText
SwissText is an annual conference that brings together text analytics experts from industry and academia. It is organized by the Swiss Association for Natural Language Processing (SwissNLP) in collaboration with the Zurich University of Applied Sciences (ZHAW).
SPECIAL EDITION
This edition of SwissText will be special, since we celebrate its 10th anniversary! This is a great opportunity to look back. Hence, in addition to novel ideas for shared tasks, we also invite previous organizers of shared tasks to re-submit their ideas: What has changed since the last run of the shared task? Is the task still relevant? Do LLMs solve everything now? This offers us to see what progress has been made over the years.
To give you some ideas, here is a list of previous shared tasks:
*
NLP for Sustainable Development Goals Monitoring
*
Swissdox Hackathon
*
Detecting greenwashing signals through a comparison of ESG reports and public media
*
Swiss German Speech to Standard German Text Shared Task
*
Low-Resource Speech-to-Text
*
The Sentence End and Punctuation Prediction in NLG text (SEPP-NLG)
*
German Text Summarization Challenge
*
.. and many more (see the SwissText website archive)!
FORMAT FOR PROPOSALS
Proposals for shared tasks should contain:
*
a title and a brief description of the topic of the task
*
a description of the data sets that will be used in the shared task and their readiness
*
a sketch of how the submitted systems will be evaluated
*
a tentative timeline
Proposals for workshops should contain:
*
a title and a brief description of the topic
*
a description of the intended audience
*
workshop format (paper presentations, poster session, etc.)
*
a tentative timeline
Proposals for tutorials should contain:
*
a title and a brief description of the topic and the goal of the tutorial
*
an introduction of the workshop speakers and their background and expertise
*
a description of the intended audience and the required level of expertise (beginners, experts, etc.)
*
a tentative outline of the tutorial schedule
Note that the organization and running of the shared tasks, workshops, and tutorial is in the hands of the respective organizers. The SwissText organizers will provide infrastructure (rooms, paper submission platform) and assist where they can, of course.
Interested? We are looking forward to your proposals. Submit your proposals by email to info(a)swisstext.org<mailto:info@swisstext.org> no later than November 30, 2024. Notifications will be sent out by December 15, 2024.
Kind regards,
Don
________________________________
ZHAW School of Engineering / CAI
Dr. Don Tuggener
Technikumstrasse 71
Postfach
8401 Winterthur
Tel: +41 58 934 78 55
Web: https://www.zhaw.ch/de/ueber-uns/person/tuge/
Call for Applications: Georgetown University MS and Ph.D. programs in
Computational Linguistics
Georgetown University (Washington, DC) invites applications to our MS and
Ph.D. programs for students wishing to study Computational Linguistics
starting in Fall 2025. Georgetown is strategically located in the nation's
capital with proximity to government institutions, a thriving tech
industry, and other major universities. Our programs offer advanced
training and cutting-edge research opportunities in topics such as natural
language processing, syntactic/semantic analysis, corpora, contemporary
machine learning/deep learning methods for language, and applications of
Large Language Models and AI technologies. Current research priorities
include computational psycholinguistics, computational models of discourse,
and linguistics-based language modeling. Students have the opportunity to
learn from faculty across a range of departments and to participate in our
interdisciplinary research community (http://gucl.georgetown.edu/).
Additionally, students who apply to the Ph.D. program in Computational
Linguistics may elect to participate in the university’s Interdisciplinary
Concentration in Cognitive Science.
Admitted Ph.D. applicants are offered full funding. Admissions and funding
decisions are made without regard to domestic vs. international status. Our
programs include:
-
MS in Computational Linguistics
<https://linguistics.georgetown.edu/graduate/master-degree-programs/master-o…>
(Dept. of Linguistics, Jan. 15 deadline)
-
Ph.D. in Computational Linguistics
<https://linguistics.georgetown.edu/graduate/phd-programs> (Dept. of
Linguistics, Dec. 1 deadline)
-
Optional: Interdisciplinary Concentration in Cognitive Science
<https://cogsci.georgetown.edu/concentration/>
Requirements: A bachelor’s degree by August 2025 is required to enter
either program. Both programs include core courses in Linguistics, so a
prior degree in Linguistics is not required. Ph.D. applicants are expected
to have some programming experience and a commitment to participating in
computational linguistics research. See recommendations for the Ph.D.
Statement of Purpose
<https://medium.com/@nschneid/inside-ph-d-admissions-what-readers-look-for-i…>
.
Applications: https://linguistics.georgetown.edu/programs/apply/
Questions about application requirements and procedures should be directed
to Erin Esch Pereira (eee8(a)georgetown.edu), Graduate Program Coordinator,
Dept. of Linguistics.
Sincerely,
Nathan Schneider <http://nathan.cl/>, Depts. of Linguistics and Computer
Science; concentration director
Amir Zeldes <https://gucorpling.org/amir/>, Dept. of Linguistics,
Computational Linguistics
Ethan Gotlieb Wilcox <https://wilcoxeg.github.io/>, Dept. of Linguistics,
Computational Linguistics
--
Ethan Gotlieb Wilcox
Assistant Professor, Computational Linguistics
Georgetown University
www.wilcoxeg.github.io
The next meeting of the Edge Hill Corpus Research Group will take place online (via MS Teams) on Friday 15 November 2024, 2-4 pm (GMT).
Topic: Discourse-Oriented Corpus Studies
2-3 pm
Katia Adimora (Edge Hill University)
Mexican immigration/immigrants in American and Mexican newspapers
3-4 pm
Dan Malone (Edge Hill University)
When is the extreme also typical? Using prototypicality to investigate representations of the lone-wolf terrorist
Attendance is free. The abstracts and registration link are here: https://sites.edgehill.ac.uk/crg/next
Registration closes on Wednesday 13 November, 11 am (GMT).
If you have any questions, please contact Costas Gabrielatos (gabrielc(a)edgehill.ac.uk<mailto:gabrielc@edgehill.ac.uk>).
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
Hello Everyone,
Based on the emails received we have extended the submission deadline to
November 7th 2024. Please read below for more information on the workshop
and the updated timeline.
----
Don’t miss this unique opportunity to discuss key issues and contribute to
the advancement of language processing in the South Asian region, home to
25% of the world’s population and rich in linguistic and cultural diversity.
Submit your papers by October 30, 2024, and join us at the first Workshop
on Challenges in Processing South Asian Languages (CHiPSAL), taking place
at COLING 2025 on January 19, 2025.
Please submit your papers via
*https://softconf.com/coling2025/CHiPSAL25
https://softconf.com/coling2025/CHiPSAL25 *
----
*CHiPSAL 2025*, the First workshop on Challenges in Processing South Asian
Languages (CHiPSAL), will be held as part of the 31st International
Conference on Computational Linguistics (COLING 2025) in Abu Dhabi,
UAE, on *January
19, 2025*. The workshop will be conducted in *virtual mode*.
CHiPSAL 2025 invites the submission of original research papers,
review/opinion papers, and system demonstration papers, in short or long
forms, on topics that highlight the challenges related to South Asian
languages, including but not limited to the following areas:
- Encoding and Unicode Issues in South Asian Scripts
- Orthographic Complexities and Their Impact on Language Technology
- Morphological Analysis and Generation in South Asian Languages
- Dialectal Variations and Language Standardisation
- Code-Mixing and Multilingualism in South Asian Contexts
- Building Linguistic Resources for South Asian Languages
- Speech Recognition and Synthesis for South Asian Languages
- Preserving Linguistic Heritage through Technology
- Benchmarking Models for South Asian Languages
*Important Dates*
All deadlines are 11:59PM UTC-12:00 (“Anywhere on Earth”).
The First CFP Monday, 15 July 2024
Submission Deadline October 30, 2024 *November 7, 2024*
Notification of acceptance November 29, 2024
Camera-ready papers December 13, 2024
Pre-recorded video due January 5, 2024
Workshop (Virtual) January 19, 2025
*For more information: https://sites.google.com/view/chipsalhttps://sites.google.com/view/chipsal*
Opening of the Faetar Low-Resource ASR Challenge 2025
We are pleased to officially announce the opening of the Faetar Low-Resource ASR Challenge 2025. While we were not able to secure a special session dedicated to the challenge at the conference, we strongly encourage submission of papers describing your systems to Interspeech 2025. As such, we plan to adhere to a timeline that will allow us to return test results and announce winners in time for participants to prepare Interspeech papers (see below).
Challenge website: https://perceptimatic.github.io/faetarspeech/
The Faetar Low-Resource ASR Challenge aims to focus researchers’ attention on several issues which are common to many archival collections of speech data:
- noisy field recordings
- lack of standard orthography, leading to noise in the transcriptions in the form of transcriber inconsistencies
- only a few hours of transcribed data
- a larger collection of untranscribed data
- no additional data in the language (textual or speech) that is easily available
- “dirty” transcriptions in documents, which contain matter that needs to be filtered out
By focusing multiple research groups on a single corpus of this kind, we aim to gain deeper insights into these problems than can be achieved otherwise.
The challenge uses the Faetar ASR Benchmark Corpus. Faetar (pronounced [fajdar]) is a variety of the Franco-Provençal language which developed in isolation in Italy, far from other speakers of Franco-Provençal, and in close contact with Italian. Faetar has less than 1000 speakers around the world, in Italy and in the diaspora. It is endangered, and preservation, learning, and documentation are a priority for many community members. The benchmark data represents the majority of all archived speech recordings of Faetar in existence, and it is not available from any other source.
We propose four tracks:
- Constrained ASR. Participants should focus on the challenge of improving ASR architectures to work with small, poor-quality sets. Participants may not use any resources to train / fine-tune their models beyond the files contained in the provided train set. No external pre-trained acoustic models or language models are allowed, and the use of the unlabelled portion of the Faetar challenge data set is not allowed either.
Three other “thematic tracks” can be explored, and should not be considered mutually exclusive:
- Using pre-trained acoustic models or language models. Participants focus on the most effective way to make use of models pre-trained on other languages.
- Using unlabelled data. The challenge data also includes ~20 hrs of unlabelled data. Participants focus on finding the most effective way to make use of it.
- Dirty data. The training data was extracted and automatically aligned from long-form audio and partial transcriptions in “cluttered” word processor files, relying on (error-prone) VAD, scraping, and alignment. Participants focus on improving the pipeline for extracting useful training data, with the ultimate goal of improving performance.
Submissions will be evaluated on phone error rate (PER) on the test set. Participants are provided with a dev kit allowing them to calculate the PER on dev and train, as well as reproduce the baselines.
For more information, and to register and obtain the data and the dev kit, please visit the challenge website:
https://perceptimatic.github.io/faetarspeech/
For more information, or for questions, please contact us by writing to faetarasrchallenge(a)gmail.com.
CALL FOR PAPERS: THE 1ST WORKSHOP ON NLP FOR LANGUAGES USING ARABIC
SCRIPT (ABJADNLP 2025)
Co-located with COLING 2025 Conference, Abu Dhabi, UAE (19-20 January
2025)
https://wp.lancs.ac.uk/abjad/
Submission URL [1]
AbjadNLP is dedicated to advancing innovation and gaining deeper
insights into Natural Language Processing (NLP) for languages that use
the Arabic script. Our primary focus is on Abjad and Ajami languages
that utilise the Arabic script or its variations. Traditionally
associated with Semitic languages, Abjad scripts represent consonants in
every syllable. In contrast, Ajami scripts denote the alphabetic use of
the Arabic script in various African contexts, representing non-Arabic
languages. We are interested in research on languages that fall under
the Abjad or Ajami categories that use the Arabic script or any
variations of it.
We invite contributions, discussions, and explorations that delve deep
into the unique linguistic structures, resources, challenges, and
untapped potential presented by Abjad and Ajami languages within the
realm of NLP and language resources. Our goal is to create synergies
among researchers by addressing the diverse phenomena and challenges
inherent in these rich linguistic traditions.
The workshop is proud to highlight our connections with the Masakhane
NLP community and collaborations with institutions worldwide, such as
COMSATS on Urdu, and the long-standing UCREL NLP Group at Lancaster
University, whose work encompasses over 20 languages worldwide,
including Abjad and Ajami languages.
Note: We chose the name Abjad for simplicity, but our focus includes
Abjad and other languages that have adopted the Arabic and Perso-Arabic
scripts, as well as Ajami languages. We acknowledge that Sorani Kurdish,
when written in Arabic script, follows an alphabet style rather than an
Abjad style.
TOPICS OF INTEREST:
* Core Technologies: morphological analysis, disambiguation,
tokenisation, POS tagging, named entity detection, chunking, parsing,
semantic role labelling, sentiment analysis, language modelling, etc.
* Applications: machine translation, speech recognition, speech
synthesis, optical character recognition, assistive technologies, social
media, etc.
* Resources and Tools: dictionaries, annotated data, corpora,
orthography descriptions, font technology, glyph rendering, text input
methodologies, spell-checking, speech-to-text solutions, BLARK
descriptions, open access corpora.
* Cultural and Sociolinguistic Considerations: text processing,
transliteration challenges, and solutions, cultural contexts in NLP
applications.
SUBMISSION GUIDELINES:
We follow the COLING 2025 standards for submission format and
guidelines. Submissions should conform to the following types:
* Long papers: Up to eight (8) pages, presenting substantial,
original, completed, and unpublished work.
* Short papers: Up to four (4) pages, describing a small focused
contribution, negative results, system demonstrations, etc.
KEY DATES:
* 1st Call for Papers Announcement: 16 July 2024
* 2nd Call for Papers Announcement: 16 August 2024
* Paper Submission Deadline: 2 December 2024
* Workshop Date: 19 or 20 January 2025
ORGANISING COMMITTEE:
General Chair: Mo El-Haj, Lancaster University
Programme Chairs:
* Hugh Paterson III, Collaborative Scholar
* Saad Ezzini, Lancaster University
* Ignatius Ezeani, Lancaster University
Review Committee:
* Mahum Hayat Khan, University of La Rioja
* Muhammad Sharjeel, COMSATS University Islamabad
Publication Chair: Sina Ahmadi, University of Zurich
Publicity Chairs:
* Cynthia Amol, Maseno University
* Amal Haddad Haddad, University of Granada
* Jaleh Delfani, University of Surrey
Advisory Committee:
* Ruslan Mitkov, Lancaster University
* Paul Rayson, Lancaster University
--
Amal Haddad Haddad (She/her)
Facultad de Traducción e Interpretación
Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad
Lexicon Research Group |http://lexicon.ugr.es/haddad
Co-Convenor, BAAL SIG 'Humans, Machines,
Language'|https://r.jyu.fi/humala
Event Coordinator, BAAL SIG 'Language, Learning and Teaching'
===============
Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a
su destinatario y puede contener información privilegiada o
confidencial. Si no es Ud. el destinatario indicado, queda notificado de
que la utilización, divulgación o copia sin autorización está prohibida
en virtud de la legislación vigente. Si ha recibido este mensaje por
error, se ruega lo comunique inmediatamente por esta misma vía y proceda
a su destrucción.
This message is intended exclusively for its addressee and may contain
information that is CONFIDENTIAL and protected by professional
privilege. If you are not the intended recipient you are hereby notified
that any dissemination, copy or disclosure of this communication is
strictly prohibited by law. If this message has been received in error,
please immediately notify us via e-mail and delete it"
===============
Links:
------
[1] https://softconf.com/coling2025/AbjadNLP25/
UMRs in Boston Summer School – 2nd Call for Applications
June 9-13, 2025
Brandeis University, Massachusetts, USA
URL: https://umr4nlp.github.io/web/SummerSchool2025.html
We invite applications for a five-day summer school on Uniform Meaning Representations (UMR).
Impressive progress has been made in many aspects of natural language processing (NLP) in recent years. Most notably, the achievements of transformer-based large language models such as ChatGPT would seem to obviate the need for any type of semantic representation beyond what can be encoded as contextualized word embeddings of surface text. Advances have been particularly notable in areas where large training data sets exist, and it is advantageous to build an end-to-end training architecture without resorting to intermediate representations. For any truly interactive NLP applications, however, a more complete understanding of the information conveyed by each sentence is needed to advance the state of the art. Here, "understanding'' entails the use of some form of meaning representation. NLP techniques that can accurately capture the required elements of the meaning of each utterance in a formal representation are critical to making progress in these areas and have long been a central goal of the field. As with end-to-end NLP applications, the dominant approach for deriving meaning representations from raw textual data is through the use of machine learning and appropriate training data. This allows the development of systems that can assign appropriate meaning representations to previously unseen text.
In this five-day course, instructors from the University of Colorado and Brandeis University will describe the framework of Uniform Meaning Representations (UMRs), a recent cross-lingual, multi-sentence incarnation of Abstract Meaning Representations (AMRs), that addresses these issues and comprises such a transformative representation. Incorporating Named Entity tagging, discourse relations, intra-sentential coreference, negation and modality, and the popular PropBank-style predicate argument structures with semantic role labels into a single directed acyclic graph structure, UMR builds on AMR and keeps the essential characteristics of AMR while making it cross-lingual and extending it to be a document-level representation. It also adds aspect, multi-sentence coreference and temporal relations, and scope. Each day will include lectures and hands-on practice.
Topics to be covered may include the following, among others:
1. The basic structural representation of UMR and its application to multiple languages;
2. How UMR encodes different types of MWE (multi-word expressions), discourse and temporal relations, and TAM (tense-aspect-modality) information in multiple languages, and differences between AMR and UMR;
3. Going from IGT (interlinear glossed text) to UMR graphs semi-automatically;
4. Formal semantic interpretation of UMR incorporating a continuation-based semantics for scope phenomena involving modality, negation, and quantification;
5. Extension to UMR for encoding gesture in multimodal dialogue, Gesture AMR (GAMR), which aligns with speech-based UMR to account for situated grounding in dialogue.
6. UMR parsing and applications
To apply, please complete this form by Nov. 15, 2024.
https://www.colorado.edu/linguistics/umrs-boston-summer-school-application
Other important dates:
● Notification of acceptance: Dec. 15, 2024
● Confirmation of participation: Jan. 31, 2025
Participation will be fully funded (reasonable airfare, lodging, and meals). This summer school has been made possible by funding from NSF Collaborative Research: Building a Broad Infrastructure for Uniform Meaning Representations (Award # 2213805), with additional support from Brandeis University.
**** We apologize for the multiple copies of this email. In case you are
already registered to the next webinar, you do not need to register
again. ****
------------------------------------------------------------------------
Dear colleague,
We are happy to announce the next webinar in the Language Technology
webinar series organized by the HiTZ Chair of AI< (https://hitz.eus).
You can check the videos of previous webinars and the schedule for
upcoming webinars here: http://www.hitz.eus/webinars
Next webinar:
*Speaker:* Elena Sokolova (Amazon Text-to-Speech Group)
*Title:* How we do research in Speech at Amazon
*Date: * Thursday, November 7, 2024 - 15:00 CET
*Summary:* In this talk we will present how Speech technology has
developed in the past 20 years. We will take a dive deep into the
research that we do at Amazon in our Text-to-Speech lab, describe the
challenges that we face and how we solve them at scale. We will also
give an overview of the internship opportunities we have in our
department for those of you who want to join our team in 2025.
*Bio:* Elena is a Machine Learning team manager at Amazon, where she
leads novel research in the field of speech technology. Over the past
five years, she has overseen the deployment of machine learning projects
into production and collaborated with her team to publish cutting-edge
research on text-to-speech technology. Before joining Amazon, Elena
completed her PhD at Radboud University Nijmegen in the Netherlands and
gained industry experience as a Senior Machine Learning Scientist at
Booking.com.
*
Upcoming webinars:*
· Javier de la Rosa (December 12, 2024)
· Ekaterina Shutova (January 30, 2025)
· Sebastian Ruder (February 6, 2025)
If you are interested in participating, please complete this
registration form: http://www.hitz.eus/webinar_izenematea
If you cannot attend this seminar, but you want to be informed of the
following HiTZ webinars, please complete this registration form instead:
http://www.hitz.eus/webinar_info
Best wishes,
HiTZ Zentroa
P.S: HiTZ will not grant any type of certificate for attendance at these
webinars.