Dear colleagues,
We cordially invite submissions of proposals for shared tasks, workshops, and tutorials to be held at the SwissText 2025 conference. SwissText will take place from June 17-18, 2025 at ZHAW in Winterthur.
ABOUT SwissText
SwissText is an annual conference that brings together text analytics experts from industry and academia. It is organized by the Swiss Association for Natural Language Processing (SwissNLP) in collaboration with the Zurich University of Applied Sciences (ZHAW).
SPECIAL EDITION
This edition of SwissText will be special, since we celebrate its 10th anniversary! This is a great opportunity to look back. Hence, in addition to novel ideas for shared tasks, we also invite previous organizers of shared tasks to re-submit their ideas: What has changed since the last run of the shared task? Is the task still relevant? Do LLMs solve everything now? This offers us to see what progress has been made over the years.
To give you some ideas, here is a list of previous shared tasks:
*
NLP for Sustainable Development Goals Monitoring
*
Swissdox Hackathon
*
Detecting greenwashing signals through a comparison of ESG reports and public media
*
Swiss German Speech to Standard German Text Shared Task
*
Low-Resource Speech-to-Text
*
The Sentence End and Punctuation Prediction in NLG text (SEPP-NLG)
*
German Text Summarization Challenge
*
.. and many more (see the SwissText website archive)!
FORMAT FOR PROPOSALS
Proposals for shared tasks should contain:
*
a title and a brief description of the topic of the task
*
a description of the data sets that will be used in the shared task and their readiness
*
a sketch of how the submitted systems will be evaluated
*
a tentative timeline
Proposals for workshops should contain:
*
a title and a brief description of the topic
*
a description of the intended audience
*
workshop format (paper presentations, poster session, etc.)
*
a tentative timeline
Proposals for tutorials should contain:
*
a title and a brief description of the topic and the goal of the tutorial
*
an introduction of the workshop speakers and their background and expertise
*
a description of the intended audience and the required level of expertise (beginners, experts, etc.)
*
a tentative outline of the tutorial schedule
Note that the organization and running of the shared tasks, workshops, and tutorial is in the hands of the respective organizers. The SwissText organizers will provide infrastructure (rooms, paper submission platform) and assist where they can, of course.
Interested? We are looking forward to your proposals. Submit your proposals by email to info(a)swisstext.org<mailto:info@swisstext.org> no later than November 30, 2024. Notifications will be sent out by December 15, 2024.
Kind regards,
Don
________________________________
ZHAW School of Engineering / CAI
Dr. Don Tuggener
Technikumstrasse 71
Postfach
8401 Winterthur
Tel: +41 58 934 78 55
Web: https://www.zhaw.ch/de/ueber-uns/person/tuge/
Call for Applications: Georgetown University MS and Ph.D. programs in
Computational Linguistics
Georgetown University (Washington, DC) invites applications to our MS and
Ph.D. programs for students wishing to study Computational Linguistics
starting in Fall 2025. Georgetown is strategically located in the nation's
capital with proximity to government institutions, a thriving tech
industry, and other major universities. Our programs offer advanced
training and cutting-edge research opportunities in topics such as natural
language processing, syntactic/semantic analysis, corpora, contemporary
machine learning/deep learning methods for language, and applications of
Large Language Models and AI technologies. Current research priorities
include computational psycholinguistics, computational models of discourse,
and linguistics-based language modeling. Students have the opportunity to
learn from faculty across a range of departments and to participate in our
interdisciplinary research community (http://gucl.georgetown.edu/).
Additionally, students who apply to the Ph.D. program in Computational
Linguistics may elect to participate in the university’s Interdisciplinary
Concentration in Cognitive Science.
Admitted Ph.D. applicants are offered full funding. Admissions and funding
decisions are made without regard to domestic vs. international status. Our
programs include:
-
MS in Computational Linguistics
<https://linguistics.georgetown.edu/graduate/master-degree-programs/master-o…>
(Dept. of Linguistics, Jan. 15 deadline)
-
Ph.D. in Computational Linguistics
<https://linguistics.georgetown.edu/graduate/phd-programs> (Dept. of
Linguistics, Dec. 1 deadline)
-
Optional: Interdisciplinary Concentration in Cognitive Science
<https://cogsci.georgetown.edu/concentration/>
Requirements: A bachelor’s degree by August 2025 is required to enter
either program. Both programs include core courses in Linguistics, so a
prior degree in Linguistics is not required. Ph.D. applicants are expected
to have some programming experience and a commitment to participating in
computational linguistics research. See recommendations for the Ph.D.
Statement of Purpose
<https://medium.com/@nschneid/inside-ph-d-admissions-what-readers-look-for-i…>
.
Applications: https://linguistics.georgetown.edu/programs/apply/
Questions about application requirements and procedures should be directed
to Erin Esch Pereira (eee8(a)georgetown.edu), Graduate Program Coordinator,
Dept. of Linguistics.
Sincerely,
Nathan Schneider <http://nathan.cl/>, Depts. of Linguistics and Computer
Science; concentration director
Amir Zeldes <https://gucorpling.org/amir/>, Dept. of Linguistics,
Computational Linguistics
Ethan Gotlieb Wilcox <https://wilcoxeg.github.io/>, Dept. of Linguistics,
Computational Linguistics
--
Ethan Gotlieb Wilcox
Assistant Professor, Computational Linguistics
Georgetown University
www.wilcoxeg.github.io
The next meeting of the Edge Hill Corpus Research Group will take place online (via MS Teams) on Friday 15 November 2024, 2-4 pm (GMT).
Topic: Discourse-Oriented Corpus Studies
2-3 pm
Katia Adimora (Edge Hill University)
Mexican immigration/immigrants in American and Mexican newspapers
3-4 pm
Dan Malone (Edge Hill University)
When is the extreme also typical? Using prototypicality to investigate representations of the lone-wolf terrorist
Attendance is free. The abstracts and registration link are here: https://sites.edgehill.ac.uk/crg/next
Registration closes on Wednesday 13 November, 11 am (GMT).
If you have any questions, please contact Costas Gabrielatos (gabrielc(a)edgehill.ac.uk<mailto:gabrielc@edgehill.ac.uk>).
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
Hello Everyone,
Based on the emails received we have extended the submission deadline to
November 7th 2024. Please read below for more information on the workshop
and the updated timeline.
----
Don’t miss this unique opportunity to discuss key issues and contribute to
the advancement of language processing in the South Asian region, home to
25% of the world’s population and rich in linguistic and cultural diversity.
Submit your papers by October 30, 2024, and join us at the first Workshop
on Challenges in Processing South Asian Languages (CHiPSAL), taking place
at COLING 2025 on January 19, 2025.
Please submit your papers via
*https://softconf.com/coling2025/CHiPSAL25
https://softconf.com/coling2025/CHiPSAL25 *
----
*CHiPSAL 2025*, the First workshop on Challenges in Processing South Asian
Languages (CHiPSAL), will be held as part of the 31st International
Conference on Computational Linguistics (COLING 2025) in Abu Dhabi,
UAE, on *January
19, 2025*. The workshop will be conducted in *virtual mode*.
CHiPSAL 2025 invites the submission of original research papers,
review/opinion papers, and system demonstration papers, in short or long
forms, on topics that highlight the challenges related to South Asian
languages, including but not limited to the following areas:
- Encoding and Unicode Issues in South Asian Scripts
- Orthographic Complexities and Their Impact on Language Technology
- Morphological Analysis and Generation in South Asian Languages
- Dialectal Variations and Language Standardisation
- Code-Mixing and Multilingualism in South Asian Contexts
- Building Linguistic Resources for South Asian Languages
- Speech Recognition and Synthesis for South Asian Languages
- Preserving Linguistic Heritage through Technology
- Benchmarking Models for South Asian Languages
*Important Dates*
All deadlines are 11:59PM UTC-12:00 (“Anywhere on Earth”).
The First CFP Monday, 15 July 2024
Submission Deadline October 30, 2024 *November 7, 2024*
Notification of acceptance November 29, 2024
Camera-ready papers December 13, 2024
Pre-recorded video due January 5, 2024
Workshop (Virtual) January 19, 2025
*For more information: https://sites.google.com/view/chipsalhttps://sites.google.com/view/chipsal*
Opening of the Faetar Low-Resource ASR Challenge 2025
We are pleased to officially announce the opening of the Faetar Low-Resource ASR Challenge 2025. While we were not able to secure a special session dedicated to the challenge at the conference, we strongly encourage submission of papers describing your systems to Interspeech 2025. As such, we plan to adhere to a timeline that will allow us to return test results and announce winners in time for participants to prepare Interspeech papers (see below).
Challenge website: https://perceptimatic.github.io/faetarspeech/
The Faetar Low-Resource ASR Challenge aims to focus researchers’ attention on several issues which are common to many archival collections of speech data:
- noisy field recordings
- lack of standard orthography, leading to noise in the transcriptions in the form of transcriber inconsistencies
- only a few hours of transcribed data
- a larger collection of untranscribed data
- no additional data in the language (textual or speech) that is easily available
- “dirty” transcriptions in documents, which contain matter that needs to be filtered out
By focusing multiple research groups on a single corpus of this kind, we aim to gain deeper insights into these problems than can be achieved otherwise.
The challenge uses the Faetar ASR Benchmark Corpus. Faetar (pronounced [fajdar]) is a variety of the Franco-Provençal language which developed in isolation in Italy, far from other speakers of Franco-Provençal, and in close contact with Italian. Faetar has less than 1000 speakers around the world, in Italy and in the diaspora. It is endangered, and preservation, learning, and documentation are a priority for many community members. The benchmark data represents the majority of all archived speech recordings of Faetar in existence, and it is not available from any other source.
We propose four tracks:
- Constrained ASR. Participants should focus on the challenge of improving ASR architectures to work with small, poor-quality sets. Participants may not use any resources to train / fine-tune their models beyond the files contained in the provided train set. No external pre-trained acoustic models or language models are allowed, and the use of the unlabelled portion of the Faetar challenge data set is not allowed either.
Three other “thematic tracks” can be explored, and should not be considered mutually exclusive:
- Using pre-trained acoustic models or language models. Participants focus on the most effective way to make use of models pre-trained on other languages.
- Using unlabelled data. The challenge data also includes ~20 hrs of unlabelled data. Participants focus on finding the most effective way to make use of it.
- Dirty data. The training data was extracted and automatically aligned from long-form audio and partial transcriptions in “cluttered” word processor files, relying on (error-prone) VAD, scraping, and alignment. Participants focus on improving the pipeline for extracting useful training data, with the ultimate goal of improving performance.
Submissions will be evaluated on phone error rate (PER) on the test set. Participants are provided with a dev kit allowing them to calculate the PER on dev and train, as well as reproduce the baselines.
For more information, and to register and obtain the data and the dev kit, please visit the challenge website:
https://perceptimatic.github.io/faetarspeech/
For more information, or for questions, please contact us by writing to faetarasrchallenge(a)gmail.com.