**
*CFP: The3rd Annual Meeting of the ELRA-ISCA Special Interest Group on
Under-resourced Languages (SIGUL2024)*
*
* Workshop website: https://sigul-2024.ilc.cnr.it
<https://sigul-2024.ilc.cnr.it/>
* When: Monday and Tuesday, May 20th-21st, 2024
* Where: Torino, Italy (co-located with LREC-COLING 2024)
* Deadline for submissions: February 26th, 2024
* Paper submission link: https://softconf.com/lrec-coling2024/sigul2024/
<https://softconf.com/lrec-coling2024/sigul2024/>
* Deadline for camera-ready papers: April 5th, 2024
The 3rd Annual Meeting of the ELRA <http://www.elra.info/>/ISCA
<https://www.isca-speech.org/iscaweb/index.php>Special Interest Group on
Under-Resourced Languages
<http://www.elra.info/en/sig/sigul/>(SIGUL2024) will provide a forum for
the presentation and discussion of cutting-edge research in language
processing for under-resourced languages by academic and industry
researchers.
SIGUL2024 is held over two days to allow for extended discussions and
interaction.
Far from being just a smaller version of a conference, SIGUL2024 aims to
create the conditions for an exchange of knowledge and a comparison of
needs and perspectives between research and practice in the field to
take place.
We invite contributions (regular long papers of 8 pages or short papers
of 4 pages) targeting any of the following - non-exhaustive - list of
topics:
*
Processing any under-resourced languages (covering less-resourced,
under-resourced, endangered, minority, and minoritized languages)
*
Cognitive and linguistic studies of under-resourced languages
*
Fast resources acquisition: text and speech corpora, parallel texts,
dictionaries, grammars, and language models
*
Zero and few-shot methodologies and self-supervised learning in
language and speech technologies
*
Cross-lingual and multilingual acoustic and lexical modeling
*
Speech recognition and synthesis for under-resourced languages and
dialects
*
Machine translation and speech-to-speech translation
*
Spoken dialogue systems
*
Applications of language technologies for under-resourced languages
*
Large language models and under-resourced languages
*
Special topic:
o
Text and speech resources and technologies for the languages of
Italy
Special Session on languages of Italy and language technologies
Italy is known for its linguistic diversity that reflects its long and
varied history. To celebrate it, SIGUL2024 will provide a special
session or forum for researchers interested in developing language
resources and technologies for the many languages of Italy (regional,
minority, or heritage languages, including those of the neighboring
countries).
Submissions
Authors can choose among three paper categories:
*
Regular long papers – up to eight (8) pages maximum*, presenting
substantial, original, completed, and unpublished work.
*
Short papers – up to four (4) pages*, describing work-in-progress
projects in the early stage of development, new resources, negative
results, system demonstrations, and early-career/student work.
*
Position papers – up to eight (8) pages*, for reflective
considerations of methodological, best practice, and institutional
issues (e.g., ethics, data ownership, speakers’ community
involvement, de-colonizing approaches).
The above page limits exclude any number of additional pages that may be
needed for references.
The form of the presentation may be oral or poster, whereas in the
proceedings there is no difference between the accepted papers.
Submission is NOT anonymous and the official LREC-COLING 2024 format
must be adopted. Each paper will be reviewed by three independent reviewers.
Invited speakers
Eddie Avila, GlobalVoices
Jean Maillard, FAIR, META
Important Dates
• 26 February 2024: submission due
• 18 March 2024: reviews due
• 22 March 2024: notifications to authors
• 5 April 2024: camera-ready (PDF) due
Diversity & Inclusion Subsidies
SIGUL2024 is providing funds for registration and travel or for
bandwidth/VPN. We encourage citizens of developing countries and members
of marginalised communities to apply for subsidies. Details on the
application procedure will be available on the workshop website. For
inquiries, please contact claudia.soria[AT]ilc.cnr.it.
Workshop Organizers
Maite Melero, Sakriani Sakti, Claudia Soria
Program Committee
*
Mohammad A. M. Abushariah (The University of Jordan, Jordan)
*
Manex Aguirrezabal (University of Copenhagen – Center for
Sprogteknologi | Center for Language Technology, Denmark)
*
Shyam S. Agrawal (KIIT, Gurugram ,India)
*
Begoña Altuna (HiTZ Center - Ixa, Euskal Herriko Unibertsitatea |
University of the Basque Country, Spain)
*
Antti Arppe (University of Alberta, Canada)
*
Martin Benjamin (Kamusi Project International)
*
Delphine Bernhard (Université de Strasbourg, LiLPa, France)
*
Steven Bird (Charles Darwin University, Australia)
*
Claudia Borg (University of Malta)
*
Matt Coler (University of Groningen, Campus Fryslân, The Netherlands)
*
Dan Cristea (Romanian Academy, Romania)
*
Pradip Kumar Das (IIT Guwahati, India)
*
A. Seza Doğruöz (Universiteit Gent, België | Ghent University, Belgium)
*
Stefano Ghazzali (Language Technologies Unit Bangor University
Prifysgol Bangor | Bangor University, Bangor, Gwynedd)
*
Itziar Gonzalez-Dios (HiTZ Basque Center for Language Technologies -
Ixa, University of the Basque Country UPV/EHU)
*
Lars Hellan (Norwegian University of Science and Technology, Norway)
*
Mélanie Jouitteau (IKER, CNRS, France)
*
Ritesh Kumar (UnReaL-TecE LLP, India)
*
Richard Littauer
*
Teresa Lynn (Mohamed bin Zayed University of Artificial
Intelligence, United Arab Emirates)
*
Nina Markl (University of Essex, UK)
*
Maite Melero (Barcelona Supercomputing Center, Espanya | Spain)
*
Peter Mihajlik (Budapest University of Technology and Economics,
Hungary)
*
Win Pa Pa (UCS Yangon, Myanmar)
*
Sandy Ritchie (Google Research)
*
Sakriani Sakti (JAIST, Japan)
*
Nay San (Stanford University, USA)
*
Claudia Soria (CNR-ILC, Italia | Italy)
*
Daan Van Esch (Google Research)
*
Menno van Zaanen (South African Centre for Digital Language
Resources, South Africa)
*
Jenifer Vega Rodriguez (GIPSA-lab, Université Grenoble Alpes, France)
*
Marcely Zanon Boito (NAVER Labs Europe, France)
Identify, Describe and Share your LRs!
When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense, i.e.
also technologies, standards, evaluation kits, etc.) that have been used
for the work described in the paper or are a new result of your
research. Moreover, ELRA encourages all LREC-COLING authors to share the
described LRs (data, tools, services, etc.) to enable their reuse and
replicability of experiments (including evaluation ones).
Contact
claudia.soria[AT]ilc.cnr.it
Please, write “SIGUL2024” in the subject of your e-mail.
*
--
facebook <https://www.facebook.com/CNRsocialFB> twitter
<https://twitter.com/CNRsocial_> instagram
<https://www.instagram.com/cnrsocial/> linkedin
<https://www.linkedin.com/company/283032>
Claudia Soria
CNR, ISTITUTO DI LINGUISTICA COMPUTAZIONALE "ANTONIO ZAMPOLLI"
claudia.soria(a)ilc.cnr.it
Tel. 0503153166
Via Giuseppe Moruzzi, 1, 56124 – Pisa
www.ilc.cnr.it
*www.cnr.it* <http://www.cnr.it/>
Devolvi il 5×1000 al CNR
CF 80054330586
--
facebook <https://www.facebook.com/CNRsocialFB> twitter
<https://twitter.com/CNRsocial_> instagram
<https://www.instagram.com/cnrsocial/> linkedin
<https://www.linkedin.com/company/283032>
Claudia Soria
CNR, ISTITUTO DI LINGUISTICA COMPUTAZIONALE "ANTONIO ZAMPOLLI"
claudia.soria(a)ilc.cnr.it
Tel. 0503153166
Via Giuseppe Moruzzi, 1, 56124 – Pisa
www.ilc.cnr.it
*www.cnr.it* <http://www.cnr.it/>
Devolvi il 5×1000 al CNR
CF 80054330586
The fifth workshop on Resources for African Indigenous Language (RAIL)
Colocated with LREC-COLING 2024
https://bit.ly/rail2024
New: deadline and article submission type
Conference dates: 20-25 May 2024
Workshop date: 25 May 2024
Venue: Lingotto Conference Centre, Torino (Italy)
The fifth RAIL workshop website: https://bit.ly/rail2024
LREC-COLING 2024 website: https://lrec-coling-2024.org/
Submission website: https://softconf.com/lrec-coling2024/rail2024/
The fifth Resources for African Indigenous Languages (RAIL) workshop will be co-located with LREC-COLING 2024 in Lingotto Conference Centre, Torino, Italy on 25 May 2024. The RAIL workshop is an interdisciplinary platform for researchers working on resources (data collections, tools, etc.) specifically targeted towards African indigenous languages. In particular, it aims to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as computational linguistic tools specifically designed for or applied to indigenous languages found in Africa.
Many African languages are under-resourced while only a few of them are somewhat better resourced. These languages often share interesting properties such as writing systems, or tone, making them different from most high-resourced languages. From a computational perspective, these languages lack enough corpora to undertake high level development of Human Language Technologies (HLT) and Natural Language Processing (NLP) tools, which in turn impedes the development of African languages in these areas. During previous workshops, it has become clear that the problems and solutions presented are not only applicable to African languages but are also relevant to many other low-resource languages. Because these languages share similar challenges, this workshop provides researchers with opportunities to work collaboratively on issues of language resource development and learn from each other.
The RAIL workshop has several aims. First, the workshop brings together researchers who work on African indigenous languages, forming a community of practice for people working on indigenous languages. Second, the workshop aims to reveal currently unknown or unpublished existing resources (corpora, NLP tools, and applications), resulting in a better overview of the current state-of-the-art, and also allows for discussions on novel, desired resources for future research in this area. Third, it enhances sharing of knowledge on the development of low-resource languages. Finally, it enables discussions on how to improve the quality as well as availability of the resources.
The workshop has “Creating resources for less-resourced languages” as its theme, but submissions on any topic related to properties of African indigenous languages (including non-African languages) may be accepted. Suggested topics include (but are not limited to) the following:
* Digital representations of linguistic structures
* Descriptions of corpora or other data sets of African indigenous languages
* Building resources for (under resourced) African indigenous languages
* Developing and using African indigenous languages in the digital age
* Effectiveness of digital technologies for the development of African indigenous languages
* Revealing unknown or unpublished existing resources for African indigenous languages
* Developing desired resources for African indigenous languages
* Improving quality, availability and accessibility of African indigenous language resources
Submission requirements:
We invite papers on original, unpublished work related to the topics of the workshop. Submissions, presenting completed work, may consist of up to eight (8) pages of content for a long submission and up to four (4) pages of content for a short submission plus additional pages of references. The final camera-ready version of accepted long papers are allowed one additional page of content (up to 9 pages) so that reviewers’ feedback can be incorporated. Papers should be formatted according to the LREC-COLING style sheet (https://lrec-coling-2024.org/authors-kit/), which is provided on the LREC-COLING 2024 website (https://lrec-coling-2024.org/). Reviewing is double-blind, so make sure to anonymise your submission (e.g., do not provide author names, affiliations, project names, etc.) Limit the amount of self citations (anonymised citations should not be used). The RAIL workshop follows the LREC-COLING submission requirements.
Please submit papers in PDF format to the START account (https://softconf.com/lrec-coling2024/rail2024/). Accepted papers will be published in proceedings linked to the LREC-COLING conference.
Important dates:
Submission deadline: 23 February 2024
Date of notification: 15 March 2024
Camera ready deadline: 29 March 2024
RAIL workshop: 25 May 2024
Organising Committee
Rooweither Mabuya, South African Centre for Digital Language Resources (SADiLaR), South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resources (SADiLaR), South Africa
Mmasibidi Setaka, South African Centre for Digital Language Resources (SADiLaR), South Africa
Menno van Zaanen, South African Centre for Digital Language Resources (SADiLaR), South Africa
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za<mailto:menno.vanzaanen@nwu.ac.za>
Professor in Digital Humanities
South African Centre for Digital Language Resources https://www.sadilar.org<https://www.sadilar.org/>
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________
Apologies for cross-posting
--------------------------------------
2nd Workshop on Resources and Technologies for Indigenous, Endangered and
Lesser-resourced Languages in Eurasia (EURALI) @ LREC-COLING 2024
Date: 25 May, 2024
Venue: *Lingotto Conference Centre - Torino (Italia)*
Main website: https://sites.google.com/view/eurali/
<https://sites.google.com/view/eurali/>
LREC-COLING 2024 website: https://lrec-coling-2024.org/
*Submission website:* https://softconf.com/lrec-coling2024/eurali2024/
——————————————————————————————————
Workshop overview and objectives
This workshop will focus on the development of language technology
resources and tools for indigenous, endangered and lesser-resourced
languages on the Eurasian continent.
In a media-centric world where language technology allows people to break
cultural and language barriers, it is important that speakers of endangered
and indigenous languages can be empowered to use this technology to
continue to share their knowledge and culture with the world. With the hope
of bridging this gap, the goal of this workshop is to increase visibility
and promote research for lesser-resourced and under-represented languages
in Europe and Asia. Through collaboration between NLP researchers, language
experts and linguists working for the benefit of endangered languages in
these communities, we aim to create language technology resources that will
help to preserve and revive these languages for future generations.
Furthermore, the workshop aims to promote the emergence of new methods that
benefit linguists, for instance for automation of analysis and validation
processes, field linguists, the facilitation of data collection and
analysis processes, and computational linguists by developing new
techniques necessary for linguistic analysis, development of supervised or
weakly supervised methods for the analysis of poorly written or
undocumented languages.
The main objective of the workshop is to create basic resources and develop
tools for Eurasiatic languages, including but not limited to the following
topics:
- identifying languages and variants spoken in these regions
- creation of language resources and applications, e.g. sentiment
analysis, named entity recognition, and syntactic parsing
- standardization for endangered languages
- automatic identification and classification of lexical variation and
language varieties
- adaptation of fundamental NLP tools for these languages, e.g.,
morphological analysis, taggers and parsers
- reusability of language resources in NLP applications, e.g. machine
translation, and POS tagging
- machine translation between closely related languages
- evaluation of language resources and tools when applied to
lesser-resourced languages in the same language families
- corpora, resources, and tools for closely related languages
- linguistic and textual similarities among languages in Eurasia
- digitalization of endangered languages
- challenges in the creation of language resources and tools from
linguistic perspectives (which includes any perspective formal theory)
Submissions
We are seeking submissions in the following categories:
Full papers: 8 pages+unlimited references
Short papers (work in progress): 4 pages+unlimited references
Posters (innovative ideas/proposals, a research idea of students): 4
pages+unlimited references
Demo (of working online/standalone systems): 2 pages
Papers must describe original, completed or in progress, and unpublished
work. The accepted papers will be given up for full/short paper and poster
in the workshop proceedings and will be presented as an oral presentation
or poster.
Papers should be formatted according to the LREC-COLING style sheet (
https://lrec-coling-2024.org/authors-kit/), which is provided on the
LREC-COLING 2024 website(https://lrec-coling-2024.org/). Please submit
papers in PDF format to the START account (
https://softconf.com/lrec-coling2024/eurali2024/). For further information
on this initiative, please refer to https://sites.google.com/view/eurali/.
Important Dates (tentative)
February 23, 2024: Paper submissions due
March 22, 2024: Paper notification of acceptance
May 25, 2024: Workshop
Workshop Chairs
Atul Kr. Ojha, Sina Ahmadi,
Chao-Hong Liu, Potamu Research Ltd, Dublin (Ireland)
John P. McCrae, University of Galway, Galway (Ireland)
Theodorus Fransen, Università Cattolica del Sacro Cuore, Milan (Italy)
Silvie Cinkovà, Charles University, Prague (Czech Republic)
Programme Committee (to be updated)
Abigail Walsh, Dublin City University, Dublin (Ireland)
Aiala Rosá, Universidad de la República - Uruguay, Montevideo (Uruguay)
Aryaman Arora, Stanford University, Stanford, California (USA)
A. Seza Doğruöz, Ghent University, Ghent (Belgium)
Alina Karakanta, University of Leiden, Leiden (Netherlands)
Alina Wróblewska, Institute of Computer Science, Jana Kazimierza, Warszawa
(Poland)
Akanksha Bansal, Panlingua, Delhi (India)
Atul Kr. Ojha, University of Galway, Galway (Ireland) & Panlingua, (India)
Bharathi Raja Chakravarthi, University of Galway, Galway (Ireland)
Bogdan Babych, Heidelberg University, Heidelberg (Germany)
Çağrı Çöltekin, University of Tübingen, Tübingen (Germany)
Chao-Hong Liu, Potamu Research Ltd, Dublin (Ireland)
Chihiro Taguchi, the University of Notre Dame, Notre Dame (USA)
Daan van Esch, Google, Amsterdam (Netherlands)
Daniel Zeman, Charles University, Prague (Czech Republic)
Deepak Alok, IIT-Delhi, Delhi (India)
Dorothee Beermann, Norwegian University of Science and Technology,
Trøndelag (Norway)
Esha Banerjee, J.P. Morgan, Bengaluru (India)
Ekaterina Vylomova, University of Melbourne, Melbourne (Australia)
George Rehm, GmbH, Berlin (Germany)
Hiwa Asadpour, Goethe University, Frankfurt (Germany)
Jamal Abdul Nasir, University of Galway, Galway (Ireland)
Joakim Nivre, Uppsala University, (Sweden)
John P. McCrae, University of Galway, (Ireland)
John E. Ortega, New York University (USA)
Jonathan Washington, Swarthmore College, Swarthmore (USA)
Joseph Mariani, LIMSI-CNRS, Pairs (France)
Kaja Dobrovoljc, University of Ljubljana, Ljubljana (Slovenia)
Khalid Choukri, ELDA/ELRA, Paris (France)
Luke D. Gessler, University of Colorado at Boulder (USA)
Maitrey Mehta, University of Utah, Utah (USA)
Marie-Catherine de Marneffe, UCLouvainCollège Léon Durpiez, (Belgium)
Olesea Caftanatov, Vladimir Andrunachievici Institute of Mathematics and
Computer Science, Chişinău (Moldova)
Ranka Stanković, University of Belgrade, Belgrade (Serbia)
Rico Sennrich, University of Zurich, Zurich (Switzerland)
Ritesh Kumar, Agra University, Agra (India)
Rute Costa, the Universidade NOVA de Lisboa, Lisbon (Portugal)
Saliha Muradoglu, Australian National University, Canberra (Australia)
Sarah Moeller, University of Florida, Gainesville, FL (USA)
Silvie Cinkovà, Charles University, Prague (Czech Republic)
Sina Ahmadi, George Mason University, (USA)
Stella Markantonatou, Athena RC, Athens (Greece)
Sourabrata Mukherjee, Charles University, Prague (Czech Republic)
Theodorus Fransen, Università Cattolica del Sacro Cuore, Milan (Italy)
Valentin Malykh, MTS AI / ITMO University
Verginica Barbu Mititelu, Research Institute for Artificial Intelligence,
Bucharest (Romania)
Victoria Bobicev, University of Moldova, Chișinău (Moldova)
Voula Giouli, Institute for Language and Speech Processing, Athens (Greece)
Code-mixing, the dynamic interplay of multiple languages within a single
discourse, is a widespread linguistic phenomenon observed in multilingual
societies. Code-mixing is particularly intriguing when observed in closely
related languages.
We invite you to participate in our shared task at the WILDRE workshop,
which is co-located with LREC-COLING 2024. This shared task addresses the
complexities of code-mixed data from less-resourced similar languages for
sentiment analysis. We will provide annotated data for the following
code-mixed languages:
1. Magahi-Hindi-English
2. Bangla-English-Hindi
3. Hindi-English
The evaluation will be in two different Tracks:
*A. Track 1:* Given training and validation data to determine the comment's
polarity (positive, negative, neutral or mixed) in the same code-mixed
setting.
1. Hindi-English
2. Magahi-Hindi-English
3. Bangla-English
4. Combined all the language pairs (1+2+3)
*B. Track 2:* Given unlabelled test data for the code-mixed Maithili
language (Maithi-Hindi-English), leverage any or all of the available
training datasets in Track 1 to determine the sentiment of a comment in the
target language.
Important Links:
- Registration Link <https://forms.gle/HVRK1W1hHqBwtgpu6>
- WILDRE Workshop Link <http://sanskrit.jnu.ac.in/conf/wildre7/index.jsp>
- GitHub
<https://github.com/wildre-workshop/wildre-7_code-mixed-sentiment-analysis>
Important Dates:
- Dec 22, 2023: Registration
- Jan 10, 2024: Train and Validation Data set Release [to get the data,
please register]
- Feb 15, 2024: Test Set Release
- Feb 23, 2024: System Submission Due
- Feb 29, 2024: System Results
- March 15, 2024: System Description Paper Due
- March 28, 2024: Paper notification of acceptance
Apologies for cross-posting.
----------------------------------------
*The International Conference on Spoken Language Translation*
*21st IWSLT 2024 – **Second** Call for Participation*
*August 15-16, 2024 – Bangkok, Thailand*
*http://iwslt.org <http://iwslt.org/>*
The International Conference on Spoken Language Translation (IWSLT) is the
premier annual conference for all aspects of Spoken Language Translation.
Every year, the conference organizes and sponsors open evaluation campaigns
around key challenges in simultaneous and consecutive translation, under
real-time/low latency or offline conditions and under low-resource or
multilingual constraints. System descriptions and results from
participants’ systems and scientific papers related to key algorithmic
advances and best practices are presented.
IWSLT is the venue of the SIGSLTs, the Special Interest Group on Spoken
Language Translation of ACL, ISCA and ELRA. With a track record of 20
years, IWSLT benchmarks and proceedings serve as reference for all
researchers and practitioners working on speech translation and related
fields.
The 21st edition of IWSLT <https://iwslt.org/2024/> will be run as an
*ELRA/ACL* event and co-located with ACL 2024 <https://2024.aclweb.org/> on
August 15-16, 2024. It will be run as a hybrid event.
Important Dates
January 15, 2024: Release of shared task training and dev data
April 01-15, 2024: Evaluation period
April 29, 2024: Paper submission due (all papers)
June 4, 2024: Notification of acceptance
June 24, 2024: Camera-ready paper due
July 22, 2024: Pre-recorded video due
August 15-16, 2024: Conference
Evaluation
The IWSLT 2024 features shared tasks <https://iwslt.org/2024/#shared-tasks>
that address the following focus areas:
- Speech-to-speech track
- Simultaneous track
- Subtitling track
- Offline track
- Dubbing track
- Low-resource track
- Indic track
Training, development and test data for each shared task will be prepared
and released by the respective organizers (for further information on this
initiative, please refer to the website <https://iwslt.org/2024/>).
Participants will receive instructions about how to submit their runs. In
addition, participants have the opportunity to present their work
through a system
paper that will be published in the ACL Proceedings.
Conference
IWSLT also invites submissions of scientific papers to be published in the
ACL Proceedings and presented either in oral or poster format. The
conference selects high-quality, original contributions on theoretical and
practical issues of spoken language translation research, technologies and
applications. For further information on this initiative, please refer to
the website <https://iwslt.org/2024/#paper-submission>
Contact
Please send an email to iwslt-evaluation-campaign(a)googlegroups.com if you
have any questions related to the shared tasks.
Thanks,
Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
(IWSLT organisers)
The fifth workshop on Resources for African Indigenous Language (RAIL)
Colocated with LREC-COLING 2024
https://bit.ly/rail2024
Conference dates: 20-25 May 2024
Workshop date: 25 May 2024
Venue: Lingotto Conference Centre, Torino (Italy)
The fifth RAIL workshop website: https://bit.ly/rail2024
LREC-COLING 2024 website: https://lrec-coling-2024.org/
Submission website: https://softconf.com/lrec-coling2024/rail2024/
The fifth Resources for African Indigenous Languages (RAIL) workshop
will be co-located with LREC-COLING 2024 in Lingotto Conference Centre,
Torino, Italy on 25 May 2024. The RAIL workshop is an interdisciplinary
platform for researchers working on resources (data collections, tools,
etc.) specifically targeted towards African indigenous languages. In
particular, it aims to create the conditions for the emergence of a
scientific community of practice that focuses on data, as well as
computational linguistic tools specifically designed for or applied to
indigenous languages found in Africa.
Many African languages are under-resourced while only a few of them are
somewhat better resourced. These languages often share interesting
properties such as writing systems, or tone, making them different from
most high-resourced languages. From a computational perspective, these
languages lack enough corpora to undertake high level development of
Human Language Technologies (HLT) and Natural Language Processing (NLP)
tools, which in turn impedes the development of African languages in
these areas. During previous workshops, it has become clear that the
problems and solutions presented are not only applicable to African
languages but are also relevant to many other low-resource languages.
Because these languages share similar challenges, this workshop
provides researchers with opportunities to work collaboratively on
issues of language resource development and learn from each other.
The RAIL workshop has several aims. First, the workshop brings together
researchers who work on African indigenous languages, forming a
community of practice for people working on indigenous languages.
Second, the workshop aims to reveal currently unknown or unpublished
existing resources (corpora, NLP tools, and applications), resulting in
a better overview of the current state-of-the-art, and also allows for
discussions on novel, desired resources for future research in this
area. Third, it enhances sharing of knowledge on the development of
low-resource languages. Finally, it enables discussions on how to
improve the quality as well as availability of the resources.
The workshop has “Creating resources for less-resourced languages” as
its theme, but submissions on any topic related to properties of
African indigenous languages (including non-African languages) may be
accepted. Suggested topics include (but are not limited to) the
following:
* Digital representations of linguistic structures
* Descriptions of corpora or other data sets of African indigenous
languages
* Building resources for (under resourced) African indigenous languages
* Developing and using African indigenous languages in the digital age
* Effectiveness of digital technologies for the development of African
indigenous languages
* Revealing unknown or unpublished existing resources for African
indigenous languages
* Developing desired resources for African indigenous languages
* Improving quality, availability and accessibility of African
indigenous language resources
Submission requirements:
We invite papers on original, unpublished work related to the topics of
the workshop. Submissions, presenting completed work, may consist of up
to eight (8) pages of content plus additional pages of references. The
final camera-ready version of accepted long papers are allowed one
additional page of content (up to 9 pages) so that reviewers’ feedback
can be incorporated. Papers should be formatted according to the LREC-
COLING style sheet (https://lrec-coling-2024.org/authors-kit/), which
is provided on the LREC-COLING 2024 website
(https://lrec-coling-2024.org/). Reviewing is double-blind, so make
sure to anonymise your submission (e.g., do not provide author names,
affiliations, project names, etc.) Limit the amount of self citations
(anonymised citations should not be used). The RAIL workshop follows
the LREC-COLING submission requirements.
Please submit papers in PDF format to the START account
(https://softconf.com/lrec-coling2024/rail2024/). Accepted papers will
be published in proceedings linked to the LREC-COLING conference.
Important dates:
Submission deadline: 16 February 2024
Date of notification: 15 March 2024
Camera ready deadline: 29 March 2024
RAIL workshop: 25 May 2024
Organising Committee
Rooweither Mabuya, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Mmasibidi Setaka, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Menno van Zaanen, South African Centre for Digital Language Resources
(SADiLaR), South Africa
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za
Professor in Digital Humanities
South African Centre for Digital Language Resources
https://www.sadilar.org
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________
Hello everyone,
My name is Pranay and I am a PhD candidate in my final year at the Language & Translation Technology Team at Ghent University, Belgium.
My area of research is efficient language modelling for low-resourced languages, and I have worked on various digital humanities projects as well,
Assisting with technical expertise for ancient languages such as Byzantine Greek and CUNE-IIIFORM. (Google scholar link<https://scholar.google.com/citations?user=8KSmDe4AAAAJ&hl=en>)
I would like to request to join SIGUL, since my work is highly related to the research interests of the group.
I have also previously published and attended at the SIGUL workshop co-located with LREC’22 in Marseille, and will be attending the SIGUL workshop at LREC-COLING’24 as well.
Look forward to hearing from you.
Best Regards,
Pranaydeep Singh
Doctoral Candidate
Language & Translation Technology Team,
Ghent University
International Conference ‘New Trends in Translation and Technology’ (NeTTT’2024)
Varna, Bulgaria, 3-6 July 2024
Second Call for Papers
The conference
The second edition of the forthcoming International Conference ‘New Trends in Translation and Technology’ (NeTTT’2024) will take place in Varna, Bulgaria, 3-6 July 2024.
The objective of the conference is (i) to bridge the gap between academia and industry in the field of translation and interpreting by bringing together academics in linguistics, translation studies, machine translation and natural language processing, developers, practitioners, language service providers and vendors who work on or are interested in different aspects of technology for translation and interpreting, and (ii) to be a distinctive event for discussing the latest developments and practices. NeTTT’2024 invites all professionals who would like to learn about the new trends, present the latest work or/and share their experience in the field, and who would like to establish business and research contacts, collaborations and new ventures.
The conference will take the form of presentations (peer-reviewed research and user presentations, keynote speeches), and posters; it will also feature panel discussions. The accepted papers will be published as open-access conference e-proceedings.
Conference topics
Contributions are invited on any topic related to latest technology and practices in machine translation, translation, subtitling, localisation and interpreting.
NeTTT’2024 will feature a Special Theme Track "Future of Translation Technology in the Era of LLMs and Generative AI".
The conference topics include but are not limited to:
CAT tools
- Translation Memory (TM) systems
- NLP and MT for translation memory systems
- Terminology extraction tools
- Localisation tools
Machine Translation
- Latest developments in Neural Machine Translation
- MT for under-resourced languages
- MT with low computing resources
- Multimodal MT
- Integration of MT in TM systems
- Resources for MT
Technologies for MT deployment
- MT evaluation techniques, metrics and evaluation results
- Human evaluations of MT output
- Evaluating MT in a real-world setting
- Quality estimation for MT
- Domain adaptation
Translation Studies
- Corpus-based studies applied to translation
- Corpora and resources for translation
- Translationese
- Cognitive effort and eye-tracking experiments in translation
Interpreting studies
- Corpus-based studies applied to interpreting
- Corpora and resources for interpreting
- Interpretese
- Resources for interpreting and interpreting technology applications
- Cognitive effort and eye-tracking experiments in interpreting
Interpreting technology
- Machine interpreting
- Computer-aided interpreting
- NLP for dialogue interpreting
- Development of NLP based applications for communication in public service settings (healthcare, education, law, emergency services)
Emerging Areas in Translation and Interpreting
- MT and translation tools for literary texts and creative texts
- MT for social media and real-time conversations
- Sign language recognition and translation
Subtitling
- NLP and MT for subtitling
- Latest technology for subtitling
User needs
- Analysis of translators’ and interpreters’ needs in terms of translation and interpreting technology
- User requirements for interpreting and translation tools
- Incorporating human knowledge into translation and interpreting technology
- What existing translators’ (including subtitlers’) and interpreters’ tools do not offer
- User requirements for electronic resources for translators and interpreters
- Translation and interpreting workflows in larger organisations and the tools for translation and interpreting employed
The business of translation and interpreting
- Translation workflow and management
- Technology adoption by translators and industry
- Setting up translation /interpreting / language provider company
Teaching translation and interpreting
- Teaching Machine Translation
- Teaching translation technology
- Teaching interpreting technology
- Latest AI developments in the syllabi of translation and interpreting curricula
Ethical issues in translation and technology
- Bias and fairness in MT
- Privacy and security in cloud MT systems
- Transparency and explainability of MT systems
- Environmental impact on MT systems
Special Theme Track - Future of Translation Technology in the Era of LLMs and Generative AI
We are excited to share that NeTTT’2024 will have a special theme with the goal of stimulating discussion around Large Language Models, Generative AI and the Future of Translation and Interpreting Technology. While the new generation of Large Language Models such as CHATGPT and LLAMA showcase remarkable advancements in language generation and understanding, we find ourselves in uncharted territory when it comes to their performance on various Translation and Interpreting Technology tasks with regards to fairness, interpretability, ethics and transparency.
The theme track invites studies on how LLMs perform on Translation and Interpreting Technology tasks and applications, and what this means for the future of the field. The possible topics of discussion include (but are not limited to) the following:
- Changes in the translators and interpreters’ professions in the new AI era especially as a result of the latest developments in LLMSs and Generative AI
- Generative AI and translation
- Generative AI and interpreting
- Augmenting machine translation systems with generative AI
- Domain and terminology adaptation with Large Language Models
- Literary translation with Large Language Models
- Improving Machine Translation Quality with Contextual Prompts in Large Language Models
- Prompt engineering for translation
- Generative AI for professional translation
- Generative AI for professional interpreting
We anticipate having a special session on this theme at the conference.
Submissions and publication
NETTT’2024 invites the following types of submissions:
User papers – for industry and practitioners. References to related work are optional. Allowed paper length: between 1 and 4 pages.
Academic submissions, in three different categories (have to follow formatting requirements, references to related work are required):
• (academic) full papers – describing original completed research. Allowed paper length: maximum 12 pages + unlimited references.
• (academic) work-in-progress papers/posters – describing work in progress, late breaking research, papers at a more conceptual stage, and other types of papers that do not fit in the ‘full’ papers category. Allowed paper length: maximum 7 pages + unlimited references.
• (academic) demo papers – describing working systems. Allowed paper length: maximum 5 pages + unlimited references. In addition to the papers, the authors will be expected to demonstrate the systems at the workshop.
The conference will not consider and evaluate abstracts only.
Each submission will be reviewed by three members of the Programme Committee. Submission is organised via Softconf START conference management system at https://softconf.com/n/nettt2024.
For submitting the papers, we invite the authors to comply with the Springer format, following the templates:
• LaTeX,
• Overleaf,
• Word.
The accepted papers will be published in the conference proceedings and made available online on the conference website. Authors of accepted papers will receive guidelines regarding how to produce camera-ready versions of their papers.
The final version of the accepted papers will be published in e-proceedings with assigned ISBN and DOI.
All accepted papers will be included in the conference e-proceedings which will be available at the conference website.
Schedule
Submission deadline: 31 March 2024
Notification: 5 June 2024
Final version due: 20 June 2024
All deadlines are valid for 23.59 Anywhere on Earth.
Venue
The conference will take place at Conference Hotel Cherno More, Varna, situated only 200 m away from the fine sandy Black Sea beach.
Further information and contact details
Registration will open on 15 January 2024.
The follow-up calls will list keynote speakers and members of the programme committee once confirmed.
The conference website is https://nettt-conference.com and will be updated on a regular basis. For further information, please contact us at nettt2024(a)nettt-conference.com
Dear colleagues,
[apologies for cross-posting]
We would like to remind you that this year SIGTYP is hosting a Shared Task
on Word Embedding Evaluation for Ancient and Historical Language:
https://github.com/sigtyp/ST2024/
Test data has been released, and CodaLab competitions are up and running,
so we encourage you to register if you still haven't! There is still a week
before the deadline. :)
*Summary*
In recent years, sets of downstream tasks called benchmarks have become a
very popular, if not default, method to evaluate general-purpose word and
sentence embeddings. Starting with decaNLP (McCann et al., 2018) and
SentEval (Conneau & Kiela, 2018), multitask benchmarks for NLU keep
appearing and improving every year. However, even the largest multilingual
benchmarks, such as XGLUE, XTREME, XTREME-R or XTREME-UP (Hu et al., 2020;
Liang et al., 2020; Ruder et al., 2021, 2023), only include modern
languages. When it comes to ancient and historical languages, scholars
mostly adapt/translate intrinsic evaluation datasets from modern languages
or create their own diagnostic tests. We argue that there is a need for a
universal evaluation benchmark for embeddings learned from ancient and
historical language data and view this shared task as a proving ground for
it.
The shared task involves solving the following problems for 12+ ancient and
historical languages that belong to 4 language families and use 6 different
scripts. Participants will be invited to describe their system in a paper
for the SIGTYP workshop proceedings. The task organizers will write an
overview paper that describes the task and summarizes the different
approaches taken, and analyzes their results.
*Subtasks*
For subtask A, participants are not allowed to use any additional data;
however, they can reduce and balance provided training datasets if they see
fit. For subtask B, participants are allowed to use any additional data in
any language, including pre-trained embeddings and LLMs.
A. Constrained
1. POS-tagging
2. Full morphological annotation
3. Lemmatisation
B. Unconstrained
1. POS-tagging
2. Detailed morphological annotation
3. Lemmatisation
4. Filling the gaps
- Word-level
- Character-level
*Important links*
- *Registration form*
<https://docs.google.com/forms/d/e/1FAIpQLSdINgMfzzZGIZ-uBVQhvyndB6yeaaj-wT7…>
- Detailed description, incl. submission format: https://github.com/
sigtyp/ST2024 <https://github.com/sigtyp/ST2024>
- Constrained subtask on CodaLab:
https://codalab.lisn.upsaclay.fr/competitions/16822
- Unconstrained subtask on CodaLab:
https://codalab.lisn.upsaclay.fr/competitions/16818
*Important dates*
*05 Nov 2023*: Release of training and validation data
*02 Jan 2024*: Release of test data
- * 09 Jan 2024:* Submission of results for Phase 1 of the Constrained
Subtask
- * 12 Jan 2024:* Submission of results for Phase 2 of the Constrained
Subtask and for the Unconstrained Subtask *13 Jan 2024*: Notification of
results
*20 Jan 2024*: Submission of shared task papers
*27 Jan 2024*: Notification of acceptance to authors
*03 Feb 2024*: Camera-ready
*15 Mar 2024*: Video recordings due
*21/22 Mar 2024*: SIGTYP workshop
Kind regards,
Oksana and the organisers' team
--
[image: https://nuig.insight-centre.org/]
<https://www.insight-centre.org/>
Oksana Dereza | PhD student on the Cardamom
<http://cardamom.insight-centre.org/> project | Unit for Linguistic Data |
Insight Centre for Data Analytics | Data Science Institute | University of
Galway
Oksana Dereza | Iarrthóir PhD ar thionscadal Cardamom
<http://cardamom.insight-centre.org/> | An tAonad um Shonraí Teangeolaíocha
| Insight, Ionad na hAnailísíochta Sonraí | Institiúid Eolaíochta Sonraí |
Ollscoil na Gaillimhe