Event: 11th Workshop on the Representation and Processing of Sign Languages (sign-lang@LREC 2024)
Deadline: 22 February 2024
Website: https://www.sign-lang.uni-hamburg.de/lrec2024/
Submission page: https://softconf.com/lrec-coling2024/signlang2024/
CALL FOR PAPERS
Submissions are invited for a full day workshop on sign language resources, to take place on 20 May 2024 as a satellite event of LREC-COLING 2024 in Turin, Italy.
During the past years, a number of large-scale sign language corpus projects have started. Some have already been completed, but many more projects are about to start. At the same time, sign language technologies are maturing and are promising to support the time-consuming basic annotation. The workshop aims at bringing together those researchers who already work with multimodal sign language corpora (and those who see the need for empirical underpinnings of their current research) with those who develop sign language technologies. It provides the platform to compare competing approaches.
As sign language resource technologies build to a large extent on methodologies and tools used in the language resource community in general, but add very specific perspectives (e.g. no writing system established, use of video as data source) and works with a different modality of human language, sign language research is able to feed back to the language resource community at large. At the same time, as the raw data are in the visual domain, the field naturally bridges into Computer Vision. Thus, researchers use Machine Learning methods on both visual and linguistic data.
We invite submissions of papers to be presented either on stage (20 minutes plus 10 minutes discussion) or as posters (with or without demonstrations) on the following topics:
2024 SPECIAL TOPIC: EVALUATION OF SIGN LANGUAGE RESOURCES
With the field maturing, it becomes an urgent issue to assess the quality of sign language resources for a large variety of tasks. We invite contributions on both automatic and human-based evaluation procedures for all kinds of sign language resources and tools.
GENERAL ISSUES ON SIGN LANGUAGE CORPORA AND TOOLS
• Avatar technology as a tool in sign language corpora and corpus data feeding into advances in avatar technology
• Experiences in building sign language corpora
• Elicitation methodology appropriate for corpus collection
• Proposals for standards for linguistic annotation or for metadata descriptions
• Experiences from linguistic research using corpora
• Use of (parallel) corpora and lexicons in translation studies
• Language documentation and long-term accessibility for sign language data
• Annotation and visualization Tools
• Linking corpora and lexicons and integrated presentation of corpus and dictionary contents
• “Internet as a corpus” for sign languages
• Sign language corpus mining
• Crowd and community sourcing for corpus work
• Multi-lingual sign language resources and connecting sign language resources to language resources for spoken languages
• FAIR, CARE and OpenScience for sign language data
In the tradition of LREC, oral/signed presentations and poster presentations (with or without demonstrations) have equal status, and authors are encouraged to suggest the presentation format best suited to communicate their ideas. Papers (4-8 pages) of all accepted submissions to this workshop will be published as workshop proceedings published on the conference website – independent of whether you have a poster or an oral/signed presentation. The workshop does not differentiate between long, short, or position papers.
Please submit your paper through the LREC START system (https://softconf.com/lrec-coling2024/signlang2024/) not later than 22 February 2024, indicating whether you prefer an oral/signed presentation, a poster presentation or a poster presentation with demo. Unlike the main conference, the workshop will be reviewed single-blind, so submissions SHOULD NOT BE ANONYMOUS.
ATTENTION Please note that you are expected to submit the full paper, not an extended abstract as in previous years!
IMPORTANT DATES
• Deadline for submissions: 22 February 2024 (11:59PM UTC-12:00 “anywhere on Earth”)
• Notification of acceptance: 22 March, 2024
• Early bird registration ends: tbd
• Camera ready version of the paper (for both oral/signed presentations and posters): 8 April 2024
• Submission of slides for interpreters' preparation (oral/signed presentations only): 10 May 2024
• This workshop: 20 May 2024
• LREC main conference: 22–24 May 2024
• LREC workshops 20, 21 & 25 May 2024
Due to the delay in EACL notifications we extend the deadline for ARR commitments to the First Workshop on Uncertainty-Aware NLP (UncertaiNLP) to January 20.
More info on the workshop: https://uncertainlp.github.io/
ARR commitment link: https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/UncertaiNLP_ARR…
We welcome submissions of papers that did not receive a positive decision by the main EACL conference and accepted findings papers who would like to apply for a presentation slot in the workshop.
——————————————
Jörg Tiedemann
University of Helsinki
https://blogs.helsinki.fi/language-technology/
ENGLISH VERSION BELOW
Mae'n dda gennym lansio TestunRhydd - pecyn cymorth dwyieithog ar-lein am ddim ar gyfer dadansoddi a delweddu data testun rhydd (o arolygon, holiaduron etc) yn Gymraeg a Saesneg. Mae TestunRhydd yn defnyddio rhai o’r gwasanaethau a’r methodolegau corpws o CorCenCC ac ACC (Crynhoi Testunau Cymraeg yn Awtomatig), ac yn eu hailbecynnu fel bod modd i gynulleidfaoedd a grwpiau o ddefnyddwyr newydd ddadansoddi eu data adborth eu hunain. Wedi'i gynllunio ar y cyd ag Ymddiriedolaeth Genedlaethol Cymru, Amgueddfa Cymru, Cadw, CBAC, a'r Ganolfan Dysgu Cymraeg Genedlaethol, mae TestunRhydd ar gael i unrhyw un mewn unrhyw sector yng Nghymru a'r tu hwnt. Mae TestunRhydd:
* yn dangos a yw eich data yn gadarnhaol a/neu'n negyddol (dadansoddi sentiment) ac mae modd delweddu'r canlyniadau a'u lawrlwytho.
* yn gadael i chi archwilio/delweddu geiriau, ymadroddion a themâu cyffredin yn eich data (mewn tablau, cymylau geiriau etc.).
* yn gadael i chi grynhoi data testun-rhydd, ac archwilio'r defnydd o eiriau a'u perthnasoedd.
Mae TestunRhydd ar gael fel cod agored gyda thrwydded Apache 2.0 (https://github.com/UCREL/FreeTxt-Flask), a thrwy ryngwyneb demo gwe lletyol yn: www.freetxt.app<http://www.freetxt.app/>. Mae’n ymgorffori offer cod agored eraill o’n prosiectau blaenorol fel CyTag (tagiwr rhannau ymadrodd Cymraeg), crynodebwr Cymraeg, a PyMUSAS (ar gyfer Cymraeg a Saesneg), gweler https://www.freetxt.app/about am fwy o fanylion.
Datblygwyd TestunRhydd fel rhan o brosiect ymchwil ar y cyd a ariannwyd gan yr AHRC 'TestunRhydd yn cefnogi dadansoddi data arolygon a holiaduron testun-rhydd dwyieithog' gyda chydweithwyr o Brifysgol Caerdydd a Phrifysgol Caerhirfryn (Rhif y Grant AH/W004844/1). Roedd y tîm yn cynnwys PY - Dawn Knight; CY - Paul Rayson, Mo El-Haj; Cydymeithion Ymchwil - Ignatius Ezeani, Nouran Khallaf a Steve Morris. Roedd Grŵp Ymgynghorol y Prosiect yn cynnwys cynrychiolwyr o: Ymddiriedolaeth Genedlaethol Cymru, Cadw, Amgueddfa Cymru, CBAC a'r Ganolfan Dysgu Cymraeg Genedlaethol
We’re excited to launch FreeTxt – a free bilingual online toolkit for analysing and visualising free-text data (from surveys, questionnaires etc.) in English and Welsh. FreeTxt draws on some of the corpus-based utilities and methodologies from CorCenCC and ACC (Welsh Automatic Text Summarisation), repackaging these to enable new audiences and user-groups to analyse their own feedback data. Co-designed in collaboration with National Trust Wales, Museum Wales, Cadw, WJEC, and National Centre for Learning Welsh, FreeTxt is accessible to anyone in any sector in Wales and beyond. FreeTxt:
* indicates if your data is positive and/or negative (sentiment analysis) and provides downloadable visualisations of results.
* allows you to explore/visualise common words, phrases and themes in your data (in tables, word clouds etc.).
* enables you to summarise free-text data, and examine word use and relationships.
FreeTxt is available open source with an Apache 2.0 licence (https://github.com/UCREL/FreeTxt-Flask), and via a hosted web demo interface at: www.freetxt.app<http://www.freetxt.app/>. It incorporates other open source tools from our previous projects such as CyTag (Welsh POS tagger), a Welsh summariser, and PyMUSAS (for English and Welsh), see https://www.freetxt.app/about for more details.
FreeTxt was developed as part of an AHRC funded collaborative 'FreeTxt supporting bilingual free-text survey and questionnaire data analysis' research project involving colleagues from Cardiff University and Lancaster University (Grant Number AH/W004844/1). The team included PI - Dawn Knight; CIs - Paul Rayson, Mo El-Haj; RAs - Ignatius Ezeani, Nouran Khallaf and Steve Morris. The Project Advisory Group included representatives from: National Trust Wales, Cadw, Museum Wales, CBAC | WJEC and National Centre for Learning Welsh.
--
Paul Rayson
Director of UCREL and Professor of Natural Language Processing
SCC Data Theme Lead
School of Computing and Communications, InfoLab21, Lancaster University, Lancaster, LA1 4WA, UK.
Web: https://www.research.lancs.ac.uk/portal/en/people/Paul-Rayson/
Tel: +44 1524 510357
Contact me on Teams<https://teams.microsoft.com/l/chat/0/0?users=p.rayson@lancaster.ac.uk>
CODI, 5th Workshop on Computational Approaches to Discourse
2024-03-21 - EACL 2024 - Malta
** Direct Submission deadline: January 22th, 2024 **
Direct submission: We now open submissions for papers rejected at another main conference.
The deadline has been updated to account for the delay in EACL notifications. Note that notifications will be sent on January 25 for direct submissions, and camera-ready will be due on January 30.
Website link: https://sites.google.com/view/codi2024
CODI considers for publication papers rejected at one of the main conferences, authors will have to submit both the paper and the reviews as a supplemantary pdf file. If modifications have been made since the original submission, please submit an additional file describing briefly the modifications made. The organizers will decide on the acceptance of the papers based on the quality of the paper and its fit with the workshop.
As a reminder, CODI also invites presentations of paper accepted at another main conference. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings.
Please submit your workshop papers (category: "direct submission") at https://softconf.com/eacl2024/CODI-2024/
In this newsletter:
Renew your LDC membership today
New publications:
KASET - Kurmanji and Sorani Kurdish Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S01>
LORELEI Farsi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T01>
________________________________
Renew your LDC membership today
The importance of curated resources for language-related education, research, and technology development drives LDC's mission to create them, to accept data contributions from researchers across the globe, and to broadly share such resources through the LDC Catalog. LDC members enjoy no-cost access to new corpora released annually, as well as the ability to license legacy data sets from among our 950+ holdings at reduced fees. Ensure that your data needs continue to be met by renewing your LDC membership or by joining the Consortium today.
Now through March 1, 2024, 2023 members receive a 10% discount on 2024 membership, and new or returning organizations receive a 5% discount. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/communications/newsletter/january-2022-newsletter> for more details on membership options and benefits.
________________________________
New publications:
KASET - Kurmanji and Sorani Kurdish Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S01> consists of 147 hours of telephone conversations (289 recordings) and broadcast news (410 recordings) in two Kurdish dialects: Kurmanji Kurdish and Sorani Kurdish along with transcripts covering 60 hours of those recordings. Kurdish is spoken primarily in Turkey, Iran, Iraq, and Syria. Sorani and Kurmanji are the two widely spoken dialects of the Kurdish language.
The telephone speech was generated from calls by native Kurdish speakers in the United States to North American acquaintances in their social network. The broadcast news audio was collected from multiple streaming radio and television broadcast programs (narrowband and wideband audio), many of which contained a mix of Kurmanji and Sorani Kurdish. Native speaker auditors identified a 5-10 minute span from each broadcast recording for transcription. Full telephone recordings that passed the native speaker audit were transcribed. This release includes speaker information, such as gender, year of birth, and language.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Farsi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T01> was developed by LDC and is comprised of approximately 250 million words of Farsi monolingual text, 120,000 Farsi words translated from English data, and 751,000 words of found Farsi-English parallel text. Approximately 75,000 words were annotated for named entities and up to 22,000 words were annotated for entity discovery and linking and situation frames (identifying entities, needs, and issues). Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
*The Third Ukrainian Natural Language Processing Workshop (UNLP 2024)*
<https://unlp.org.ua/>
UNLP 2024 features the first *Shared Task on Fine-Tuning Large Language
Models for Ukrainian*.
This Shared Task aims to challenge and assess LLMs' capabilities to
understand and generate Ukrainian, paving the way for LLM development in
Slavic languages.
*Task Description*
In this shared task, your goal is to instruction-tune a large language
model that can answer questions and perform tasks in Ukrainian. The model
should possess knowledge of Ukrainian history, language, and literature, as
well as common knowledge, and should be capable of generating fluent and
factually accurate responses.
The evaluation will be two-fold: accuracy of answers to multiple-choice
questions and human evaluation on a selection of text generation tasks.
You can find the instructions, sample data, and scripts at
https://github.com/unlp-workshop/unlp-2024-shared-task.
*Registration*
Teams that intend to participate should register by filling in this form
<https://forms.gle/MiC7pWsWbwBdSmoX9>.
*Publication*
Participants in the shared task are invited to submit a paper to the UNLP
2024 <https://unlp.org.ua/call-for-papers/> workshop. Submitting a
paper is *not
mandatory* for participating in the Shared Task. Papers must follow the
workshop submission instructions and will undergo regular peer review.
Their acceptance will not depend on the results obtained in the shared
task, but on the quality of the paper. Accepted papers will appear in the
ACL anthology and will be presented at a session of UNLP 2023 specially
dedicated to the Shared Task.
*Important Dates*
January 15, 2024 — Shared task announcement
February 15, 2024 — Release of test data to registered participants
February 22, 2024 — Registration deadline
February 23, 2024 — Submission of system responses
March 1, 2024 — Shared Task paper due
March 7, 2024 — Results of the Shared Task announced
March 29, 2024 — Notification of acceptance
TBD (mid-April) — Camera-ready Shared Task papers due
May 25, 2024 — Workshop date
*Contact*
Discord for the shared task: https://discord.gg/kCc6xgWbCJ
Email: info(a)unlp.org.ua
Website: https://unlp.org.ua/
Twitter: https://twitter.com/UNLP_workshop
Telegram: https://t.me/UNLP_workshop
Facebook: https://www.facebook.com/UNLPworkshop
TEICAI, Towards Ethical and Inclusive Conversational AI: Language Attitudes, Linguistic Diversity, and Language Rights at EACL 2024 on Malta-March 17-22, 2024.
Workshop website: https://sites.google.com/view/teicai2024
Submission link: https://softconf.com/eacl2024/TEICAI-2024/
Submission Deadline: 17 Januray 2024 (anywhere on earth)
Submissions are now being accepted for papers that were previously rejected at a major conference. Authors are required to submit their paper alongside the reviews it received, provided as a supplementary PDF file. In cases where the paper has undergone revisions since its original submission, authors should also include a separate file briefly outlining the changes made. The acceptance of these papers for the workshop will be determined by the organizers, based on the paper's quality and relevance to the workshop's theme.
We are also pleased to announce that our sponsor, e-COST ACTION Language in the Human-Machine Era (LITHME), is offering two to three travel grants for authors of selected accepted papers. More information about LITHME can be found at https://lithme.eu/.
Workshop Organizers:
Sviatlana Höhn, LuxAI, Luxembourg
Nina Hosseini-Kivanani, Faculty of Science, Technology and Medicine (FSTM), University of Luxembourg, Luxembourg
Dimitra Anastasiou, Luxembourg Institute of Science and Technology, Luxembourg
Angela Soltan, State University of Moldova, Moldova
Bettina Migge, University College Dublin, Ireland
Doris Dippold, University of Surrey, UK
Fred Philippy, Zortify, Luxembourg
Ekaterina Kamlovskaya, Translatables
Program Committee:
A list of program committee members is available on the workshop website.
For any preliminary questions, you're welcome to reach out to teicai2024(a)gmail.com .
You can follow us on LinkedIn (TEICAI) and Twitter (teicai2024) to get more updates about the workshop.
On behalf of the organizers
Nina Hosseini-Kivanani
University of Luxembourg
Apologies for cross-posting.
----------------------------------------
*The International Conference on Spoken Language Translation*
*21st IWSLT 2024 – **Second** Call for Participation*
*August 15-16, 2024 – Bangkok, Thailand*
*http://iwslt.org <http://iwslt.org/>*
The International Conference on Spoken Language Translation (IWSLT) is the
premier annual conference for all aspects of Spoken Language Translation.
Every year, the conference organizes and sponsors open evaluation campaigns
around key challenges in simultaneous and consecutive translation, under
real-time/low latency or offline conditions and under low-resource or
multilingual constraints. System descriptions and results from
participants’ systems and scientific papers related to key algorithmic
advances and best practices are presented.
IWSLT is the venue of the SIGSLTs, the Special Interest Group on Spoken
Language Translation of ACL, ISCA and ELRA. With a track record of 20
years, IWSLT benchmarks and proceedings serve as reference for all
researchers and practitioners working on speech translation and related
fields.
The 21st edition of IWSLT <https://iwslt.org/2024/> will be run as an
*ELRA/ACL* event and co-located with ACL 2024 <https://2024.aclweb.org/> on
August 15-16, 2024. It will be run as a hybrid event.
Important Dates
January 15, 2024: Release of shared task training and dev data
April 01-15, 2024: Evaluation period
April 29, 2024: Paper submission due (all papers)
June 4, 2024: Notification of acceptance
June 24, 2024: Camera-ready paper due
July 22, 2024: Pre-recorded video due
August 15-16, 2024: Conference
Evaluation
The IWSLT 2024 features shared tasks <https://iwslt.org/2024/#shared-tasks>
that address the following focus areas:
- Speech-to-speech track
- Simultaneous track
- Subtitling track
- Offline track
- Dubbing track
- Low-resource track
- Indic track
Training, development and test data for each shared task will be prepared
and released by the respective organizers (for further information on this
initiative, please refer to the website <https://iwslt.org/2024/>).
Participants will receive instructions about how to submit their runs. In
addition, participants have the opportunity to present their work
through a system
paper that will be published in the ACL Proceedings.
Conference
IWSLT also invites submissions of scientific papers to be published in the
ACL Proceedings and presented either in oral or poster format. The
conference selects high-quality, original contributions on theoretical and
practical issues of spoken language translation research, technologies and
applications. For further information on this initiative, please refer to
the website <https://iwslt.org/2024/#paper-submission>
Contact
Please send an email to iwslt-evaluation-campaign(a)googlegroups.com if you
have any questions related to the shared tasks.
Thanks,
Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
(IWSLT organisers)