Due to the delay in EACL notifications we extend the deadline for ARR commitments to the First Workshop on Uncertainty-Aware NLP (UncertaiNLP) to January 20.
More info on the workshop: https://uncertainlp.github.io/
ARR commitment link: https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/UncertaiNLP_ARR…
We welcome submissions of papers that did not receive a positive decision by the main EACL conference and accepted findings papers who would like to apply for a presentation slot in the workshop.
——————————————
Jörg Tiedemann
University of Helsinki
https://blogs.helsinki.fi/language-technology/
ENGLISH VERSION BELOW
Mae'n dda gennym lansio TestunRhydd - pecyn cymorth dwyieithog ar-lein am ddim ar gyfer dadansoddi a delweddu data testun rhydd (o arolygon, holiaduron etc) yn Gymraeg a Saesneg. Mae TestunRhydd yn defnyddio rhai o’r gwasanaethau a’r methodolegau corpws o CorCenCC ac ACC (Crynhoi Testunau Cymraeg yn Awtomatig), ac yn eu hailbecynnu fel bod modd i gynulleidfaoedd a grwpiau o ddefnyddwyr newydd ddadansoddi eu data adborth eu hunain. Wedi'i gynllunio ar y cyd ag Ymddiriedolaeth Genedlaethol Cymru, Amgueddfa Cymru, Cadw, CBAC, a'r Ganolfan Dysgu Cymraeg Genedlaethol, mae TestunRhydd ar gael i unrhyw un mewn unrhyw sector yng Nghymru a'r tu hwnt. Mae TestunRhydd:
* yn dangos a yw eich data yn gadarnhaol a/neu'n negyddol (dadansoddi sentiment) ac mae modd delweddu'r canlyniadau a'u lawrlwytho.
* yn gadael i chi archwilio/delweddu geiriau, ymadroddion a themâu cyffredin yn eich data (mewn tablau, cymylau geiriau etc.).
* yn gadael i chi grynhoi data testun-rhydd, ac archwilio'r defnydd o eiriau a'u perthnasoedd.
Mae TestunRhydd ar gael fel cod agored gyda thrwydded Apache 2.0 (https://github.com/UCREL/FreeTxt-Flask), a thrwy ryngwyneb demo gwe lletyol yn: www.freetxt.app<http://www.freetxt.app/>. Mae’n ymgorffori offer cod agored eraill o’n prosiectau blaenorol fel CyTag (tagiwr rhannau ymadrodd Cymraeg), crynodebwr Cymraeg, a PyMUSAS (ar gyfer Cymraeg a Saesneg), gweler https://www.freetxt.app/about am fwy o fanylion.
Datblygwyd TestunRhydd fel rhan o brosiect ymchwil ar y cyd a ariannwyd gan yr AHRC 'TestunRhydd yn cefnogi dadansoddi data arolygon a holiaduron testun-rhydd dwyieithog' gyda chydweithwyr o Brifysgol Caerdydd a Phrifysgol Caerhirfryn (Rhif y Grant AH/W004844/1). Roedd y tîm yn cynnwys PY - Dawn Knight; CY - Paul Rayson, Mo El-Haj; Cydymeithion Ymchwil - Ignatius Ezeani, Nouran Khallaf a Steve Morris. Roedd Grŵp Ymgynghorol y Prosiect yn cynnwys cynrychiolwyr o: Ymddiriedolaeth Genedlaethol Cymru, Cadw, Amgueddfa Cymru, CBAC a'r Ganolfan Dysgu Cymraeg Genedlaethol
We’re excited to launch FreeTxt – a free bilingual online toolkit for analysing and visualising free-text data (from surveys, questionnaires etc.) in English and Welsh. FreeTxt draws on some of the corpus-based utilities and methodologies from CorCenCC and ACC (Welsh Automatic Text Summarisation), repackaging these to enable new audiences and user-groups to analyse their own feedback data. Co-designed in collaboration with National Trust Wales, Museum Wales, Cadw, WJEC, and National Centre for Learning Welsh, FreeTxt is accessible to anyone in any sector in Wales and beyond. FreeTxt:
* indicates if your data is positive and/or negative (sentiment analysis) and provides downloadable visualisations of results.
* allows you to explore/visualise common words, phrases and themes in your data (in tables, word clouds etc.).
* enables you to summarise free-text data, and examine word use and relationships.
FreeTxt is available open source with an Apache 2.0 licence (https://github.com/UCREL/FreeTxt-Flask), and via a hosted web demo interface at: www.freetxt.app<http://www.freetxt.app/>. It incorporates other open source tools from our previous projects such as CyTag (Welsh POS tagger), a Welsh summariser, and PyMUSAS (for English and Welsh), see https://www.freetxt.app/about for more details.
FreeTxt was developed as part of an AHRC funded collaborative 'FreeTxt supporting bilingual free-text survey and questionnaire data analysis' research project involving colleagues from Cardiff University and Lancaster University (Grant Number AH/W004844/1). The team included PI - Dawn Knight; CIs - Paul Rayson, Mo El-Haj; RAs - Ignatius Ezeani, Nouran Khallaf and Steve Morris. The Project Advisory Group included representatives from: National Trust Wales, Cadw, Museum Wales, CBAC | WJEC and National Centre for Learning Welsh.
--
Paul Rayson
Director of UCREL and Professor of Natural Language Processing
SCC Data Theme Lead
School of Computing and Communications, InfoLab21, Lancaster University, Lancaster, LA1 4WA, UK.
Web: https://www.research.lancs.ac.uk/portal/en/people/Paul-Rayson/
Tel: +44 1524 510357
Contact me on Teams<https://teams.microsoft.com/l/chat/0/0?users=p.rayson@lancaster.ac.uk>
CODI, 5th Workshop on Computational Approaches to Discourse
2024-03-21 - EACL 2024 - Malta
** Direct Submission deadline: January 22th, 2024 **
Direct submission: We now open submissions for papers rejected at another main conference.
The deadline has been updated to account for the delay in EACL notifications. Note that notifications will be sent on January 25 for direct submissions, and camera-ready will be due on January 30.
Website link: https://sites.google.com/view/codi2024
CODI considers for publication papers rejected at one of the main conferences, authors will have to submit both the paper and the reviews as a supplemantary pdf file. If modifications have been made since the original submission, please submit an additional file describing briefly the modifications made. The organizers will decide on the acceptance of the papers based on the quality of the paper and its fit with the workshop.
As a reminder, CODI also invites presentations of paper accepted at another main conference. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings.
Please submit your workshop papers (category: "direct submission") at https://softconf.com/eacl2024/CODI-2024/
In this newsletter:
Renew your LDC membership today
New publications:
KASET - Kurmanji and Sorani Kurdish Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S01>
LORELEI Farsi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T01>
________________________________
Renew your LDC membership today
The importance of curated resources for language-related education, research, and technology development drives LDC's mission to create them, to accept data contributions from researchers across the globe, and to broadly share such resources through the LDC Catalog. LDC members enjoy no-cost access to new corpora released annually, as well as the ability to license legacy data sets from among our 950+ holdings at reduced fees. Ensure that your data needs continue to be met by renewing your LDC membership or by joining the Consortium today.
Now through March 1, 2024, 2023 members receive a 10% discount on 2024 membership, and new or returning organizations receive a 5% discount. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/communications/newsletter/january-2022-newsletter> for more details on membership options and benefits.
________________________________
New publications:
KASET - Kurmanji and Sorani Kurdish Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S01> consists of 147 hours of telephone conversations (289 recordings) and broadcast news (410 recordings) in two Kurdish dialects: Kurmanji Kurdish and Sorani Kurdish along with transcripts covering 60 hours of those recordings. Kurdish is spoken primarily in Turkey, Iran, Iraq, and Syria. Sorani and Kurmanji are the two widely spoken dialects of the Kurdish language.
The telephone speech was generated from calls by native Kurdish speakers in the United States to North American acquaintances in their social network. The broadcast news audio was collected from multiple streaming radio and television broadcast programs (narrowband and wideband audio), many of which contained a mix of Kurmanji and Sorani Kurdish. Native speaker auditors identified a 5-10 minute span from each broadcast recording for transcription. Full telephone recordings that passed the native speaker audit were transcribed. This release includes speaker information, such as gender, year of birth, and language.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Farsi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T01> was developed by LDC and is comprised of approximately 250 million words of Farsi monolingual text, 120,000 Farsi words translated from English data, and 751,000 words of found Farsi-English parallel text. Approximately 75,000 words were annotated for named entities and up to 22,000 words were annotated for entity discovery and linking and situation frames (identifying entities, needs, and issues). Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
*The Third Ukrainian Natural Language Processing Workshop (UNLP 2024)*
<https://unlp.org.ua/>
UNLP 2024 features the first *Shared Task on Fine-Tuning Large Language
Models for Ukrainian*.
This Shared Task aims to challenge and assess LLMs' capabilities to
understand and generate Ukrainian, paving the way for LLM development in
Slavic languages.
*Task Description*
In this shared task, your goal is to instruction-tune a large language
model that can answer questions and perform tasks in Ukrainian. The model
should possess knowledge of Ukrainian history, language, and literature, as
well as common knowledge, and should be capable of generating fluent and
factually accurate responses.
The evaluation will be two-fold: accuracy of answers to multiple-choice
questions and human evaluation on a selection of text generation tasks.
You can find the instructions, sample data, and scripts at
https://github.com/unlp-workshop/unlp-2024-shared-task.
*Registration*
Teams that intend to participate should register by filling in this form
<https://forms.gle/MiC7pWsWbwBdSmoX9>.
*Publication*
Participants in the shared task are invited to submit a paper to the UNLP
2024 <https://unlp.org.ua/call-for-papers/> workshop. Submitting a
paper is *not
mandatory* for participating in the Shared Task. Papers must follow the
workshop submission instructions and will undergo regular peer review.
Their acceptance will not depend on the results obtained in the shared
task, but on the quality of the paper. Accepted papers will appear in the
ACL anthology and will be presented at a session of UNLP 2023 specially
dedicated to the Shared Task.
*Important Dates*
January 15, 2024 — Shared task announcement
February 15, 2024 — Release of test data to registered participants
February 22, 2024 — Registration deadline
February 23, 2024 — Submission of system responses
March 1, 2024 — Shared Task paper due
March 7, 2024 — Results of the Shared Task announced
March 29, 2024 — Notification of acceptance
TBD (mid-April) — Camera-ready Shared Task papers due
May 25, 2024 — Workshop date
*Contact*
Discord for the shared task: https://discord.gg/kCc6xgWbCJ
Email: info(a)unlp.org.ua
Website: https://unlp.org.ua/
Twitter: https://twitter.com/UNLP_workshop
Telegram: https://t.me/UNLP_workshop
Facebook: https://www.facebook.com/UNLPworkshop
TEICAI, Towards Ethical and Inclusive Conversational AI: Language Attitudes, Linguistic Diversity, and Language Rights at EACL 2024 on Malta-March 17-22, 2024.
Workshop website: https://sites.google.com/view/teicai2024
Submission link: https://softconf.com/eacl2024/TEICAI-2024/
Submission Deadline: 17 Januray 2024 (anywhere on earth)
Submissions are now being accepted for papers that were previously rejected at a major conference. Authors are required to submit their paper alongside the reviews it received, provided as a supplementary PDF file. In cases where the paper has undergone revisions since its original submission, authors should also include a separate file briefly outlining the changes made. The acceptance of these papers for the workshop will be determined by the organizers, based on the paper's quality and relevance to the workshop's theme.
We are also pleased to announce that our sponsor, e-COST ACTION Language in the Human-Machine Era (LITHME), is offering two to three travel grants for authors of selected accepted papers. More information about LITHME can be found at https://lithme.eu/.
Workshop Organizers:
Sviatlana Höhn, LuxAI, Luxembourg
Nina Hosseini-Kivanani, Faculty of Science, Technology and Medicine (FSTM), University of Luxembourg, Luxembourg
Dimitra Anastasiou, Luxembourg Institute of Science and Technology, Luxembourg
Angela Soltan, State University of Moldova, Moldova
Bettina Migge, University College Dublin, Ireland
Doris Dippold, University of Surrey, UK
Fred Philippy, Zortify, Luxembourg
Ekaterina Kamlovskaya, Translatables
Program Committee:
A list of program committee members is available on the workshop website.
For any preliminary questions, you're welcome to reach out to teicai2024(a)gmail.com .
You can follow us on LinkedIn (TEICAI) and Twitter (teicai2024) to get more updates about the workshop.
On behalf of the organizers
Nina Hosseini-Kivanani
University of Luxembourg
Apologies for cross-posting.
----------------------------------------
*The International Conference on Spoken Language Translation*
*21st IWSLT 2024 – **Second** Call for Participation*
*August 15-16, 2024 – Bangkok, Thailand*
*http://iwslt.org <http://iwslt.org/>*
The International Conference on Spoken Language Translation (IWSLT) is the
premier annual conference for all aspects of Spoken Language Translation.
Every year, the conference organizes and sponsors open evaluation campaigns
around key challenges in simultaneous and consecutive translation, under
real-time/low latency or offline conditions and under low-resource or
multilingual constraints. System descriptions and results from
participants’ systems and scientific papers related to key algorithmic
advances and best practices are presented.
IWSLT is the venue of the SIGSLTs, the Special Interest Group on Spoken
Language Translation of ACL, ISCA and ELRA. With a track record of 20
years, IWSLT benchmarks and proceedings serve as reference for all
researchers and practitioners working on speech translation and related
fields.
The 21st edition of IWSLT <https://iwslt.org/2024/> will be run as an
*ELRA/ACL* event and co-located with ACL 2024 <https://2024.aclweb.org/> on
August 15-16, 2024. It will be run as a hybrid event.
Important Dates
January 15, 2024: Release of shared task training and dev data
April 01-15, 2024: Evaluation period
April 29, 2024: Paper submission due (all papers)
June 4, 2024: Notification of acceptance
June 24, 2024: Camera-ready paper due
July 22, 2024: Pre-recorded video due
August 15-16, 2024: Conference
Evaluation
The IWSLT 2024 features shared tasks <https://iwslt.org/2024/#shared-tasks>
that address the following focus areas:
- Speech-to-speech track
- Simultaneous track
- Subtitling track
- Offline track
- Dubbing track
- Low-resource track
- Indic track
Training, development and test data for each shared task will be prepared
and released by the respective organizers (for further information on this
initiative, please refer to the website <https://iwslt.org/2024/>).
Participants will receive instructions about how to submit their runs. In
addition, participants have the opportunity to present their work
through a system
paper that will be published in the ACL Proceedings.
Conference
IWSLT also invites submissions of scientific papers to be published in the
ACL Proceedings and presented either in oral or poster format. The
conference selects high-quality, original contributions on theoretical and
practical issues of spoken language translation research, technologies and
applications. For further information on this initiative, please refer to
the website <https://iwslt.org/2024/#paper-submission>
Contact
Please send an email to iwslt-evaluation-campaign(a)googlegroups.com if you
have any questions related to the shared tasks.
Thanks,
Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
(IWSLT organisers)
The International Congress of Linguists (ICL) is organized once every five years as the meeting place for international linguistics, where all areas and sub-disciplines of linguistics as well as interdisciplinary topics can be discussed. Its 21st edition (https://icl2024poznan.pl/) will be held from 8 to 14 September 2024 in Poznań and now invites abstracts for Sections, Focus streams, and Workshops.
Call for Abstracts: Corpus Linguistics
Focus stream 8 invites abstracts of papers that examine the methods and applications of corpus linguistics. Topics may include the design and construction of corpora, the analysis and interpretation of corpus data, the use of corpus tools and software, and the implications of corpus findings for various linguistic domains and disciplines. The focus stream also explores the challenges and opportunities of corpus linguistics in the era of big data, artificial intelligence, and natural language processing.
Abstracts should clearly state the research question(s), approach, method, data, and (expected) results. They should not display the names of the presenters, nor their affiliations or addresses, or any other information that could reveal their authorship. They should contain the title, five keywords, and a text between 300 and 400 words (including examples, excluding references).
Each abstract will be reviewed anonymously by two reviewers (section/focus stream/workshop convenor + external reviewer).
Important dates
Feb 1, 2024: (Extended) submission deadline (12.00 PM CET). Submission link: https://easychair.org/conferences/?conf=icl2024poznan
Apr 15, 2024: Notification of acceptance.
Sep 11, 2024: Focus stream date
Presentations and posters
Authors may apply, upon abstract submission, for a presentation or a poster. Presentations will be organized in 30 minute slots (20 min. presentation, 7 min. discussion, 3 min. room change). Posters are always displayed during one full day. Separate time slots will be included in the program in which participants can discuss with the poster presenters.
Best regards –
Maciej Ogrodniczuk
Convenor of FS8: Corpus Linguistics at ICL 2024