Dear Colleagues,
We are delighted to announce the second CfP for The First Workshop of Evaluation of Multi-Modal Generation @ COLING 2025. It is now open and closes on 20th November 2024 (11:59PM AoE UTC-12).
Website link: https://evalmg.github.io/
The 1st Evaluation of Multi-Modal Generation Workshop @ COLING 2025
----------------------------------------------------------------------------------------------------------
Multimodal generation techniques have opened new avenues for creative content generation. However, evaluating the quality of multimodal generation remains underexplored and some key questions are unanswered, such as the contributions of each modal, the utility of pre-trained large language models for multimodal generation, and measuring faithfulness and fairness in multimodal outputs. This workshop aims to foster discussions and research efforts by bringing together researchers and practitioners in natural language processing, computer vision, and multimodal AI. Our goal is to establish evaluation methods for multimodal research and advance research efforts in this direction.
Call for Papers
----------------------
Both long paper and short papers (up to 8 pages and 4 pages respectively with unlimited references and appendices) are welcomed for submission.
A list of topics relevant to this workshop (but not limited to):
- Evaluation metrics for multimodal text generation for assessing informativeness, factuality and faithfulness
- New benchmark datasets, evaluation protocols and annotations
- Challenges in evaluating multimodal coherence, relevance and contribution of modalities and inter- and intra-interactions
- Assessing information integration and aggregation across multiple modalities
- Adversarial evaluation approaches for testing the robustness and reliability of multimodal generation systems
- Ethical considerations in the evaluation of multimodal text generation, including bias detection and mitigation strategies
- Multilingual multimodal text generation systems for low-resource languages
- Evaluating fairness and privacy in multimodal learning and applications
Important Dates
---------------------------
- Nov 20, 2024: Paper submission due date
- Dec 05, 2024: Notification of acceptance
- Dec 11, 2024: Camera-ready version due
- Jan 20, 2025: Workshop Date
Note: All deadlines are 11:59PM UTC-12:00 (“Anywhere on Earth”)
Submission Instructions
-------------------------------------
You are invited to submit your papers in our START/SoftConf submission portal. All the submitted papers have to be anonymous for double-blind review. The content of the paper should not be longer than 8 pages for long papers and 4 pages for short papers, strictly following the COLING 2025 templates, with the mandatory limitation section not counting towards the page limit. Supplementary and appendices (either as separate files or appended after the main submission) are allowed. We encourage code link submissions for reproducibility.
Submission Link: https://softconf.com/coling2025/EvalMG25
ACL style template: https://coling2025.org/calls/submission_guidlines/
Non-archival Option
-------------------------------
To promote discussions within the community, our workshop includes non-archival track. Authors have the flexbility to submit their unpublished work or papers accepted to COLING main conference to our workshop. The organisers may offer the opportunity to give oral or poster presentation.
Organisers
-------------------------------
* Wei Emma Zhang, The University of Adelaide
* Xiang Dai, CSIRO
* Desmond Elliot, University of Copenhagen
* Byron Fang, CSIRO
* Haojie Zhuang, The University of Adelaide
* Mong Yuan Sim, The University of Adelaide & CSIRO
* Weitong Chen, The University of Adelaide
Kind regards,
COLING25 EvalMG Organisers
ComputEL-8: Eighth Workshop on the Use of Computational Methods in the
Study of Endangered Languages
FINAL CALL FOR PAPERS for REGULAR SESSION (and SPECIAL SESSION)
Submission deadline (POSTPONED): October 14, 2024
Submission link: https://easychair.org/conferences/?conf=computel8
REGULAR SESSION
(For details about Special Session, scroll further below.)
We encourage submissions that explore the interface and intersection of
computational linguistics, documentary linguistics, and community-based
efforts in language revitalization and reclamation. This includes
submissions that:
(i) propose or demonstrate new methods or technologies for tasks or
applications focused on low-resource settings, and in particular,
endangered languages
(ii) examine the use of specific methods in the analysis of data from
low-resource languages, or propose new methods for analysis of such
data, oriented toward the goals of language reclamation and revitalization
(iii) propose new models for the collection, management, and
mobilization of language data in community settings, with attention to
e.g. issues of data sovereignty and community protocols
(iv) explore concrete steps for a more fruitful interaction among
computer scientists, documentary linguists, and language communities
IMPORTANT DATES
14-Oct-2024 Deadline for submission of papers or extended abstracts
22-Nov-2024 Notification of Acceptance
10-Jan-2025 Camera-ready papers due
4 & 5 March 2025 Workshop
PRESENTATIONS
Presentation of accepted papers will be in both oral sessions and a
poster session. The decision on whether a presentation for a paper will
be oral and/or poster will be made by the Organizing Committee on the
advice of the Program Committee, taking into account the subject matter
and how the content might be best conveyed. Oral and poster
presentations will not be distinguished in the Proceedings.
SUBMISSIONS
In line with our goal of reaching multiple overlapping communities, we
offer two modes of submission: extended abstract and full paper. The
mode of submission does not influence the likelihood of acceptance.
Either can be submitted to one of the workshop’s tracks: (a) language
community perspective and (b) academic perspective.
Submissions must be uploaded to EasyChair
(https://easychair.org/conferences/?conf=computel8) no later than
October 14, 2024 11:59PM (UTC-12, “anywhere on earth”). Submissions may
be considered for both the regular session and the special session.
All submissions must be anonymous following ACL guidelines and will be
peer-reviewed by the scientific Program Committee.
A. Extended Abstract:
Please submit anonymous abstracts of up to 1500 words, excluding
references. Extended abstracts must be submitted as attached documents.
B. Full Paper:
Please submit anonymously either a) a long paper - max. 8 pages
excluding references and appendices; or b) a short paper - max. 4 pages
excluding references, according to the style and formatting guidelines
provided in by ACL Style Files (download template files for LaTeX or
Microsoft Word: https://github.com/acl-org/acl-style-files).
PROCEEDINGS
The authors of selected accepted full papers (long or short) will be
invited by the Organizing Committee to submit their papers for online
publication via the open-access ACL Anthology. Final versions of long
and short papers will be allotted one additional page (altogether 5 and
9 pages) excluding references.
Proceedings papers should be revised and improved versions of the work
that was submitted for, and which underwent, review. Any revisions
should concern responses to reviewer comments or the addition of
relevant details and clarifications, but not entirely new, unreviewed
content. Camera-ready versions of the articles for publication will be
due on January 10, 2025.
Please see the ComputEL-8 website for further information:
https://computel-workshop.org/computel-8/
SPECIAL THEME SESSION - BUILDING TOOLS TOGETHER
In addition to the main session, ComputEL-8 invites self-identified
submissions to a special themed session on “Building Tools Together”,
oriented toward amplifying our shared understanding of how best to work
together across disciplinary and cultural boundaries to build
technological tools that support community language revitalization.
We invite presentations that: (1) describe collaborations in the
development of new tools and technologies; and/or (2) describe or
identify technological or computational needs within community language
reclamation contexts, and/or propose solutions.
1. For presentations that describe a collaboration among language
communities, academic researchers, and (in some cases) industry or
non-governmental organizations towards the development of new tools,
resources, and technologies in, we encourage submissions which address
questions such as:
a. How did the idea for the tool or technology come about?
b. How did the team members meet and come to work together?
c. What has been the impact of this tool? How are you evaluating it? How
has the project d. benefitted community efforts at language maintenance
and revitalization?
d. What are some challenges (logistical, technical, interdisciplinary,
intercultural) that you encountered, and how did you address them?
e. How have you balanced the needs and priorities of different team
members through the lifespan of the project?
f. What lessons have you learned that might benefit similar collaborations?
2. For presentations that identify technological or computational needs
within community language reclamation contexts, and/or propose
solutions, e we encourage submissions which address questions such as:
a. What is the need that this tool would meet? Who will it serve?
b. What is the blue-sky version of this tool? What is the minimum viable
product version?
c. What kinds of data, digital assets, or media content would be
required to create the tool, and how would they be assembled?
d. What challenges might the team face in the development process?
e. How do you anticipate the collaborative process to best incorporate
diverse areas of expertise from cultural and community-grounded
knowledge to academic, technical, and production-oriented knowledge?
Please submit anonymous extended abstracts of up to 1500 words,
excluding references.
Submissions representing community-led collaborations are strongly
encouraged.
Submissions must be uploaded to EasyChair
(https://easychair.org/conferences/?conf=computel8) no later than
October 14, 2024 11:59PM (UTC-12, “anywhere on earth”). Submissions may
be considered for both the regular session and the special session.
Notification of acceptance to the Special Session will be sent out by
November 22, 2024.
All authors of papers in the Special Theme Session will be invited to
contribute to a follow-up paper that synthesizes the findings of the
Session.
IMPORTANT DATES
14-Oct-2024 Deadline for submission of papers or extended abstracts
22-Nov-2024 Notification of Acceptance
10-Jan-2025 Camera-ready papers due
4 & 5 March 2025 Workshop
Please see the ComputEL-8 website for further information:
https://computel-workshop.org/special-theme-session-building-tools-together/
ORGANIZING COMMITTEE
Godfred Agyapong (University of Florida)
Antti Arppe (University of Alberta)
Aditi Chaudhary (Google DeepMind)
Jordan Lachler (University of Alberta)
Sarah Moeller (University of Florida)
Shruti Rijhwani (Google DeepMind)
Daisy Rosenblum (University of British Columbia)
Olivia Waring (University of Hawai'i Mānoa)
CONTACT US
WEB: https://computel-workshop.org/ComputEL-8/
EMAIL: computel.workshop(a)gmail.com
--
======================================================================
Antti Arppe - Ph.D (General Linguistics), M.Sc. (Engineering)
Professor of Quantitative Linguistics
Director, Alberta Language Technology Lab (ALTLab)
Project Director, 21st Century Tools for Indigenous Languages (21C)
Past President, ACL SIG for Endangered Languages (SIGEL)
Department of Linguistics, University of Alberta
E-mail: arppe(a)ualberta.ca, antti.arppe(a)iki.fi
WWW: www.ualberta.ca/~arppe, altlab.artsrn.ualberta.ca
Mānahtu ina rēdûti ihza ummânūti ihannaq - dulum ugulak úmun ingul
----------------------------------------------------------------------
(Apologies for cross-posting)
Version 2.0 of the HPLT Datasets is now published, with web-derived
corpora in 193 languages.
These collections are available under the Creative Commons CC0 license
and bring significant improvements compared to previous releases
(version 1.2). Similarly to 1.2, the release comes in two variants:
de-duplicated (21 TB in size) and cleaned (15 TB in size). The cleaned
variant contains the same documents as de-duplicated minus those
filtered out by our cleaning heuristics. The cleaned variant is the
recommended one, unless you want to try your own cleaning pipelines.
Download the corpora here:
https://hplt-project.org/datasets/v2.0
Similar to the previous releases, version 2.0 datasets are hosted by
Sigma2 NIRD Data Lake (https://www.sigma2.no/service/nird-data-lake),
and text extraction pipeline was run on LUMI supercomputer
(https://lumi-supercomputer.eu/).
*What's new*
- The size of the source web collections has increased 2.5x: 4.5
petabytes of compressed web data in total, mostly from Internet Archive
(https://archive.org/), but also from Common Crawl
(https://commoncrawl.org/).
- The text extraction pipeline now uses Trafilatura
(https://trafilatura.readthedocs.io/), which results in more efficient
boilerplate removal: thus, less noise in the data.
- Language identification now uses a refined version of OpenLID
(https://aclanthology.org/2023.acl-short.75/).
- This, in turn, allowed us to publish data in 193 languages, compared
to 75 languages in version 1.2.
- We switched from two-letter ISO 639-1 language codes to three-letter
ISO 639-3 language codes, augmented with a postfix denoting writing
system. For example, `pol_Latn` is Polish written in Latin script.
Mapping from the old to the new codes is available at
https://github.com/hplt-project/warc2text-runner/blob/main/stats/_langs/lan….
- The documents are now annotated with their compliance to the
robots.txt file of the original website. This metadata field can be used
to filter out documents explicitly forbidden for crawling by website
owners, making the resulting corpora somewhat less prone to copyright
violations . The cleaned variant contains only robots.txt compliant
documents. More details at
https://github.com/hplt-project/monotextor-slurm/blob/main/README.md#robots…
- De duplication is done at collection-level, not at dataset level.
- Documents have also been annotated for PII information with
multilingual-pii-tool
(https://github.com/mmanteli/multilingual-PII-tool). These are
identified in the form of Unicode character offsets for every match.
- Segment-level language-model-based scores have been replaced by
document quality scores computed with web-docs-scorer.
- Filtering and cleaning criteria have been simplified
(https://github.com/hplt-project/monotextor-slurm?tab=readme-ov-file#filters).
HPLT Monolingual Datasets version 2.0 (the de-duplicated variant)
feature about 7.6 trillion whitespace-separated words and about 52
trillion characters extracted from 21 billion documents, compared to 5.6
trillion words and 42 trillion characters extracted from 5 billion
documents in version 1.2. All in all, you can expect less noise and
boilerplate, less duplicates, more unique documents, and generally
better quality texts to train language models on or for other NLP tasks.
*How was this dataset produced*
You may want to read section 3 of our Deliverable HPLT pipeline and
tools
(https://hplt-project.org/HPLT_D7_2___HPLT_pipelines_and_tools.pdf) to
have a full description on how did we produced this dataset. If you
don't have much time for reading, maybe this chart is enough for your
purposes:
https://hplt-project.org/_next/static/media/dataset-pipeline-light.c2521ee1…
Each language is accompanied with an HPLT Analytics report. These
automated reports provide useful information and statistics about the
clean version of the HPLT v.2.0 datasets. They are the result of running
the HPLT Analytics Tool
(https://github.com/hplt-project/data-analytics-tool) on them. They are
helpful for inspecting the datasets even before downloading them.
*What is HPLT?*
HPLT (High Performance Language Technologies) is an EU Horizon Europe
funded project which aims at collecting large quantities of data in many
languages and training powerful and efficient language and translation
models. An important feature of HPLT is openness and transparency: all
the artifacts of the project are publicly available under permissive
licenses.
https://hplt-project.org/
--
Andrey
Language Technology Group (LTG)
University of Oslo
Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages.
>> Topics
LoResLM 2025 invites submissions on a broad range of topics related to the development and evaluation of neural language models for low-resource languages, including but not limited to the following.
*
Building language models for low-resource languages.
*
Adapting/extending existing language models/large language models for low-resource languages.
*
Corpora creation and curation technologies for training language models/large language models for low-resource languages.
*
Benchmarks to evaluate language models/large language models in low-resource languages.
*
Prompting/in-context learning strategies for low-resource languages with large language models.
*
Review of available corpora to train/fine-tune language models/large language models for low-resource languages.
*
Multilingual/cross-lingual language models/large language models for low-resource languages.
*
Applications of language models/large language models for low-resource languages (i.e. machine translation, chatbots, content moderation, etc.
>> Important Dates
*
Paper submission due – 5th November 2024
*
Notification of acceptance – 25th November 2024
*
Camera-ready due – 13th December 2024
*
LoResLM 2025 workshop – 19th / 20th January 2025 co-located with COLING 2025
>> Submission Guidelines
We follow the COLING 2025 standards for submission format and guidelines. LoResLM 2025 invites the submission of long papers of up to eight pages and short papers of up to four pages. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references), papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix.
To prepare your submission, please make sure to use the COLING 2025 style files available here:
*
Latex - https://coling2025.org/downloads/coling-2025.zip
*
Word - https://coling2025.org/downloads/coling-2025.docx
*
Overleaf - https://www.overleaf.com/latex/templates/instructions-for-coling-2025-proce…
Papers should be submitted through Softconf/START using the following link: https://softconf.com/coling2025/LoResLM25/
>> Organising Committee
*
Hansi Hettiarachchi, Lancaster University, UK
*
Tharindu Ranasinghe, Lancaster University, UK
*
Paul Rayson, Lancaster University, UK
*
Ruslan Mitkov, Lancaster University, UK
*
Mohamed Gaber, Birmingham City University, UK
*
Damith Premasiri, Lancaster University, UK
*
Fiona Anting Tan, National University of Singapore, Singapore
*
Lasitha Uyangodage, University of Münster, Germany
>> Programme Committee
*
Burcu Can - University of Stirling, UK
*
Çağrı Çöltekin - University of Tübingen, Germany
*
Debashish Das - Birmingham City University, UK
*
Alphaeus Dmonte - George Mason University, USA
*
Daan van Esch - Google
*
Ignatius Ezeani - Lancaster University, UK
*
Anna Furtado - University of Galway, Ireland
*
Amal Htait - Aston University, UK
*
Ali Hürriyetoğlu - Wageningen University & Research, Netherlands
*
Diptesh Kanojia - University of Surrey, UK
*
Jean Maillard - Meta
*
Maite Melero - Barcelona Supercomputing Centre, Spain
*
Muhidin Mohamed - Aston University, UK
*
Nadeesha Pathirana - Aston University, UK
*
Alistair Plum - University of Luxembourg, Luxembourg
*
Sandaru Seneviratne - Australian National University, Australia
*
Ravi Shekhar - University of Essex, UK
*
Taro Watanabe - Nara Institute of Science and Technology, Japan
*
Phil Weber - Aston University, UK
URL - https://loreslm.github.io/
Twitter - https://x.com/LoResLM2025
Best Regards
Tharindu Ranasinghe
[apologies if you receive multiple copies of this call]
Dear colleagues and friends,
*We are pleased to release the 1st Call for Participation to the
LLMs4Subjects Shared Task organized as part of SemEval 2025.*
*Overview:* As the first of its kind, LLMs4Subjects invites the research
community to *develop cutting-edge LLM-based semantic solutions for the
subject tagging of the Leibniz University's Technical Library's open-access
collection*. The shared task provides an opportunity for the research
community to creatively utilize LLMs for subject tagging of technical
records. *Systems need to demonstrate bilingual language modeling in
understanding technical documents in both German and English.* Moreover,
successful solutions may be directly integrated into the operational
workflows of the TIB Leibniz Information Centre for Science and Technology
University Library.
*What we provide to participants:* a human-readable form of a subject's
taxonomy (this is the *GND* or *Gemeinsame Normdatei*, the integrated
authority file used for cataloging in German-speaking countries) and a
large collection of technical records tagged with these subjects from the
TIB's open-access collection called *TIBKAT*.
More details on the task website:
https://sites.google.com/view/llms4subjects/
*LLMs4Subjects defines the following three tasks:*
- Task 1: Learn the GND
- Task 2: Align subject tagging to the TIBKAT collection
- (Optional and Fun) Task 3: Develop Elegant Frontend Interfaces for
Subject Tagging
*LLMs4Subjects will have three separate evaluations:*
- Evaluation 1: Quantitative Metrics-based Evaluations
- Evaluation 2: Qualitative Evaluations by the Human Subject Specialists
- (Optional) Evaluation 3: HCI evaluations for subject indexing interfaces
submitted
*To participate in the LLMs4Subjects shared task,*
1. please submit your interest to participate using our online form (
https://forms.gle/YQzupcoySAyJi45c6),
2. sign up to the shared task Google Groups (
https://groups.google.com/u/6/g/llms4subjects) for FAQs, news, and
announcements, and,
3. last but not the least, download the datasets (
https://github.com/jd-coderepos/llms4subjects/) to begin development.
*Dates*
Training and validation datasets available:October 2, 2024
Test data available/Evaluation starts: January 10, 2025
Evaluation ends: January 31, 2025
Participant paper submissions due: February 28, 2025
Notification to authors: March 31, 2025
Camera ready due: April 21, 2025
SemEval workshop: TBD
*Task Organizers*
Jennifer D'Souza, Sameer Sadruddin, Holger Israel, Mathias Begoin et al.
All organizers are affiliated with the TIB Leibniz Information Centre for
Science and Technology - Germany (https://www.tib.eu/en/)
*We look forward to having you on board!*
*Contact: * llms4subjects [at] gmail.com
Dear all,
This is the first announcement for the eLex 2025 conference on electronic lexicography, so please mark your calendars.
Dates: 18-20 November 2025
Location: Bled, Slovenia
Website: https://elex.link/elex2025/
Organiser: Centre for Language Resources and Technologies, University of Ljubljana
The theme of next year’s conference will be “Intelligent lexicography”. More information on the theme and the first call for papers will be published soon. We are also accepting proposals for pre- or postconference workshops.
Looking forward to seeing you in Slovenia
Iztok Kosem
In the name of the organising committee
-----------------------------------------------------
Workshop on Multilingual Counterspeech Generation
-----------------------------------------------------
Background and Scope
---------------------
While interest in automatic approaches to Counterspeech generation has been steadily growing,
including studies on data curation (Chung et al., 2019a; Fanton et al., 2021), detection (Chung
et al., 2021a; Mathew et al., 2018), and generation (Tekiroglu et al., 2020; Chung et al., 2021b;
Zhu and Bhat, 2021; Tekiroglu et al., 2022), the large majority of the published experimental work on automatic Counterspeech generation has been carried out for English. This is due to the scarcity of both non-English manually curated training data and to the crushing predominance of English in the generative Large Language Models (LLMs) ecosystem. A workshop on exploring Multilingual Counterspeech Generation is proposed to promote and encourage research on multilingual approaches for this challenging topic.
Thus, this workshop aims to test monolingual and multilingual LLMs in particular and Language Technology in general to automatically generate counterspeech not only in English but also in languages with fewer resources. In this sense, an important goal of the workshop will be to understand the impact of using LLMs, considering for example how to deal with pressing issues such as biases, hallucinated content, data scarcity or data contamination.
We seek to maximize the scientific and social impact of this workshop by promoting the
creation of a community of researchers from diverse fields, such as computer and social sciences, as well as policy makers and other stakeholders interested in automatic counterspeech generation. By doing so we aim to gain a deeper understanding of how counterspeech is currently used to tackle abuse by individuals, activists, and organizations
and how Natural Language Processing (NLP) and Generation (NLG) may be best applied to counteract it.
Call for Papers
---------------------
We welcome submissions on the following topics (but not limited to):
- Models and methods for generating counterspeech in different languages.
- Automatic Counterspeech generation for low resource languages with scarce training data.
- Dialogue agents that use counterspeech to combat offensive messages that are directed to individuals or groups, targeted based on various aspects such as ideology, gender, sexual orientation and religion.
- Methods for human and automatic evaluation of counterspeech.
- Multidisciplinary studies providing different perspectives on the topic such as computer science, social science, psychology, etc.
- Development of taxonomies and quality datasets for counterspeech in multiple languages.
- Potentials and limitations (e.g., fairness, biases, hallucinated content) of applying different NLP methods, such as LLMs, to generate counterspeech.
- Social impact and empirical studies of counterspeech in social networks, including research on the effectiveness and consequences for users of using counterspeech to combat hate online.
Submission
---------------------
We welcome two types of papers: regular workshop papers and non-archival submissions. Regular workshop papers will be included in the workshop proceedings. All submissions must be in PDF format and made through START [https://softconf.com/coling2025/MCG25/]
- Regular workshop papers: Authors can submit papers up to 8 pages, with unlimited pages for references. Authors may submit up to 100 MB of supplementary materials separately and their code for reproducibility. All submissions undergo an double-blind single-track review. Accepted papers will be presented as posters with the possibility of oral presentations.
- Non-archival submissions: Cross-submissions are welcome. Accepted papers will be presented at the workshop, but will not be included in the workshop proceedings. Papers must be in PDF format and will be reviewed in a double-blind fashion by workshop reviewers. We also welcome extended abstracts (up to 2 pages) of papers that are work in progress, under review or to be submitted to other venues. Papers in this category need to follow the COLING format.
Important Dates
---------------------
- Submission: November 20th, 2024
- Notification of Acceptance: December 2nd, 2024
- Camera-Ready Papers Due: December 10th, 2024
-----------------------------------------------------
Shared Task on Multilingual Counterspeech Generation
-----------------------------------------------------
*TRAIN DATA RELEASED!*
In addition to paper contributions, we are organizing a shared task on multilingual counterspeech generation with the aim of sharing in a central space current efforts, especially those for languages different to English.
It is envisaged that the shared task would allow the community to study how we can improve counterspeech generation for both lower resource languages but also to reinforce the strong body of research already existing for English.
The counterspeech generated by participants should be respectful, non-offensive, and contain information that is specific and truthful with respect to the following targets: Jews, LGBT+, immigrants,, people of color, women.
Data
---------------------
We release new data consisting of 596 Hate Speech-Counter Narrative (HS-CN) pairs. In this dataset, the HS are taken from MTCONAN [https://github.com/marcoguerini/CONAN/tree/master/Multitarget-CONAN], while the CN are newly generated. Together with each HS-CN pair, we also provide 5 background knowledge sentences, some of which are relevant for obtaining the Counter Narratives. The dataset is available in 4 different languages (Basque, English, Italian and Spanish) and divided in the following splits:
- Development: 100 pairs. [AVAILABLE NOW!] [https://huggingface.co/datasets/LanD-FBK/ML_MTCONAN_KN]
- Train: 396 pairs [AVAILABLE NOW!] [https://huggingface.co/datasets/LanD-FBK/ML_MTCONAN_KN]
- Test: 100 pairs [AVAILABLE ON 21st OCTOBER]
In order to score the shared task participants, the CNs will be kept hidden during the shared task while the HS and the background knowledge will be released for participants to prepare their submissions.
The languages, Basque, English, Italian and Spanish, offer a varied spectrum of complexity including an agglutinative language isolate (Basque), two romance languages (Italian, Spanish) and a Germanic one (English). The choice of languages obeys to the existing linguistic expertise among the organizers required to successfully run the shared task.
Additional Multilingual Data
---------------------
Participants also have available the English, Italian, Basque and Spanish CONAN manually curated data:
- CONAN (English and Italian) [https://github.com/marcoguerini/CONAN/tree/master/CONAN]
- CONAN (Basque and Spanish) [https://huggingface.co/datasets/HiTZ/CONAN-EUS]
- CONAN-MT-SP (Spanish) [https://github.com/sinai-uja/CONAN-MT-SP] or [https://huggingface.co/datasets/SINAI/CONAN-MT-SP]
- Multitarget-CONAN (English) [https://github.com/marcoguerini/CONAN/tree/master/Multitarget-CONAN]
Participation
---------------------
The aim of the shared task is, given a HS (and optionally any additional knowledge the participants may like to use), generate a CN to counteract the HS.
An example of CN appear bellow:
- English:
+ HS: We should ban all homosexuals.
+ CN: When will the love prosper and the hatred start to dissipate? I will not only respect my fellow LGBT+ people, I will promote their rights.
- Italian:
+ HS: Dovremmo espellere tutti gli omosessuali.
+ CN: Quand'è che l'amore prospererà e l'odio inizierà a dissiparsi? Non solo rispetterò i miei compagni LGBT+, ma promuoverò i loro diritti.
- Spanish:
+ HS: Deberíamos prohibir a todos los homosexuales.
+ CN: ¿Cuándo prosperará el amor y empezará a disiparse el odio? No sólo respetaré a mis compañeros LGBT+, sino que promoveré sus derechos.
- Basque:
+ HS: Homosexual guztiak debekatu beharko genituzke.
+ CN: Noiz hasiko da maitasuna irabazten eta gorrotoa desagertzen? LGBT+ pertsonak errespetatzeaz gain, haien eskubideak sustatuko ditut.
Participants will download the test HS for the 4 languages and generate at most three different CNs per HS for each language). The test window will last 5 days.
Participants are allowed to use any resource (language model, data, etc.) to generate the CN.
**Note:** If you are going to participate in the shared task, please fill the following form: [https://docs.google.com/forms/d/e/1FAIpQLSeAZTJsrEXt35HfFFchPNdPi289q5kKerq…]
Evaluation
---------------------
The CNs submitted by the participants will be evaluated:
- Using traditional automatic metrics as in Tekiro ̆glu et al.( 2022), which include BLEU, ROUGE, Novelty and Repetition Rate.
- Using LLM as a Judge following the approach described in this paper: https://arxiv.org/abs/2406.15227
Important Dates
---------------------
- Test dataset release: October 21st, 2024
- Results submission: October 25th, 2024
- Results notification: November 10th, 2024
- Working papers submission: November 20th, 2024
- Notification of Acceptance: December 3rd, 2024
- Camera-Ready Papers Due: December 10th, 2024
- Workshop: January 19th, 2025
For more information you can joint the Google group [https://groups.google.com/g/multilingual-cs-generation-coling2025] or visit our website [https://sites.google.com/view/multilang-counterspeech-gen/home] [https://sites.google.com/view/multilang-counterspeech-gen/shared-task]
Best regards,
The Multilingual Counterspeech Generation Workshop Organizers.
*** Apologies for cross-postings ***
*Researcher positions in Speech and Natural Language Processing (Junior &
Senior Positions) @ Vicomtech, San Sebastian/Bilbao, Spain*
Vicomtech (https://www.vicomtech.org/en/), an international applied
research centre specialised in Artificial Intelligence, Visual Computing
and Interaction located in Spain, has several research positions in the
field of speech and natural language processing.
We are seeking talented and motivated individuals to join our dynamic
Speech and Natural Language Technologies team in either our Donostia - San
Sebastián or Bilbao premises. If you have experience in speech and/or
natural language processing technologies and are passionate about applying
cutting-edge research to solve real-world needs through advanced
prototypes, this opportunity is for you!
Whether you are a junior researcher (BSc/MSc graduate) looking to kickstart
your career or a senior researcher (PhD graduate) eager to take on research
leadership roles, we are interested in your profile. We offer the perfect
environment with outstanding equipment and the best human team for growth.
You will participate in advanced research and development projects, with
opportunities to manage high-profile projects and/or lead technical teams
depending on your experience.
*Key Responsibilities: *
- Conduct cutting-edge research in Speech and Natural Language
Processing (NLP) technologies such as automatic speech recognition and
synthesis, audio deep fake detection, information extraction, machine
translation, text simplification and dialogue systems, among others.
- Contribute to national and international research projects.
- Develop advanced prototypes that transfer technology to businesses and
institutions.
- Manage or lead research projects, depending on experience.
*Requirements: *
- Bachelor’s or Master’s degree in Computer Science, Telecommunications
Engineering or related fields.
- For senior profiles, a PhD in Speech Processing, NLP, AI or related
disciplines is preferred. A PhD is not required for junior candidates.
- Strong programming skills (Python, Bash).
- Fluency in both spoken and written Spanish and English.
*Preferred Skills (Not Required but Valued): *
- Experience with speech and natural language processing tools and
libraries (e.g. Kaldi, Whisper, Marian NMT, HuggingFace Transformers, Rasa,
etc.). Deep learning frameworks (Pytorch, Tensorflow, ONNX).
- Virtualization technologies (Docker, Kubernetes).
- Experience in industrial and/or European research projects.
*What We Offer: *
- A vibrant, innovative research environment with state-of-the-art AI,
Visual Computing, and Interaction technologies.
- Exciting national and international research projects. A
multidisciplinary and renowned team in Speech and Language Technologies.
- Creative freedom in research, aligned with the centre’s goals.
- Opportunities for personal development through continuous learning.
- Clear career progression paths and leadership opportunities.
- Work-life balance policies and a commitment to equal employment
opportunities.
If you are passionate about research and eager to apply or develop your
expertise to real-world challenges, we encourage you to send us your CV and
join our forward-thinking team!
To apply via LinkedIn: https://www.linkedin.com/jobs/view/4034768411
--
[image: Vicomtech] <https://www.vicomtech.org>
Dr. Arantza del Pozo Echezarreta
Director of Speech and Natural Language Technologies
adelpozo(a)vicomtech.org
+(34) 943 30 92 30
+(34) 619 910 422
The information contained in this electronic message is intended only for
the personal and confidential use of the recipients. If you have received
this e-mail by mistake, please, notify us and delete it.
Avoid printing this message if it is not strictly necessary.
TL;DR
Mu-SHROOM<https://helsinki-nlp.github.io/shroom/> is a non-English-centric SemEval-2025 shared task to advance the SOTA in hallucination detection for content generated with LLMs. We’ve annotated hallucinated content in 10 different languages from top -tier LLMs. Participate in as many languages as you’d like by accurately identifying spans of hallucinated content. Stay informed by joining our Google group<https://groups.google.com/g/semeval-2025-task-3-mu-shroom> or our Slack<https://join.slack.com/t/shroom-shared-task/shared_invite/zt-2mmn4i8h2-HvRB…> or follow our Twitter account<https://x.com/mushroomtask>!
Full Invitation
We are excited to announce the Mu-SHROOM shared task on hallucination detection (link to website<https://helsinki-nlp.github.io/shroom/>). We invite participants to detect hallucination spans in the outputs of instruction-tuned LLMs in a multilingual context.
About
This shared task builds upon our previous iteration, SHROOM<https://helsinki-nlp.github.io/shroom/2024>, with three key improvements: LLM-centered, multilingual annotations & hallucination-span prediction.
LLMs frequently produce "hallucinations," where models generate plausible but incorrect outputs, while the existing metrics prioritize fluency over correctness. This results in an issue of growing concern as these models are increasingly adopted by the public.
With Mu-SHROOM, we want to advance the state-of-the-art in detecting hallucinated content. This new iteration of the shared task is held in a multilingual and multimodel context: we provide data produced by a variety of open-weights LLMs in 10 different languages (Arabic (modern standard), Chinese (Mandarin), English, Finnish, French, German, Hindi, Italian, Spanish, and Swedish).
Participants are invited to participate in any of the languages available and are expected to develop systems that can accurately identify and mitigate hallucinations in generated content.
As is usual with SemEval shared tasks, participants will be invited to submit system description papers, with the option to present them in poster format during the next SemEval workshop (collocated with an upcoming *ACL conference). Participants that elect to write a system description paper will be asked to review their peers’ submissions (max 2 papers per author)
Key Dates:
All deadlines are “anywhere on Earth” (23:59 UTC-12).
* Dev set available by: 02.09.2024
* Test set available by: 01.01.2025
* Evaluation phase ends: 31.01.2025
* System description papers due: 28.02.2025 (TBC)
* Notification of acceptance: 31.03.2025 (TBC)
* Camera-ready due: 21.04.2025 (TBC)
* SemEval workshop: Summer 2025 (co-located with an upcoming *ACL conference)
Evaluation Metrics:
Participants will be ranked along two (character-level) metrics:
1. intersection-over-union of characters marked as hallucinations in the gold reference vs. predicted as such
2. how well the probability assigned by the participants' system that a character is part of a hallucination correlates with the empirical probabilities observed in our annotations.
Rankings and submissions will be done separately per language: you are welcome to focus only on the languages you are interested in!
How to Participate:
* Register: Please register your team before making a submission on https://mushroomeval.pythonanywhere.com
* Submit results: use our platform to submit your results before 31.01.2025
* Submit your system description: system description papers should be submitted by 28.02.2025 (TBC, further details will be announced at a later date).
Want to be kept in the loop?
Join our Google group mailing list<https://groups.google.com/g/semeval-2025-task-3-mu-shroom> or the shared task Slack<https://join.slack.com/t/shroom-shared-task/shared_invite/zt-2mmn4i8h2-HvRB…>! You can also follow us on Twitter<https://x.com/mushroomtask>. We look forward to your participation and to the exciting research that will emerge from this task.
Best regards,
Raúl Vázquez and Timothee Mickus
On behalf of all the Mu-SHROOM organizers
We invite submissions to the *1st Workshop on Ecology, Environment, and
Natural Language Processing <https://econlpws2025.di.unito.it/>*. This
workshop will bring together the NLP community and stakeholders from
various disciplines to explore how computational linguistics and NLP tools,
methods, and applications can help address pressing climate change and
environment-related challenges. We are particularly interested in
contributions that push the boundaries of linguistics and NLP research in
the context of ecological and environmental crises and that foster
interdisciplinary collaboration.
The *topics of interests* include, but are not limited to:
*Sentiment Analysis of Environmental Topics*:
Evaluating public opinions on environmental issues across platforms such as
social media, news outlets, and other media (e.g., Bosco et al., 2023
<https://ceur-ws.org/Vol-3596/paper11.pdf>; Ibrohimelmustafaelmustafa et
al., 2023 <https://dl.acm.org/doi/pdf/10.1145/3604605>).
Automated Linguistic Analysis:
Studying grammatical, syntactical and lexical patterns from an
ecolinguistic perspective (e.g., Widanti, 2022
<http://influence-journal.com/index.php/influence/article/view/18>),
including analyses of corporate environmental reports and other
institutional communications (e.g., Gong, 2019
<https://helda.helsinki.fi/server/api/core/bitstreams/5a38650d-71c2-4a62-a33…>
).
Detection of Anthropocentric and Speciesist Biases
Identifying harmful biases in language and NLP applications, and developing
methods to mitigate them (e.g., Leach et al., 2021
<https://bpspsychub.onlinelibrary.wiley.com/doi/pdfdirect/10.1111/bjso.12561>;
Takeshita et al., 2022
<https://www.sciencedirect.com/science/article/pii/S0306457322001558?casa_to…>
).
*Topic Modeling & Discourse/Frame Analysis*:
Investigating how environmental issues are framed in media and political
discourse and how these frames influence public perception and policymaking
(e.g., Dehler-Holland et al., 2021
<https://www.sciencedirect.com/science/article/pii/S2666389920302336>).
*Geo-tagging and sentiment mapping of environmental discussions*:
Mapping environmental discussions and sentiments across geographical
locations (e.g., Yao & Wang, 2020).
*Ecofeminism, environmental justice, and language*:
Exploring the intersections of gender, justice, and ecological narratives,
and how NLP can help analyze language in these contexts.
*Text Classification in Environmental Contexts*:
Categorizing texts into specific environmental subfields such as
biodiversity, climate change, and conservation, and using NLP to monitor
compliance with environmental regulations (e.g., Schimanski et al., 2023
<https://aclanthology.org/2023.emnlp-main.975.pdf>; Grasso & Locci, 2024
<https://link.springer.com/chapter/10.1007/978-3-031-70242-6_29>).
Entity Recognition, Relation Extraction, and Environmental Monitoring
Identifying and tracking mentions of species, habitats, pollutants, and
ecological phenomena in text (e.g., Abdelmageed et al., 2022).
Fact-checking & Greenwashing Detection
Analyzing corporate sustainability reports for accuracy and detecting
greenwashing practices (e.g., Moodaley & Telukdarie, 2023
<https://www.mdpi.com/2071-1050/15/2/1481>; Cojoianu et al., 2020
<https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3627157>).
Further topics include:
- Ecolinguistic applications of NLP.
- Large Language Models (LLMs) application in Climata Change and
Environmental domain.
- Analyzing Social Media for Harmful Environmental Narratives.
- Corpora creation and annotation.
- Fairness and ethics in environmental data analysis.
- Environmental communication in low-resource languages.
- Multimodal analysis for ecological and environmental challenges.
- Lexical analysis in the context of sustainability and environmental
discourse.
- Linked Data and Knowledge Graphs on ecological topics.
- Language diversity and inclusion in environmental narratives.
- Cognitive models and ecological narratives.
- NLP for understanding indigenous knowledge in environmental contexts.
- Machine learning techniques for analyzing environmental communication.
- NLP for tracking environmental legislation and policy discourse.
- NLP for analyzing environmental education and awareness campaigns.
- Speech recognition technologies to support ecological field research;
- Development of educational chatbots or FAQs for raising environmental
awareness.
Key Dates
- *Paper Submission Deadline*: December 16, 2024
- *Notification of Acceptance*: TBA
- *Camera-Ready Deadline*: February 3, 2025
- *Workshop Date*: March 2, 2025
*Submission Instructions:*
The workshop will accept *archival* submissions, *non-archival*
submissions, as well as *research communications* . *Non-archival
submissions* refer to new work that will not appear in the proceedings,
while *research communications* consist of work already published at other
venues (e.g., conferences, journals) that can be presented at the workshop
but will not be included in the proceedings.
Submissions should follow the *NoDaLiDa/Baltic-HLT 2025
<https://www.nodalida-bhlt2025.eu/call-for-papers> *formatting templates
and guidelines; We invite paper submissions of three types:
- Regular paper (up to 8 pages)
- Short papers (up to 4 pages)
- Demo papers (up to 4 pages)
For all three submission types, these page limits do not include additional
pages with bibliographic references. We do not allow any extra pages for
appendices.
*Submission and reviewing* will be conducted through *OpenReview* (link to
submission TBA)
All submissions will undergo *double-blind peer review*, adhering to
professional standards.