Dear Colleagues,
We invite you to submit your research and perspectives to ALL 4 Health 2024
– The First Workshop on Applying LLMs in LMICs for Healthcare Solutions.
Submission Deadline: March 1st, 2024
Website: https://www.nivi.io/all4health
Contact: all-4-health(a)googlegroups.com
ALL 4 health will be held at the University of Florida on June 3rd in
conjunction with the IEEE International Conference on Healthcare Informatics
<https://ieeeichi2024.github.io/> (ICHI 2024
<https://ieeeichi2024.github.io/>). There has been substantial and growing
interest and funding from the development sector in applying Large Language
Model (LLM) technologies in Low- and Middle-Income Countries (LMICs) to
address healthcare and other social good challenges.[1] Simultaneously,
there have been acknowledgements from the software industry and from NLP
researchers that state of the art LLMs are heavily influenced by Western /
developed world data and have significant capability gaps between high- and
low-resource languages.[2,3,4] Additional research and collaboration is
required to bridge this gap.
The goal of this workshop is to bring together researchers and
practitioners from diverse disciplinary backgrounds to discuss challenges
and opportunities for applying LLMs for health applications in low-resource
settings, and to share findings on gaps, pitfalls, best practices, and
opportunities for impact.
We invite novel approaches, works in progress, comparative analyses of
tools, and advancing state-of-the-art work relevant to applying LLMs for
health applications in low-resource languages and settings. Specific topics
of interest include, but are not limited to:
* Evaluations of LLMs in contexts with substantial code-switching
* Comparisons of LLM accuracy/suitability between high- and low-resource
languages
* Approaches to localizing the health information processing of LLMs in the
context of the laws, culture, service availability, and public health
realities in specific LMICs
* Data sources for training or tuning LLMs for use on low-resource
languages or in LMIC contexts
* Studies demonstrating the health or health knowledge impact of LLM
applications in low-resource language and/or LMIC contexts
* Equity- and Diversity-based evaluations of LLM performance on health
domain tasks
* Evidence-based position papers on best practices
We will accept full papers (4-6 pages, including references) and abstracts
(2 pages, including references). Full papers will be eligible for a Best
Paper Award with a $300 (USD) prize sponsored by MSD for Mothers
<https://www.msdformothers.com/>.
Please see https://www.nivi.io/all4health for further information including
submission instructions.
Best wishes,
The ALL 4 Health organizing committee
all-4-health(a)googlegroups.com
https://www.nivi.io/all4health
References:
1.
R. Shrivastava. “Gates Foundation Funds Nearly 50 Generative AI Projects
In Low And Middle Income Countries.” Forbes, 10 August 2023,
https://www.forbes.com/sites/rashishrivastava/2023/08/10/gates-foundation-f…
2.
Viet Dac Lai, et al. "Chatgpt beyond english: Towards a comprehensive
evaluation of large language models in multilingual learning.
<https://arxiv.org/abs/2304.05613>" arXiv preprint arXiv:2304.05613
(2023).
3.
J. Dodge, et al. "Documenting large webtext corpora: A case study on the
colossal clean crawled corpus. <https://arxiv.org/abs/2104.08758>" arXiv
preprint arXiv:2104.08758 (2021).
4.
N.R. Robertson, et al. "ChatGPT MT: Competitive for High- (but not Low-)
Resource Languages. <https://arxiv.org/abs/2309.07423>" arXiv preprint
arxiv:2309.07423 (2023).
Second call for papers for the LREC-COLING2024 pre-conference workshop:
Holocaust Testimonies as Language Resources
*
*Date:21 May 2024 (full day)
Venue:Lingotto Conference Centre, Turin, Italy
Webpage: https://www.clarin.eu/HTRes2024
Submission Deadline: 21 February 2024
Submission Portal:https://softconf.com/lrec-coling2024/htres2024/
*Workshop description*
Holocaust testimonies serve as a bridge between survivors and history’s
darkest chapters, providing a connection to the profound experiences of
the past. Testimonies stand as the primary source of information that
describe the Holocaust, offering first-hand accounts and personal
narratives of those who experienced it. The majority of testimonies are
captured in an oral format, as survivors vividly explain and share their
personal experiences and observations from that time period.
Transforming Holocaust testimonies into a machine-processable digital
format can be a difficult task owing to the unstructured nature of the
text. The creation of accessible, comprehensive, and well-annotated
Holocaust testimony collections is of paramount importance to our
society. These collections empower researchers and historians to
validate the accuracy of socially and historically significant
information, enabling them to share critical insights and trends derived
from these data. This workshop will investigate a number of ways in
which techniques and tools from natural language processing and corpus
linguistics can contribute to the exploration, analysis, dissemination
and preservation of Holocaust testimonies.
The workshop is supported by CLARIN and the European Holocaust Research
Infrastructure (EHRI).
We expect contributions related to the following topics:
Creation of datasets and development of tools for the study of Holocaust
testimonies:
* Creation of language corpora of Holocaust testimonies
* Digitization and enhancement of oral and written testimonies
(including automatic speech recognition, alignment of text and
speech, format conversion, OCR, handwriting recognition, machine
translation)
* Named entity recognition for identifying people, places, and events
in testimonies
* Standards, representation formats, and guidelines for annotations
and vocabularies relevant to the Holocaust testimonies
* Creation, adaptation and tuning of software applications for the
creation, annotation, enhancement and use of Holocaust testimonies
as language resources
* Research usingand Holocaust testimonies
o Applications of NLP in analysing Holocaust survivor testimonies
o Sentiment analysis and emotional content extraction from
survivor narratives.
* Data Visualisation, Knowledge Representation and Information
Extraction:
o Visualising complex data structures from Holocaust testimonies
o Building knowledge graphs and networks to represent historical
relationships
o Interactive data visualisations for education and research
o Extracting biographical and temporal information relevant to the
Holocaust
o Deep learning and large language models
* Digital Archiving and Long-Term Preservation:
o Methods and tools for digitising and preserving Holocaust
testimonies
o Best practices for metadata standards and cataloguing
o Ensuring long-term accessibility and data integrity
* Ethical Considerations and Privacy
o Ethical challenges in digitising and sharing sensitive testimonies
o Anonymisation and privacy protection in Holocaust data
o Community engagement and consent in digital projects
* User and application aspects
o Development of tools and interfaces for the search, analysis and
exploration of Holocaust testimonies
o Other relevant use cases and application scenarios
All papers must clearly state and explain their relevance to the topic
of 'Holocaust Testimonies as Language Resources'.
*Submission & Publication*
All papers must represent original and unpublished work that is not
currently under review. Papers will be evaluated according to their
significance, originality, technical content, style, clarity, and
relevance to the workshop. We welcome the following types of contributions:
* Standard research papers (up to 8 pages, plus more pages for
references if needed);
* Short research papers (from 4 to 6 pages, plus more pages for
references if needed).
Submissions must be anonymous and strictly follow theLREC2024 stylesheet
formatting<https://lrec-coling-2024.org/authors-kit/>guidelines. All
papers should be electronically submitted in PDF format via the main
conference platform viaSTART
<https://softconf.com/lrec-coling2024/htres2024/>.
*Important Dates*
* *Paper submission deadline:*21 February 2024
* *Notification of acceptance:*20 March 2024
* *Camera-ready paper:*15 April 2024
* *Workshop date:*21 May 2024
*Organising Committee*
* Isuri Anuradha, University of Wolverhampton, UK
* Ingo Frommholz, University of Wolverhampton
* Francesca Frontini, CNR-ILC, Italy & CLARIN
* Martin Wynne, Oxford University, UK
* Ruslan Mitkov, Lancaster University, UK
* Paul Rayson, Lancaster University, UK
* Alistair Plum, University of Luxembourg, Luxembourg
*Programme Committee*
* Le An Ha, Ho Chi Minh City University of Foreign Languages and
Information Technology, Vietnam
* Federico Boschetti, CNR-Istituto di, Linguistica Computazionale “A.
Zampolli”, Italy
* Estelle Bunout, University of Luxembourg, Luxembourg
* Martin Bulin, University of West Bohemia, Czech Republic
* Tim Cole, University of Bristol, UK
* Angelo Mario Del Grosso, CNR-Istituto di, Linguistica Computazionale
“A. Zampolli”, Italy
* Maria Dermentzi, King’s College London, UK
* Robert Ehrenreich, USHMM, USA
* Ignatius Ezeani, Lancaster University, UK
* Ian Gregory, Lancaster University, UK
* Wolf Gruner, Shoah Foundation, USA
* Arjan van Hessen, Radboud University
* Henk van den Heuvel, Radboud University & CLARIN ERIC
* Renana Keydar, The Hebrew University of Jerusalem, Israel
* William J.B. Mattingly, USHMM, USA
* Patricia Murrieta-Flores, Lancaster, University, UK
* Maciej Ogrodniczuk, Institute of Computer, Science, Polish Academy
of Sciences, Poland
* Maciej Piasecki, Wroclaw University of Science and Technology, Poland
* Rachel Pistol, King’s College London, UK
* Johannes-Dieter Steinert, University of Wolverhampton, UK
* Jan Svec, University of West Bohemia
* Gabor Toth, University of Luxembourg,Luxembourg
* Eveline Wandl-Vogt, Austrian Academy of Sciences, Vienna
--
Senior Researcher in Corpus Linguistics
Faculty of Linguistics, Philology and Phonetics, University of Oxford
National Co-ordinator, CLARIN-UK
martin.wynne(a)ling-phil.ox.ac.uk
https://orcid.org/0000-0002-4155-0530
I am looking for two researchers to join my new group at the faculty of computer science at the University of Göttingen. The group is part of the Campus Institute Data Science (CIDAS). Our research is interdisciplinary at its core and we cooperate closely with colleagues from other faculties (e.g., psychology, humanities).
We take a human-centered perspective on natural language processing research and focus on the following topics:
* the cognitive plausibility, interpretability, and generalization capabilities of language processing models
* cross-lingual transfer and typological diversity in multilingual models
* exploring how language processing differences between humans and computers can guide the development of more efficient models
* language technology for education (e.g., readability and simplification, exercise generation, learner modeling)
The full descriptions can be found here:
PostDoc: https://www.uni-goettingen.de/en/644546.html?&details=74388
Phd: https://www.uni-goettingen.de/en/644546.html?&details=74387
Please forward this info to potential candidates. Don't hesitate to contact me if you have any questions about the positions.
Best regards,
Lisa Beinborn
Dear colleagues,
We would like to invite you to submit the unpublished results of your
research on Knowledge Graphs and Large Language Models to:
The 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM), to
be held on August 15, 2024, co-located with ACL 2024, Bangkok, Thailand.
First Call for Participation
Submission Deadline: May 10, 2024 at 23:59, UTC -12h, AoE
Website: https://kallmworkshop.github.io/kallm2024/
Contact email: kallmworkshop2024(a)googlegroups.com
The workshop intends to provide a platform for researchers, practitioners,
and industry professionals to explore the synergies between LLMs and KGs.
We aim to provide a space for the LLM community and the community of KG
researchers to interact and explore how these two communities could
collaborate and support one another.
Important Dates
Submission Starts: Feb 05, 2024
Submission Deadline: May 10, 2024
Author Notifications: June 17, 2024
Camera-Ready Deadline: July 1, 2024
Workshop Date: August 15, 2024
Submission Guidelines:
Papers must be submitted in PDF format using the official ACL template.
More details are available on the website.
Scope of the workshop:
KaLLM invites quality research contributions as short or long papers and
resource papers. All submissions will undergo a double-blind review
process, and accepted submissions will be presented at the workshop.
The submissions should focus on the interaction between LLMs and KGs in the
context of NLP. The workshop will cover a diverse range of topics related
to the integration of LLMs and KGs, including but not limited to:
-
Knowledge-enhanced language generation
-
KG-based question answering using LLMs
-
Fact validation and bias mitigation
-
KG creation and completion using LLMs
-
Privacy considerations in LLM-KG integration
-
Interpretability and explainability
-
Cross-domain applications
-
KG-based text summarisation with LLMs
-
Ethical implications of LLM-KG technologies
-
Multimodality of KGs and LLMs
-
Multilingual LLMs for KGs and vice-versa
We look forward to receiving your submissions and having your valuable
contribution to the success of the workshop. If you have any questions or
require further information, please do not hesitate to contact us at
kallmworkshop2024(a)googlegroups.com or visit
https://kallmworkshop.github.io/kallm2024/.
Thank you and best regards,
Russa Biswas on behalf of Workshop Organisers
Postdoctoral Researcher
Hasso Plattner Institute, Germany
KONVENS 2024: First Call for Papers
We warmly welcome the submission of papers for KONVENS 2024, scheduled from September 9 to 13, 2024, at the University of Vienna, Austria. In addition to its technical program, KONVENS will facilitate dynamic interactions among academic researchers and industry peers, offering workshops, tutorials, shared tasks, and networking events.
PAPER SUBMISSION INFORMATION
We invite submissions of original and unpublished works in the fields of research, development, applications, and evaluation, encompassing all aspects of natural language processing, from fundamental inquiries to the practical implementation of natural language resources, components, and systems. We particularly encourage submissions of NLP approaches dedicated to the German language, including survey papers that provide insights into the current state of the art in German language and speech processing. We welcome contributions from both academic and industry professionals.
We welcome the following types of paper submissions:
* Long papers (8 pages plus references and appendix), describing original research with substantial new results.
* Short papers (4 pages plus references and appendix), including small focused contributions, work in progress, as well as descriptions of projects, systems and resources.
Accepted papers will be presented orally or as posters as determined by the program chairs. The decisions will be based on the nature rather than the quality of the work. The conference language is English. Only contributions written in English will be accepted. Each submission must include a mandatory discussion of Ethical Considerations as well as a section on Limitations (both sections do not count towards the page limit). Papers without these sections will be desk-rejected. The review process will be double-blind. Submissions must be anonymized accordingly. The conference proceedings are planned to be published via the ACL Anthology.
Papers must be formatted in accordance with the ACL style sheets<https://github.com/acl-org/acl-style-files>. We strongly encourage authors to use LaTeX in preparing their document. Papers must be submitted electronically. The submission link will be provided soon.
IMPORTANT DATES
* April 30th, 2024: Paper submission due (all submission types)
* June 30th, 2024: Notification of acceptance
* July 15th, 2024: Camera-ready papers due
* September, 9th-13th, 2023: KONVENS
Mit freundlichen Grüßen / Best regards
the KONVENS-2024 organization team
konvens-2024(a)googlegroups.com
https://konvens-2024.univie.ac.at/
*** Last Call for Workshop Papers ***
36th International Conference on Advanced Information Systems Engineering
(CAiSE'24)
June 3-7, 2024, 5* St. Raphael Resort and Marina, Limassol, Cyprus
https://cyprusconferences.org/caise2024/
(*** Submission Deadline: 26th February, 2024 AoE ***)
CAiSE is a well-established, highly visible conference series on Advanced Information Systems
(IS) Engineering. It covers all relevant topics in the area, including methodologies and
approaches for IS engineering, innovative platforms, architectures and technologies, and
engineering of specific kinds of IS. CAiSE conferences also have the tradition of hosting
workshops in related fields. Workshops are intended to focus on particular topics and provide
ample room for discussions of new ideas and developments.
CAiSE'24, the 36th edition of the CAiSE series, will host the following workshops. For more
information for each workshop please visit the workshops' web sites.
CAiSE'24 WORKSHOPS
• 3rd International Workshop on Agile Methods for Information Systems Engineering (Agil-ISE)
https://agilise.github.io/2024/index.html
• International Workshop on Blockchain for Information Systems (BC4IS24) and Blockchain for
Trusted Data Sharing (B4TDS)
https://pros.unicam.it/bc4isb4tds/
• 2nd International Workshop on Hybrid Artificial Intelligence and Enterprise Modelling for
Intelligent Information Systems (HybridAIMS)
https://hybridaims.com/
• 2nd Workshop on Knowledge Graphs for Semantics-driven Systems Engineering
https://www.omilab.org/activities/events/caise2024_kg4sdse/
• 16th International Workshop on Enterprise & Organizational Modeling and Simulation
(EOMAS 2024)
https://eomas2024.fel.cvut.cz/
• Digital Transformation with Business Process Mining (DigPro2024)
https://digpro.iiita.ac.in/
IMPORTANT DATES
• Paper Submission Deadline: 26th February, 2024 (AoE)
• Notification of Acceptance: 27th March, 2024
• Camera-ready Deadline: 5th April, 2024
• Author Registration Deadline: 5th April, 2024
WORKSHOP CHAIRS
• João Paulo A. Almeida, Federal University of Espírito Santo, Brazil
• Claudio di Ciccio, Sapienza University of Rome, Italy
• Christos Kalloniatis, University of the Aegean, Greece
Please consider contributing and/or forwarding to appropriate colleagues
and groups.
*******We apologize for the multiple copies of this e-mail******
--------------------------------------------------------------------------------------------------------------------
Call for Participation
--------------------------------------------------------------------------------------------------------------------
DETESTS-Dis IberLEF 2024
Task: DETESTS-Dis (DETEction and classification of racial Stereotypes in
Spanish – Learning with Disagreement)
This task will take part of IberLEF 2024
<https://sites.google.com/view/iberlef-2024/home?authuser=0>, the 6th
Workshop on Iberian Languages Evaluation Forum at the SEPLN 2024
Conference, which will be held in Valladolid, Spain, on September 24th.
-------------------------------------------------------------------------------------------------------------------
Here, we introduce the second edition of the DETESTS task (Ariza-Casabona,
2022
<http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6442>),
which was first presented at IberLEF 2022. The aim of the new edition,
DETESTS-Dis, is to detect and classify explicit and implicit stereotypes in
texts from social media and comments on news articles, incorporating
learning with disagreement techniques. Next, a description of both subtasks
is provided:
-
Subtask 1, Stereotype Identification: This is a binary classification
task the aim of which is to determine whether a comment or sentence
contains at least one stereotype or none, considering the full distribution
of labels provided by the annotators. This subtask follows the SemEval 2021
Task 12 (Uma et al., 2021 <https://aclanthology.org/2021.semeval-1.41/>)
proposal about learning with disagreement, in which the authors state that
there does not necessarily exist a single gold label for every sample in
the dataset. This fact is particularly evident when multiple contradictory
annotations arise at the data labeling stage due to “debatable, subjective,
or linguistic ambiguity”. The actual gold label of this subtask is left as
a proxy to determine the subset of comments that will be evaluated in the
posterior subtask.
-
Subtask 2 (Optional), Implicitness Identification: This subtask
introduces a novel binary classification problem to determine whether the
stereotype is manifested or latent within the text, that is, whether the
stereotype is implicit or explicit. The added difficulty in this case is
that implicit stereotypes are not directly expressed in the text, and a
process of inference must be applied by the annotators. Moreover, there are
different strategies in which an implicit stereotype can be coded, such as
metaphors, irony and other figures of speech, evaluations of the in-group,
and the overgeneralization of a social group from features of some of its
members. This subtask will be presented as a hierarchical binary
classification problem.
Although we recommend participating in both subtasks, participants are
allowed to participate just in one of them (e.g., subtask 1).
Teams will be allowed (and encouraged) to submit multiple runs (max. 5).
To avoid any conflict with the sources of the comments regarding their
intellectual property rights (IPR), the data will be sent privately to each
participant who is interested in the task. The corpus will only be made
available for research purposes.
Important dates (All deadlines are 11:59 PM UTC-12:00):
Training dataset release: March 04, 2024
Test dataset release: April 15, 2024
Systems results: April 29, 2024
Results notification: May 13, 2024
Working papers submission: June 3, 2024
Working papers (peer-)reviewed: June 17, 2024
Camera-ready versions: July 4, 2024
Workshop: September 24, 2024
Task organizers:
-
Mariona Taulé (Universitat de Barcelona, UB)
-
Wolfgang Schmeisser (Universitat de Barcelona, UB)
-
Alejandro Ariza (Universitat de Barcelona, UB)
-
Pol Pastells (Universitat de Barcelona, UB)
-
Mireia Farrús (Universitat de Barcelona, UB)
-
Simona Frenda (Università degli Studi di Torino, UniTo)
-
Paolo Rosso (Universitat Politècnica de València, UPV)
Contact:
Contact the organizers by writing to: detests.iberlef(a)gmail.com
Web page: https://detests-dis.github.io/
We invite participants to join our Google Groups
<https://groups.google.com/u/1/g/detests-dis> to be kept up to date with
the latest news related to the task.
Our NLP department is expected to grow to 20 full-time faculty, and we are
now seeking applications for new faculty appointments. We invite candidates
at all levels to apply.
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) is the
world’s first AI university. It is a graduate-level research-oriented
university with 200 Master’s students and 85 PhD students (from over 40
nationalities; 31% female), and over 60 world-leading faculty members. It
is ranked 17th in the world for AI, and is one of only 8 universities in
the world to be ranked in the top-30 for all of CV, ML, and NLP (
csrankings.org). The university started with three departments (machine
learning, computer vision, and natural language processing), and has
recently launched two new departments (robotics and computer science).
The NLP department undertakes rigorous, high-impact, and original research.
Topics of interest include, but are not limited to, speech processing,
spoken language understanding and dialog, large language models and their
responsible applications, cognitive computational linguistics, multilingual
and multimodal models, countering misinformation and fake news, and the
synergies between NLP and AI in general, for example, embodied cognition.
Applicants must have a PhD in Speech/Natural Language Processing or in a
closely related area prior to the start of the appointment, and should have
demonstrated excellence in conducting innovative and impactful research,
and an interest in mentoring graduate students. MBZUAI offers an attractive
remuneration package and generous start-up funds to stimulate cutting edge
research and support faculty with building strong research programs.
Interested applicants must submit their materials (cover letter, Curriculum
Vitae, research statement, teaching statement, and contact information for
three references) via this link:
http://apply.interfolio.com/137880
We strongly encourage early application for appointment during 2024.
Applications will be accepted and reviewed on a rolling basis. For more
information about MBZUAI, see https://mbzuai.ac.ae/
Dear Sir/Mam
Greetings of the day. We invite you to review the https://www.aictc.in/
Springer 5th International Conference on Advances in Information
Communication Technology & Computing from 29th to 30th April - 2024. If you
are interested please fill the form. We will provide you with a certificate
for the same. The review will be on the CMT platform
https://forms.gle/2QtdZCWCQ3uzEU346
Anticipating your positive response for contributing in academia.
-------------------
Best Regards
Dr.Rituraj Soni
Assistant Professor, Senior Member IEEE
Department of Computer Science & Engineering,
Room No. F-50, C Wing,
Engineering College Bikaner
Karni Industrial Area, Pugal Road,
Bikaner, Rajasthan
9414059125,
https://scholar.google.com/citations?user=TZfsmQ0AAAAJ&hl=enhttps://sites.google.com/view/riturajson
<https://sites.google.com/view/riturajsoni/home?authuser=0>i
Dear all,
The 5th International Workshop on Computational Approaches to Historical
Language Change (https://www.changeiskey.org/event/2024-acl-lchange/,
collocated with ACL'24) is hosting a shared task on _explainable
semantic change modeling_: AXOLOTL-24.
AXOLOTL-24 stands for "Ascertain and eXplain Overhauls of the Lexicon
Over Time at LChange'24" and you are welcome to participate!
https://github.com/ltgoslo/axolotl24_shared_task will serve as the main
information hub for the shared task. Example of the datasets, processing
and evaluation scripts, etc will appear in this Github repository in due
time according to the timeline below.
If you are interested in AXOLOTL-24, please also join our Google Group:
https://groups.google.com/g/axolotl-24/
========
Timeline
========
- February 1 2024 - training data published
- March 25 2024 - test data published
- April 9 2024 - deadline for submission of the systems’ predictions
- April 10 2024 - AXOLOTL'24 test results published
- May 10 2024 - paper submission deadline (same procedure as with other
LChange'24 papers)
============
Introduction
============
This shared task builds on the existing tradition of competitions in
diachronic semantic change detection, like (Schlechtweg et al 2020) and
many others. However, this time we focus on explaining diachronic
semantic changes, even if on a very basic level (for now).
In particular, we challenge the participants to implement a semantic
change modeling system which, given two historical corpora and a sense
inventory corresponding to one of the periods, is able to:
1. Find the target word usages associated with new, gained senses
2. Describe these senses in a way that facilitates understanding and
lexicographical research.
Thus, the task is to identify which exact senses were gained between two
time periods and generate reasonable descriptions (definitions) of these
senses.
To be able to use high-quality gold data, we use a simplified setup
where instead of asking the participants to retrieve and analyze all
target word usages in raw corpora, we provide two manually checked sets
of usage examples (still of considerable size). Below, we still call
them "corpora", for clarity.
The shared task will feature data from Finnish and Russian languages,
but you do not have to speak these languages to participate. There will
also be a surprise language of lesser size at the test stage. For all
these languages, we will use gold, manually annotated data to evaluate
the predictions of the participant systems.
The shared task will consist of two subtasks. The participants are
welcome to choose one of them or both, at their will.
===============================
Subtask 1. Bridging diachronic word uses and a synchronic dictionary
===============================
The participants are offered two corpora, belonging to different time
periods. In addition to this, they are provided with a set of dictionary
entries (sense inventories) for the target words describing their senses
in the first time period (accompanied by definitions). The task is to
find all usages of the target words belonging to newly gained senses,
i.e., senses not covered by the provided sense inventory.
The assumption is that sense definitions from the dictionary, even
though not always covering all word senses even from the same time
period, may still be a useful additional source of information. The goal
is to map word usages to the dictionary senses. This is very similar to
Word Sense Disambiguation, with the difference being that the usages
corresponding to word senses absent from the dictionary should be
grouped into novel sense clusters (this is more similar to Word Sense
Induction). In a way, this subtask is a mixture of WSD and WSI.
- Inputs: a set of target words, two sets of usages for each target word
(a usage is a text fragment containing a target word); target word
dictionary entries with sense ids for the first of two time periods.
- Predictions: sense id for every word usage of the second time period
(either re-using an id from the provided dictionary or adding a novel one).
- Metrics: Adjusted Rand Index (ARI) for all usages and macro-F1 for
usages with existing senses
- Ground truth: manually annotated sense inventories
==============================
Subtask 2. Definition generation for novel word senses
==============================
This subtask challenges the participants to submit good
descriptions/definitions for the novel senses they found in subtask 1.
The definitions can be generated from scratch or retrieved from existing
ontologies: this is completely up to the participants. The organizers
will map the predicted definitions to the gold standard ones and
evaluate their quality with the standard NLG metrics.
- Inputs: Same as subtask 1
- Predictions: Same as subtask 1 plus a dictionary-like definition for
every novel sense of the target word (a sense not present in the
dictionary entry from the first time period)
- Metrics: BLEU/ROUGE and BERTScore. The final score is averaged across
target words
- Ground truth: definitions from our gold standard sense inventories
==========
Organizers
==========
- Mariia Fedorova (University of Oslo)
- Andrey Kutuzov (University of Oslo)
- Timothee Mickus (University of Helsinki)
- Niko Partanen (University of Helsinki)
- Janine Siewert (University of Helsinki)
==========
References
==========
1. Diachronic word embeddings and semantic shifts: a survey (Kutuzov et
al., COLING 2018)
2. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
(Schlechtweg et al., SemEval 2020)
3. Computational approaches to semantic change (Tahmasebi et al.,
LangSci Press 2021)
4. Semeval-2022 Task 1: CODWOE – Comparing Dictionaries and Word
Embeddings (Mickus et al., SemEval 2022)
5. Interpretable Word Sense Representations via Definition Generation:
The Case of Semantic Change Analysis (Giulianelli et al., ACL 2023)
--
Andrey
Language Technology Group (LTG)
University of Oslo