[Apologies for cross-postings]
CALL FOR PAPERS FOR THE
SECOND INTERNATIONAL WORKSHOP TOWARDS DIGITAL LANGUAGE EQUALITY (TDLE):
FOCUSING ON SUSTAINABILITY
_ _
co-located with LREC-COLING 2024, Saturday 25th May 2024, Turin (Italy)
_ _
https://european-language-equality.eu/tdle-2024/
1 DESCRIPTION AND AIMS OF THE WORKSHOP
The key aim of this half-day workshop co-located with LREC-COLING 2024
(https://lrec-coling-2024.org/), to be held in Turin (Italy) on Saturday
25th May 2024, is to discuss and promote the importance of
sustainability in the design, development, creation, use, distribution
and sharing of language data, resources, platforms, infrastructures,
tools and technologies, with the intention of achieving Digital Language
Equality (DLE). While some important work has recently addressed these
crucial areas (e.g. Fort and Couillault, 2016; Hessenthaler et al.,
2022; Ramesh et al., 2023; Castilho et al., forthcoming), the relevant
contributions seem to be as yet unsystematic and relatively isolated.
The workshop intends to provide an inclusive forum to encourage in-depth
debate and facilitate collaborations to promote the sustainability of
resources and technologies in any (combination of) languages, in support
of multilingualism and of the overarching goal of DLE.
_The sustainability of language resources and technologies is key to
enabling multilingualism and digital language equality in the age of
Artificial Intelligence._
2 TOPICS OF INTEREST
The _Second International Workshop_ _Towards Digital Language Equality
(TDLE) _focuses on sustainability in relation to the design,
development, creation, use, distribution and sharing of language data,
resources, platforms, infrastructures, tools and technologies, with a
view to promoting the broader goal of Digital Language Equality (DLE).
The concept of DLE has been firmly established in relation to all
languages of Europe (Rehm and Way, 2023), and has the potential to also
benefit other languages throughout the world, to support the prosperity
of the respective communities at a time of impressive - but as yet very
unevenly distributed and severely imbalanced - progress in
language-centric Artificial Intelligence (AI), e.g. through large
language models (LLMs). The workshop places particular emphasis on
multilingualism and on leveling up digital support for languages,
domains and applications that have so far been underserved, and wishes
to explore ways to develop policies and funding streams to work towards
sustainability in connection with DLE, especially in support of
regional, minority and territorial languages.
To this end, recognizing that the sustainability of Language Resources
and Technologies (LRTs) is key to enabling multilingualism and DLE in
the age of AI, topics of particular interest for the workshop on which
we invite original contributions covering any (combination of) languages
include, but are not limited to, the following:
* research on the factors affecting DLE and the sustainability of
LRTs;
* best practices, case studies and validated guidelines related to the
design, implementation and improvement of sustainability of written,
oral/spoken, signed and/or multimodal LRTs (including LLMs),
particularly in support of DLE;
* how multilingual LLM technology can support DLE;
* retrospectively assessing the sustainability of legacy LRTs, and
future-proofing new LRTs in the interest of DLE;
* analyzing the costs and benefits of foregrounding sustainability for
LRTs;
* the role of metadata, accompanying documentation and licenses in
showing and improving the sustainability of LRTs;
* sustainability, fairness and accessibility (e.g. for users with
physical or cognitive disabilities, limited computing resources and
connectivity) of platforms and infrastructures hosting, distributing and
sharing LRTs in the interest of DLE;
* how current data and computing access inequality is affecting DLE
(in particular regarding LLMs);
* ecological sustainability and environmental fairness of developing
and deploying state-of-the-art LRTs, e.g. LLMs with regard to energy
consumption, global warming and climate change;
* developing data and parameter efficient methods to train or adapt
language models to new languages;
* how to evaluate, measure, compare and improve the sustainability of
LRTs;
* establishing benchmarks and protocols to ensure the sustainability
of LRTs;
* how to avoid the potential dangers of developing and using _un_fair
and _un_sustainable LRTs, e.g. for malicious, ill-intentioned or harmful
purposes;
* ethical, legal, cultural and/or socio-economic implications of
(ignoring) fairness and sustainability of LRTs;
* developing and implementing forward-looking policies to promote
fairness and long-term sustainability of LRTs to achieve DLE;
* education and training needs and experiences in relation to
promoting fairness and sustainability of LRTs and ways to raise broad
awareness of DLE and related topics, e.g. among the general public,
policy- and decision-makers.
Given this wide-ranging and inclusive remit, the workshop intends to
bring together developers, creators, vendors, distributors, brokers,
users, evaluators and researchers of written, oral/spoken, signed and/or
multimodal LRTs in any (combination of) languages.
3 BACKGROUND AND FIRST TDLE WORKSHOP HELD IN 2022
The second 2024 edition of the workshop builds on the success of the
first _Towards Digital Language Equality (TDLE) workshop_,[1] that was
held at LREC 2022 in Marseille (France) on 20 June 2022, and whose
accepted papers were published in a dedicated volume of proceedings,
Aldabe et al. (2022).[2]
Following this well-received inaugural workshop held in June 2022, the
second event in the series will be co-located with LREC-COLING 2024 in
Turin (Italy) on Saturday 25th May 2024, and will focus specifically on
the highly relevant topic of the sustainability of LRTs in connection
with multilingualism and DLE.
4 SUBMISSIONS
Up-to-date information on the workshop, including materials for authors,
guidelines, templates, stylesheet and key dates can be found at the
dedicated website https://european-language-equality.eu/tdle-2024/. To
contact the organizing committee of the workshop directly, you can email
tdle2024.hitz(a)ehu.eus.
Papers submitted to the workshop should be completely anonymous for
double-blind peer review, written in English, and prepared using the
official LREC-COLING 2024 author's kit and submission
stylesheet/template available at
https://lrec-coling-2024.org/authors-kit/. The submissions to the
workshop should not exceed 8 pages, excluding references, and be saved
in unprotected PDF format. Papers should be submitted no later than 23
February 2024 through the START submission management system available
at https://softconf.com/lrec-coling2024/tdle2024/.
The workshop seeks original papers, i.e. it does not accept submissions
that have been, or will be, published elsewhere. The workshop allows
simultaneous submissions, and in these cases the authors should clearly
indicate in the manuscript to which other conference, workshop or venue
they have submitted the paper for review. Each paper submitted to the
workshop will receive three double-blind peer reviews. Papers accepted
for presentation will be included in the proceedings of the workshop.
In light of the LREC-COLING 2024 Map and the "Share your LRs!"
initiative, when submitting their papers through the START system
authors will be asked to provide essential information about resources
(in a broad sense, i.e. also technologies, standards, evaluation kits,
etc.) that have been used for the work described in the paper or are a
new result of their research. Moreover, ELRA encourages all LREC-COLING
authors to share the described LRs (data, tools, services, etc.) to
enable their reuse and replicability of experiments (including
evaluation ones).
5 KEY DATES
Paper submission deadline: 23 February 2024
Notification of acceptance: 19 March 2024
Camera-ready papers due: 8 April 2024
Half-day workshop date: Saturday, 25th May 2024
6 WORKSHOP ORGANIZERS
* Itziar Aldabe (HiTZ Basque Center for Language Technology - Ixa,
University of the Basque Country, Spain)
* Begoña Altuna (HiTZ Basque Center for Language Technology - Ixa,
University of the Basque Country, Spain)
* Aritz Farwell (HiTZ Basque Center for Language Technology - Ixa,
University of the Basque Country, Spain)
* Federico Gaspari (University of Naples "Federico II", Italy & ADAPT
Centre, Dublin City University, Ireland - co-chair)
* Joss Moorkens (School of Applied Language & Intercultural
Studies/ADAPT Centre, Dublin City University, Ireland - co-chair)
* Stelios Piperidis (Institute of Language and Speech Processing,
Athena Research and Innovation Center in Information, Communication and
Knowledge Technologies, Greece)
* Georg Rehm (Speech and Language Technology Lab, Deutsches
Forschungszentrum für Künstliche Intelligenz, Germany)
* German Rigau (HiTZ Basque Center for Language Technology - Ixa,
University of the Basque Country, Spain)
7 PROGRAM COMMITTEE
* Antonios Anastasopoulos (GMU, USA)
* Anya Belz (ADAPT, DCU, Ireland)
* Steven Bird (CDU, Australia)
* Fred Blain (Uni. Tilburg, Netherlands)
* Franco Cutugno (Uni. Naples "Federico II", Italy)
* Bessie Dendrinos (NKUA, Greece & ECSPM, Denmark)
* Félix do Carmo (Uni. Surrey, UK)
* Annika Grützner-Zahn (DFKI, Germany)
* Ana Guerberof-Arenas (Uni. Groningen, Netherlands)
* Davyth Hicks (ELEN, Belgium)
* Monja Jannet (ADAPT, DCU, Ireland)
* John Judge (ADAPT, DCU, Ireland)
* Dorothy Kenny (SALIS/CTTS/ADAPT, DCU, Ireland)
* Sabine Kirchmeier (EFNIL, Luxembourg)
* Teresa Lynn (MBZUAI, United Arab Emirates)
* Maite Melero (BSC, Spain)
* Helena Moniz (Uni. Lisbon, Portugal & EAMT)
* Johanna Monti (UniOR, Italy)
* Rachele Raus (UniBO, Italy)
* Wessel Reijers (Uni. Paderborn, Germany)
* Celia Rico Pérez (Universidad Complutense de Madrid, Spain)
* Dimitar Shterionov (TU, Netherlands)
* Carlos S. C. Teixeira (IOTA Localisation Services & Uni. Rovira i
Virgili, Spain)
* Antonio Toral ( Groningen, Netherlands)
* Vincent Vandeghinste (Instituut voor de Nederlandse Taal,
Netherlands & KU Leuven, Belgium)
REFERENCES
Itziar Aldabe, Begoña Altuna, Aritz Farwell and German Rigau, editors.
2022. _Proceedings of the Workshop Towards Digital Language Equality
(TDLE)_ [1]. European Language Resources Association, Marseille, France.
Sheila Castilho, Federico Gaspari, Joss Moorkens, Maja Popović and
Antonio Toral, editors. Forthcoming. _Journal of Specialised
Translation_ [2]. Special Issue n. 41 on "Translation Automation and
Sustainability".
Karën Fort and Alain Couillault, 2016. "Yes, We Care! Results of the
Ethics and Natural Language Processing Surveys [3]". _Proceedings of the
Tenth International Conference on Language Resources and Evaluation
(LREC'16)_ [4]. European Language Resources Association, Portorož,
Slovenia. 1593-1600.
Marius Hessenthaler, Emma Strubell, Dirk Hovy and Anne Lauscher, 2022.
"Bridging Fairness and Environmental Sustainability in Natural Language
Processing [5]". _Proceedings of the 2022 Conference on Empirical
Methods in Natural Language Processing_ [6], Abu Dhabi, United Arab
Emirates. 7817-7836.
András Kornai, 2013. "Digital Language Death [7]". _PLoS ONE_,
8(10):e77056.
Krithika Ramesh, Sunayana Sitaram and Monojit Choudhury, 2023. "Fairness
in Language Models Beyond English: Gaps and Challenges [8]". _Findings
of the Association for Computational Linguistics: EACL 2023_ [9].
Association for Computational Linguistics, Dubrovnik, Croatia.
2106-2119.
Georg Rehm and Andy Way, editors. 2023. _European Language Equality: A
Strategic Agenda for Digital Language Equality_ [10]. Berlin: Springer.
[1] https://european-language-equality.eu/tdle-2022/
[2]
www.lrec-conf.org/proceedings/lrec2022/workshops/TDLE/2022.tdle-1.0.pdf
[11]
Links:
------
[1] https://aclanthology.org/2022.tdle-1.pdf
[2] https://www.jostrans.org/
[3] https://aclanthology.org/L16-1252.pdf
[4] https://aclanthology.org/volumes/L16-1/
[5] https://aclanthology.org/2022.emnlp-main.533.pdf
[6] https://aclanthology.org/volumes/2022.emnlp-main/
[7]
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0077056
[8] https://aclanthology.org/2023.findings-eacl.157.pdf
[9] https://aclanthology.org/2023.findings-eacl.pdf
[10] https://link.springer.com/book/10.1007/978-3-031-28819-7
[11]
http://www.lrec-conf.org/proceedings/lrec2022/workshops/TDLE/2022.tdle-1.0.…
Job offer: Researcher for Multimodal Fake-News and Disinformation Detection at DFKI Berlin
The German Research Center for Artificial Intelligence (DFKI) has operated as a non-profit, Public-Private-Partnership (PPP) since 1988. DFKI combines scientific excellence and commercially-oriented value creation with social awareness and is recognized as a major "Center of Excellence" by the international scientific community. In the field of artificial intelligence, DFKI as Germany’s biggest public and independent organisation dedicated to AI research and development, has focused on the goal of human-centric AI for more than 30 years. Research is committed to essential, future-oriented areas of application and socially relevant topics.
We are looking for a highly motivated research assistant to join our existing team and work on a project focused on fake-news and disinformation detection from speech and multimedia data. Content authenticity verification of speech combined with other modalities like text, visuals or meta-data will be a center part. In any case, xAI and bias analysis are aspects of high relevance to the position as well.
The successful candidate will work closely with high-impact partners in this field, e.g. Technical University of Berlin, RBB (Berlin TV and news broadcaster), Deutsche Welle (Germany's broadcaster abroad), and 5 other partners.
Responsibilities will include developing and testing different AI/NLP models and techniques, analyzing the performance of machine learning models in the context of applicable fake-news and disinformation fighting for journalists, and communicating project progress and results to relevant stakeholders. The position offers opportunities for pursuing a doctorate and publishing research results in scientific journals and conferences.
Qualified candidates will have a completed university degree in (technical) computer science or computational linguistics, excellent programming skills in Python, and a strong background in machine learning/AI and signal processing or NLP. Previous experience in the field of fake-news or spoofing / authenticity detection of multimedia data is an advantage.
DFKI offers an agile and lively international and interdisciplinary environment for working in a self-determined manner. If you are interested in contributing to cutting-edge research and working with a dynamic team, please apply!
More details and link: https://jobs.dfki.de/en/vacancy/researcher-m-f-d-547585.html
Application deadline: Jan 23, 2024.
In terms of questions please don’t hesitate to contact tim.polzehl(a)dfki.de<mailto:tim.polzehl@dfki.de>
--
Dr.-Ing. Tim Polzehl
Senior Researcher
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)
German Research Center for Artificial Intelligence
Speech & Language Technology
Associate Senior Researcher
Technische Universität Berlin
Quality and Usability Lab
DFKI Labor Berlin
Alt-Moabit 91c, D-10559 Berlin, Germany
Tel.: +49.30.238951863
Fax: +49 30 23895 1810
E-Mail tim.polzehl(a)dfki.de<mailto:tim.polzehl@dfki.de>
-------------------------------------------------------------
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Trippstadter Straße 122, 67663 Kaiserslautern, Germany
Geschäftsführung:
Prof. Dr. Antonio Krüger (Vorsitzender)
Helmut Ditzer
Vorsitzender des Aufsichtsrats:
Dr. Ferri Abolhassan
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------
Apologies for cross-posting
------------------------------------------------------
Dear colleagues,
We invite you to submit to the special session on “Emergent Phenomena in
Deep Representations and Large Language Models” as a part of IJCNN 2024
and IEEE WCCI 2024, which will be located in Yokohama, Japan.
We are looking forward to your contributions.
Please find the CfP below.
Best wishes,
On behalf of Organising Committee
Özge Alacam
------------------------------------------------------
First Call for Papers: Special Session on Emergent Phenomena in Deep
Representations and Large Language Models @IJCNN 2024 & IEEE WCCI 2024:
Deep learning models trained on large datasets have shown spectacular
performance in a wide range of tasks demonstrated by current
applications of Large Language Models. However, recent works have shown
that the abilities large machine learning models acquire often emerge
unpredictably with increasing model complexity or training dataset size.
These emergent phenomena include the unexpected appearance of abilities
for which the model was not explicitly trained, but they might also be
related to unexpected performance boosts due to the increased model
complexity. Emergent phenomena are not always beneficial: larger models
may pick up new biases from the training data or start hallucinating.
To move towards increasingly sustainable, reliable, and explainable
applications of AI systems, it is necessary to increase the
understanding of the mechanisms surrounding emergent phenomena.
Moreover, this effort provides increased insight into the learning
process behind the acquisition of abilities of large models to perform
specific tasks. Important research questions relate to the definition of
emergent phenomena, their causes (what controls which abilities are
acquired and when?), training efficiency, and training data quality
(e.g., acquiring desired abilities with less computational effort),
prompting strategies to get or test for desired model behaviour (e.g., a
chain of thought), and further verification methods of model abilities
and properties.
The primary goal of this special session is (i) to discuss the emergent
abilities and risks in deep neural networks and representations from
very different angles and (ii) facilitate networking and encourage
collaboration between various research fields that approach this issue
from different perspectives, like computational linguistics, ethics in
AI, computer science, physics, etc.
Topics of interest include, but are not limited to:
• The definition of emergence in the context of NLP and ML
• Prompting strategies
• Physics-based/inspired analyses (e.g. phase transitions in ML
models)
• Explainability and interpretability (XAI)
• Evaluation measures for model ability, monitoring strategies,
assessment of model abilities (e.g. technical or psychology-based)
• Knowledge distillation, model pruning, energy-efficient models.
• Mitigation strategies for emergent risks and model deterioration.
• Fine-tuning and Retrieval-augmented generation (RAG)
• Papers focusing on specific emergent phenomena (reasoning,
creativity, double descent phenomena etc.)
The website for the call for papers is accessible at
https://sites.google.com/view/emergenn/call-for-papers
Organising Committee:
------------------------------
• Dr. Özge Alacam (Ludwig-Maximilian University & Uni Bielefeld,
Germany)
• Dr. Michiel Straat (Uni Bielefeld, Germany)
• Prof. Dr. Hinrich Schütze (Ludwig-Maximilian University, Germany)
• Prof. Dr. Alessandro Sperduti (University of Padova, Italy)
Important Dates:
------------------------------
• January 15, 2024 - Paper Submission Deadline
• March 15, 2024 - Notification of Acceptance
• May 1, 2024 - Camera-ready Deadline & Early
Registration Deadline
• June 30 - July 5, 2024 - Main Conference (IEEE WCCI 2024,
Yokohama, Japan)
* All deadlines are 11:59 PM UTC-12:00 ("anywhere on Earth")
Submission Format and Platform:
------------------------------
• Submissions will be through the IEEE WCCI 2024 Submission page
<https://edas.info/login.php?rurl=aHR0cHM6Ly9lZGFzLmluZm8vTjMxNjE0P2M9MzE2MT…>.
• Each paper is limited to 8 pages, including figures, tables,
and references. Please refer to the author guidelines provided by IEEE
WCCI 2024
• Please specify during the submission that your paper is
intended for the Special Session: Emergent Phenomena in Deep
Representations and Large Language Models.
• Special session webpage:
https://sites.google.com/view/emergenn/call-for-papers
• IEEE WCCI 2024 webpage: https://2024.ieeewcci.org/
Contact information:
------------------------------
• Özge Alacam : oezge.alacam(a)uni-bielefeld.de
• Michiel Straat : mstraat(a)techfak.uni-bielefeld.de
CODI, 5th Workshop on Computational Approaches to Discourse
2024-03-21 or 22 - EACL 2024 - Malta
** Direct Submission deadline: January 17th, 2024 **
Direct submission: We now open submissions for papers rejected at another main conference.
Website link: https://sites.google.com/view/codi2024
CODI considers for publication papers rejected at one of the main conferences, authors will have to submit both the paper and the reviews as a supplemantary pdf file. If modifications have been made since the original submission, please submit an additional file describing briefly the modifications made. The organizers will decide on the acceptance of the papers based on the quality of the paper and its fit with the workshop.
As a reminder, CODI also invites presentations of paper accepted at another main conference. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings.
Please submit your workshop papers (category: "direct submission") at https://softconf.com/eacl2024/CODI-2024/
DSTL (Defence Science and Technology Laboratory, part of the UK Civil Service) is advertising for a computational/corpus linguist to work in their 'Behavioural and Social Science Group'. They are looking for a linguist who understands the potential for (and limitations of) computational approaches to discourse and who would be comfortable interacting with computer / data scientists.
Details can be found at: https://www.civilservicejobs.service.gov.uk/csr/index.cgi?SID=b3duZXJ0eXBlP…
Regards
Paul Thompson
= = = = = = = = = = = = =
Dr Paul Thompson
Reader in Applied Corpus Linguistics
Co-Director, Centre for Corpus Research
Head of Department, English Language and Linguistics
University of Birmingham
Birmingham B15 2TT, UK
Editor-in-Chief, Applied Corpus Linguistics journal
= = = = = = = = = = = = =
Apologies for cross-posting!
GeoLD2024: 6th International Workshop on Geospatial Linked Data
Hersonissos, Greece, May 26-27, 2024
Conference website https://i3mainz.github.io/GeoLD2024/
Submission link https://easychair.org/conferences/?conf=geold2024
Submission deadline March 10, 2024
GeoLD2024
*6th International Workshop on Geospatial Linked Data* at ESWC 2024
<https://2024.eswc-conferences.org/>
Geospatial data is vital for both traditional applications like navigation,
logistics, and tourism and emerging areas like autonomous vehicles, smart
buildings and GIS on demand. Spatial linked data has recently transitioned
from experimental prototypes to national infrastructure. However the next
generation of spatial knowledge graphs will integrate multiple spatial
datasets with the large number of general datasets that contain some
geospatial references (e.g., DBpedia, Wikidata). This integration, either
on the public Web or within organizations has immense socio-economic as
well as academic benefits. The upsurge in Linked data related presentations
in the recent Eurogeographics data quality workshop shows the deep interest
in Geospatial Linked Data (GLD) in national mapping agencies. GLD enables a
web-based, interoperable geospatial infrastructure. This is especially
relevant for delivering the INSPIRE directive in Europe. Moreover,
geospatial information systems benefit from Linked Data principles in
building the next generation of spatial data applications e.g., federated
smart buildings, self-piloted vehicles, delivery drones or automated local
authority services.
This workshop invites papers covering the challenges and solutions for
handling with GLD, especially for building high quality, adaptable,
geospatial infrastructures and next-generation spatial applications. We aim
to demonstrate the latest approaches and implementations and to discuss the
solutions to challenges and issues arising from research and industrial
organizations.
The following topics of interest are covered by GeoLD2024.
*Interoperability and Integration*
- Geospatial Linked Data vocabularies and standards (GeoSPARQL, INSPIRE,
W3C, OGC)
- Extraction/transformation of Geospatial Linked Data from native
geospatial data sources
- Integration (schema mapping, interlinking, fusion) techniques for
Geospatial RDF Data
- Enrichment, quality and evolution of Linked Data with Geospatial
information
- Machine Learning improving Geospatial Linked Data processing
- Natural Language Processing, especially Large Language Models for
improving GLD processing
*Big Geospatial Data Management*
- Distributed solutions for Geospatial Linked Data management (storing,
querying, mapping)
- Algorithms and tools for large scale, scalable Geospatial Linked Data
management
- Efficient Indexing and Querying of Geospatial Linked Data
- Geospatial-specific Reasoning on RDF Data
- Ranking techniques on querying Geospatial RDF Data
- Advanced querying capabilities on Geospatial RDF Data
*Utilization of Geospatial Linked Data*
- Benchmarking of Geospatial Linked Data applications
- Geospatial Linked Data in social web platforms and applications
- Geospatial linked data applications for indoor navigation
- Visualization models/interfaces for browsing/authoring/querying
Geospatial Linked Data
- Real-world applications/use cases/paradigms using Geospatial Linked
Data
- Evaluation/comparison of tools/libraries/frameworks for Geospatial
Linked Data
- Data governance models for Geospatial Linked Data
Submission Guidelines
All papers must be original and not simultaneously submitted to another
journal or conference. The following paper categories are welcome:
- *Long papers (up to 12 pages)*: Presenting novel scientific research
pertaining to geospatial Linked Data.
- *Short papers (up to 6 pages)*: Position papers, System, Library, API
and Dataset descriptions, relevant to the topics of interest.
- *Demo/Tutorial papers (up to 4 pages)*: Describe a demo or hands-on
tutorial of a tool on the workshop topics
Organizing committee
- Timo Homburg (i3mainz -- Institute for Spatial Information Surveying
Technology, Mainz University Of Applied Sciences, Germany)
- Dr. Beyza Yaman (ADAPT Centre, Trinity College Dublin, Ireland)
- Dr. Mohamed Ahmed Sherif (University of Paderborn, Germany)
- Prof. Dr. Axel-Cyrille Ngonga Ngomo (University Of Paderborn, Germany)
Contact
All questions about submissions should be emailed to
Timo.Homburg(a)hs-mainz.de
STAND Workshop on Standardizing Tasks, meAsures and NLP Datasets
https://stand4nlp.github.io/
Full-day workshop in Paris, France, January 29th 2024 (+ partial hybrid)
Abstract submission deadline: January 24th 2024, but earlier submissions
are welcome
Scientific context:
The current lack of standardized practices and definitions in NLP systems
hinders the progress of the field. Indeed, there is not always consensus on
which evaluation methods are meaningful and fruitful, or which of their
implementations are to be used with which parameters (eg. SacreBLEU, Post
2018).
In some cases, there is no general agreement on the very definition of a
task.
This situation calls for work on *standardizing* NLP practices.
The International Organization for Standardization (ISO) has just created *a
dedicated working group on NLP* (as a joint effort of the AI and Language
committees), and *2 standards* are already under way. Topics under
consideration by the ISO standardization committees include NLP
terminology, evaluation metrics, interoperability, annotation guidelines,
good practices in NLP development/evaluation/corpora, documentation.
These topics are already heavily discussed in academia, and a number of
informal guidelines have already been proposed. We believe that the
creation of NLP standards can significantly benefit from the input of both NLP
academics and industry NLP practitioners.
Reciprocally, NLP researchers would benefit from getting involved in the
standardization effort, thus ensuring that academia's views are listened
to, in particular in the context of the *AI Act* (the European regulation
on AI that has been finalized in December), whose enforcement will strongly
rely on those standards.
The STAND workshop is a research initiative whose goal is:
- to foster discussion on existing standards, their creation and use
- to assess the current needs of the community for standardization
- to share experience on the impact on the research activities when
lacking good practices
- to collect existing good practices (and propose new ones)
We invite contributions from NLP practitioners from both the industry and
academia, as well as standardization experts.
We invite two types of submission:
* short abstract: 1 page
* long abstract: 3 pages
Accepted submissions will be presented as posters. Authors accepted in the
long-abstract track will be invited to submit a full paper (5-10 pages)
after the workshop.
Topics for submissions include, but are not limited to:
- Comparability and reproducibility of evaluation setup
- Annotation guidelines
- Evaluation metrics
- Good practices for building, annotating and maintaining corpora
- Good practices for system evaluation
- Interoperability
- Ethical guidelines
- Guidelines for documenting corpora and models
Submission instructions:
- Submissions are expected in PDF form by email at stand4nlp(a)inria.fr
- All submissions should be formatted using the ACL 2023 style files
https://2023.aclweb.org/calls/style_and_formatting/.
============
PROGRAM AT A GLANCE:
[09:00-10:00] Welcome, introduction to standardization, ongoing activities
in NLP standardization, and the AI Act context
[10:15-11:50] Academic keynote (*Joakim Nivre*) and invited talks (*Matt
Post*, other speaker TBC)
[11:50-13:30] Poster session (with boosters) & lunch
[13:30-14:40] Industry keynote (speaker TBC) and invited talk (*Dirk Hovy*)
[15:00-16:30] Moderator-led breakout discussions. Potential topics that
will be discussed include:
- [sharing / drafting] Standardizing good practices for evaluation
- [sharing / drafting] Standardizing good practices for corpus
management (collection, annotation, versioning)
- [sharing / drafting] Standardizing evaluation metrics (definitions,
implementation, sharing scripts)
- [sharing / drafting] Standardizing annotation schemes (formats and
guidelines)
- [debate] Explainability and ethics in NLP: what needs for standards?
- [debate] Comparing standardization needs with limitations of the
state-of-the-art: how to bridge the gap?
- [debate] Towards standardizing translations of technical terminology
in NLP: how to organize i18n?
[16:30-17:30] Reports from breakouts, definition of community-level actions
& wrap-up. Example outcomes that are envisioned include:
- Collection and drafting of existing good practices
- Preparation of a joint submission for a position paper
- Creation of common repositories for evaluation scripts, corpus
documentation
Participants to the workshop will be offered the opportunity to attend a
standardization committee's meeting, which has been scheduled for the day
after the workshop (January 30th). The outputs of that meeting will be used
in direct support of the AI Act.
Remote access will be offered for part of the workshop only. In-person
participation is recommended if possible.
Posters will be in-person only.
IMPORTANT DATES:
Abstract submission: Anytime by January 24
Notification of acceptance: Within a few days of submission
Workshop: January 29
Standardization committee meeting: January 30
ORGANISING COMMITTEE:
Lauriane Aufrant, Timothée Bernard, Maximin Coavoux, Yoann Dupont, Arnaud
Ferré, Taras Holoyad, Rania Wazir
MORE INFORMATION
For the latest information see the workshop page at
https://stand4nlp.github.io/; for any questions contact stand4nlp(a)inria.fr.
Dear all,
the department DRX of DFKI Berlin has opened two student assistant
positions for a research project related to LLM evaluation and
usability. We would be grateful if you could circulate the following
student assistant positions to students who may be interested:
* Student Assistant (m/f/d) on the topic of Frontend-Development and
Design
https://jobs.dfki.de/en/vacancy/en-student-assistant-m-f-d-frontenddevelopm…
* Student Assistant (m/f/d) on the topic of UX-Design
https://jobs.dfki.de/en/vacancy/en-student-assistant-m-f-d-545837.html
Deadline is January 31st
best
Lefteris
--
Eleftherios Avramidis, senior researcher
German Research Center for Artificial Intelligence (DFKI)
departments: Design Research eXplorations, Speech and Language Technology
short name: Lefteris, (pronouns: he/him), languages: English, German, Greek
Website:https://www.dfki.de/~elav01
Address: Alt Moabit 91c, 10559 Berlin, Germany
Tel.: +49 30 23895 1806
Sec.: +49 30 23895 1800
Fax.: +49 30 23895 1810
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
Geschäftsführung: Prof. Dr. Antonio Krüger, Helmut Ditzer
Vorsitzender des Aufsichtsrats: Dr. Ferri Abolhassan
Amtsgericht Kaiserslautern, HRB 2313
tl;dr:
-
submission deadline for research track paper via Softconf: December 18th
2023
-
submission deadline for research track submissions already reviewed via
ARR: January 17th 2024
https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SCI-CHAT_ARR_Co…
-
submission deadline for shard task systems: January 20th 2024
https://forms.gle/r7HgxZKgqdencRrHA
-
submission deadline for shard task system descriptions via SoftConf:
January 26th 2024
https://sites.google.com/view/dialogue-evaluation/
Call for Papers
The aim of this workshop is to bring together experts working in the area
of open-domain dialogue. In this speedily advancing research area many
challenges still exist, such as learning information from conversations,
engaging in realistic and convincing simulation of human intelligence,
reasoning, and so on.
SCI-CHAT follows previous workshops on open domain dialogue, but with a
focus on the simulation of intelligent conversation, including the ability
to follow a challenging topic over a multi-turn conversation, the ability
to posit questions, refuting and reasoning with live human evaluation
employed as the primary mechanism for evaluating models. The workshop will
include a research track and shared task:
SCI-CHAT's research track aims to explore recent advances and challenges in
open-domain dialogue research. Researchers working on all aspects of
open-domain dialogue are invited to submit papers on recent advances,
resources, tools, analysis, evaluation, and challenges on the broad theme
of open-domain dialogues.
The topics of the workshop include but are not limited to the following:
-
Intelligent conversation, chit-chat, open-domain dialogue;
-
Automatic and human evaluation of open-domain dialogue;
-
Limitations, risks and safety in open-domain dialogue;
-
Instruction-tuned and instruction-enabled models;
-
Any other topic of interest to the dialogue community.
SCI-CHAT's shared task will focus on simulating intelligent conversations;
participants will be asked to submit (access to the APIs of) automated
dialogue agents with the aim of carrying out nuanced conversations over
multiple dialogue turns. Participating systems will be interactively
evaluated in a live human evaluation. All data acquired within the context
of the shared task will be made public, providing an important resource for
improving metrics and systems in this research area.
Submission guidelines:
Authors are invited to submit their unpublished work that represents novel
research through either direct submission or ARR commitment. Papers should
consist of up to 8 pages of content, plus unlimited pages for references
and appendix. Authors should make use of the EACL Latex Template
<https://2023.eacl.org/calls/styles/> alongside supplementary materials,
including technical appendices, links to source code, datasets, and
multimedia appendices.
Papers can also be submitted as non-archival, so that their content can be
reused for other venues by adding "(NON-ARCHIVAL)" to the title of their
submission. Previously published work can also be submitted as non-archival
in the same way, with the additional requirement to state such on the first
page.
-
Direct paper submissions must be submitted through SoftCon submission
link: https://softconf.com/eacl2024/SCI-CHAT-2024/
<https://softconf.com/eacl2024/SCI-CHAT-2024/>
Multiple submissions of the same paper to more EACL workshops are forbidden.
All papers will be double-blind peer-reviewed, by at least 2 program
committee members. As such, all submissions, including the main paper and
its supplementary materials, should be fully anonymized. For more
information on formatting and anonymity guidelines, please refer to EACL
guidelines <https://eacl.org/index.html>.
Organizers
-
Yvette Graham (Trinity College Dublin, Ireland)
-
Qun Liu (Huawei Noah's Ark Lab, China)
-
Gerasimos Lampouras (Huawei Noah's Ark Lab,UK)
-
Ignacio Iacobacci (Huawei Noah's Ark Lab, UK)
-
Sinead Madden (Trinity College Dublin, Ireland)
-
Haider Khalid (Trinity College Dublin, Ireland)
-
Rameez Qureshi (Trinity College Dublin, Ireland)
Important Dates
Regarding Research Track:
-
Research paper via Softconf: December 18th 2023
-
Pre-reviewed ARR commitment deadline: January 17th 2024
-
Notification of research paper acceptance: January 20th, 2024
-
Camera-ready papers due: January 30th 2024
Regarding Shared Task:
-
Release of training and development data: November 9th 2023
-
Release of baseline systems: November 9th 2023
-
Preliminary System submission deadline: January 13th 2024 (optional - if
you want help testing your API, please submit early)
-
System submission (API) deadline: January 20th 2024
-
System description paper via SoftConf: January 26th 2024
-
Camera-ready papers due: January 30th 2024
Overview of results at one-day workshop: March 21 or 22, 2024
CONTACT: sci-chat(a)adaptcentre.ie
Dear corpora-list members,
We are announcing the first SemEval shared task on Semantic Textual
Relatedness (STR): A shared task on automatically detecting the degree of
semantic relatedness (closeness in meaning) between pairs of sentences.
The semantic relatedness of two language units has long been considered
fundamental to understanding meaning (Halliday and Hasan, 1976; Miller and
Charles, 1991), and automatically determining relatedness has many
applications such as evaluating sentence representation methods, question
answering, and summarization.
Two sentences are considered semantically similar when they have a
paraphrasal or entailment relation. On the other hand, relatedness is a
much broader concept that accounts for all the commonalities between two
sentences: whether they are on the same topic, express the same view,
originate from the same time period, one elaborates on (or follows from)
the other, etc. For instance, for the following sentence pairs:
-
Pair 1: a. There was a lemon tree next to the house. b. The boy enjoyed
reading under the lemon tree.
-
Pair 2: a. There was a lemon tree next to the house. b. The boy was an
excellent football player.
Most people will agree that the sentences in pair 1 are more related than
the sentences in pair 2.
In this task, new textual datasets will be provided for Afrikaans
<https://en.wikipedia.org/wiki/Afrikaans>, Algerian Arabic
<https://en.wikipedia.org/wiki/Algerian_Arabic>, Amharic
<https://en.wikipedia.org/wiki/Amharic>, English, Hausa
<https://en.wikipedia.org/wiki/Hausa_language>, Hindi
<https://en.wikipedia.org/wiki/Hindi>, Indonesian
<https://en.wikipedia.org/wiki/Indonesian_language>, Kinyarwanda
<https://en.wikipedia.org/wiki/Kinyarwanda>, Marathi
<https://en.wikipedia.org/wiki/Marathi_language>, Moroccan Arabic
<https://en.wikipedia.org/wiki/Moroccan_Arabic>, Modern Standard Arabic
<https://en.wikipedia.org/wiki/Modern_Standard_Arabic>, Punjabi
<https://en.wikipedia.org/wiki/Punjabi_language>, Spanish
<https://en.wikipedia.org/wiki/Spanish_language>, and Telugu
<https://en.wikipedia.org/wiki/Telugu_language>.
Data
Each instance in the training, development, and test sets is a sentence
pair. The instance is labeled with a score representing the degree of
semantic textual relatedness between the two sentences. The scores can
range from 0 (maximally unrelated) to 1 (maximally related). These gold
label scores have been determined through manual annotation. Specifically,
a comparative annotation approach was used to avoid known limitations of
traditional rating scale annotation methods This comparative annotation
process (which avoids several biases of traditional rating scales) led to a
high reliability of the final relatedness rankings.
Further details about the task, the method of data annotation, how STR is
different from semantic textual similarity, applications of semantic
textual relatedness, etc. can be found in this paper:
https://aclanthology.org/2023.eacl-main.55.pdf
Tracks
Each team can provide submissions for one, two or all of the tracks shown
below:
Track A: Supervised
Participants are to submit systems that have been trained using the labeled
training datasets provided. Participating teams are allowed to use any
publicly available datasets (e.g., other relatedness and similarity
datasets or datasets in any other languages). However, they must report
additional data they used, and ideally report how impactful each resource
was on the final results.
Track B: Unsupervised
Participants are to submit systems that have been developed without the use
of any labeled datasets pertaining to semantic relatedness or semantic
similarity between units of text more than two words long in any language.
The use of unigram or bigram relatedness datasets (from any language) is
permitted.
Track C: Cross-lingual
Participants are to submit systems that have been developed without the use
of any labeled semantic similarity or semantic relatedness datasets in the
target language and with the use of labeled dataset(s) from at least one
other language. Note: Using labeled data from another track is mandatory
for submission to this track.
Deciding which track a submission should go to:
-
If a submission uses labeled data in the target language: submit to
Track A
-
If a submission does not use labeled data in the target language but
uses labeled data from another language: submit to Track C
-
If a submission does not use labeled data in any language: submit to
Track B
** Here ‘labeled data’ refers to labeled datasets pertaining to semantic
relatedness or semantic similarity between units of text more than two
words long.
Evaluation
The official evaluation metric for this task is the Spearman rank
correlation coefficient, which captures how well the system-predicted
rankings of test instances align with human judgments. You can find the
evaluation script for this shared task on our Github page
<https://github.com/semantic-textual-relatedness/Semantic_Relatedness_SemEva…>
.
Helpful Links
-
Competition Website: https://codalab.lisn.upsaclay.fr/competitions/15704
-
Task Website: <https://afrisenti-semeval.github.io/>
https://semantic-textual-relatedness.github.io
-
Twitter/X: <https://twitter.com/AfriSenti2023>
https://twitter.com/SemRel2024
-
Contact organisers semrel-semeval-organisers(a)googlegroups.com
-
Google group for participants semrel
-semeval-participants(a)googlegroups.com
Important Dates
-
Training data ready: 11 September 2023
-
Evaluation Starts: *20 January 2024*
-
Evaluation End: 31 January 2024
-
System Description Paper Due: 19 February 2024
- Notification of acceptance: 01 April 2024
-
Camera-ready Due: 22 April 2024
- SemEval workshop: 16-21 June (co-located with NAACL 2024)
NB. We will organise a QA mentorship tomorrow (January 16th 2024 from 4 to
5 pm GMT) and a system description writing tutorial in February for all
participants, especially students and junior researchers. The zoom links
will be shared by email and on Slack.
References
-
Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko. 2019. Big BiRD: A
Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic
Composition. Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language
Technologies.
-
M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. London:
Longman.
-
George A Miller and Walter G Charles. 1991. Contextual Correlates of
Semantic Similarity. Language and Cognitive Processes, 6(1):1–28
-
Mohamed Abdalla, Krishnapriya Vishnubhotla, and Saif Mohammad. 2023.
What Makes Sentences Semantically Related? A Textual Relatedness Dataset
and Empirical Study. In Proceedings of the 17th Conference of the European
Chapter of the Association for Computational Linguistics, pages 782–796,
Dubrovnik, Croatia. Association for Computational Linguistics.
Task Organizers
Nedjma Ousidhoum
Shamsuddeen Hassan Muhammad
Mohamed Abdalla
Krishnapriya Vishnubhotla
Vladimir Araujo
Meriem Beloucif
Idris Abdulmumin
Seid Muhie Yimam
Nirmal Surange
Christine De Kock
Sanchit Ahuja
Oumaima Hourrane
Manish Shrivastava
Alham Fikri Aji
Thamar Solorio
Saif M. Mohammad