The 2nd Workshop on Practical LLM-assisted Data-to-Text Generation
(Practical D2T 2024)
While large language models (LLMs) offer to become a viable alternative to
traditional rule-based data-to-text (D2T) natural language generation
(NLG), they still suffer from well-known neural model issues, such as lack
of controllability and risk of producing harmful text. There are many
potential solutions to this problem up for discussion.
The Practical D2T workshop at INLG 2024 aims to build a space for
researchers to discuss and present innovative work on D2T systems using
LLMs. Building upon the 2023 edition’s hackathon, Practical D2T 2024 opens
up a broader range of activities, including a special track for
neuro-symbolic D2T approaches and a shared task in D2T evaluation focused
on semantic accuracy.
Website: https://practicald2t.github.io/
Practical D2T 2023 at INLG 2023: https://practicald2t.github.io/2023/
Workshop Topic and Content
Practical D2T 2024 will be a full-day in-person-only event. We welcome
contributions from both original unpublished work and non-archival
submissions, in the form of long (8 pages) or short (4 pages) papers, on
topics including but not limited to:
- Design, implementation and evaluation of LLM-assisted D2T systems
- Cross-domain adaption of LLMs for D2T
- User perceptions and acceptance of LLM-generated text in D2T
- Bias, fairness and red-teaming issues in LLM-assisted D2T systems
- Leveraging LLMs for D2T in low-resource languages and domains
- Error analysis and debugging techniques for LLM-assisted D2T
- Human-in-the-loop approaches for improving LLM-assisted D2T
- Comparison between LLM-assisted D2T and traditional symbolic approaches
Special Track: Neuro-Symbolic D2T
Research is currently seeing a renewed interest in developing systems
combining neural and symbolic approaches to improve explainability and
reduce dependence on training data. Practical D2T 2024 will feature a
special track on neuro-symbolic approaches to D2T. Submissions for papers
in the special track follow the same requirements and procedure as the main
workshop submissions.
Shared task: Improving Semantic Accuracy in LLM-assisted D2T
This year will feature a shared task on improving semantic accuracy of D2T
systems. Participants will build an LLM-assisted D2T system to generate
textual reports from various domains, such as weather forecasting, product
descriptions or sports reports. We will provide testing data obtained from
public APIs, to limit potential previous exposure to the used LLMs.
We encourage participants to focus on system robustness and objective
evaluation, rather than metrics scores. Because of this, participants will
receive an initial evaluation script, that they are encouraged to
change/improve. All submitted system’s outputs will be evaluated against
every submitted custom evaluation, and correlated with human ratings.
The system reaching the highest correlation with humans will be declared
winner of the competition. Results and participants’ system descriptions
will be featured in the workshop proceedings.
For more info, visit the workshop website:
https://practicald2t.github.io/pages/cfp
Important dates
Note: all deadlines are 23:59 UTC-12.
-
Evaluation script and data release for known domains (shared task) 24
June
-
Regular paper submission (main & special track, archival &
non-archival): 22 July
-
Known domains system output submission & surprise domain data release:
29 July
-
Surprise domain system outputs submission: 5 August
-
System description submission (shared task): 12 August
-
Notification of acceptance (main, special track and shared task): 19
August
-
Camera-ready (main, special track and shared task): 28 August
-
Workshop: 23/24 September (to be announced)
Contacts and more info:
Find detailed information about submission, deadlines and contacts on the
official Practical D2T 2024 website: https://practicald2t.github.io/
For any query, contact the organiser at d2t2024(a)googlegroups.com
If you have any problem with the above mail group, contact
balloccu(a)ufal.mff.cuni.cz
Organisers
Simone Balloccu, Ondřej Dušek, Patrícia Schmidtová, Zdeněk Kasner, Kristýna
Onderková, Ondřej Plátek, Mateusz Lango, Ondřej Dušek - Charles University
(CZ)
Ehud Reiter - University of Aberdeen (UK)
Lucie Flek - University of Bonn (DE)
Simon Mille - ADAPT Centre (UK)
Dimitra Gkatzia - Edinburgh Napier University (UK)
*Call for Papers: *The First Workshop on Natural Language Argument-Based
Explanations (ArgNLE - https://argnle.github.io/ECAI-ArgNLE/)
Co-located with ECAI 2024 (https://www.ecai2024.eu/). Universidad de
Santiago de Compostela, Spain.
*Workshop description*
Explainability and Computational Argumentation have usually been
approached as separate, independent research topics, which neglects many
aspects arising from considering the interdependencies between them. To
be effective for human users, explanations are required to be formulated
in natural language, possibly in an argumentative fashion. A workshop on
exploring Natural language Argument-based Explanations is proposed to
investigate this challenging topic, at the crossroad of these different
research fields. Providing high quality explanations for AI predictions
based on machine learning is a challenging and complex task. To work
well it requires, among other factors: selecting a proper level of
generality/specificity of the explanation; considering assumptions about
the familiarity of the explanation beneficiary with the AI task under
consideration; referring to specific elements that have contributed to
the decision; making use of additional knowledge (e.g., metadata) which
might not be part of the prediction process; selecting appropriate
examples; providing evidence supporting negative hypothesis. Finally,
the system needs to formulate the explanation in a clearly
interpretable, and possibly convincing, way.
Given these considerations, the workshop welcomes contributions showing
an integrated vision of Explainable AI (XAI), where low level
characteristics of the deep learning process are combined with higher
level schemas proper of the human argumentation capacity. These
integrated vision relies on three main considerations: i) In neural
architectures the correlation between internal states of the network and
the justification of the network classification outcome is not well
studied; ii) High quality explanations are crucially based on
argumentation mechanisms (e.g., provide supporting examples and rejected
alternatives); iii) In real settings, providing explanations is
inherently an interactive process involving the system and the user.
Accordingly, the workshop calls for cross-disciplinary contributions in
three areas, i.e., deep learning, argumentation and interactivity, to
support a broader and innovative view of explainable AI. More precisely,
the workshop is intended to discuss research challenges that will allow
to advance the state of the art in explainable AI. Providing
explanations to support a certain conclusion has been largely studied in
logic, as a fundamental characteristic of human reasoning. As a result,
both theoretical and computational models of human argumentation are
investigated. The recent resurgence of AI highlighted the idea that low
level system behaviors not only need to be interpretable (e.g., showing
those elements that most contributed to the system decision), but also
need to fit high level human schemas to produce convincing arguments.
**
*Topics of interest*
* Natural language argument-based explanations
* Dialectical, dialogical and conversational explanations
* AI methods to support argumentative explainability
* User-acceptance and evaluation of argumentation-based explanations
* Tools that provide argumentation-based explanations
* Use of argument-based explanations for research from the social
sciences, digital humanities, and related fields
* Real-world applications
The workshop solicits the submission of three types of contributions
relevant to the workshop topics and suitable to generate discussion:
* Original, unpublished contributions
* Dataset related submissions (presenting a dataset or a corpus
related to the workshop topics, that has been or is currently under
development. These papers may have already been published in another
venue).
* Projects related submissions (presenting funded projects or lines of
work within the topics of the workshop, both academic and industrial).
*Invited speaker*
Professor Francesca Toni, Faculty of Engineering, Department of
Computing, Imperial College London, UK.
(https://www.imperial.ac.uk/people/f.toni)
*Important Dates
*
* Paper submission: 31 May 2024
* Notification of acceptance: 1 July 2024
* Camera-ready papers: 31 July 2024
* ArgNLE workshop: 19 or 20 October 2024
*Submission Instructions
*Papers must be written in English, be prepared for double-blind review
using the ECAI LaTeX template, and not exceed 7 pages (not including
references). The ECAI LaTeX Template can be found at
https://ecai2024.eu/download/ecai-template.zip. Papers should be
submitted via EasyChair: https://easychair.org/conferences/?conf=argnle2024
*Workshop Organizers:*
* Rodrigo Agerri <https://ragerri.github.io/> - HiTZ Center - Ixa,
University of the Basque Country UPV/EHU, Spain
* Elena Cabrio <https://www-sop.inria.fr/members/Elena.Cabrio/> -
Université Côte d’Azur, Inria, CNRS, I3S, France
* Serena Villata <https://webusers.i3s.unice.fr/~villata/Home.html> -
Université Côte d’Azur, Inria, CNRS, I3S, France
* Marcin Lewinski <https://ifilnova.pt/en/people/marcin-lewinski/> -
IFILNOVA, Universidade Nova de Lisboa, Portugal
* Bernardo Magnini <http://hlt.fbk.eu/people/magnini> - Fondazione
Bruno Kessler, Italy
* Marie-Francine Moens <https://people.cs.kuleuven.be/~sien.moens/> -
KU Leuven, Belgium
The Institute of Artificial Intelligence invites applications for the position of a
DOCTORAL OR POSTDOCTORAL RESEARCHER (M/F/D)
ON THE TOPIC OF NATURAL LANGUAGE PROCESSING (NLP) FOR SOCIAL GOOD
(SALARY SCALE 13 TV-L, 100%)
starting in September 2024 or soon afterwards. The position is limited to a period of three years with the possibility of extension.
TASKS
The goal of the offered position is to carry out innovative research on NLP, aiming for scientific publications at reputed international venues. The research should involve LARGE LANGUAGE MODELS (LLMs) related to NLP FOR SOCIAL GOOD. We support the development of own research directions in this broad context.
The position also comes with a teaching duty of four hours per week; the candidate is expected to lead tutorials and/or programming labs as well as to support the supervision of bachelor's and master’s students.
We are looking for highly motivated candidates with a passion for creativity and learning who seek to make a positive impact through open and independent research in a young team.
YOUR PROFILE
- Completed academic degree (Master or comparable) in computer science, computational linguistics, artificial intelligence, or related disciplines
- Solid understanding of machine learning with hands-on experience, ideally in the context of NLP and LLMs
- Proficient programming skills in Python
- Good scientific writing skills (for example, shown by a very good master’s thesis) are expected
- Strong communication skills in English, both in oral and in written form
TEAM
The position will be placed in the NLP Group at the Institute of Artificial Intelligence. We are a diverse and international team, studying how humans express their views and intentions in language, and how LLMs can understand and create such language in a fair, trustworthy, and explainable way.
Our research tackles interdisciplinary questions from the humanities and social sciences, while building on state-of-the-art NLP techniques, such as instruction fine-tuning and contrastive learning. We seek to do cutting-edge research on artificial intelligence methods that have a positive impact on society and the world.
OUR OFFER
- Creative and innovative work in a diverse and international team
- Possibility to obtain a Ph.D. degree or to shape your Postdoc profile
- State-of-the-art research facilities, including top-notch computing clusters
- Participation in international scientific events and research collaborations
- Salary at the level of 100% of salary scale 13 according to the Collective Agreement for the Public Service of the Länder (TV-L)
D&I
Leibniz University Hannover considers itself a family-friendly university and therefore promotes a balance between work and family responsibilities. Part-time employment can be arranged upon request.
The university aims to promote equality between women and men. For this purpose, the university strives to reduce under-representation in areas where a certain gender is under-represented. Women are under-represented in the salary scale of the advertised position. Therefore, qualified women are encouraged to apply. Moreover, we welcome applications from qualified men. Preference will be given to equally-qualified applicants with disabilities.
QUESTIONS
In case you have questions, please contact Maja Stahl (email: m.stahl(a)ai.uni-hannover.de). Further information about the NLP Group can be found at: https://www.ai.uni-hannover.de/en/institute/research-groups/nlp
For information on the salary scales, see: https://oeffentlicher-dienst.info/c/t/rechner/tv-l/west?id=tv-l-2023&matrix…
APPLICATION
Please submit your application with supporting documents (including CV, full set of transcripts, a brief statement of at most 1 page of why you apply to the NLP Group, and possibly further qualifications) by June 23, 2024 as A SINGLE PDF FILE to
Email: office(a)ai.uni-hannover.de (subject: “[ai-nlp] Application”)
or alternatively by post to:
Gottfried Wilhelm Leibniz Universität Hannover
Institute of Artificial Intelligence
Prof. Dr. Henning Wachsmuth
Welfengarten 1, 30167 Hannover
Germany
http://www.uni-hannover.de/jobs
Information on the collection of personal data according to article 13 GDPR can be found at https://www.uni-hannover.de/en/datenschutzhinweis-bewerbungen/.
9th Symposium on Corpus Approaches to Lexicogrammar (LxGr2024)
5-6 July 2024. Online. Attendance is free.
Symposium programme: https://sites.edgehill.ac.uk/lxgr/lxgr2023
Registration is now open:
https://store.edgehill.ac.uk/conferences-and-events/conferences/conferences…
If you have any questions, or if you want to be added to the LxGr mailing list, contact: lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>.
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
KGLLM 2024 : Special session on Knowledge Graphs and Large Language Models
Oct 19, 2024 - Oct 19, 2024
Trento, Italy
Link: https://www.icnlsp.org/2024welcome/#special_session
The Special session on Knowledge Graphs and Large Language Models will be
held within the 7th International Conference on Natural Language and Speech
Processing (ICNLSP 2024 <https://www.icnlsp.org/2024welcome/>) on October
19, 2024.
** DESCRIPTION **
“In recent years, the fields of Knowledge Graphs (KGs) and Large Language
Models (LLMs) have witnessed remarkable advancements, revolutionizing the
landscape of artificial intelligence and natural language processing. KGs,
structured representations of knowledge, and LLMs, powerful language models
trained on vast amounts of text data, have individually demonstrated their
prowess in various applications.
However, the integration and synergy between KGs and LLMs have emerged as a
new frontier, offering unprecedented opportunities for enhancing knowledge
representation, understanding, and generation. This integration not only
enriches the semantic understanding of textual data but also empowers AI
systems with the ability to reason, infer, and generate contextually
relevant responses.
** TOPICS **
This special session aims to delve into the theoretical foundations,
historical perspectives, and practical applications of the fusion between
Knowledge Graphs and Large Language Models. We invite contributions that
explore the following areas:
1- Theoretical Frameworks: Papers elucidating the theoretical underpinnings
of
integrating KGs and LLMs, including methodologies, algorithms, and models
for
knowledge-enhanced language understanding and generation.
2- Historical Perspectives: Insights into the evolution of KGs and LLMs,
tracing their
development trajectories, seminal works, and transformative milestones
leading to
their integration.
3- Design and Implementation: Research articles focusing on the design
principles,
architectures, and techniques for effectively combining KGs and LLMs to
facilitate
tasks such as information retrieval, question answering, knowledge
inference, and
natural language understanding.
4- Explanatory Capabilities: Explorations into how the fusion of KGs and
LLMs enables
the development of explainable AI systems, providing transparent and
interpretable
insights into model decisions and outputs.
5- Human-Centered Intelligent Systems: Studies examining the design and
deployment of
interactive AI systems that leverage KGs and LLMs to facilitate seamless
human-
computer interaction, catering not only to experts but also to a broader lay
audience.
We encourage submissions that contribute to advancing our understanding of
the synergistic relationship between Knowledge Graphs and Large Language
Models, fostering interdisciplinary collaborations across computer science,
artificial intelligence, linguistics, cognitive science, and beyond. By
shedding light on this burgeoning area of research, this special session
aims to propel the field forward and inspire future innovations in
AI-driven knowledge representation and natural language processing.”
** SESSION ORGANIZERS **
Gérard Chollet, CNRS-SAMOVAR Institut Polytechnique de Paris, France.
Hugues Sansen, Institut Polytechnique de Paris, France.
** IMPORTANT DEADLINES **
Submission deadline: 30 June 2024 11:59 PM (GMT)
Notification of acceptance: 15 September 2024
Camera-ready paper due: 25 September 2024
** PUBLICATION **
The accepted papers will be included in the ICNLSP Conference proceedings
which will be published in ACL anthology. The extended versions will be
published in a special issue of the Machine Learning and Knowledge
Extraction Journal (MAKE), indexed in Web of Science, Scopus, etc.
** CONTACT **
icnlsp(at)gmail(dot)com
Title: Structural Biases for Compositional Semantic Prediction
# Scientific context
Compositionality is a foundational hypothesis in formal semantics and
states that the semantic interpretation of an utterance is a function of
its parts and how they are combined (i.e. their syntactic structure). In
NLP, the current dominant paradigm is to design end-to-end models with no
intermediate linguistically interpretable representations, which is often
motivated by the fact that pretrained language models implicitly encode
latent syntactical representations. However, recent studies suggest that
the syntactic information learned by language models are insufficient and
that, in their current form, they are unable to exploit the syntactic
information provided in their input when they need to generate a structured
output.
Strikingly, most systems that obtained decent results on compositional
generalization benchmarks either (i) include some data augmentation methods
that increase the exposure of the model to diverse syntactic structures at
training time, or (ii) resort to a natural language parser and hand-crafted
rules to derive the semantic representation from the syntactic tree. These
two approaches are efficient, but they still have limitations that need to
be addressed. Firstly, data augmentation bypasses the issue altogether, is
tied to a particular dataset or task and requires additional computation,
both for generating new data and for re-training or fine-tuning models.
Secondly, approach (ii) leaves the seq2seq framework for a more
conceptually complex framework, and often uses architectures that are tied
to specific data or tasks. In contrast, we believe that with proper
built-in inductive biases, a seq2seq model might provide a simple, yet
effective solution to the structural compositionality issue.
# PhD Proposal
The goal of this PhD wil be to explore inductive biases related to
linguistic structures, in an attempt to build small NLP models with
compositional skills, i.e. models with built-in knowledge making them able
to infer generalization rules from few data points. Research directions
will be defined together with the successful applicant (who is encouraged
to bring their own ideas!) and may include:
- Learning invariant language representations. A risk of learning from
little data or rare phenomena is that a model may rely on spurious
correlations and be unable to generalize outside a specific context.
Developing representations that are invariant to noise has been proposed as
a way of improving generalization (Peyrard et al 2022). We propose to
formalize invariants related to syntactic and semantic structures and
explore ways to integrate them during the training phase.
- Syntactically constrained decoders. Unlike parsers, Seq2seq models are
unable to generate structures unseen at train time. We propose to explore
the use of structural constraints to guide decoding in seq2seq models.
# Important information:
- Starting date: between September and December 2024 (duration 3 years)
- Place of work: Laboratoire d’Informatique de Grenoble, CNRS, Grenoble,
France
- Funding: ANR project ''COMPO: Inductive Biases for
Compositionality-capable Deep Learning Models of Natural Language'’
(2024-2028)
- Partners: Université Paris Cité, Université Aix-Marseille, Université
Grenoble Alpes
- The PhD will be supervised by Éric Gaussier and Maximin Coavoux, and in
close collaborations with other partners from the COMPO consortium, the
PhD candidate will be part of 2 teams of the LIG: GETALP and APTIKAL.
- Salary: ~2300€ gross/month
- Profile: Master’s degree in NLP, computer science, experience in NLP and
machine learning
To apply, please send cv, cover letter and most recent academic
transcripts to eric.gaussier(a)univ-grenoble-alpes.fr and
maximin.coavoux(a)univ-grenoble-alpes.fr
References:
SLOG: A Structural Generalization Benchmark for Semantic Parsing
Bingzhi Li, Lucia Donatelli, Alexander Koller, Tal Linzen, Yuekun Yao,
Najoung Kim
<https://aclanthology.org/2023.emnlp-main.194/>
Structural generalization is hard for sequence-to-sequence models
Yuekun Yao, Alexander Koller
<https://aclanthology.org/2022.emnlp-main.337/>
Compositional Generalization Requires Compositional Parsers
Pia Weißenhorn, Yuekun Yao, Lucia Donatelli, Alexander Koller
<https://arxiv.org/abs/2202.11937>
Invariant Language Modeling
Maxime Peyrard, Sarvjeet Ghotra, Martin Josifoski, Vidhan Agarwal, Barun
Patra, Dean Carignan, Emre Kiciman, Saurabh Tiwary, Robert West
<https://aclanthology.org/2022.emnlp-main.387/>
We are delighted to invite you to ICNLSP 2024
<https://www.icnlsp.org/2024welcome/>, the 7th edition of the International
Conference on Natural Language and Speech Processing, which will be held at
University of Trento from October 19th to 20th, 2024 (*HYBRID*).
*Topics*
- Signal processing, acoustic modeling.
- Speech recognition (Architecture, search methods, lexical modeling,
language modeling, language model adaptation, multimodal systems,
applications in education and learning, zero-resource speech recognition,
etc.).
- Speech Analysis.
- Paralinguistics in Speech and Language (Perception of paralinguistic
phenomena, analysis of speaker states and traits, etc.).
- Spoken Dialog Systems and Conversational Analysis
- Speech Translation.
- Speech synthesis.
- Speaker verification and identification.
- Language identification
- Speech coding.
- Speech enhancement
- Speech intelligibility
- Speech Perception
- Speech Production
- Brain studies on speech
- Phonetics, phonology and prosody.
- Speech and hearing disorders.
- Paralinguistics of pathological speech and language.
- Speech technology for disordered speech/hairing.
- Cognition and natural language processing.
- Machine translation.
- Text categorization.
- Summarization.
- Sentiment analysis and opinion mining.
- Computational Social Web.
- Arabic dialects processing.
- Under-resourced languages: tools and corpora.
- Large language models.
- Arabic OCR.
- NLP tools for software requirements and engineering.
- Knowledge fundamentals.
- Knowledge management systems.
- Information extraction.
- Data mining and information retrieval.
- Lexical semantics and knowledge representation.
- Requirements engineering and NLP.
- NLP for Arabic heritage documents.
*Submission*
Papers must be submitted via the link:
https://cmt3.research.microsoft.com/ICNLSP2024/
<https://cmt3.research.microsoft.com/ICNLSP2024/>
Each submitted paper will be reviewed by three program committee members.The
reviewing process is double-blind. Authors can use the *ACL format*: *Latex
<https://www.icnlsp.org/ACL%202023%20Proceedings%20Template.zip>*or Word.
Authors have the choice to submit their papers as a full or short
paper. Long papers consist of up to 8 pages of content + references. Short
papers, up to 4 pages of content + references.
*Important dates*
*Submission deadline:* *30 June 2024 11:59 PM (GMT*)
*Notification of acceptance:* 15 September 2024
*Camera-ready paper due:* 25 September 2024
*Conference dates:* 19, 20 October 2024
*Publication*
*1- All accepted papers will be published in **ACL Anthology
<https://aclanthology.org/>**.*
*2- Selected papers will be published (after extension) in:*
* 2-a-* A *SPECIAL ISSUE*
<https://www.mdpi.com/journal/make/special_issues/POB4VNE0QP> of Machine
Learning and Knowledge Extraction Journal
<https://www.mdpi.com/journal/make> (MAKE), indexed in *Web of Science
<https://mjl.clarivate.com/search-results>*, *Scopus*
<https://www.scopus.com/sources.uri>, etc.
*Special issue title*:
<https://www.mdpi.com/journal/make/special_issues/POB4VNE0QP>
<https://www.mdpi.com/journal/make/special_issues/POB4VNE0QP>*Knowledge
Graphs and Large Language Models.
<https://www.mdpi.com/journal/make/special_issues/POB4VNE0QP>*
* 2-b-* Signals and Communication Technology (Springer), indexed in
*Scopus* <https://www.scopus.com/> and *zbMATH* <https://zbmath.org/>.
Dear all,
we are happy to invite you to participate in the Shared Task on Quality Estimation at WMT'24.
The details of the task can be found at: https://www2.statmt.org/wmt24/qe-task.html
New this year:
* We introduce a new language pair (zero-shot): English-Spanish
* Continuing from the previous edition, we will also analyse the robustness of submitted QE systems to a set of different phenomena which will span from hallucinations and biases to localized errors, which can significantly impact real-world applications.
* We also introduce a new task, seeking not only to detect but also to correct errors: Quality-aware Automatic Post-Editing! We invite participants to submit systems capable of automatically generating QE predictions for machine-translated text and the corresponding output corrections.
2024 QE Tasks:
Task 1 -- Sentence-level quality estimation
This task follows the same format as last year but with fresh test sets and a new language pair: English-Spanish. We will test the following language pairs:
* English to German (MQM)
* English to Spanish (MQM)
* English to Hindi (MQM & DA)
* English to Gujarati (DA)
* English to Telugu (DA)
* English to Tamil (DA)
More details: https://www2.statmt.org/wmt24/qe-subtask1.html
Task 2 -- Fine-grained error span detection
Sequence labelling task: predict the error spans in each translation and the associated error severity: Major or Minor.
We will test the following language pairs:
* English to German (MQM)
* English to Spanish (MQM)
* English to Hindi (MQM)
More details: https://www2.statmt.org/wmt24/qe-subtask2.html
Task 3 -- Quality-aware Automatic Post-editing
We expect submissions of post edits correcting detected error spans of the original translation. Although the task is focused on quality-informed APE, we also allow participants to submit APE output without QE predictions to understand the impact of their QE system. Submissions w/o QE predictions will also be considered official.
We will test the following language pairs:
* English to Hindi
* English to Tamil
More details: https://www2.statmt.org/wmt24/qe-subtask3.html
Important dates:
1. Test sets will be released on July 15th.
2. Participants can submit their systems by July 23rd on codalab.
3. System paper submissions are due by 20th August [aligned with WMT deadlines].
Note: Like last year, we aligned with the General MT and Metrics shared tasks to facilitate cross-submission on the common language pairs: English-German, English-Spanish, and English-Hindi (MQM).
We look forward to your submissions and feel free to contact us if you have any more questions!
Best wishes,
on behalf of the organisers.
The original post is here: https://www.informatik.tu-darmstadt.de/ukp/ukp_home/jobs_ukp/2021_associate…
Are you passionate about making a difference in the field of mental health through cutting-edge research in AI and Natural Language Processing? Do you have a strong background in computer science, data science, or a related field? If so, we invite you to join our dynamic and interdisciplinary team at the Technical University of Darmstadt!
Position: Full-Time Research Assistant (i.e., doctoral candidate or PhD student)
Duration: 1.10.2024 or soon afterward - 31.12.2027 with the possibility of extension.
Location: Department of Computer Science, Technical University of Darmstadt
Responsibilities:
- Conduct cutting-edge research in NLP with a focus on mental health applications.
- Focus on research topics, such as NLP and knowledge discovery for mental health, large language models for clinical applications, and multimodal clinical data analysis.
- Develop and implement algorithms for analyzing therapist-patient conversations.
- Collaborate with a diverse team of researchers from TU Darmstadt and other partner institutions.
Ecosystem: We are part of DYNAMIC, the newly approved interdisciplinary LOEWE-funded center “Dynamic Network Approach of Mental Health to Stimulate Innovations for Change.” Our mission is to advance the understanding and treatment of mental health disorders using AI, NLP, and multimodal data analysis.
Team: Dr. Shaoxiong Ji (https://www.helsinki.fi/~shaoxion/) will join TU Darmstadt this fall and establish a junior independent research group focusing on foundation models and their applications, such as healthcare. He has a wide range of research directions, including NLP for health, multilingual LLMs, and learning methods such as federated learning, multitask learning, and meta-learning. The newly established research group will closely collaborate with the research labs led by Prof. Iryna Gurevych and Prof. Kristian Kersting, and partners under the umbrella of the DYNAMIC project.
Qualifications:
- A Master’s degree in Computer Science, Data Science, AI, NLP, or a related field.
- Strong programming skills in Python or other relevant languages.
- Experience with deep learning frameworks
- Excellent problem-solving abilities and a passion for research.
- Previous experience in clinical NLP or multimodal data analysis is a plus but not required.
- Strong communication skills and the ability to work effectively in a collaborative environment.
What We Offer:
- An exciting opportunity to contribute to impactful research in mental health.
- A supportive and collaborative research environment.
- Opportunities for professional development and growth within the DYNAMIC project and beyond.
How to Apply: If you are enthusiastic about joining our team and contributing to groundbreaking research, please submit the following documents:
- Detailed CV
- Master’s degree certificates and the Bachelor and Master study transcripts
- Cover letter outlining your motivation and relevant experience
- Contact information for at least two academic or professional references
Please send your application to Shaoxiong Ji <shaoxiong.ji(a)outlook.com> by July 31st, 2024. After that, the positions will remain open until filled. We will consider applications as soon as they are submitted.
Join us in making a real-world impact on mental health through the power of AI and NLP!
Shared task on Multilingual Grammatical Error Correction (MultiGEC-2025)
We invite you to participate in the shared task on Multilingual Grammatical Error Correction, MultiGEC-2025, covering over 10 languages, including Czech, English, Estonian, German, Icelandic, Italian, Latvian, Slovene, Swedish and Ukrainian.
The results will be presented on March 5 (or 2), 2025, at the NLP4CALL workshop, colocated with the NoDaLiDa conference (https://www.nodalida-bhlt2025.eu/conference) to be held in Estonia, Tallinn, on 2--5 March 2025.
The publication venue for system descriptions will be the proceedings of the NLP4CALL workshop.
Official system evaluation will be carried out on CodaLab.
* TASK DESCRIPTION
In this shared task, your goal is to rewrite learner-written texts to make them grammatically correct or both grammatically correct and idiomatic, that is either adhering to the "minimal correction" principle or applying fluency edits.
For instance, the text
> My mother became very sad, no food. But my sister better five months later.
can be corrected minimally as
> My mother became very sad, and ate no food. But my sister felt better five months later.
or with fluency edits as
> My mother was very distressed and refused to eat. Luckily, my sister recovered five months later.
For fair evaluation of both approaches to the correction task, we will provide two evaluation metrics, one favoring minimal correction, one suited for fluency-edited output (read more under Evaluation).
We particularly encourage development of multilingual systems that can process all (or several) languages using a single model, but this is not a mandatory requirement to participate in the task.
* DATA
We provide training, development and test data for each of the languages. The training and development dataset splits will be made available through Github. Evaluation will be performed on a separate test set.
See website for more detailed information: https://github.com/spraakbanken/multigec-2025/
* EVALUATION
During the shared task, evaluation will be based on cross-lingually applicable automatic metrics, primarily:
- GLEU score (reference-based)
- Scribendi score (reference-free)
For comparability with previous results, we will also provide F0.5 scores.
After the shared task, we also plan on carrying out a human evaluation experiment on a subset of the submitted results.
* TIMELINE (preliminary)
- June 18, 2024 - first call for participation
- September 20, 2024 - second call for participation
- October 20, 2024 - third call for participation. Training and validation data released, CodaLab opens for team registrations
- October 30, 2024 - reminder. Validation server released online
- November 13, 2024 - test data released
- November 20, 2024 - system submission deadline (system output)
- November 29, 2024 - results announced
- December 20, 2024 - paper submission deadline with system descriptions
- January 20, 2025 - paper reviews sent to the authors
- February 7, 2025 - camera-ready deadline
- March 5 (or March 2), 2025 - presentations of the systems at the NLP4CALL workshop
* PUBLICATION
We encourage you to submit a paper with your system description to the NLP4CALL workshop special track. We follow the same requirements for paper submissions as the NLP4CALL workshop, i.e. we use the same template and apply the same page limit. All papers will be reviewed by the organizing committee. Upon paper publication, we encourage you to share models, code, fact sheets, extra data, etc. with the community through GitHub or other repositories.
* ORGANIZERS
- Arianna Masciolini, University of Gothenburg, Sweden
- Andrew Caines, University of Cambridge, UK
- Orphee De Clecrq, Ghent university, Belgium
- Murathan Kurfali, Stockholm University, Sweden
- Ricardo Muñoz Sánchez, University of Gothenburg, Sweden
- Elena Volodina, University of Gothenburg, Sweden
- Robert Östling, Stockholm University, Sweden
* DATA PROVIDERS (more languages to come)
- Czech: Alexandr Rosen, Charles University, Prague
- English: Andrew Caines, University of Cambridge
- Estonian:
-- Mark Fishel, University of Tartu, Estonia
-- Kais Allkivi-Metsoja, Tallinn University, Estonia
-- Kristjan Suluste, Eesti Keele Instituut, Estonia
- German:
-- Torsten Zesch, Fernuniversität in Hagen, Germany
-- Andrea Horbach, Fernuniversität in Hagen, Germany
- Icelandic: Isidora Glisič, University of Iceland
- Italian: Jennifer-Carmen Frey, Eurac Research Bolzano, Italy
- Latvian:
- Roberts Darģis, University of Latvia
- Ilze Auzina, University of Latvia
- Slovene: Špela Arhar Holdt, University of Ljubljana, Slovenia
- Swedish: Arianna Masciolini, University of Gothenburg, Sweden
- Ukrainian:
-- Oleksiy Syvokon, Microsoft and
-- Mariana Romanyshyn, Grammarly
* CONTACT
Please join the MultiGEC-2025 Google group (https://groups.google.com/g/multigec-2025) in order to ask questions, hold discussions and browse for already answered questions.