****We apologize for multiple postings of this e-mail****
CALL FOR PARTICIPATION
FIRE 2024 Task - CoLI-Dravidian: Word-level Code-Mixed Language
Identification in Dravidian Languages
Held as a shared task in the 16th meeting of Forum for Information
Retrieval Evaluation (FIRE 2024 <http://fire.irsi.org.in/fire/2024/home>)
December 12-15, 2024. DAIICT, Gandhinagar, India
Website:
https://sites.google.com/view/coli-dravidian-2024/datasets?authuser=0
Codalab link: https://codalab.lisn.upsaclay.fr/competitions/19357
Dear All,
We are inviting researchers and students to participate in the shared task
CoLI-Dravidian: Word-level Code-Mixed Language Identification in Dravidian
Languages, which is held as a shared task in the 16th meeting of Forum for
Information Retrieval Evaluation (FIRE 2024
<http://fire.irsi.org.in/fire/2024/home>).
Language Identification (LI) involves detecting the language(s) used in a
given text, which is a preliminary step for many applications such as
sentiment analysis, machine translation, information retrieval, and natural
language understanding. In multilingual India, especially among the youth,
social media often features code-mixed text, blending local languages with
English at various levels. However, this poses significant challenges for
LI, particularly when languages are mixed within a single word. Dravidian
languages, extensively spoken in southern India, are under-resourced
despite their rich morphological structure. These languages face
technological challenges, especially in script representation on digital
platforms, leading users to prefer Roman or hybrid scripts for
communication. This prevalent code-mixing offers vast linguistic data for
research yet remains understudied.
To address word-level LI challenges in code-mixed Dravidian languages, we
are conducting a shared task by providing code-mixed datasets for four
languages - Kannada, Tamil, Malayalam, and Tulu, to encourage the
development of advanced LI models.
There will be a real-time leaderboard, and the participants will be allowed
to make a maximum of 10 submissions in the training phase and 5 submissions
in the testing phase through CodaLab. Each team will have to select the
best submission for ranking.
To download the data and participate, go to:
https://codalab.lisn.upsaclay.fr/competitions/19357.
Best regards,
The CoLI-Dravidian 2024 Organizing Committee
Important dates
-
14th June 2024 - open track websites and training data release
-
1st July 2024– test data release
-
25th July – run submission deadline
-
27th July – results declared
-
27th August – Working notes due
-
10th September - Reviews
-
30th October – Camera-ready copies of working notes
NOTE: All dates mentioned here are in the AoE (Anywhere on Earth) zone.
Organizing Committee
-
Shashirekha Hosahalli Lakshmaiah, Department of Computer Science,
Mangalore University, India.
-
Ameeta Agrawal, Department of Computer Science, Portland State
University, USA.
-
Fazlourrahman Balouchzahi, CIC, IPN, Mexico.
-
Asha Hegde, Department of Computer Science, Mangalore University, India.
-
Sabur Butt, IFE, Tecnologico de Monterrey, Mexico.
-
Sharal Coelho, Department of Computer Science, Mangalore University,
India.
-
Kavya G, Department of Computer Science, Mangalore University, India.
-
Harshitha, Department of Computer Science, Mangalore University, India.
-
Sonith D, Department of Computer Science, Mangalore University, India.
*Sabur Butt, Ph.D. *(He/Him)
Institute for the Future of Education (IFE)
*Tecnológico de Monterrey, Mexico*
Address: Av. Eugenio Garza Sada 2501 Sur Tecnológico, 64849 Monterrey, N.L.
LinkedIn <https://www.linkedin.com/in/saburb> - GitHub
<https://github.com/saburbutt> - Scholar
<https://scholar.google.com/citations?user=re7md-0AAAAJ&hl=en> - Website
<https://saburbutt.github.io/>
*GenBench: The second workshop on generalisation (benchmarking) in NLP*
*Workshop description*The ability to generalise well is often mentioned as
one of the primary desiderata for models of natural language processing
(NLP).
Yet, there are still many open questions related to what it means for an
NLP model to generalise well, and how generalisation should be evaluated.
LLMs, trained on gigantic training corpora that are – at best – hard to
analyse or not publicly available at all, bring a new set of challenges to
the topic.
The second GenBench workshop aims to serve as a cornerstone to catalyse
research on generalisation in the NLP community.
The workshop aims to bring together different expert communities to discuss
challenging questions relating to generalisation in NLP, crowd-source
challenging generalisation benchmarks for LLMs, and make progress on open
questions related to generalisation.
Topics of interest include, but are not limited to:
- Opinion or position papers about generalisation and how it should be
evaluated;
- Analyses of how existing or new models generalise;
- Empirical studies that propose new paradigms to evaluate
generalisation;
- Meta-analyses that compare how results from different generalisation
studies compare;
- Meta-analyses that study how different types of generalisation are
related;
- Papers that discuss how generalisation of LLMs can be evaluated;
- Papers that discuss why generalisation is (not) important in the era
of LLMs;
- Studies on the relationship between generalisation and fairness or
robustness.
The second GenBench workshop on generalisation (benchmarking) in NLP will
be co-located with EMNLP 2024.
*Submission types*
We call for two types of submissions: regular workshop submissions and
collaborative benchmarking task submissions.
The latter will consist of a data/task artefact and a companion paper
motivating and evaluating the submission.
In both cases, we accept archival papers and extended abstracts.
*1. Regular workshop submissions*
Regular workshop submissions present papers on the topic of generalisation
(see examples listed above).
Regular workshop papers may be submitted as an archival paper, when they
report on completed, original and unpublished research, or as a shorter
extended abstract, otherwise.
More details on this category can be found below.
If you are unsure whether a specific topic is well-suited for submission,
feel free to reach out to the organisers of the workshop at
genbench(a)googlegroups.com.
*2. Collaborative Benchmarking Task (CBT) submissions*
The goal of this year's CBT is to generate versions of existing evaluation
datasets for LLMs which, given a particular training corpus, have a larger
distribution shift than the original test set, or – in other words –
evaluate generalisation to a stronger degree than the original dataset.
For this particular challenge, we focus on three training corpora: C4,
RedPajama-Data-1T, and Dolma.
All three corpora are publicly available, and they can be searched via the
What's in My Big Data API (https://github.com/allenai/wimbd).
We will focus on three popular evaluation datasets: MMLU, HumanEval, and
SiQA.
Submitters to the CBT are asked to design a way to assess distribution
shift for one or more of these evaluation datasets, given particular
features of the training corpus, and then generate one or more versions of
the dataset that have a larger distribution shift according to this method.
Newly generated sets do not have to have the same size as the original test
set, but should have at least 200 examples.
Practically speaking, CBT submissions consist of:
1. the data/task artefact, submitted through
https://github.com/GenBench/genbench_cbt
2. a paper describing the dataset and its method of construction,
submitted through
https://openreview.net/group?id=GenBench.org/2024/Workshop
We accept submissions that consider only one pretraining dataset and
evaluation dataset, but encourage submitters to apply their suggested
protocols to both pretraining datasets.
We also suggest that submitters include model results for models trained on
these datasets.
Suggestions are provided on the CBT website: https://genbench.org/cbt.
Given enough high-quality submissions, we aim to write a paper with the
combined results, to which submitters can be co-authors, if they wish so.
More detailed guidelines will be given on https://genbench.org/cbt.
*Archival vs extended abstract*
Archival papers are up to 8 pages excluding references and report on
completed, original and unpublished research.
They follow the requirements of regular EMNLP 2024 submissions.
Accepted papers will be published in the workshop proceedings and are
expected to be presented at the workshop.
The papers will undergo double-blind peer review and should thus be
anonymised.
Extended abstracts can be up to 2 pages excluding references, and may
report on work in progress or be cross-submissions of work that has already
appeared in another venue.
Abstract titles will be posted on the workshop website, but will not be
included in the proceedings.
*Submission instructions*For both archival papers and extended abstracts,
we refer to the EMNLP 2024 website for paper templates and requirements.
Additional requirements for both regular workshop papers and collaborative
benchmarking task submissions can be found on our website.
All submissions can be submitted through OpenReview:
https://openreview.net/group?id=GenBench.org/2024/Workshop.
*Important dates*
- August 15, 2024: Paper submission deadline
- September 20, 2024: Notification deadline
- October 4, 2024: Camera-ready deadline
- November 15 or 16, 2024: Workshop
Note: all deadlines are 11:59 PM UTC-12:00. Check the website for final
updates to these deadlines (https://genbench.org/workshop).
*Preprints*
We do not have an anonymity deadline, preprints are allowed, both before
the submission deadline as well as after.
*Contact*
Email address: genbench(a)googlegroups.com
Website: https://genbench.org/workshop
*On behalf of the organisers*Dieuwke Hupkes
Verna Dankers
Khuyagbaatar Batsuren
Amirhossein Kazemnejad
Christos Christodoulopoulos
Mario Giulianelli
Ryan Cotterell
*Apologies for crossposting*
LLMs Beyond the Cutoff: 1st International Workshop on Computational Methods
Beyond the Temporal Borders of Training Data
https://llmsbeyondthecutoff2024.wordpress.com
Collocated with CIKM 2024
October 25, 2024 — Boise (Idaho), USA
* July 29, 2024: Paper submission deadline
* August 30, 2024: Paper acceptance notification
* September 15, 2024: Camera ready versions submission
* October 25, 2024: Workshop date
=== NEWS ===
* LLMs Beyond the CutOff will be published as a volume of Springer Nature’s
post-proceedings
* Submission via EasyChair:
https://easychair.org/conferences/?conf=llmsbeyondthecut0ff
* Springer guidelines for authors:
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…
SUMMARY
LLMs are trained on large amounts of web data that spread temporally up to
a specific moment in time. For instance, chatGPT’s LLM “knows” the world
before May 2023 with no real time access to information beyond this limit,
other than a browsing tool similar to a search engine enabling simple
lookup. However, in many scenarios, being able to analyze and reason with
novel emerging events and topics is crucial to face the challenges of
rapidly evolving landscapes of information.
The workshop provides an interdisciplinary forum for discussing the
temporal limitations of LLMs and proposing technical solutions of how to
apply and develop LLMs beyond their cutoff dates. We explore two prominent
scenarios, where contexts tend to evolve faster than the LLMs that are used
to analyze them: (1) journalism and (2) industry. In terms of (1) the goal
is to propose methods of detecting, classifying and reasoning with emerging
topics that infuse public discourse on social or mainstream media. An
example of such a topic is COVID-19 at the dawn of the pandemics outbreak.
Downstream tasks of interest are fake news detection and fact-checking on
novel topics, including claim analysis, opinion mining and narratives
extraction. With regard to (2), the goal is to shed light on the limits of
LLMs for companies in sectors such as international geopolitical monitoring
and corporate intelligence, finance and stock market trading or insurance,
where companies need to track their interests and products in real time.
This does not address the inclusion of corporate data into the LLMs, but
rather proposes solutions by using publicly available and constantly
growing data. An overarching problem that will be studied is that of the
cross-language and cross-country specificities of emerging data, where
novel information in underrepresented languages or contexts may be more
challenging to analyze. We welcome insights and parallels from the field of
knowledge representation, where the similar problem with cutoff dates of
knowledge graphs (dynamics and regular updates) is well understood.
The expected outcomes are: 1) insights on the temporal limitations of LLMs,
where the workshop will outline concrete challenges and bottlenecks in the
identified scenarios; 2) novel methodological and technical solutions in
terms of (incremental) machine learning models when dealing with
(reasoning, extracting and classifying) information beyond the cutoff dates
of current LLMs.
TOPICS OF INTEREST
* Analysis of emerging topics and events, including counterfactual/what-if
reasoning
* Methods for few-shot or zero-shot learning
* Large language models for online discourse
* Large language models for corporate near real-time data analysis
* Large language models for multimodal understanding and generation
* Multilingual and cross-country emerging information extraction
* Computational journalism, disinformation spread, fact-checking and fake
news detection
* Stance and viewpoint discovery for novel information
* Detection and classification of claims within emerging narratives
* Social, ethical and legal aspects of LLMs up-to-dateness
* Interpretability / explainability of computational methods beyond the cut
off
* Linking and enrichment of data beyond LLM cut off
* Foundational models for knowledge graph building and entity alignment
* Recommender systems for novel information
* Quality, provenance, uncertainty and trust of emerging information and
data
* Use-cases, applications and cross-community interfaces
* Evaluation frameworks and benchmarks
SUBMISSION
We welcome the following types of contributions:
* Full papers (12-15 pages including references): contain original research.
* Short papers (up to 11 pages including references): contain original
research in progress.
* Demo papers (up to 11 pages including references): contain descriptions
of prototypes, demos or software systems.
* Data papers (up to 11 pages including references): contain descriptions
of resources related to the workshop topics, such as datasets, knowledge
graphs, corpora, annotation protocols, etc.
* Position papers (up to 11 pages including references): discuss vision
statements or research directions.
Workshop papers must be self-contained and in English. They should not have
been previously published, should not be considered for publication, and
should not be under review for another workshop, conference, or journal.
Manuscripts should be submitted via EasyChair (
https://easychair.org/conferences/?conf=llmsbeyondthecut0ff) in PDF format,
using the Springer LNCS format. For full authors instructions, please check
Springer’s website:
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu….
The review of manuscripts will be double-blind. Papers will be evaluated
according to their significance, originality, technical content, style,
clarity, and relevance to the workshop. At least one author of each
accepted contribution must register for the workshop and present the paper.
Pre-prints of all contributions will be made available during the
conference. The accepted papers will appear as a volume of Springer
Nature’s LNCS post-proceedings.
Submission via EasyChair:
https://easychair.org/conferences/?conf=llmsbeyondthecut0ff
Springer guidelines for authors:
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…
For any enquiries, please contact the workshop organizers:
todorov(a)lirmm.fr, rettinger(a)uni-trier.de, jmgomez(a)expert.ai,
croitoru(a)lirmm.fr,
IMPORTANT DATES
* July 29, 2024: Paper submission deadline
* August 30, 2024: Paper acceptance notification
* September 15, 2024: Camera ready versions submission
* October 25, 2024: Workshop date
All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time
zone.
KEYNOTES
* TBA
AWARD
* All contributions are eligible for the "Best Paper" award
ORGANIZING COMMITTEE
* Konstantin Todorov (University of Montpellier, CNRS, LIRMM, France)
* José Manuel Gomèz Perèz (Expert.ai, Spain)
* Madalina Croitoru (University of Montpellier, CNRS, LIRMM, France)
* Achim Rettinger (University of Trier, Germany)
PROGRAM COMMITTEE
* Preslav Nakov, MBZUAI, United Arabe Emirates
* Serena Villata, I3S, CNRS, France
* Ronald Denaux, Amazon, USA
* Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands
* Elena Montiel, Universidad Politécnica de Madrid, Spain
* Sandra Bringay, University Paul Valéry, France
* Carlos Badenes, Universidad Politécnica de Madrid, Spain
* Ioana Manolescu, Inria Saclay, France
* Dino Ienco, INRAE, France
* Colin Porlezza, Univ. della Svizzera Italiana, Switzerland
* Katarina Boland, Heinrich Heine Universität, Germany
* Gabriella Lapesa, GESIS, Germany
* Jonas Fegert, FZI, Germany
* Michael Färber, TU-Dresden, Germany
* Salim Hafid, University of Montpellier, France
* Pavlos Fafalios, FORTH, Greece
* Andrés García Silva, Expert.ai, Spain
* Sarah Labelle, University Paul Valéry, France
* Pablo Calleja, Universidad Politécnica de Madrid, Spain
*Patricia Martín Chozas*
*Assistant Professor *at the Applied Linguistics Department
*Postdoctoral Researcher *at the Ontology Engineering Group
(Artificial Intelligence Department)
ETSI Informáticos - Universidad Politécnica de Madrid
Phone: (+34) 910673091
===============
===============
* We apologize if you receive multiple copies of this Tutorial program *
* For the online version of this program, visit: https://cikm2024.org/tutorials/
===============
CIKM 2024: 33rd ACM International Conference on Information and Knowledge Management
Boise, Idaho, USA
October 21–25, 2024
===============
The tutorial program of CIKM 2024 has been published. Tutorials are planned to take place on 21 October 2024.
Here you can find a summary of each accepted tutorial.
===============
Systems for Scalable Graph Analytics and Machine Learning
===============
Da Yan (Indiana University Bloomington), Lyuheng Yuan (Indiana University Bloomington), Akhlaque Ahmad (Indiana University Bloomington) and Saugat Adhikari (Indiana University Bloomington)
Graph-theoretic algorithms and graph machine learning models are essential tools for addressing many real-life problems, such as social network analysis and bioinformatics. To support large-scale graph analytics, graph-parallel systems have been actively developed for over one decade, such as Google’s Pregel and Spark’s GraphX, which (i) promote a think-like-a-vertex computing model and target (ii) iterative algorithms and (iii) those problems that output a value for each vertex. However, this model is too restricted for supporting the rich set of heterogeneous operations for graph analytics and machine learning that many real applications demand.
In recent years, two new trends emerge in graph-parallel systems research: (1) a novel think-like-a-task computing model that can efficiently support the various computationally expensive problems of subgraph search; and (2) scalable systems for learning graph neural networks. These systems effectively complement the diversity needs of graph-parallel tools that can flexibly work together in a comprehensive graph processing pipeline for real applications, with the capability of capturing structural features. This tutorial will provide an effective categorization of the recent systems in these two directions based on their computing models and adopted techniques, and will review the key design ideas of these systems.
===============
Fairness in Large Language Models: Recent Advances and Future
===============
Thang Viet Doan (Florida International University), Zichong Wang (Florida International University), Minh Nhat Nguyen (Florida International University) and Wenbin Zhang (Florida International University)
Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. In this tutorial, we give a systematic overview of recent advances in the existing literature concerning fair LLMs. Specifically, a series of real-world case studies serve as a brief introduction to LLMs, and then an analysis of bias causes based on their training process follows. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, current research challenges and open questions are discussed.
===============
Unifying Graph Neural Networks across Spatial and Spectral Domains
===============
Zhiqian Chen (Mississippi State University), Lei Zhang (Virginia Tech) and Liang Zhao (Emory University)
Over recent years, Graph Neural Networks (GNNs) have garnered significant attention. However, the proliferation of diverse GNN models, underpinned by various theoretical approaches, complicates model selection, as they are not readily comprehensible within a uniform framework. Early GNNs were implemented using spectral theory, while others were based on spatial theory. This divergence renders direct comparisons challenging. Moreover, the multitude of models within each domain further complicates evaluation.
In this half-day tutorial, we examine state-of-the-art GNNs and introduce a comprehensive framework bridging spatial and spectral domains, elucidating their interrelationship. This framework enhances our understanding of GNN operations. The tutorial explores key paradigms, such as spatial and spectral methods, through a synthesis of spectral graph theory and approximation theory. We provide an in-depth analysis of recent research developments, including emerging issues like over-smoothing, using well-established GNN models to illustrate our framework's universality.
===============
Tabular Data-centric AI: Challenges, Techniques and Future Perspectives
===============
Yanjie Fu (Arizona State University), Dongjie Wang (University of Kansas), Hui Xiong (Hong Kong University of Science and Technology (Guangzhou)) and Kunpeng Liu (Portland State University)
Tabular data is ubiquitous across various application domains such as biology, ecology, and material science. Tabular data-centric AI aims to enhance the predictive power of AI through better utilization of tabular data, improving its readiness at structural, predictive, interaction, and expression levels. This tutorial targets professionals in AI, machine learning, and data mining, as well as researchers from specific application areas. We will cover the settings, challenges, existing methods, and future directions of tabular data-centric AI. The tutorial includes a hands-on session to develop, evaluate, and visualize techniques in this emerging field, equipping attendees with a thorough understanding of its key challenges and techniques for integration into their research.
===============
Frontiers of Large Language Model-Based Agentic Systems
===============
Reshmi Ghosh (Microsoft), Jia He (Microsoft Corp.), Kabir Walia (Microsoft), Jieqiu Chen (Microsoft), Tushar Dhadiwal (Microsoft), April Hazel (Microsoft) and Chandra Inguva (Microsoft)
Large Language Models (LLMs) have recently demonstrated remarkable potential in achieving human-level intelligence, sparking a surge of interest in LLM-based autonomous agents. However, there
is a noticeable absence of a thorough guide that methodically compiles the latest methods for building LLM-agents, their assessment, and the associated challenges. As a pioneering initiative, this tutorial delves into the intricacies of constructing LLM-based agents, providing a systematic exploration of key components and recent innovations. We dissect agent design using an established taxonomy, focusing on essential keywords prevalent in agent-related framework discussions. Key components include profiling, perception, memory, planning, and action. We unravel the intricacies of each element, emphasizing state-of-the-art techniques. Beyond individual agents, we explore the extension from single-agent paradigms to multi-agent frameworks. Participants will gain insights into orchestrating collaborative intelligence within complex environments.
Additionally, we introduce and compare popular open-source frameworks for LLM-based agent development, enabling practitioners to choose the right tools for their projects. We discuss evaluation methodologies for assessing agent systems, addressing efficiency and safety concerns. We present a unified framework that consolidates existing work, making it a valuable resource for practitioners and researchers alike.
===============
Hands-On Introduction to Quantum Machine Learning
===============
Samuel Yen-Chi Chen (Wells Fargo) and Joongheon Kim (Korea University)
This tutorial offers a hands-on introduction into the captivating field of quantum machine learning (QML). Beginning with the bedrock of quantum information science (QIS)—including essential elements like qubits, single and multiple qubit gates, measurements, and entanglement—the session swiftly progresses to foundational QML concepts. Participants will explore parametrized or variational circuits, data encoding or embedding techniques, and quantum circuit design principles.
Delving deeper, attendees will examine various QML models, including the quantum support vector machine (QSVM), quantum feed-forward neural network (QNN), and quantum convolutional neural network (QCNN). Pushing boundaries, the tutorial delves into cutting-edge QML models such as quantum recurrent neural networks (QRNN) and quantum reinforcement learning (QRL), alongside privacy-preserving techniques like quantum federated machine learning, bolstered by concrete programming examples.
Throughout the tutorial, all topics and concepts are brought to life through practical demonstrations executed on a quantum computer simulator. Designed with novices in mind, the content caters to those eager to embark on their journey into QML. Attendees will also receive guidance on further reading materials, as well as software packages and frameworks to explore beyond the session.
===============
On the Use of Large Language Models for Table Tasks
===============
Yuyang Dong (NEC), Masafumi Oyamada (NEC), Chuan Xiao (Osaka University, Nagoya University) and Haochen Zhang (Osaka University)
The proliferation of LLMs has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a variety of table-related tasks, such as table understanding, text-to-SQL conversion, and tabular data preprocessing. It surveys LLM solutions to these tasks in five classes, categorized by their underpinning techniques: prompting, fine-tuning, RAG, agents, and multimodal methods. It discusses how LLMs offer innovative ways to interpret, augment, query, and cleanse tabular data, featuring academic contributions and their practical use in the industrial sector. It emphasizes the versatility and effectiveness of LLMs in handling complex table tasks, showcasing their ability to improve data quality, enhance analytical capabilities, and facilitate more intuitive data interactions. By surveying different approaches, this tutorial highlights the strengths of LLMs in enriching table tasks with more accuracy and usability, setting a foundation for future research and application in data science and AI-driven analytics.
===============
Data Quality-aware Graph Machine Learning
===============
Yu Wang (Vanderbilt University), Kaize Ding (Northwestern University), Xiaorui Liu (North Carolina State University), Jian Kang (University of Rochester), Ryan Rossi (Adobe Research) and Tyler Derr (Vanderbilt University)
Recent years have seen a significant shift in Artificial Intelligence from model-centric to data-centric approaches, highlighted by the success of large foundational models. Following this trend, despite numerous innovations in graph machine learning model design, graph-structured data often suffers from data quality issues, which jeopardizes the progress of Data-centric AI in graph-structured applications. Our proposed tutorial aims to address this gap by raising awareness about data quality issues within the graph machine-learning community. We provide an overview of existing issues, including topology, imbalance, bias, limited data, and abnormalities in graph data. Additionally, we highlight previous studies and recent developments in foundational graph models that focus on identifying, investigating, mitigating, and resolving these issues.
===============
Towards Efficient Temporal Graph Learning: Algorithms, Frameworks, and Tools
===============
Ruijie Wang (University of Illinois Urbana-Champaign), Wanyu Zhao (University of Illinois Urbana-Champaign), Dachun Sun (University of Illinois Urbana-Champaign), Charith Mendis (University of Illinois Urbana-Champaign) and Tarek Abdelzaher (University of Illinois Urbana-Champaign)
Temporal graphs capture dynamic node relations via temporal edges, finding extensive utility in wide domains where time-varying patterns are crucial. Temporal Graph Neural Networks (TGNNs) have gained significant attention for their effectiveness in representing temporal graphs. However, TGNNs still face significant efficiency challenges in real-world low-resource settings. First, from a data-efficiency standpoint, training TGNNs requires sufficient temporal edges and data labels, which is problematic in practical scenarios with limited data collection and annotation. Second, from a resource-efficiency perspective, TGNN training and inference are computationally demanding due to complex encoding operations, especially on large-scale temporal graphs. Minimizing resource consumption while preserving effectiveness is essential. Inspired by these efficiency challenges, this tutorial systematically introduces state-of-the-art data-efficient and resource-efficient TGNNs, focusing on algorithms, frameworks, and tools, and discusses promising yet under-explored research directions in efficient temporal graph learning. This tutorial aims to benefit researchers and practitioners in data mining, machine learning, and artificial intelligence.
===============
Landing Generative AI in Industrial Social and E-commerce Recsys
===============
Da Xu (LinkedIn), Danqing Zhang (Amazon), Lingling Zheng (Microsoft), Bo Yang (Amazon), Guangyu Yang (TikTok), Shuyuan Xu (TikTok) and Cindy Liang (LinkedIn)
Over the past two years, GAI has evolved rapidly, influencing various fields including social and e-commerce Recsys. Despite exciting advances, landing these innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems. Our tutorial begins with a brief overview of building industrial Recsys and GAI fundamentals, followed by the ongoing efforts and opportunities to enhance personalized recommendations with foundation models.
We then explore the integration of curation capabilities into Recsys, such as repurposing raw content, incorporating external knowledge, and generating personalized insights/explanations to foster transparency and trust. Next, the tutorial illustrates how AI agents can transform Recsys through interactive reasoning and action loops, shifting away from traditional passive feedback models. Finally, we shed insights on real-world solutions for human-AI alignment and responsible GAI practices.
A critical component of the tutorial is detailing the AI, Infrastructure, LLMOps, and Product roadmap (including the evaluation and responsible AI practices) derived from the production solutions in LinkedIn, Amazon, TikTok, and Microsoft. While GAI in Recsys is still in its early stages, this tutorial provides valuable insights and practical solutions for the Recsys and GAI communities.
===============
Transforming Digital Forensics with Large Language Models
===============
Eric Xu (University of Maryland, College Park), Wenbin Zhang (Florida International University) and Weifeng Xu (University of Baltimore)
In the pursuit of justice and accountability in the digital age, the integration of Large Language Models (LLMs) with digital forensics holds immense promise. This half-day tutorial provides a comprehensive exploration of the transformative potential of LLMs in automating digital investigations and uncovering hidden insights. Through a combination of real-world case studies, interactive exercises, and hands-on labs, participants will gain a deep understanding of how to harness LLMs for evidence analysis, entity identification, and knowledge graph reconstruction. By fostering a collaborative learning environment, this tutorial aims to empower professionals, researchers, and students with the skills and knowledge needed to drive innovation in digital forensics. As LLMs continue to revolutionize the field, this tutorial will have far-reaching implications for enhancing justice outcomes, promoting accountability, and shaping the future of digital investigations.
===============
Collecting and Analyzing Public Data from Mastodon
===============
Haris Bin Zia (Queen Mary University of London), Ignacio Castro (none) and Gareth Tyson (Hong Kong University of Science and Technology)
Understanding online behaviors, communities, and trends through social media analytics is becoming increasingly important. Recent changes in the accessibility of platforms like Twitter have made Mastodon a valuable alternative for researchers. In this tutorial, we will explore methods for collecting and analyzing public data from Mastodon, a decentralized micro-blogging social network. Participants will learn about the architecture of Mastodon, techniques and best practices for data collection, and various analytical methods to derive insights from the collected data. This session aims to equip researchers with the skills necessary to harness the potential of Mastodon data in computational social science and social data science research.
Registration for ECAI-2024, the 27th European Conference on Artificial Intelligence, is now open. The early registration period will end on Monday, 19 August 2024.
https://urldefense.com/v3/__https://www.ecai2024.eu/registration__;!!D9dNQw…
Please join us during 19-24 October 2024 in Santiago de Compostela to mark the 50th anniversary since the first AI conference was held in Europe back in 1974.
We are looking forward to an exciting programme with some 600 accepted papers across all areas of AI, as well as lots of special events, including invited talks, panel sessions, satellite workshops, tutorials, and more.
--
Luis Magdalena
Publicity Chair of the European Conference on Artificial Intelligence (ECAI-2024)
Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages.
>> Topics
LoResLM 2025 invites submissions on a broad range of topics related to the development and evaluation of neural language models for low-resource languages, including but not limited to the following.
*
Building language models for low-resource languages.
*
Adapting/extending existing language models/large language models for low-resource languages.
*
Corpora creation and curation technologies for training language models/large language models for low-resource languages.
*
Benchmarks to evaluate language models/large language models in low-resource languages.
*
Prompting/in-context learning strategies for low-resource languages with large language models.
*
Review of available corpora to train/fine-tune language models/large language models for low-resource languages.
*
Multilingual/cross-lingual language models/large language models for low-resource languages.
*
Applications of language models/large language models for low-resource languages (i.e. machine translation, chatbots, content moderation, etc.
>> Important Dates
*
Paper submission due – 5th November 2024
*
Notification of acceptance – 25th November 2024
*
Camera-ready due – 13th December 2024
*
LoResLM 2025 workshop – 19th / 20th January 2025 co-located with COLING 2025
>> Submission Guidelines
We follow the COLING 2025 standards for submission format and guidelines. LoResLM 2025 invites the submission of long papers of up to eight pages and short papers of up to four pages. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references), papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix.
To prepare your submission, please make sure to use the COLING 2025 style files available here:
*
Latex - https://coling2025.org/downloads/coling-2025.zip
*
Word - https://coling2025.org/downloads/coling-2025.docx
*
Overleaf - https://www.overleaf.com/latex/templates/instructions-for-coling-2025-proce…
Papers should be submitted through Softconf/START using the following link: https://softconf.com/coling2025/LoResLM25/
>> Organising Committee
*
Hansi Hettiarachchi, Lancaster University, UK
*
Tharindu Ranasinghe, Lancaster University, UK
*
Paul Rayson, Lancaster University, UK
*
Ruslan Mitkov, Lancaster University, UK
*
Mohamed Gaber, Birmingham City University, UK
*
Damith Premasiri, Lancaster University, UK
*
Fiona Anting Tan, National University of Singapore, Singapore
*
Lasitha Uyangodage, University of Münster, Germany
URL - https://loreslm.github.io/
Twitter - https://x.com/LoResLM2025
Best Regards
Tharindu Ranasinghe
FIRST CALL FOR PARTICIPATION
Advanced Language Processing School (ALPS) 2025
March 30th - April 4th 2025
Aussois (French Alps)
We are pleased to announce the 5th edition of ALPS - the Advanced NLP
School to be held in the French Alps from March 30th to April 4th 2025.
This school targets advanced research students in Natural Language
Processing and related fields and brings together world leading experts and
motivated students. The programme comprises lectures, poster presentations,
practical lab sessions and nature activities - the venue is located near a
National Park.
Important Dates
-
Oct 15th 2024: Application deadline
-
Nov 15th 2024: acceptance notification
-
Jan 15th 2025: registration deadline
-
March 30th 2025: Start of School
Confirmed speakers so far:
-
Kyunghyun Cho (New York University & Prescient Design)
-
Titouan Parcollet (Cambridge University & Samsung AI Center)
-
Barbara Plank (LMU Munich)
-
François Yvon (ISIR CNRS)
Website and online application: https://alps.imag.fr/ <http://alps.imag.fr/>
Questions: alps(a)univ-grenoble-alpes.fr
The registration fees for the event encompass accommodation and full board
at the conference venue, the Centre Paul Langevin
<https://lig-alps.imag.fr/index.php/venue/>. We will announce the fee
amounts later, and they will vary depending on the participant's
background: students, academia, and industry. Student fees will be set at
or below €600, including twin room accommodation. We will have a limited
amount of scholarships for the registration: if you are interested please
mark this in the application form. The rates for academia and industry will
be higher, as is customary, and will include accommodation in a single room.
Call For Papers: The International Conference on Intelligent Multilingual
Information Processing 2024 (IMLIP 2024)
The International Conference on Intelligent Multilingual Information
Processing 2024 (IMLIP 2024) will take place in Beijing, China, on 16-17
November 2024, hosted by Beijing Institute of Technology (
https://english.bit.edu.cn/).
As a professional committee of the Chinese Association of Artificial
Intelligence, the Institue of Multilingual Intelligent Information
Processing (IMLIP), focuses on multilingual intelligent information
processing and its applications. The aim of the conference IMLIP 2024 is to
bring together experts from industry, academia, and research in the
community, to provide a platform for academic exchange and collaborative
research for scholars from around the world, and also to promote linguistic
research and natural language processing studies related to China's ethnic
minorities and countries.
Conference Website:http://www.imlip.org/
Topics
IMLIP 2024 welcomes original research and applications related to
multilingual intelligent information processing. We encourage
interdisciplinary studies and the integration of humanities and sciences.
Topics of interest include, but are not limited to, the following:
Linguistics
Cross-lingual processing
Large language models
Computational linguistics theory
Resource and corpus construction
Evaluation
Multilingual language understanding
Machine translation
Multimodal intelligent information processing, including multilingual
speech recognition and text processing
Intelligent processing in international Chinese education
Applications of multilingual intelligent information processing
Keynote speakers
Academician Nima Tashi, Professor, Tibet University, Tibetan multilingual
processing
Professor Kim Gerdes, University of Saclay, France
Important Dates
Paper Submission System Open: June 30, 2024
Paper submission Deadline: August 30, 2024
Notification of Acceptance: September 30, 2024
Conference Dates: November 16-17, 2024
Submissions
Papers submitted to IMLIP 2024 can be in Chinese or English. An accepted
paper will be presented either as an oral talk or as a poster, as
determined by the Program Committee. Accepted Chinese papers will be
recommended to "Corpus Linguistics" and "AppliedTechnology" based on
circumstances, with further review required to determine final acceptance.
Accepted English papers will be published in the Springer conference
proceedings (EIindexed). The authors of accepted papers must revise the
papers according to the review before publication. At least one author of
the accepted paper must attend the conference.
Format
Please use the Word or LaTeX templates provided. Papers may consist of up
to 8 pages of content, plus unlimited references. Papers will be
double-blindly reviewed without the authors’ names and affiliations
included. Furthermore, self-references that reveal the author’s identity,
e.g., “We previously showed (Smith, 1991) …”, must be avoided. Instead, use
citations such as “Smith (1991) previously showed …”. Papers that do not
conform to these requirements will be rejected without review. For Chinese
submission, please download the template at
http://jcip.cipsc.org.cn/CN/item/downloadFile.do?id=79.
For English submission, please download the template at
https://github.com/acl-org/acl-style-files.
Submission Website: Submission will be electronic in PDF format through
https://openreview.net/groupid=IMLIP.org/2024/Conference.
Multiple-Submission Policy
IMLIP 2024 allows authors to submit manuscripts to leading NLP
international conferences simultaneously only if the conferences have
established similar multiple-submission policies. Papers that have been or
will be submitted to other conferences must indicate this at the submission
time. Authors of papers accepted for presentation in IMLIP 2024 must notify
the program chairs by the camera-ready deadline as to whether the paper
will be presented at IMLIP 2024. Once confirmed, the paper must be
withdrawn from other venues. We will not accept papers that are identical
or overlap significantly in content or results with papers that will be (or
have been) published elsewhere except the arXiv preprint version.
Awards and Funds
IMLIP 2024 will grant Best Paper Awards in Chinese and English respectively.
Contact
For further information, visit the conference website at
http://www.imlip.org/
参会地点:
北京理工大学良乡校区文博中心
Venue: Cultural and Museum Center, Liangxiang Campus, Beijing Institute of
Technology
ICLC-11
11TH INTERNATIONAL CONTRASTIVE LINGUISTICS CONFERENCE
First Call for Abstracts
September 17–19, 2025
Prague, Czech Republic
The Faculty of Arts at Charles University in Prague is pleased to announce the 11th International Contrastive Linguistics Conference. The ICLC conference series, running since 1998, aims to promote fine-grained cross-linguistic research comprising two or more languages from a broad range of theoretical and methodological perspectives. Following the success of ICLC-10 in Mannheim 2023, ICLC-11 wants to bring together researchers from different linguistic subfields and neighbouring disciplines to continue the interdisciplinary dialog on comparing languages, to foster the development of an international community and to advance possible new areas of cross-linguistic research. See https://iclc11.ff.cuni.cz/ for more and note the submission deadline of February 24, 2025.
We invite abstracts on a broad range of topics, including but not limited to:
(1) Comparison of phenomena in two or more languages focused on any area and level of linguistic analysis:
* lexicon
* phonetics and phonology
* morphology, syntax and morphosyntax, linguistic complexity
* semantics, pragmatics, register and socio-cultural context
(2) Methodological challenges and solutions in cross-linguistic research:
* language corpora (multilingual, learner, and multimodal) and issues of linguistic annotation (e.g., Universal Dependencies)
* comparability issues, tertia comparationis, language universals; experimental and naturalistic interaction data
* AI and new digital tools in linguistic analysis
* low-resourced languages
(3) Contrastive linguistics in touch with related disciplines:
* generative, model-theoretic, functional or cognitive (e.g., constructional) approches
* historical, sociolinguistic and variationist perspectives; registers, multimodality, pragmatics, interculturality; language contact; language policy
* cognitive and psycholinguistic approaches to bilingualism and multilingualism; language acquisition, language teaching and learning
* translation studies
The abstracts should present empirical research, well-defined research questions or hypotheses, details of the research approach and methods, theoretical insights, and (preliminary or expected) results. For details see https://iclc11.ff.cuni.cz/calls-and-circulars/call-for-papers/.
PRELIMINARY PROGRAM
* Parallel Oral Sessions
* Poster Sessions
* Keynote Speakers:
Sabine De Knop (Université Saint-Louis, Bruxelles, Belgium)
Volker Gast (Friedrich-Schiller-University, Jena, Germany)
Dan Zeman (Charles University, Prague)
* Panel Discussion
IMPORTANT DATES
24.02.2025: Deadline for abstract submission
26.05.2025: Notification of acceptance
02.06.2025: Registration opens
16.06.2025: Deadline for revised abstract submission
30.06.2025: Last day for early bird registration
01.09.2025: Online registration closes
16.09.2025: Arrival, Registration, Get-together
17–19.09.2025: Conference
ORGANIZING COMMITTEE
* Mirjam Fried (chair) 1)
* Viktor Elšík 1)
* Jana Kocková 2)
* Michal Křen 1)
* Olga Nádvorníková 1)
* Alexandr Rosen 1)
1) Charles University, Faculty of Arts
2) Czech Academy of Sciences, Institute of Slavonic Studies
PROGRAM COMMITTEE: tba
CONTACT INFORMATION
Website: https://iclc11.ff.cuni.cz/
Email: iclc11(a)ff.cuni.cz
Non thematic issue of the TAL journal: 2025 Volume 66-1
http://tal-66-1.sciencesconf.org/
Editors: Maxime Amblard, Cécile Fabre, Benoit Favre and Sophie Rosset
The call for volume 66-1 is open until December 31, 2024.
NEW since 2023: Non-thematic issues of the Automatic Language Processing journal become "on the fly". Each paper in issue 66-1 will be evaluated as soon as it is submitted and will be published, subject to its acceptance, within an indicative period of six months after its submission.
THEMES
The journal Automatic Language Processing has an open call for papers. Submissions may concern theoretical and experimental contributions on all aspects of written, spoken, and signed language processing and computational linguistics, both theoretical and experimental, for example:
Computational models of language
Linguistic resources
Statistical learning and modeling
Intermodality and multimodality
Language multiplicity and diversity
Semantics and comprehension
Information access and text mining
Language production and processing/generation/synthesis
Evaluation
Explicability and reproducibility
NLP in interaction with other disciplines, digital humanities
This list is indicative. On all topics, it is essential that the aspects related to natural language processing are emphasized.
We also welcome position papers and survey papers.
LANGUAGE
Manuscripts may be submitted in English or French.
THE TAL JOURNAL
TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, https://www.atala.org/revuetal) since 1959. TAL has an electronic mode of publication with immediate free access to published articles.
SCHEDULE
Submission deadline: on the fly until December 31, 2024
Notification to the authors after first review: two months after submission
Notification to the authors after second review: two months after the first review
Publication : two months after the second review
FORMAT SUBMISSION
Papers must be between 20 and 25 pages long, including references and appendices (with no possible derogation on the length).
TAL is a double-blind review journal: it is thus necessary to anonymise the manuscript and the name of the pdf file. Self-references that reveal the author's identity must be avoided.
Style sheets are available for download on the Web site of the journal.
More information on: http://tal-66-1.sciencesconf.org/
*Apologies for cross-posting*
Dear colleague,
We cordially invite you to participate in the 34th Meeting of Computational Linguistics in The Netherlands (CLIN34) which takes place in Leiden on Friday 30 August 2024.
Besides a large and diverse programme of posters and oral presentations, we are happy to report that CLIN34 will have two keynote talks by:
* Diana Maynard, Sheffield University
* Dominique Blok and Erik de Graaf, TNO
If you wish to participate, please register via the conference website: clin34.leidenuniv.nl<http://clin34.leidenuniv.nl/>
The programme can also be found at: clin34.leidenuniv.nl/program/<https://clin34.leidenuniv.nl/program/>
We hope to see you in Leiden in August!
The CLIN34 organizers
Leiden University
18th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA
WITH SHARED TASK ON MULTILINGUAL TERMINOLOGY EXTRACTION
FROM COMPARABLE CORPORA
Co-located with COLING 2025 (Abu Dhabi)
Paper submission deadline: 30 November, 2024
Workshop website: https://comparable.lisn.upsaclay.fr/bucc2025/
COLING website: https://coling2025.org/
Keynote speaker: Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi
**************************************************************
* Motivation
In the language engineering and linguistics communities, research in
comparable corpora has been motivated by two main reasons. In language
engineering, on the one hand, it is chiefly motivated by the need to
use comparable corpora as training data for statistical NLP
applications such as statistical and neural machine translation or
cross-lingual retrieval. In linguistics, on the other hand, comparable
corpora are of interest because they enable cross-language discoveries
and comparisons. It is generally accepted in both communities that
comparable corpora consist of documents that are comparable in content
and form in various degrees and dimensions across several
languages. Parallel corpora are on the one end of this spectrum, and
unrelated corpora are on the other.
In recent years, the use of comparable corpora for pre-training Large
Language Models (LLMs) has led to their impressive multilingual and
cross-lingual abilities, which are relevant to a range of applications,
including Information Retrieval, Machine Translation, Cross-lingual text
classification, etc. The linguistic definitions and observations related
to comparable corpora can improve methods to mine such corpora or
to improve cross-lingual transfer of LLMs. Therefore, it is of great interest
to bring together builders and users of such corpora.
* Shared Task
This year we will run a shared task aimed at detecting translations of
terms via comparable corpora. Please see the website for details: https://comparable.limsi.fr/bucc2025/bucc2025-task.html
* Topics
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and
pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide
for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words,
multi-word expressions, proper names, named entities, sentences, and
paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from
comparable corpora
- Induction of multilingual word classes from comparable corpora
Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive
linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic
linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
* Workshop Organizers
- Serge Sharoff (University of Leeds)
- Ayla Rigouts Terryn (Université de Montréal (UdeM), Mila)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay)
- Reinhard Rapp (University of Mainz, Germany)
* Program Committee
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences,
Iran)
- Eleftherios Avramidis (DFKI, Germany)
- Gabriel Bernier-Colborne (National Research Council, Canada)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Alex Fraser (University of Munich, Germany)
- Natalia Grabar (University of Lille, France)
- Amal Haddad Haddad (Universidad de Granada, Spain)
- Amir Hazem (University of Tokyo, Japan)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan).
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (RWS, USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Pranaydeep Singh (University of Gent, Belgium)
- Richard Sproat (Google, USA)
- Marko Tadić (University of Zagreb, Croatia)
- François Yvon (Sorbonne Université, France)
We are recruiting PhD researchers for the UKRI/RAi UK Keystone project
AdSoLve on Addressing Sociotechnical Limitations of LLMs:
https://adsolve.github.io/
Up to four funded positions are available in a joint collaboration between
Queen Mary University of London (QMUL) and the Imperial College London CDT
in healthcare AI - a great opportunity to work with leading academics in
NLP, AI, healthcare, and responsible AI. AdSoLve offers collaborations
across 4 universities, a large consortium and a network of over 21
non-academic partners. QMUL has one of the UK's leading NLP research
groups, with 8 core faculty and a group of c.40 researchers.
APPLICATION DEADLINE 28th July 2024
Interviews 5th & 6th September 2024
For details see:
https://www.findaphd.com/phds/programme/phd-opportunities-in-addressing-soc…https://adsolve.github.io/assets/other/phd_advert_QMUL.pdf
--
Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/
Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/
Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/
School of Electronic Engineering and Computer Science
Queen Mary University of London, London E1 4NS, UK
*My working days for QMUL are **Tuesday-Thursday**; responses to mail on
other days may be delayed.*
Dear Corpora-list,
We are advertising a post-doctoral position in ML/XAI : 18 month at IMT
Mines Alès (south of France), or IMT Business School, Evry (near Paris)
Subject: Evaluation of the impact of XAI techniques on Human-Machine
collaboration
Context: ENFIELD project, Horizon-funded European AI Network of
Excellence on adaptive, sustainable, human-centered and trustworthy AI.
Objectives :
Evaluate the impact of XAI methods on Human-Machine collaboration
through the study of :
Performance of the human operator in performing a task, in different
contexts: alone, with the help of a predictive model for which decisions
will be explained/not explained, with the help of an XAI technique,
Types of human-machine collaboration (e.g. delegation, substitution,
mediation), Potential biases induced by XAI techniques.
A focus will be made on specific contexts of study (e.g., image
classification or NLP tasks, XAI techniques based on local
interpretability using attribution methods).
You will contribute to:
Defining the study contexts (e.g. games, image classification) and test
protocols to be considered.
Selecting and implementing predictive models and XAI techniques.
Set up the tools needed to carry out the experiments covered by the
study protocols, e.g. development of simple games, decision interfaces.
Implement the above-mentioned protocols on cohorts of human operators.
Evaluate and promote the results obtained.
Deadline for applications: 20/09/2024
Desired start date: 01/11/2024
Application and additional info:
https://institutminestelecom.recruitee.com/o/post-doctorant-post-doctorante…
Contacts :
Sébastien Harispe, Associate Professor
sebastien.harispe(a)mines-ales.fr
Nicolas Soulié, Associate Professor
nicolas.soulie(a)imt-bs.eu
Best regards,
--
Andon Tchechmedjiev, PhD. Associate Professor of Artificial Intelligence
and Computer Engineering at EuroMov Digital Health in Motion, IMT Mines
Alès. Taxonomy and Semantics of Movement (SemTaxM) co-lead, Learning and
Complexity group member. Research expertise: Deep Learning, Knowledge
Engineering, Computational Linguistics and Semantics, Biomedical
Informatics, Neuroengineering and Human Movement Processing
Postdoctoral Researcher – Defining Authentic Inclusive Communication
Insight SFI Research Centre for Data Analytics
Data Science Institute Ref. No. 010548
JOB ADVERTISEMENT
Applications are invited from suitably qualified candidates for a
full-time, fixed term position as a Postdoctoral Researcher with Data
Science Institute <https://www.universityofgalway.ie/dsi/>at the
University of Galway, Ireland.
This position is funded by Science Foundation of Ireland and is available
from 1st October 2024 to contract end date of 30th September 2025.
Salary: Postdoctoral salary scale €44,346 – €56,764 per annum per annum,
(subject to the project’s funding limitations), and pro rata for shorter
and/or part-time contracts.
Closing date for receipt of applications is 17:00 (Irish Time) on 5th
Aug2024. It will not be possible to consider applications received after
the closing date.
ELIGIBILITY REQUIREMENTS
Essential Requirements:
- PhD in Natural Language Processing (NLP) or Linguistics
- Published at top conferences in the NLP field or in high impact factor
- Excellent understanding of experimental design and scientific
methodologies
- Strong command of oral and written English
- Good programming skills
Desirable Requirements:
- Strong knowledge of NLP equality, diversity, and inclusion
- Experience engaging in research collaborations with industry
- Experience in writing grant proposals
- Experience of working in national and/or EU research projects
To apply: Jobs – University of Galway.
<https://www.universityofgalway.ie/about-us/jobs/> Applications must be
submitted
How to apply guide
<https://www.universityofgalway.ie/human-resources/recruitment-and-selection…>
- For informal enquiries, please contact Bharathi Raja Chakravarthi
bharathi.raja(a)universityofgalway.ie
<bharathi.raja(a)universityofgalway.ie>and cc Dr Meghann L. Drury-Grogan
Meghann.Drury- <Meghann.Drury-Grogan(a)atu.ie> Grogan(a)atu.ie
<Meghann.Drury-Grogan(a)atu.ie>
- University’s Strategic Plan
<https://www.universityofgalway.ie/strategy2025/>
- Working in Research at University of Galway
<https://www.universityofgalway.ie/our-research/>
- Moving to Ireland (Euraxess) <https://www.euraxess.ie/>
- Applicant Information
<https://www.universityofgalway.ie/human-resources/recruitment-and-selection…>
- We reserve the right to re-advertise or extend the closing date for
this
- University of Galway is an equal opportunities
- All positions are recruited in line with Open, Transparent, Merit
(OTM) and Competency based
with regards,
Dr. Bharathi Raja Chakravarthi,
Assistant Professor / Lecturer-above-the-bar
School of Computer Science, University of Galway, Ireland
Insight SFI Research Centre for Data Analytics, Data Science Institute,
University of Galway, Ireland
E-mail: bharathiraja.akr(a)gmail.com , bharathi.raja(a)universityofgalway.ie
<bharathiraja.asokachakravarthi(a)universityofgalway.ie>
Google Scholar: https://scholar.google.com/citations?user=irCl028AAAAJ&hl=en
Website:
https://www.universityofgalway.ie/our-research/people/computer-science/bhar…
<https://www.universityofgalway.ie/our-research/people/computer-science/bhar…>
Dear all,
LIACS currently has a vacancy for two assistant professor positions, which might be of interest to some people on this list.
Here’s the beginning of the vacancy:
"The Faculty of Science, Leiden Institute of Advanced Computer Science (LIACS), is seeking candidates for two Assistant Professors (0.8-1.0 FTE), one in generative AI and another in Human-centered AI. We seek to appoint an expert in the research area of Generative AI with focus on software systems and engineering (code generation, bug detection and repair, refactoring, and optimization but also at a larger scale such as architecture reconstruction and impact analysis for changes), prompt engineering (for content creation and data analysis), or diffusion models (for transforming the creation of high-fidelity data, such as images and simulations). Additionally, we seek to appoint an expert in Human-centered AI with focus on the designing of AI systems that prioritize human needs, usability, and collaboration, and/or on the involvement of humans in the training and refining processes (interactive machine learning)."
Here’s the full vacancy: https://www.universiteitleiden.nl/en/vacancies/2024/q3/150312-assistant-pro…
Best,
dr. Gijs Wijnholds
Assistant Professor in Natural Language Processing
Text Mining and Retrieval Group<https://tmr.liacs.nl/>
Leiden Institute of Advanced Computer Science
https://gijswijnholds.github.io
Dear all,
the QE shared task 2024 is ON!
You can now submit and test your quality estimation system(s) on a set of different languages and tasks: to predict translation quality at sentence level, to detect error spans, or even to correct translations!
For information on how to access the test data and the submission platforms, visit the shared task's webpage:
https://www2.statmt.org/wmt24/qe-task.html
Deadline to participate is July 31 (AoE).
Looking forward to receiving your predictions!
--
Best wishes,
on behalf of the organisers.
Dear all,
we are happy to invite you to participate in the Shared Task on Quality Estimation at WMT'24.
The details of the task can be found at: https://www2.statmt.org/wmt24/qe-task.html
New this year:
* We introduce a new language pair (zero-shot): English-Spanish
* Continuing from the previous edition, we will also analyse the robustness of submitted QE systems to a set of different phenomena which will span from hallucinations and biases to localized errors, which can significantly impact real-world applications.
* We also introduce a new task, seeking not only to detect but also to correct errors: Quality-aware Automatic Post-Editing! We invite participants to submit systems capable of automatically generating QE predictions for machine-translated text and the corresponding output corrections.
2024 QE Tasks:
Task 1 -- Sentence-level quality estimation
This task follows the same format as last year but with fresh test sets and a new language pair: English-Spanish. We will test the following language pairs:
* English to German (MQM)
* English to Spanish (MQM)
* English to Hindi (MQM & DA)
* English to Gujarati (DA)
* English to Telugu (DA)
* English to Tamil (DA)
More details: https://www2.statmt.org/wmt24/qe-subtask1.html
Task 2 -- Fine-grained error span detection
Sequence labelling task: predict the error spans in each translation and the associated error severity: Major or Minor.
We will test the following language pairs:
* English to German (MQM)
* English to Spanish (MQM)
* English to Hindi (MQM)
More details: https://www2.statmt.org/wmt24/qe-subtask2.html
Task 3 -- Quality-aware Automatic Post-editing
We expect submissions of post edits correcting detected error spans of the original translation. Although the task is focused on quality-informed APE, we also allow participants to submit APE output without QE predictions to understand the impact of their QE system. Submissions w/o QE predictions will also be considered official.
We will test the following language pairs:
* English to Hindi
* English to Tamil
More details: https://www2.statmt.org/wmt24/qe-subtask3.html
Important dates:
1. Test sets will be released on July 15th.
2. Participants can submit their systems by July 23rd on codalab.
3. System paper submissions are due by 20th August [aligned with WMT deadlines].
Note: Like last year, we aligned with the General MT and Metrics shared tasks to facilitate cross-submission on the common language pairs: English-German, English-Spanish, and English-Hindi (MQM).
We look forward to your submissions and feel free to contact us if you have any more questions!
Best wishes,
on behalf of the organisers.
First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS 2024)
Lancaster, UK, 29-30 July 2024
Call for Participation
We are pleased to share the NLPAICS 2024 conference programme, which you can view by clicking here - https://nlpaics.com/programme-2/.
To register, please visit https://nlpaics.com/registration/.
We very much hope to welcome you to NLPAICS 2024 at Lancaster!
The conference
Recent advances in Natural Language Processing (NLP), Deep Learning and Large Language Models (LLMs) have resulted in improved performance of applications. In particular, there has been a growing interest in employing AI methods in different Cyber Security applications.
In today's digital world, Cyber Security has emerged as a heightened priority for both individual users and organisations. As the volume of online information grows exponentially, traditional security approaches often struggle to identify and prevent evolving security threats. The inadequacy of conventional security frameworks highlights the need for innovative solutions that can effectively navigate the complex digital landscape for ensuring robust security. NLP and AI in Cyber Security have vast potential to significantly enhance threat detection and mitigation by fostering the development of advanced security systems for autonomous identification, assessment, and response to security threats in real-time. Recognising this challenge and the capabilities of NLP and AI approaches to fortify Cyber Security systems, the First International Conference on Natural Language Processing (NLP) and Artificial Intelligence (AI) for Cyber Security (NLPAICS’2024) serves as a gathering place for researchers in NLP and AI methods for Cyber Security. We invite contributions that present the latest NLP and AI solutions for mitigating risks in processing digital information.
Venue
The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS’2024) will take place at Lancaster University and is organised by the Lancaster University UCREL NLP research group.
Keynote speakers
We are delighted to announce the NLPAICS’2024 keynote speakers
- Iva Gumnishka (Humans in the Loop)
- Sevil Şen (Hacettepe University)
- Paolo Rosso (Universitat Politècnica de València)
- Jacques Klein (University of Luxembourg)
Sponsors
We are proud to announce the conference sponsors:
CodeAgent – Collaborative Agents for Software Engineering
Further information and contact details
The conference website is https://nlpaics.com/ and will be updated on a regular basis. The conference updates will also be available on social media (X - https://x.com/nlpaics, LinkedIn - https://linkedin.com/company/nlpaics/ )
Regards
Tharindu Ranasinghe
Second Call for Papers
NLP for Positive Impact Workshop
Miami, USA
November 15 or 16, 2024
(co-located with EMNLP 2024 <https://2024.emnlp.org/>)
https://sites.google.com/view/nlp4positiveimpact
*Submission*
Direct submission via ARR*: *link
<https://openreview.net/group?id=EMNLP/2024/Workshop/NLP4PI_Direct_Submission>
Deadline: August, 15th
For papers submitted to June (or earlier) ARR cycle: Commitment deadline to
the Workshop: August 20, 2024 Commit to the workshop: via this link
<https://openreview.net/group?id=EMNLP/2024/Workshop/NLP4PI_ARR_Commitment>
Notification of Acceptance: September 20, 2024
Camera-Ready Papers Due: October 3, 2024
Workshop Date: either November 15 or 16
All deadlines are 11:59 PM (Anywhere on Earth
<https://www.timeanddate.com/time/zones/aoe>)
*Submission Information*
We are using the EMNLP Submission Guidelines
<https://2024.emnlp.org/calls/main_conference_papers/#paper-submission-detai…>
for the workshop. Authors are invited to submit a full paper of up to 8
pages of content with unlimited pages for references. We also invite short
papers of up to 4 pages of content, including unlimited pages for
references. Final camera ready versions of accepted papers will be given an
additional page of content to address reviewer comments.
Summary
The widespread and indispensable use of language-oriented AI systems
presents new opportunities to have a positive social impact. NLP
technologies are starting to mature to the point where they could have an
even broader impact, supporting the UN sustainability goals
<https://sdgs.un.org/goals> by helping to address big problems such as
poverty, hunger, healthcare, education, inequality, COVID-19 and climate
change.
Our workshop aims to promote innovative NLP research that will positively
impact society, focusing on responsible methods and new applications. We
will encourage submissions from areas including (but not limited to):
-
Work that grounds the impact of NLP: Beyond developing a
better-performing NLP model, can we make a step further to connect the
model to actual social impact? Example directions include: case studies
of real-world deployments; or improving the deployment and maintenance of
NLP models in practice.
-
In addition to commonly recognized NLP for social good areas such as NLP
for healthcare, mental well-being, and many others, we also call for work
on neglected areas such as NLP for poverty, hunger, energy, climate change,
among others.
-
We also highly value work that builds on interdisciplinary expertise,
and encourages submissions of case studies or worked examples that seek to
expand the social impact of NLP through collaboration with other fields
(e.g., philanthropy, social science, political science, economics, HCI).
Special theme: This year, we would like to encourage submission providing
solutions or concepts to address digital violence. Digital violence
encompasses various forms of violence that utilize digital tools and media,
such as cell phones, apps, internet applications, and emails, and occurs
within digital spaces like online portals and social platforms. We aim to
explore how modern NLP and AI technologies can contribute to enhancing
safety in digital environments. At the workshop, you will have an
opportunity to connect and share your results with NGO representatives from
this field!
Submission types:
Thus, we would appreciate to see various types of works on this (but not
only) topic like:
-
automatic identification of various social needs, their corresponding
sizes and demographics of people affected;
-
position papers to propose promising new tasks or directions that the
field should pursue;
-
literature review of a subfield;
-
philosophical discussions of what how positive impact can be achieved
with NLP methods;
-
approaches to interdisciplinary collaboration;
-
user study designs, user surveys;
-
ethical considerations, and other related topics.
Note that we want submissions to our workshop to have some distinctive
features of social good implications, beyond a general paper on NLP. We
will require each submission to discuss the ethical and societal
implications of their work, and encourage a discussion of what "positive
impact" means in the work.
Organizers
Zhijing Jin (Max Planck Institute & ETH Zurich)
Daryna Dementieva (Technical University of Munich)
Giorgio Piatti (ETH Zürich)
Steven Wilson (Oakland University)
Oana Ignat (Santa Clara University)
Jieyu Zhao (University of Maryland, College Park)
Joel Tetreault (Dataminr, Inc.)
Rada Michaela (University of Michigan)
Contact Email
-
nlp4pi.workshop(a)gmail.com
[With apologies for cross-posting]
We are excited to announce the 22nd International Workshop on Treebanks and Linguistic Theories (TLT 2024), which will bring together developers and users of linguistically annotated natural language corpora. The workshop is endorsed by ACL SIGPARSE and will be hosted by Universität Hamburg in Germany on December 5th-6th, 2024.
-----------------------------
VENUE
-----------------------------
TLT 2024 will take place at the guest house of Universität Hamburg. In order to support rich discussions and networking, TLT 2024 will primarily be an in-person event; we will, however, accommodate a limited number of live / synchronous remote presentations, prioritizing those with circumstances that prevent travel.
Universität Hamburg and its guest house are conveniently located near the Dammtor train station / metro station Stephansplatz which are well-connected with many parts of the city and beyond, providing an easy commute for attendees.
Hamburg is a vibrant city known for its rich maritime history as one of the leading cities in the medieval Hanseatic League, as well as its modern cultural diversity, including events at the world-famous Elbphilharmonie Concert Hall. The city is easily accessible by train or plane (Hamburg Airport (HAM); about 1 to 1.5 hours train ride: Bremen Airport (BRE) and Hannover Airport (HAJ)).
-----------------------------
SUBMISSION INFORMATION
-----------------------------
TLT addresses all aspects of treebank design, development, and use. As ‘treebanks’ we consider any pairing of natural language data (spoken, signed, or written) with annotations of linguistic structure at various levels of analysis, including, e.g., morpho-phonology, syntax, semantics, and discourse. Annotations can take any form (including trees or general graphs), but they should be encoded in a way that enables computational processing. Reflections on the design of linguistic annotations, methodology studies, resource announcements or updates, annotation or conversion tool development, or reports on treebank usage including probing the leakage of treebanks into large language models are but some examples of the types of papers we anticipate for TLT.
Papers should describe original work; they should emphasize completed work rather than intended work, and should indicate clearly the state of completion of the reported results. Submissions will be judged on correctness, originality, technical strength, significance and relevance to the conference, and interest to the attendees.
We invite paper submissions in two distinct tracks:
* regular papers on substantial and original research, including empirical evaluation results, where appropriate;
* short papers on smaller, focused contributions, work in progress, negative results, surveys, or opinion pieces.
Submissions (in both tracks) may either be archival—in case of unpublished work—or non-archival, based on the wish of the authors. All archival papers accepted for presentation at the workshop will be included in the TLT 2024 proceedings volume, which will be part of the ACL Anthology. Non-archival papers must have been published or accepted for publication at another CL conference.
Long papers may consist of up to 8 pages of content (excluding references and appendices). Short papers may consist of up to 4 pages of content (excluding references and appendices). Accepted papers will be given an additional page to address reviewer comments.
All submissions should follow the two-column format and the ACL style guidelines. We strongly recommend the use of the LaTeX style files, OpenDocument, or Microsoft Word templates created for ACL: https://github.com/acl-org/acl-style-files
Submissions will be reviewed double-blind, and all full and short papers must be anonymous, i.e. not reveal author(s) on the title page or through self-references. So e.g., “We previously showed (Smith, 2020) …”, should be avoided. Instead, use citations such as “Smith (2020) previously showed …. Papers must be submitted digitally, in PDF, and uploaded through the on-line conference system (link forthcoming).
Submissions that violate these requirements will be rejected without review.
-----------------------------
IMPORTANT DATES
-----------------------------
* Long and short paper submission deadlines: August 15th, 2024
* Reviews Due: September 26th, 2024
* Notification of acceptance: October 6th, 2024
* Final version of papers due: November 6th, 2024
* TLT2024: December 5th-6th, 2024 in Hamburg
-----------------------------
TLT2024 WORKSHOP CHAIRS
-----------------------------
Daniel Dakota, Indiana University
Sandra Kübler, Indiana University
Heike Zinsmeister, Universität Hamburg
-----------------------------
TLT2024 COMMUNICATION CHAIR
-----------------------------
Sarah Jablotschkin, Universität Hamburg
Contact: tlt2024.gw(a)uni-hamburg.de
Website: https://www.korpuslab.uni-hamburg.de/en/tlt2024.html
---------------------------
Prof. Dr. Heike Zinsmeister (sie/ihr)
Linguistik des Deutschen / Korpuslinguistik
Universität Hamburg, Institut für Germanistik, Raum C7012
Von-Melle-Park 6, Postfach #15, D-20146 Hamburg
Tel.: 040 42838-7119
heike.zinsmeister(a)uni-hamburg.de
http://www.slm.uni-hamburg.de/germanistik/personen/zinsmeister.html
Friday, November 8 - Saturday, November 9
Brown Computer Science Department, Providence, RI
https://cs.brown.edu/people/in-memorium/eugene_charniak/
Brown University invites you to attend an academic memorial event to
commemorate the research and legacy of Eugene Charniak. Eugene, an ACL
Lifetime Achievement Award winner and ACL fellow, passed away in June
2023. His colleagues and students have organized a two-day workshop of
invited presentations of cutting-edge research with an emphasis on the
themes which defined Eugene's career: the legacy of classic statistical
NLP/ML, the sometimes-surprising effectiveness of simple baselines,
clever tricks for dealing with data sparsity such as self-training or
distant supervision, and unsupervised learning.
A full program will be posted later this summer. Mark Johnson will give
a keynote presentation, along with research talks by Regina Barzilay,
Michael Collins, Jason Eisner, Lillian Lee, Ani Nenkova, Ellie Pavlick,
Brian Roark, Chris Tanner and Byron Wallace. There will also be
opportunities to remember Eugene in a social setting, and a panel
discussion of the workshop's research themes.
The event will take place at the Brown Computer Science Department in
Providence, RI; attendees are responsible for finding their own
accommodations. Instructions for travel to Providence are available
here: https://cs.brown.edu/about/directions/. The program will begin at
9am on Friday the 8th, and conclude at 1:30pm on Saturday the 9th. All
members of the ACL community are welcome, whether you knew Eugene well
or not. Please mark your calendars now!
To stay in the loop about the event, please fill out this form:
https://docs.google.com/forms/d/e/1FAIpQLSe_7LZBSjP3Ur2XCTtsDtwnL_Jbxgh5Wfi…
If you have questions about the event, contact the organizers, Micha
Elsner (melsner0(a)gmail.com) and David McClosky
(david.mcclosky(a)gmail.com).
Hi all,
I have a postdoc job to share, see below. Thank you for sharing the offer in potentially interested circles.
Best,
Peeter
----
Post doc job(s) available for text and data mining and social history / sustainability transitions.
The project “The Crisis and Transformation of Industrial Modernity, 1900-2055”, is a five-year project at the University of Tartu. It is based on the Deep Transitions framework which theorizes industrialization as a long-term co-evolution of various socio-technical systems.
Website: https://www.deeptransitions.ut.ee/.
Job description: https://www.deeptransitions.ut.ee/jobs/
Job call PDF: http://tiny.cc/dt_postdoc_call_2024<http://tiny.cc/dt_postdoc_call_2024>
1,5 FTE available for jobs. Some flexibility in workload and work location based on the exact focus. Contact laur.kanger(a)ut.ee<mailto:laur.kanger@ut.ee> for more details.
** Apologies for cross-posting **
Dear Colleagues,
This is the last call for tutorial proposals for COLING 2025.
Due: July, 31, 2024
CFT:
The 2025 International Conference on Computational Linguistics (COLING 2025) invites proposals for tutorials to be held in conjunction with the conference. We seek proposals in all areas of natural language processing and computation, language resources (LRs) and evaluation, including spoken language, sign language, and multimodal interaction.
We invite proposals for three types of tutorials, and we especially encourage submissions from early-career researchers:
Cutting-edge: tutorials that cover advances in newly emerging areas. The tutorials are expected to give a brief introduction to the topic, but participants are assumed to have some prior knowledge of the topic. The focus of the class will be on discussing the most recent developments in the field, and it will spend a considerable amount of time pointing out open research questions and important novel research directions.
Introductory to computational linguistics/NLP topics: tutorials that provide introductions to topics that are established in the COLING communities. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire sufficient understanding of the topic to understand the most recent research in the field.
Introduction to Key Concepts in Linguistics including Semantics, Syntax, Psycholinguistics, Neurolinguistics, and Sociolinguistics: tutorials that provide introductions to topics that are established or emerging in areas adjacent to CL/NLP. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire a sufficient understanding of the topic to understand the most recent research in the field and the relevance for the CL/NLP domains.
Each of these types of tutorials can either be half-day (4h long including a coffee break (30m long)) or full-day (8h long including two coffee breaks (1h long in total) but excluding a lunch break).
In all cases, the aim of a tutorial is primarily to help understand a scientific problem, its tractability, and its theoretical and practical implications. Presentations of particular technological solutions or systems are welcome, provided that they serve as illustrations of broader scientific considerations. None of the tutorial types are expected to be “self-invited” long talks – the content should be a good balance between research from multiple groups and perspectives, not only fromof the teachers of the tutorial.
The tutorials will be held at COLING 2025 in Abu Dhabi, UAE, on 19 and 20 January, 2025.
Important Dates
All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).
Proposal submission due July 31, 2024
Notification of acceptance August 31, 2024
COLIING2025 tutorials January 19-20, 2025
COLING2025 conference January 21-24, 2025
Diversity and Inclusion
We particularly encourage submissions of underrepresented groups in computational linguistics, researchers from any demographic or geographic minority, with disabilities, or others. In the evaluation of the proposal, we will take these aspects into account to create a varied and balanced set of tutorials.
This includes several aspects of diversity, namely (1) how the topic of the tutorial contributes to improved diversity and increased fairness in the field, (2) if the topic is particularly relevant for a specific underrepresented group of potential participants, (3), if the presenters are from an underrepresented group.
Submission Details
They should contain:
A title that helps the potential attendees to understand what the tutorial will be about.
An abstract that summarizes the topics, goals, target audience, and type (see above) of the tutorial (this abstract will also be on the LREC-COLING website).
A section called “Introduction” that explains the topic and summarizes the starting point and relevance for our community and in general.
A section called “Target Audience” that explains for whom the tutorial will be developed and what the expected prior knowledge is. Clearly specify what attendees should know and be able to practically do to get the most out of your tutorial. Examples of what to specify include prior mathematical knowledge, knowledge of specific modeling approaches and methods, programming skills, or adjacent areas like computer vision. Also specify the number of expected participants.
A section called “Outline” in which the various topics are explained. This can be a list of bullet points or a set of paragraphs explaining the content. Explain what you intend and how long the tutorial will be.
A section called “Diversity Considerations”, discussing each of the three aspects of diversity mentioned above or others.
A section called “Reading List”: What are introductory papers or books that potential attendees can read to get a first impression of the tutorial content? What do you expect them to have read before attending? What does provide further information beyond the content of the tutorial?
A section called “Presenters” in which each tutorial presenter is briefly introduced in one paragraph, including their research interests, their areas of expertise for the tutorial topic, and their experience in teaching a diverse and international audience.
A section called “Other Information” which should include information on how many people are expected to participate and how you came to this estimate. You can also explain any other aspects that you find important, including special equipment that you would need.
A section called “Ethics Statement” which discusses ethical considerations related to the topics of the tutorial.
The proposals should be submitted no later than 31 July, 2024, 11:59 PM Samoa Standard Time (SST) (UTC/GMT-11, “anywhere on Earth”).
Submission is electronic. Please submit the proposals using the START system at this URL: https://softconf.com/coling2025/tutorialsCL25 <https://softconf.com/coling2025/tutorialsCL25>
Please note that tutorials should either be 100% in-person or 100% virtual; hybrid formats will not be allowed. For in-person tutorials, at least one tutorial organiser should be physically present to run the tutorial at COLING.
Evaluation Criteria
The tutorial proposals will be evaluated according to their originality and impact, the expected interest level of participants, as well as the quality of the organizing team and Program Committee and their contribution to the diversity of the conference.
Each tutorial will be evaluated regarding its clarity and preparedness, novelty or timely character of the topic, the instructor’s experience, the audience interest, and the potential to increase diversity in our community.
Instructor Responsibilities
Accepted tutorial presenters will be notified by the date mentioned above. They must then provide abstracts of their tutorials for inclusion in the conference registration material by the specific deadlines. The abstract needs to be provided in ASCII format. The summary will be submitted in PDF format and can be updated from the version submitted for review. The instructors will make their material available in an appropriate way, for instance, by setting up a website. They will be invited to submit their slides to the ACL Anthology.
Tutorial Chairs
Email: coling25tutorialchairs(a)gmail.com
The tutorial chairs are:
Djamé Seddah, Senior Researcher, INRIA, Paris, Frace (on leave from Sorbonne University)
Shaonan Wang, Associate Professor at the Institute of Automation, Chinese Academy of Sciences, Beijing, China