*GenBench: The second workshop on generalisation (benchmarking) in NLP*
*Workshop description*The ability to generalise well is often mentioned as
one of the primary desiderata for models of natural language processing
(NLP).
Yet, there are still many open questions related to what it means for an
NLP model to generalise well, and how generalisation should be evaluated.
LLMs, trained on gigantic training corpora that are – at best – hard to
analyse or not publicly available at all, bring a new set of challenges to
the topic.
The second GenBench workshop aims to serve as a cornerstone to catalyse
research on generalisation in the NLP community.
The workshop aims to bring together different expert communities to discuss
challenging questions relating to generalisation in NLP, crowd-source
challenging generalisation benchmarks for LLMs, and make progress on open
questions related to generalisation.
Topics of interest include, but are not limited to:
- Opinion or position papers about generalisation and how it should be
evaluated;
- Analyses of how existing or new models generalise;
- Empirical studies that propose new paradigms to evaluate
generalisation;
- Meta-analyses that compare how results from different generalisation
studies compare;
- Meta-analyses that study how different types of generalisation are
related;
- Papers that discuss how generalisation of LLMs can be evaluated;
- Papers that discuss why generalisation is (not) important in the era
of LLMs;
- Studies on the relationship between generalisation and fairness or
robustness.
The second GenBench workshop on generalisation (benchmarking) in NLP will
be co-located with EMNLP 2024.
*Submission types*
We call for two types of submissions: regular workshop submissions and
collaborative benchmarking task submissions.
The latter will consist of a data/task artefact and a companion paper
motivating and evaluating the submission.
In both cases, we accept archival papers and extended abstracts.
*1. Regular workshop submissions*
Regular workshop submissions present papers on the topic of generalisation
(see examples listed above).
Regular workshop papers may be submitted as an archival paper, when they
report on completed, original and unpublished research, or as a shorter
extended abstract, otherwise.
More details on this category can be found below.
If you are unsure whether a specific topic is well-suited for submission,
feel free to reach out to the organisers of the workshop at
genbench(a)googlegroups.com.
*2. Collaborative Benchmarking Task (CBT) submissions*
The goal of this year's CBT is to generate versions of existing evaluation
datasets for LLMs which, given a particular training corpus, have a larger
distribution shift than the original test set, or – in other words –
evaluate generalisation to a stronger degree than the original dataset.
For this particular challenge, we focus on three training corpora: C4,
RedPajama-Data-1T, and Dolma.
All three corpora are publicly available, and they can be searched via the
What's in My Big Data API (https://github.com/allenai/wimbd).
We will focus on three popular evaluation datasets: MMLU, HumanEval, and
SiQA.
Submitters to the CBT are asked to design a way to assess distribution
shift for one or more of these evaluation datasets, given particular
features of the training corpus, and then generate one or more versions of
the dataset that have a larger distribution shift according to this method.
Newly generated sets do not have to have the same size as the original test
set, but should have at least 200 examples.
Practically speaking, CBT submissions consist of:
1. the data/task artefact, submitted through
https://github.com/GenBench/genbench_cbt
2. a paper describing the dataset and its method of construction,
submitted through
https://openreview.net/group?id=GenBench.org/2024/Workshop
We accept submissions that consider only one pretraining dataset and
evaluation dataset, but encourage submitters to apply their suggested
protocols to both pretraining datasets.
We also suggest that submitters include model results for models trained on
these datasets.
Suggestions are provided on the CBT website: https://genbench.org/cbt.
Given enough high-quality submissions, we aim to write a paper with the
combined results, to which submitters can be co-authors, if they wish so.
More detailed guidelines will be given on https://genbench.org/cbt.
*Archival vs extended abstract*
Archival papers are up to 8 pages excluding references and report on
completed, original and unpublished research.
They follow the requirements of regular EMNLP 2024 submissions.
Accepted papers will be published in the workshop proceedings and are
expected to be presented at the workshop.
The papers will undergo double-blind peer review and should thus be
anonymised.
Extended abstracts can be up to 2 pages excluding references, and may
report on work in progress or be cross-submissions of work that has already
appeared in another venue.
Abstract titles will be posted on the workshop website, but will not be
included in the proceedings.
*Submission instructions*For both archival papers and extended abstracts,
we refer to the EMNLP 2024 website for paper templates and requirements.
Additional requirements for both regular workshop papers and collaborative
benchmarking task submissions can be found on our website.
All submissions can be submitted through OpenReview:
https://openreview.net/group?id=GenBench.org/2024/Workshop.
*Important dates*
- August 15, 2024: Paper submission deadline
- September 20, 2024: Notification deadline
- October 4, 2024: Camera-ready deadline
- November 15 or 16, 2024: Workshop
Note: all deadlines are 11:59 PM UTC-12:00. Check the website for final
updates to these deadlines (https://genbench.org/workshop).
*Preprints*
We do not have an anonymity deadline, preprints are allowed, both before
the submission deadline as well as after.
*Contact*
Email address: genbench(a)googlegroups.com
Website: https://genbench.org/workshop
*On behalf of the organisers*Dieuwke Hupkes
Verna Dankers
Khuyagbaatar Batsuren
Amirhossein Kazemnejad
Christos Christodoulopoulos
Mario Giulianelli
Ryan Cotterell
===============
===============
* We apologize if you receive multiple copies of this Tutorial program *
* For the online version of this program, visit: https://cikm2024.org/tutorials/
===============
CIKM 2024: 33rd ACM International Conference on Information and Knowledge Management
Boise, Idaho, USA
October 21–25, 2024
===============
The tutorial program of CIKM 2024 has been published. Tutorials are planned to take place on 21 October 2024.
Here you can find a summary of each accepted tutorial.
===============
Systems for Scalable Graph Analytics and Machine Learning
===============
Da Yan (Indiana University Bloomington), Lyuheng Yuan (Indiana University Bloomington), Akhlaque Ahmad (Indiana University Bloomington) and Saugat Adhikari (Indiana University Bloomington)
Graph-theoretic algorithms and graph machine learning models are essential tools for addressing many real-life problems, such as social network analysis and bioinformatics. To support large-scale graph analytics, graph-parallel systems have been actively developed for over one decade, such as Google’s Pregel and Spark’s GraphX, which (i) promote a think-like-a-vertex computing model and target (ii) iterative algorithms and (iii) those problems that output a value for each vertex. However, this model is too restricted for supporting the rich set of heterogeneous operations for graph analytics and machine learning that many real applications demand.
In recent years, two new trends emerge in graph-parallel systems research: (1) a novel think-like-a-task computing model that can efficiently support the various computationally expensive problems of subgraph search; and (2) scalable systems for learning graph neural networks. These systems effectively complement the diversity needs of graph-parallel tools that can flexibly work together in a comprehensive graph processing pipeline for real applications, with the capability of capturing structural features. This tutorial will provide an effective categorization of the recent systems in these two directions based on their computing models and adopted techniques, and will review the key design ideas of these systems.
===============
Fairness in Large Language Models: Recent Advances and Future
===============
Thang Viet Doan (Florida International University), Zichong Wang (Florida International University), Minh Nhat Nguyen (Florida International University) and Wenbin Zhang (Florida International University)
Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. In this tutorial, we give a systematic overview of recent advances in the existing literature concerning fair LLMs. Specifically, a series of real-world case studies serve as a brief introduction to LLMs, and then an analysis of bias causes based on their training process follows. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, current research challenges and open questions are discussed.
===============
Unifying Graph Neural Networks across Spatial and Spectral Domains
===============
Zhiqian Chen (Mississippi State University), Lei Zhang (Virginia Tech) and Liang Zhao (Emory University)
Over recent years, Graph Neural Networks (GNNs) have garnered significant attention. However, the proliferation of diverse GNN models, underpinned by various theoretical approaches, complicates model selection, as they are not readily comprehensible within a uniform framework. Early GNNs were implemented using spectral theory, while others were based on spatial theory. This divergence renders direct comparisons challenging. Moreover, the multitude of models within each domain further complicates evaluation.
In this half-day tutorial, we examine state-of-the-art GNNs and introduce a comprehensive framework bridging spatial and spectral domains, elucidating their interrelationship. This framework enhances our understanding of GNN operations. The tutorial explores key paradigms, such as spatial and spectral methods, through a synthesis of spectral graph theory and approximation theory. We provide an in-depth analysis of recent research developments, including emerging issues like over-smoothing, using well-established GNN models to illustrate our framework's universality.
===============
Tabular Data-centric AI: Challenges, Techniques and Future Perspectives
===============
Yanjie Fu (Arizona State University), Dongjie Wang (University of Kansas), Hui Xiong (Hong Kong University of Science and Technology (Guangzhou)) and Kunpeng Liu (Portland State University)
Tabular data is ubiquitous across various application domains such as biology, ecology, and material science. Tabular data-centric AI aims to enhance the predictive power of AI through better utilization of tabular data, improving its readiness at structural, predictive, interaction, and expression levels. This tutorial targets professionals in AI, machine learning, and data mining, as well as researchers from specific application areas. We will cover the settings, challenges, existing methods, and future directions of tabular data-centric AI. The tutorial includes a hands-on session to develop, evaluate, and visualize techniques in this emerging field, equipping attendees with a thorough understanding of its key challenges and techniques for integration into their research.
===============
Frontiers of Large Language Model-Based Agentic Systems
===============
Reshmi Ghosh (Microsoft), Jia He (Microsoft Corp.), Kabir Walia (Microsoft), Jieqiu Chen (Microsoft), Tushar Dhadiwal (Microsoft), April Hazel (Microsoft) and Chandra Inguva (Microsoft)
Large Language Models (LLMs) have recently demonstrated remarkable potential in achieving human-level intelligence, sparking a surge of interest in LLM-based autonomous agents. However, there
is a noticeable absence of a thorough guide that methodically compiles the latest methods for building LLM-agents, their assessment, and the associated challenges. As a pioneering initiative, this tutorial delves into the intricacies of constructing LLM-based agents, providing a systematic exploration of key components and recent innovations. We dissect agent design using an established taxonomy, focusing on essential keywords prevalent in agent-related framework discussions. Key components include profiling, perception, memory, planning, and action. We unravel the intricacies of each element, emphasizing state-of-the-art techniques. Beyond individual agents, we explore the extension from single-agent paradigms to multi-agent frameworks. Participants will gain insights into orchestrating collaborative intelligence within complex environments.
Additionally, we introduce and compare popular open-source frameworks for LLM-based agent development, enabling practitioners to choose the right tools for their projects. We discuss evaluation methodologies for assessing agent systems, addressing efficiency and safety concerns. We present a unified framework that consolidates existing work, making it a valuable resource for practitioners and researchers alike.
===============
Hands-On Introduction to Quantum Machine Learning
===============
Samuel Yen-Chi Chen (Wells Fargo) and Joongheon Kim (Korea University)
This tutorial offers a hands-on introduction into the captivating field of quantum machine learning (QML). Beginning with the bedrock of quantum information science (QIS)—including essential elements like qubits, single and multiple qubit gates, measurements, and entanglement—the session swiftly progresses to foundational QML concepts. Participants will explore parametrized or variational circuits, data encoding or embedding techniques, and quantum circuit design principles.
Delving deeper, attendees will examine various QML models, including the quantum support vector machine (QSVM), quantum feed-forward neural network (QNN), and quantum convolutional neural network (QCNN). Pushing boundaries, the tutorial delves into cutting-edge QML models such as quantum recurrent neural networks (QRNN) and quantum reinforcement learning (QRL), alongside privacy-preserving techniques like quantum federated machine learning, bolstered by concrete programming examples.
Throughout the tutorial, all topics and concepts are brought to life through practical demonstrations executed on a quantum computer simulator. Designed with novices in mind, the content caters to those eager to embark on their journey into QML. Attendees will also receive guidance on further reading materials, as well as software packages and frameworks to explore beyond the session.
===============
On the Use of Large Language Models for Table Tasks
===============
Yuyang Dong (NEC), Masafumi Oyamada (NEC), Chuan Xiao (Osaka University, Nagoya University) and Haochen Zhang (Osaka University)
The proliferation of LLMs has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a variety of table-related tasks, such as table understanding, text-to-SQL conversion, and tabular data preprocessing. It surveys LLM solutions to these tasks in five classes, categorized by their underpinning techniques: prompting, fine-tuning, RAG, agents, and multimodal methods. It discusses how LLMs offer innovative ways to interpret, augment, query, and cleanse tabular data, featuring academic contributions and their practical use in the industrial sector. It emphasizes the versatility and effectiveness of LLMs in handling complex table tasks, showcasing their ability to improve data quality, enhance analytical capabilities, and facilitate more intuitive data interactions. By surveying different approaches, this tutorial highlights the strengths of LLMs in enriching table tasks with more accuracy and usability, setting a foundation for future research and application in data science and AI-driven analytics.
===============
Data Quality-aware Graph Machine Learning
===============
Yu Wang (Vanderbilt University), Kaize Ding (Northwestern University), Xiaorui Liu (North Carolina State University), Jian Kang (University of Rochester), Ryan Rossi (Adobe Research) and Tyler Derr (Vanderbilt University)
Recent years have seen a significant shift in Artificial Intelligence from model-centric to data-centric approaches, highlighted by the success of large foundational models. Following this trend, despite numerous innovations in graph machine learning model design, graph-structured data often suffers from data quality issues, which jeopardizes the progress of Data-centric AI in graph-structured applications. Our proposed tutorial aims to address this gap by raising awareness about data quality issues within the graph machine-learning community. We provide an overview of existing issues, including topology, imbalance, bias, limited data, and abnormalities in graph data. Additionally, we highlight previous studies and recent developments in foundational graph models that focus on identifying, investigating, mitigating, and resolving these issues.
===============
Towards Efficient Temporal Graph Learning: Algorithms, Frameworks, and Tools
===============
Ruijie Wang (University of Illinois Urbana-Champaign), Wanyu Zhao (University of Illinois Urbana-Champaign), Dachun Sun (University of Illinois Urbana-Champaign), Charith Mendis (University of Illinois Urbana-Champaign) and Tarek Abdelzaher (University of Illinois Urbana-Champaign)
Temporal graphs capture dynamic node relations via temporal edges, finding extensive utility in wide domains where time-varying patterns are crucial. Temporal Graph Neural Networks (TGNNs) have gained significant attention for their effectiveness in representing temporal graphs. However, TGNNs still face significant efficiency challenges in real-world low-resource settings. First, from a data-efficiency standpoint, training TGNNs requires sufficient temporal edges and data labels, which is problematic in practical scenarios with limited data collection and annotation. Second, from a resource-efficiency perspective, TGNN training and inference are computationally demanding due to complex encoding operations, especially on large-scale temporal graphs. Minimizing resource consumption while preserving effectiveness is essential. Inspired by these efficiency challenges, this tutorial systematically introduces state-of-the-art data-efficient and resource-efficient TGNNs, focusing on algorithms, frameworks, and tools, and discusses promising yet under-explored research directions in efficient temporal graph learning. This tutorial aims to benefit researchers and practitioners in data mining, machine learning, and artificial intelligence.
===============
Landing Generative AI in Industrial Social and E-commerce Recsys
===============
Da Xu (LinkedIn), Danqing Zhang (Amazon), Lingling Zheng (Microsoft), Bo Yang (Amazon), Guangyu Yang (TikTok), Shuyuan Xu (TikTok) and Cindy Liang (LinkedIn)
Over the past two years, GAI has evolved rapidly, influencing various fields including social and e-commerce Recsys. Despite exciting advances, landing these innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems. Our tutorial begins with a brief overview of building industrial Recsys and GAI fundamentals, followed by the ongoing efforts and opportunities to enhance personalized recommendations with foundation models.
We then explore the integration of curation capabilities into Recsys, such as repurposing raw content, incorporating external knowledge, and generating personalized insights/explanations to foster transparency and trust. Next, the tutorial illustrates how AI agents can transform Recsys through interactive reasoning and action loops, shifting away from traditional passive feedback models. Finally, we shed insights on real-world solutions for human-AI alignment and responsible GAI practices.
A critical component of the tutorial is detailing the AI, Infrastructure, LLMOps, and Product roadmap (including the evaluation and responsible AI practices) derived from the production solutions in LinkedIn, Amazon, TikTok, and Microsoft. While GAI in Recsys is still in its early stages, this tutorial provides valuable insights and practical solutions for the Recsys and GAI communities.
===============
Transforming Digital Forensics with Large Language Models
===============
Eric Xu (University of Maryland, College Park), Wenbin Zhang (Florida International University) and Weifeng Xu (University of Baltimore)
In the pursuit of justice and accountability in the digital age, the integration of Large Language Models (LLMs) with digital forensics holds immense promise. This half-day tutorial provides a comprehensive exploration of the transformative potential of LLMs in automating digital investigations and uncovering hidden insights. Through a combination of real-world case studies, interactive exercises, and hands-on labs, participants will gain a deep understanding of how to harness LLMs for evidence analysis, entity identification, and knowledge graph reconstruction. By fostering a collaborative learning environment, this tutorial aims to empower professionals, researchers, and students with the skills and knowledge needed to drive innovation in digital forensics. As LLMs continue to revolutionize the field, this tutorial will have far-reaching implications for enhancing justice outcomes, promoting accountability, and shaping the future of digital investigations.
===============
Collecting and Analyzing Public Data from Mastodon
===============
Haris Bin Zia (Queen Mary University of London), Ignacio Castro (none) and Gareth Tyson (Hong Kong University of Science and Technology)
Understanding online behaviors, communities, and trends through social media analytics is becoming increasingly important. Recent changes in the accessibility of platforms like Twitter have made Mastodon a valuable alternative for researchers. In this tutorial, we will explore methods for collecting and analyzing public data from Mastodon, a decentralized micro-blogging social network. Participants will learn about the architecture of Mastodon, techniques and best practices for data collection, and various analytical methods to derive insights from the collected data. This session aims to equip researchers with the skills necessary to harness the potential of Mastodon data in computational social science and social data science research.
*Apologies for crossposting*
LLMs Beyond the Cutoff: 1st International Workshop on Computational Methods
Beyond the Temporal Borders of Training Data
https://llmsbeyondthecutoff2024.wordpress.com
Collocated with CIKM 2024
October 25, 2024 — Boise (Idaho), USA
* July 29, 2024: Paper submission deadline
* August 30, 2024: Paper acceptance notification
* September 15, 2024: Camera ready versions submission
* October 25, 2024: Workshop date
=== NEWS ===
* LLMs Beyond the CutOff will be published as a volume of Springer Nature’s
post-proceedings
* Submission via EasyChair:
https://easychair.org/conferences/?conf=llmsbeyondthecut0ff
* Springer guidelines for authors:
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…
SUMMARY
LLMs are trained on large amounts of web data that spread temporally up to
a specific moment in time. For instance, chatGPT’s LLM “knows” the world
before May 2023 with no real time access to information beyond this limit,
other than a browsing tool similar to a search engine enabling simple
lookup. However, in many scenarios, being able to analyze and reason with
novel emerging events and topics is crucial to face the challenges of
rapidly evolving landscapes of information.
The workshop provides an interdisciplinary forum for discussing the
temporal limitations of LLMs and proposing technical solutions of how to
apply and develop LLMs beyond their cutoff dates. We explore two prominent
scenarios, where contexts tend to evolve faster than the LLMs that are used
to analyze them: (1) journalism and (2) industry. In terms of (1) the goal
is to propose methods of detecting, classifying and reasoning with emerging
topics that infuse public discourse on social or mainstream media. An
example of such a topic is COVID-19 at the dawn of the pandemics outbreak.
Downstream tasks of interest are fake news detection and fact-checking on
novel topics, including claim analysis, opinion mining and narratives
extraction. With regard to (2), the goal is to shed light on the limits of
LLMs for companies in sectors such as international geopolitical monitoring
and corporate intelligence, finance and stock market trading or insurance,
where companies need to track their interests and products in real time.
This does not address the inclusion of corporate data into the LLMs, but
rather proposes solutions by using publicly available and constantly
growing data. An overarching problem that will be studied is that of the
cross-language and cross-country specificities of emerging data, where
novel information in underrepresented languages or contexts may be more
challenging to analyze. We welcome insights and parallels from the field of
knowledge representation, where the similar problem with cutoff dates of
knowledge graphs (dynamics and regular updates) is well understood.
The expected outcomes are: 1) insights on the temporal limitations of LLMs,
where the workshop will outline concrete challenges and bottlenecks in the
identified scenarios; 2) novel methodological and technical solutions in
terms of (incremental) machine learning models when dealing with
(reasoning, extracting and classifying) information beyond the cutoff dates
of current LLMs.
TOPICS OF INTEREST
* Analysis of emerging topics and events, including counterfactual/what-if
reasoning
* Methods for few-shot or zero-shot learning
* Large language models for online discourse
* Large language models for corporate near real-time data analysis
* Large language models for multimodal understanding and generation
* Multilingual and cross-country emerging information extraction
* Computational journalism, disinformation spread, fact-checking and fake
news detection
* Stance and viewpoint discovery for novel information
* Detection and classification of claims within emerging narratives
* Social, ethical and legal aspects of LLMs up-to-dateness
* Interpretability / explainability of computational methods beyond the cut
off
* Linking and enrichment of data beyond LLM cut off
* Foundational models for knowledge graph building and entity alignment
* Recommender systems for novel information
* Quality, provenance, uncertainty and trust of emerging information and
data
* Use-cases, applications and cross-community interfaces
* Evaluation frameworks and benchmarks
SUBMISSION
We welcome the following types of contributions:
* Full papers (12-15 pages including references): contain original research.
* Short papers (up to 11 pages including references): contain original
research in progress.
* Demo papers (up to 11 pages including references): contain descriptions
of prototypes, demos or software systems.
* Data papers (up to 11 pages including references): contain descriptions
of resources related to the workshop topics, such as datasets, knowledge
graphs, corpora, annotation protocols, etc.
* Position papers (up to 11 pages including references): discuss vision
statements or research directions.
Workshop papers must be self-contained and in English. They should not have
been previously published, should not be considered for publication, and
should not be under review for another workshop, conference, or journal.
Manuscripts should be submitted via EasyChair (
https://easychair.org/conferences/?conf=llmsbeyondthecut0ff) in PDF format,
using the Springer LNCS format. For full authors instructions, please check
Springer’s website:
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu….
The review of manuscripts will be double-blind. Papers will be evaluated
according to their significance, originality, technical content, style,
clarity, and relevance to the workshop. At least one author of each
accepted contribution must register for the workshop and present the paper.
Pre-prints of all contributions will be made available during the
conference. The accepted papers will appear as a volume of Springer
Nature’s LNCS post-proceedings.
Submission via EasyChair:
https://easychair.org/conferences/?conf=llmsbeyondthecut0ff
Springer guidelines for authors:
https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…
For any enquiries, please contact the workshop organizers:
todorov(a)lirmm.fr, rettinger(a)uni-trier.de, jmgomez(a)expert.ai,
croitoru(a)lirmm.fr,
IMPORTANT DATES
* July 29, 2024: Paper submission deadline
* August 30, 2024: Paper acceptance notification
* September 15, 2024: Camera ready versions submission
* October 25, 2024: Workshop date
All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time
zone.
KEYNOTES
* TBA
AWARD
* All contributions are eligible for the "Best Paper" award
ORGANIZING COMMITTEE
* Konstantin Todorov (University of Montpellier, CNRS, LIRMM, France)
* José Manuel Gomèz Perèz (Expert.ai, Spain)
* Madalina Croitoru (University of Montpellier, CNRS, LIRMM, France)
* Achim Rettinger (University of Trier, Germany)
PROGRAM COMMITTEE
* Preslav Nakov, MBZUAI, United Arabe Emirates
* Serena Villata, I3S, CNRS, France
* Ronald Denaux, Amazon, USA
* Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands
* Elena Montiel, Universidad Politécnica de Madrid, Spain
* Sandra Bringay, University Paul Valéry, France
* Carlos Badenes, Universidad Politécnica de Madrid, Spain
* Ioana Manolescu, Inria Saclay, France
* Dino Ienco, INRAE, France
* Colin Porlezza, Univ. della Svizzera Italiana, Switzerland
* Katarina Boland, Heinrich Heine Universität, Germany
* Gabriella Lapesa, GESIS, Germany
* Jonas Fegert, FZI, Germany
* Michael Färber, TU-Dresden, Germany
* Salim Hafid, University of Montpellier, France
* Pavlos Fafalios, FORTH, Greece
* Andrés García Silva, Expert.ai, Spain
* Sarah Labelle, University Paul Valéry, France
* Pablo Calleja, Universidad Politécnica de Madrid, Spain
*Patricia Martín Chozas*
*Assistant Professor *at the Applied Linguistics Department
*Postdoctoral Researcher *at the Ontology Engineering Group
(Artificial Intelligence Department)
ETSI Informáticos - Universidad Politécnica de Madrid
Phone: (+34) 910673091
FIRST CALL FOR PARTICIPATION
Advanced Language Processing School (ALPS) 2025
March 30th - April 4th 2025
Aussois (French Alps)
We are pleased to announce the 5th edition of ALPS - the Advanced NLP
School to be held in the French Alps from March 30th to April 4th 2025.
This school targets advanced research students in Natural Language
Processing and related fields and brings together world leading experts and
motivated students. The programme comprises lectures, poster presentations,
practical lab sessions and nature activities - the venue is located near a
National Park.
Important Dates
-
Oct 15th 2024: Application deadline
-
Nov 15th 2024: acceptance notification
-
Jan 15th 2025: registration deadline
-
March 30th 2025: Start of School
Confirmed speakers so far:
-
Kyunghyun Cho (New York University & Prescient Design)
-
Titouan Parcollet (Cambridge University & Samsung AI Center)
-
Barbara Plank (LMU Munich)
-
François Yvon (ISIR CNRS)
Website and online application: https://alps.imag.fr/ <http://alps.imag.fr/>
Questions: alps(a)univ-grenoble-alpes.fr
The registration fees for the event encompass accommodation and full board
at the conference venue, the Centre Paul Langevin
<https://lig-alps.imag.fr/index.php/venue/>. We will announce the fee
amounts later, and they will vary depending on the participant's
background: students, academia, and industry. Student fees will be set at
or below €600, including twin room accommodation. We will have a limited
amount of scholarships for the registration: if you are interested please
mark this in the application form. The rates for academia and industry will
be higher, as is customary, and will include accommodation in a single room.
Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages.
>> Topics
LoResLM 2025 invites submissions on a broad range of topics related to the development and evaluation of neural language models for low-resource languages, including but not limited to the following.
*
Building language models for low-resource languages.
*
Adapting/extending existing language models/large language models for low-resource languages.
*
Corpora creation and curation technologies for training language models/large language models for low-resource languages.
*
Benchmarks to evaluate language models/large language models in low-resource languages.
*
Prompting/in-context learning strategies for low-resource languages with large language models.
*
Review of available corpora to train/fine-tune language models/large language models for low-resource languages.
*
Multilingual/cross-lingual language models/large language models for low-resource languages.
*
Applications of language models/large language models for low-resource languages (i.e. machine translation, chatbots, content moderation, etc.
>> Important Dates
*
Paper submission due – 5th November 2024
*
Notification of acceptance – 25th November 2024
*
Camera-ready due – 13th December 2024
*
LoResLM 2025 workshop – 19th / 20th January 2025 co-located with COLING 2025
>> Submission Guidelines
We follow the COLING 2025 standards for submission format and guidelines. LoResLM 2025 invites the submission of long papers of up to eight pages and short papers of up to four pages. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references), papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix.
To prepare your submission, please make sure to use the COLING 2025 style files available here:
*
Latex - https://coling2025.org/downloads/coling-2025.zip
*
Word - https://coling2025.org/downloads/coling-2025.docx
*
Overleaf - https://www.overleaf.com/latex/templates/instructions-for-coling-2025-proce…
Papers should be submitted through Softconf/START using the following link: https://softconf.com/coling2025/LoResLM25/
>> Organising Committee
*
Hansi Hettiarachchi, Lancaster University, UK
*
Tharindu Ranasinghe, Lancaster University, UK
*
Paul Rayson, Lancaster University, UK
*
Ruslan Mitkov, Lancaster University, UK
*
Mohamed Gaber, Birmingham City University, UK
*
Damith Premasiri, Lancaster University, UK
*
Fiona Anting Tan, National University of Singapore, Singapore
*
Lasitha Uyangodage, University of Münster, Germany
URL - https://loreslm.github.io/
Twitter - https://x.com/LoResLM2025
Best Regards
Tharindu Ranasinghe
Registration for ECAI-2024, the 27th European Conference on Artificial Intelligence, is now open. The early registration period will end on Monday, 19 August 2024.
https://urldefense.com/v3/__https://www.ecai2024.eu/registration__;!!D9dNQw…
Please join us during 19-24 October 2024 in Santiago de Compostela to mark the 50th anniversary since the first AI conference was held in Europe back in 1974.
We are looking forward to an exciting programme with some 600 accepted papers across all areas of AI, as well as lots of special events, including invited talks, panel sessions, satellite workshops, tutorials, and more.
--
Luis Magdalena
Publicity Chair of the European Conference on Artificial Intelligence (ECAI-2024)
*Apologies for cross-posting*
Dear colleague,
We cordially invite you to participate in the 34th Meeting of Computational Linguistics in The Netherlands (CLIN34) which takes place in Leiden on Friday 30 August 2024.
Besides a large and diverse programme of posters and oral presentations, we are happy to report that CLIN34 will have two keynote talks by:
* Diana Maynard, Sheffield University
* Dominique Blok and Erik de Graaf, TNO
If you wish to participate, please register via the conference website: clin34.leidenuniv.nl<http://clin34.leidenuniv.nl/>
The programme can also be found at: clin34.leidenuniv.nl/program/<https://clin34.leidenuniv.nl/program/>
We hope to see you in Leiden in August!
The CLIN34 organizers
Leiden University
Non thematic issue of the TAL journal: 2025 Volume 66-1
http://tal-66-1.sciencesconf.org/
Editors: Maxime Amblard, Cécile Fabre, Benoit Favre and Sophie Rosset
The call for volume 66-1 is open until December 31, 2024.
NEW since 2023: Non-thematic issues of the Automatic Language Processing journal become "on the fly". Each paper in issue 66-1 will be evaluated as soon as it is submitted and will be published, subject to its acceptance, within an indicative period of six months after its submission.
THEMES
The journal Automatic Language Processing has an open call for papers. Submissions may concern theoretical and experimental contributions on all aspects of written, spoken, and signed language processing and computational linguistics, both theoretical and experimental, for example:
Computational models of language
Linguistic resources
Statistical learning and modeling
Intermodality and multimodality
Language multiplicity and diversity
Semantics and comprehension
Information access and text mining
Language production and processing/generation/synthesis
Evaluation
Explicability and reproducibility
NLP in interaction with other disciplines, digital humanities
This list is indicative. On all topics, it is essential that the aspects related to natural language processing are emphasized.
We also welcome position papers and survey papers.
LANGUAGE
Manuscripts may be submitted in English or French.
THE TAL JOURNAL
TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, https://www.atala.org/revuetal) since 1959. TAL has an electronic mode of publication with immediate free access to published articles.
SCHEDULE
Submission deadline: on the fly until December 31, 2024
Notification to the authors after first review: two months after submission
Notification to the authors after second review: two months after the first review
Publication : two months after the second review
FORMAT SUBMISSION
Papers must be between 20 and 25 pages long, including references and appendices (with no possible derogation on the length).
TAL is a double-blind review journal: it is thus necessary to anonymise the manuscript and the name of the pdf file. Self-references that reveal the author's identity must be avoided.
Style sheets are available for download on the Web site of the journal.
More information on: http://tal-66-1.sciencesconf.org/
ICLC-11
11TH INTERNATIONAL CONTRASTIVE LINGUISTICS CONFERENCE
First Call for Abstracts
September 17–19, 2025
Prague, Czech Republic
The Faculty of Arts at Charles University in Prague is pleased to announce the 11th International Contrastive Linguistics Conference. The ICLC conference series, running since 1998, aims to promote fine-grained cross-linguistic research comprising two or more languages from a broad range of theoretical and methodological perspectives. Following the success of ICLC-10 in Mannheim 2023, ICLC-11 wants to bring together researchers from different linguistic subfields and neighbouring disciplines to continue the interdisciplinary dialog on comparing languages, to foster the development of an international community and to advance possible new areas of cross-linguistic research. See https://iclc11.ff.cuni.cz/ for more and note the submission deadline of February 24, 2025.
We invite abstracts on a broad range of topics, including but not limited to:
(1) Comparison of phenomena in two or more languages focused on any area and level of linguistic analysis:
* lexicon
* phonetics and phonology
* morphology, syntax and morphosyntax, linguistic complexity
* semantics, pragmatics, register and socio-cultural context
(2) Methodological challenges and solutions in cross-linguistic research:
* language corpora (multilingual, learner, and multimodal) and issues of linguistic annotation (e.g., Universal Dependencies)
* comparability issues, tertia comparationis, language universals; experimental and naturalistic interaction data
* AI and new digital tools in linguistic analysis
* low-resourced languages
(3) Contrastive linguistics in touch with related disciplines:
* generative, model-theoretic, functional or cognitive (e.g., constructional) approches
* historical, sociolinguistic and variationist perspectives; registers, multimodality, pragmatics, interculturality; language contact; language policy
* cognitive and psycholinguistic approaches to bilingualism and multilingualism; language acquisition, language teaching and learning
* translation studies
The abstracts should present empirical research, well-defined research questions or hypotheses, details of the research approach and methods, theoretical insights, and (preliminary or expected) results. For details see https://iclc11.ff.cuni.cz/calls-and-circulars/call-for-papers/.
PRELIMINARY PROGRAM
* Parallel Oral Sessions
* Poster Sessions
* Keynote Speakers:
Sabine De Knop (Université Saint-Louis, Bruxelles, Belgium)
Volker Gast (Friedrich-Schiller-University, Jena, Germany)
Dan Zeman (Charles University, Prague)
* Panel Discussion
IMPORTANT DATES
24.02.2025: Deadline for abstract submission
26.05.2025: Notification of acceptance
02.06.2025: Registration opens
16.06.2025: Deadline for revised abstract submission
30.06.2025: Last day for early bird registration
01.09.2025: Online registration closes
16.09.2025: Arrival, Registration, Get-together
17–19.09.2025: Conference
ORGANIZING COMMITTEE
* Mirjam Fried (chair) 1)
* Viktor Elšík 1)
* Jana Kocková 2)
* Michal Křen 1)
* Olga Nádvorníková 1)
* Alexandr Rosen 1)
1) Charles University, Faculty of Arts
2) Czech Academy of Sciences, Institute of Slavonic Studies
PROGRAM COMMITTEE: tba
CONTACT INFORMATION
Website: https://iclc11.ff.cuni.cz/
Email: iclc11(a)ff.cuni.cz
Call For Papers: The International Conference on Intelligent Multilingual
Information Processing 2024 (IMLIP 2024)
The International Conference on Intelligent Multilingual Information
Processing 2024 (IMLIP 2024) will take place in Beijing, China, on 16-17
November 2024, hosted by Beijing Institute of Technology (
https://english.bit.edu.cn/).
As a professional committee of the Chinese Association of Artificial
Intelligence, the Institue of Multilingual Intelligent Information
Processing (IMLIP), focuses on multilingual intelligent information
processing and its applications. The aim of the conference IMLIP 2024 is to
bring together experts from industry, academia, and research in the
community, to provide a platform for academic exchange and collaborative
research for scholars from around the world, and also to promote linguistic
research and natural language processing studies related to China's ethnic
minorities and countries.
Conference Website:http://www.imlip.org/
Topics
IMLIP 2024 welcomes original research and applications related to
multilingual intelligent information processing. We encourage
interdisciplinary studies and the integration of humanities and sciences.
Topics of interest include, but are not limited to, the following:
Linguistics
Cross-lingual processing
Large language models
Computational linguistics theory
Resource and corpus construction
Evaluation
Multilingual language understanding
Machine translation
Multimodal intelligent information processing, including multilingual
speech recognition and text processing
Intelligent processing in international Chinese education
Applications of multilingual intelligent information processing
Keynote speakers
Academician Nima Tashi, Professor, Tibet University, Tibetan multilingual
processing
Professor Kim Gerdes, University of Saclay, France
Important Dates
Paper Submission System Open: June 30, 2024
Paper submission Deadline: August 30, 2024
Notification of Acceptance: September 30, 2024
Conference Dates: November 16-17, 2024
Submissions
Papers submitted to IMLIP 2024 can be in Chinese or English. An accepted
paper will be presented either as an oral talk or as a poster, as
determined by the Program Committee. Accepted Chinese papers will be
recommended to "Corpus Linguistics" and "AppliedTechnology" based on
circumstances, with further review required to determine final acceptance.
Accepted English papers will be published in the Springer conference
proceedings (EIindexed). The authors of accepted papers must revise the
papers according to the review before publication. At least one author of
the accepted paper must attend the conference.
Format
Please use the Word or LaTeX templates provided. Papers may consist of up
to 8 pages of content, plus unlimited references. Papers will be
double-blindly reviewed without the authors’ names and affiliations
included. Furthermore, self-references that reveal the author’s identity,
e.g., “We previously showed (Smith, 1991) …”, must be avoided. Instead, use
citations such as “Smith (1991) previously showed …”. Papers that do not
conform to these requirements will be rejected without review. For Chinese
submission, please download the template at
http://jcip.cipsc.org.cn/CN/item/downloadFile.do?id=79.
For English submission, please download the template at
https://github.com/acl-org/acl-style-files.
Submission Website: Submission will be electronic in PDF format through
https://openreview.net/groupid=IMLIP.org/2024/Conference.
Multiple-Submission Policy
IMLIP 2024 allows authors to submit manuscripts to leading NLP
international conferences simultaneously only if the conferences have
established similar multiple-submission policies. Papers that have been or
will be submitted to other conferences must indicate this at the submission
time. Authors of papers accepted for presentation in IMLIP 2024 must notify
the program chairs by the camera-ready deadline as to whether the paper
will be presented at IMLIP 2024. Once confirmed, the paper must be
withdrawn from other venues. We will not accept papers that are identical
or overlap significantly in content or results with papers that will be (or
have been) published elsewhere except the arXiv preprint version.
Awards and Funds
IMLIP 2024 will grant Best Paper Awards in Chinese and English respectively.
Contact
For further information, visit the conference website at
http://www.imlip.org/
参会地点:
北京理工大学良乡校区文博中心
Venue: Cultural and Museum Center, Liangxiang Campus, Beijing Institute of
Technology