- Corpora - ELRA lists

COLING 2025 Workshop on Detecting AI Generated Content, 1st Call for participation
by Firoj Alam 22 Aug '24

22 Aug '24

(apologies for cross-posting) Dear colleague, We seek submissions of long and short papers on original and unpublished work (same page limit as the COLING 2025 main conference). In addition, there will be two shared tasks. All accepted submissions will be presented as talks and/or posters at the workshop, prior to the COLING 2025 main conference. Research Papers We invite original research papers from a wide range of topics, including but not limited to: - Detection methods for text, image, speech and other modalities - Multilingual detection methods - Detection Methods Image Modality - Detection Methods Multimodal Content - Real-time detection systems: real-time systems for detecting AI-generated content in live scenarios. - Attacks for detection systems - Datasets and resources - Benchmarking for AI generated content detection - AI generated fake news detection - Deep Fakes in audio, videos and images - Ethical and legal implications of AI generated content Shared Tasks We plan to run two shared tasks: - Task 1: Binary Multilingual Machine-Generated Text Detection (Human vs. Machine) - Task 2: AI vs. Human -- Academic Essay Authenticity Challenge Important dates: Regular Submission (including shared tasks) Deadline: November 15, 2024 (dual-submission allowed) Resubmission (with a rebuttal) Deadline: December 2, 2024 Acceptance Notification: December 7, 2024 Camera-Ready Deadline: December 13, 2025 Workshop Day: January 19-20, 2025 All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”). Submission Details: Papers must describe original, completed or in-progress, and unpublished work. All papers will be refereed through a double-blind peer review process by multiple reviewers with final acceptance decisions made by the workshop organizers. Accepted papers will be given up to 9 pages (for full papers), 5 pages (for short papers and posters) in the workshop proceedings, and will be presented as oral paper or poster. We are seeking submissions under the following categories: - Full/long papers (8 pages) - Short papers (work in progress, innovative ideas/proposals: 4 pages) - Shared task papers (4 pages) Long, short, and shared task papers must follow the two-column format of *ACL conferences, using the official templates. The templates can be downloaded in style files and formatting. Please do not modify these style files, nor should you use templates designed for other conferences. Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review. Verification to guarantee conformance to publication standards, we will be using the ACL pubcheck tool. The PDFs of camera-ready papers must be run through this tool prior to their final submission, and we recommend its use also at submission time. Submissions are open to all, and are to be submitted anonymously. For the anonymity, double-blind submission, and reproducibility criteria please follow the COLING 2025 instructions. **If you have published in *ACL conferences previously, and are interested in helping out in the program committee to review papers, please fill up this form. Submission portal Submissions must be made using the START portal. We will make the URL to the portal available soon. Best regards The Organizers

2 1

Third Call for Participation - CoLI-Dravidian@FIRE 2024: Word-level Code-Mixed Language Identification in Dravidian Languages
by Sabur B 22 Aug '24

22 Aug '24

****We apologize for multiple postings of this e-mail**** CALL FOR PARTICIPATION FIRE 2024 Task - CoLI-Dravidian: Word-level Code-Mixed Language Identification in Dravidian Languages Held as a shared task in the 16th meeting of Forum for Information Retrieval Evaluation (FIRE 2024 <http://fire.irsi.org.in/fire/2024/home>) December 12-15, 2024. DAIICT, Gandhinagar, India Website: https://sites.google.com/view/coli-dravidian-2024/datasets?authuser=0 Codalab link: https://codalab.lisn.upsaclay.fr/competitions/19357 Dear All, We are inviting researchers and students to participate in the shared task CoLI-Dravidian: Word-level Code-Mixed Language Identification in Dravidian Languages, which is held as a shared task in the 16th meeting of Forum for Information Retrieval Evaluation (FIRE 2024 <http://fire.irsi.org.in/fire/2024/home>). Language Identification (LI) involves detecting the language(s) used in a given text, which is a preliminary step for many applications such as sentiment analysis, machine translation, information retrieval, and natural language understanding. In multilingual India, especially among the youth, social media often features code-mixed text, blending local languages with English at various levels. However, this poses significant challenges for LI, particularly when languages are mixed within a single word. Dravidian languages, extensively spoken in southern India, are under-resourced despite their rich morphological structure. These languages face technological challenges, especially in script representation on digital platforms, leading users to prefer Roman or hybrid scripts for communication. This prevalent code-mixing offers vast linguistic data for research yet remains understudied. To address word-level LI challenges in code-mixed Dravidian languages, we are conducting a shared task by providing code-mixed datasets for four languages - Kannada, Tamil, Malayalam, and Tulu, to encourage the development of advanced LI models. There will be a real-time leaderboard, and the participants will be allowed to make a maximum of 10 submissions in the training phase and 5 submissions in the testing phase through CodaLab. Each team will have to select the best submission for ranking. To download the data and participate, go to: https://codalab.lisn.upsaclay.fr/competitions/19357. Best regards, The CoLI-Dravidian 2024 Organizing Committee Important dates - 14th June 2024 - open track websites and training data release - 1st July 2024– test data release - 25th July – run submission deadline - 27th July – results declared - 27th August – Working notes due - 10th September - Reviews - 30th October – Camera-ready copies of working notes NOTE: All dates mentioned here are in the AoE (Anywhere on Earth) zone. Organizing Committee - Shashirekha Hosahalli Lakshmaiah, Department of Computer Science, Mangalore University, India. - Ameeta Agrawal, Department of Computer Science, Portland State University, USA. - Fazlourrahman Balouchzahi, CIC, IPN, Mexico. - Asha Hegde, Department of Computer Science, Mangalore University, India. - Sabur Butt, IFE, Tecnologico de Monterrey, Mexico. - Sharal Coelho, Department of Computer Science, Mangalore University, India. - Kavya G, Department of Computer Science, Mangalore University, India. - Harshitha, Department of Computer Science, Mangalore University, India. - Sonith D, Department of Computer Science, Mangalore University, India. *Sabur Butt, Ph.D. *(He/Him) Institute for the Future of Education (IFE) *Tecnológico de Monterrey, Mexico* Address: Av. Eugenio Garza Sada 2501 Sur Tecnológico, 64849 Monterrey, N.L. LinkedIn <https://www.linkedin.com/in/saburb> - GitHub <https://github.com/saburbutt> - Scholar <https://scholar.google.com/citations?user=re7md-0AAAAJ&hl=en> - Website <https://saburbutt.github.io/>

2 1

Second Call for Papers: GenBench, the second workshop on generalisation (benchmarking) in NLP @ EMNLP 2024
by Verna 22 Aug '24

22 Aug '24

*GenBench: The second workshop on generalisation (benchmarking) in NLP* *Workshop description*The ability to generalise well is often mentioned as one of the primary desiderata for models of natural language processing (NLP). Yet, there are still many open questions related to what it means for an NLP model to generalise well, and how generalisation should be evaluated. LLMs, trained on gigantic training corpora that are – at best – hard to analyse or not publicly available at all, bring a new set of challenges to the topic. The second GenBench workshop aims to serve as a cornerstone to catalyse research on generalisation in the NLP community. The workshop aims to bring together different expert communities to discuss challenging questions relating to generalisation in NLP, crowd-source challenging generalisation benchmarks for LLMs, and make progress on open questions related to generalisation. Topics of interest include, but are not limited to: - Opinion or position papers about generalisation and how it should be evaluated; - Analyses of how existing or new models generalise; - Empirical studies that propose new paradigms to evaluate generalisation; - Meta-analyses that compare how results from different generalisation studies compare; - Meta-analyses that study how different types of generalisation are related; - Papers that discuss how generalisation of LLMs can be evaluated; - Papers that discuss why generalisation is (not) important in the era of LLMs; - Studies on the relationship between generalisation and fairness or robustness. The second GenBench workshop on generalisation (benchmarking) in NLP will be co-located with EMNLP 2024. *Submission types* We call for two types of submissions: regular workshop submissions and collaborative benchmarking task submissions. The latter will consist of a data/task artefact and a companion paper motivating and evaluating the submission. In both cases, we accept archival papers and extended abstracts. *1. Regular workshop submissions* Regular workshop submissions present papers on the topic of generalisation (see examples listed above). Regular workshop papers may be submitted as an archival paper, when they report on completed, original and unpublished research, or as a shorter extended abstract, otherwise. More details on this category can be found below. If you are unsure whether a specific topic is well-suited for submission, feel free to reach out to the organisers of the workshop at genbench(a)googlegroups.com. *2. Collaborative Benchmarking Task (CBT) submissions* The goal of this year's CBT is to generate versions of existing evaluation datasets for LLMs which, given a particular training corpus, have a larger distribution shift than the original test set, or – in other words – evaluate generalisation to a stronger degree than the original dataset. For this particular challenge, we focus on three training corpora: C4, RedPajama-Data-1T, and Dolma. All three corpora are publicly available, and they can be searched via the What's in My Big Data API (https://github.com/allenai/wimbd). We will focus on three popular evaluation datasets: MMLU, HumanEval, and SiQA. Submitters to the CBT are asked to design a way to assess distribution shift for one or more of these evaluation datasets, given particular features of the training corpus, and then generate one or more versions of the dataset that have a larger distribution shift according to this method. Newly generated sets do not have to have the same size as the original test set, but should have at least 200 examples. Practically speaking, CBT submissions consist of: 1. the data/task artefact, submitted through https://github.com/GenBench/genbench_cbt 2. a paper describing the dataset and its method of construction, submitted through https://openreview.net/group?id=GenBench.org/2024/Workshop We accept submissions that consider only one pretraining dataset and evaluation dataset, but encourage submitters to apply their suggested protocols to both pretraining datasets. We also suggest that submitters include model results for models trained on these datasets. Suggestions are provided on the CBT website: https://genbench.org/cbt. Given enough high-quality submissions, we aim to write a paper with the combined results, to which submitters can be co-authors, if they wish so. More detailed guidelines will be given on https://genbench.org/cbt. *Archival vs extended abstract* Archival papers are up to 8 pages excluding references and report on completed, original and unpublished research. They follow the requirements of regular EMNLP 2024 submissions. Accepted papers will be published in the workshop proceedings and are expected to be presented at the workshop. The papers will undergo double-blind peer review and should thus be anonymised. Extended abstracts can be up to 2 pages excluding references, and may report on work in progress or be cross-submissions of work that has already appeared in another venue. Abstract titles will be posted on the workshop website, but will not be included in the proceedings. *Submission instructions*For both archival papers and extended abstracts, we refer to the EMNLP 2024 website for paper templates and requirements. Additional requirements for both regular workshop papers and collaborative benchmarking task submissions can be found on our website. All submissions can be submitted through OpenReview: https://openreview.net/group?id=GenBench.org/2024/Workshop. *Important dates* - August 15, 2024: Paper submission deadline - September 20, 2024: Notification deadline - October 4, 2024: Camera-ready deadline - November 15 or 16, 2024: Workshop Note: all deadlines are 11:59 PM UTC-12:00. Check the website for final updates to these deadlines (https://genbench.org/workshop). *Preprints* We do not have an anonymity deadline, preprints are allowed, both before the submission deadline as well as after. *Contact* Email address: genbench(a)googlegroups.com Website: https://genbench.org/workshop *On behalf of the organisers*Dieuwke Hupkes Verna Dankers Khuyagbaatar Batsuren Amirhossein Kazemnejad Christos Christodoulopoulos Mario Giulianelli Ryan Cotterell

2 1

[CIKM-2024] - Tutorial program announced
by antonela.tommasel＠isistan.unicen.edu.ar 22 Aug '24

22 Aug '24

=============== =============== * We apologize if you receive multiple copies of this Tutorial program * * For the online version of this program, visit: https://cikm2024.org/tutorials/ =============== CIKM 2024: 33rd ACM International Conference on Information and Knowledge Management Boise, Idaho, USA October 21–25, 2024 =============== The tutorial program of CIKM 2024 has been published. Tutorials are planned to take place on 21 October 2024. Here you can find a summary of each accepted tutorial. =============== Systems for Scalable Graph Analytics and Machine Learning =============== Da Yan (Indiana University Bloomington), Lyuheng Yuan (Indiana University Bloomington), Akhlaque Ahmad (Indiana University Bloomington) and Saugat Adhikari (Indiana University Bloomington) Graph-theoretic algorithms and graph machine learning models are essential tools for addressing many real-life problems, such as social network analysis and bioinformatics. To support large-scale graph analytics, graph-parallel systems have been actively developed for over one decade, such as Google’s Pregel and Spark’s GraphX, which (i) promote a think-like-a-vertex computing model and target (ii) iterative algorithms and (iii) those problems that output a value for each vertex. However, this model is too restricted for supporting the rich set of heterogeneous operations for graph analytics and machine learning that many real applications demand. In recent years, two new trends emerge in graph-parallel systems research: (1) a novel think-like-a-task computing model that can efficiently support the various computationally expensive problems of subgraph search; and (2) scalable systems for learning graph neural networks. These systems effectively complement the diversity needs of graph-parallel tools that can flexibly work together in a comprehensive graph processing pipeline for real applications, with the capability of capturing structural features. This tutorial will provide an effective categorization of the recent systems in these two directions based on their computing models and adopted techniques, and will review the key design ideas of these systems. =============== Fairness in Large Language Models: Recent Advances and Future =============== Thang Viet Doan (Florida International University), Zichong Wang (Florida International University), Minh Nhat Nguyen (Florida International University) and Wenbin Zhang (Florida International University) Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. In this tutorial, we give a systematic overview of recent advances in the existing literature concerning fair LLMs. Specifically, a series of real-world case studies serve as a brief introduction to LLMs, and then an analysis of bias causes based on their training process follows. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, current research challenges and open questions are discussed. =============== Unifying Graph Neural Networks across Spatial and Spectral Domains =============== Zhiqian Chen (Mississippi State University), Lei Zhang (Virginia Tech) and Liang Zhao (Emory University) Over recent years, Graph Neural Networks (GNNs) have garnered significant attention. However, the proliferation of diverse GNN models, underpinned by various theoretical approaches, complicates model selection, as they are not readily comprehensible within a uniform framework. Early GNNs were implemented using spectral theory, while others were based on spatial theory. This divergence renders direct comparisons challenging. Moreover, the multitude of models within each domain further complicates evaluation. In this half-day tutorial, we examine state-of-the-art GNNs and introduce a comprehensive framework bridging spatial and spectral domains, elucidating their interrelationship. This framework enhances our understanding of GNN operations. The tutorial explores key paradigms, such as spatial and spectral methods, through a synthesis of spectral graph theory and approximation theory. We provide an in-depth analysis of recent research developments, including emerging issues like over-smoothing, using well-established GNN models to illustrate our framework's universality. =============== Tabular Data-centric AI: Challenges, Techniques and Future Perspectives =============== Yanjie Fu (Arizona State University), Dongjie Wang (University of Kansas), Hui Xiong (Hong Kong University of Science and Technology (Guangzhou)) and Kunpeng Liu (Portland State University) Tabular data is ubiquitous across various application domains such as biology, ecology, and material science. Tabular data-centric AI aims to enhance the predictive power of AI through better utilization of tabular data, improving its readiness at structural, predictive, interaction, and expression levels. This tutorial targets professionals in AI, machine learning, and data mining, as well as researchers from specific application areas. We will cover the settings, challenges, existing methods, and future directions of tabular data-centric AI. The tutorial includes a hands-on session to develop, evaluate, and visualize techniques in this emerging field, equipping attendees with a thorough understanding of its key challenges and techniques for integration into their research. =============== Frontiers of Large Language Model-Based Agentic Systems =============== Reshmi Ghosh (Microsoft), Jia He (Microsoft Corp.), Kabir Walia (Microsoft), Jieqiu Chen (Microsoft), Tushar Dhadiwal (Microsoft), April Hazel (Microsoft) and Chandra Inguva (Microsoft) Large Language Models (LLMs) have recently demonstrated remarkable potential in achieving human-level intelligence, sparking a surge of interest in LLM-based autonomous agents. However, there is a noticeable absence of a thorough guide that methodically compiles the latest methods for building LLM-agents, their assessment, and the associated challenges. As a pioneering initiative, this tutorial delves into the intricacies of constructing LLM-based agents, providing a systematic exploration of key components and recent innovations. We dissect agent design using an established taxonomy, focusing on essential keywords prevalent in agent-related framework discussions. Key components include profiling, perception, memory, planning, and action. We unravel the intricacies of each element, emphasizing state-of-the-art techniques. Beyond individual agents, we explore the extension from single-agent paradigms to multi-agent frameworks. Participants will gain insights into orchestrating collaborative intelligence within complex environments. Additionally, we introduce and compare popular open-source frameworks for LLM-based agent development, enabling practitioners to choose the right tools for their projects. We discuss evaluation methodologies for assessing agent systems, addressing efficiency and safety concerns. We present a unified framework that consolidates existing work, making it a valuable resource for practitioners and researchers alike. =============== Hands-On Introduction to Quantum Machine Learning =============== Samuel Yen-Chi Chen (Wells Fargo) and Joongheon Kim (Korea University) This tutorial offers a hands-on introduction into the captivating field of quantum machine learning (QML). Beginning with the bedrock of quantum information science (QIS)—including essential elements like qubits, single and multiple qubit gates, measurements, and entanglement—the session swiftly progresses to foundational QML concepts. Participants will explore parametrized or variational circuits, data encoding or embedding techniques, and quantum circuit design principles. Delving deeper, attendees will examine various QML models, including the quantum support vector machine (QSVM), quantum feed-forward neural network (QNN), and quantum convolutional neural network (QCNN). Pushing boundaries, the tutorial delves into cutting-edge QML models such as quantum recurrent neural networks (QRNN) and quantum reinforcement learning (QRL), alongside privacy-preserving techniques like quantum federated machine learning, bolstered by concrete programming examples. Throughout the tutorial, all topics and concepts are brought to life through practical demonstrations executed on a quantum computer simulator. Designed with novices in mind, the content caters to those eager to embark on their journey into QML. Attendees will also receive guidance on further reading materials, as well as software packages and frameworks to explore beyond the session. =============== On the Use of Large Language Models for Table Tasks =============== Yuyang Dong (NEC), Masafumi Oyamada (NEC), Chuan Xiao (Osaka University, Nagoya University) and Haochen Zhang (Osaka University) The proliferation of LLMs has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a variety of table-related tasks, such as table understanding, text-to-SQL conversion, and tabular data preprocessing. It surveys LLM solutions to these tasks in five classes, categorized by their underpinning techniques: prompting, fine-tuning, RAG, agents, and multimodal methods. It discusses how LLMs offer innovative ways to interpret, augment, query, and cleanse tabular data, featuring academic contributions and their practical use in the industrial sector. It emphasizes the versatility and effectiveness of LLMs in handling complex table tasks, showcasing their ability to improve data quality, enhance analytical capabilities, and facilitate more intuitive data interactions. By surveying different approaches, this tutorial highlights the strengths of LLMs in enriching table tasks with more accuracy and usability, setting a foundation for future research and application in data science and AI-driven analytics. =============== Data Quality-aware Graph Machine Learning =============== Yu Wang (Vanderbilt University), Kaize Ding (Northwestern University), Xiaorui Liu (North Carolina State University), Jian Kang (University of Rochester), Ryan Rossi (Adobe Research) and Tyler Derr (Vanderbilt University) Recent years have seen a significant shift in Artificial Intelligence from model-centric to data-centric approaches, highlighted by the success of large foundational models. Following this trend, despite numerous innovations in graph machine learning model design, graph-structured data often suffers from data quality issues, which jeopardizes the progress of Data-centric AI in graph-structured applications. Our proposed tutorial aims to address this gap by raising awareness about data quality issues within the graph machine-learning community. We provide an overview of existing issues, including topology, imbalance, bias, limited data, and abnormalities in graph data. Additionally, we highlight previous studies and recent developments in foundational graph models that focus on identifying, investigating, mitigating, and resolving these issues. =============== Towards Efficient Temporal Graph Learning: Algorithms, Frameworks, and Tools =============== Ruijie Wang (University of Illinois Urbana-Champaign), Wanyu Zhao (University of Illinois Urbana-Champaign), Dachun Sun (University of Illinois Urbana-Champaign), Charith Mendis (University of Illinois Urbana-Champaign) and Tarek Abdelzaher (University of Illinois Urbana-Champaign) Temporal graphs capture dynamic node relations via temporal edges, finding extensive utility in wide domains where time-varying patterns are crucial. Temporal Graph Neural Networks (TGNNs) have gained significant attention for their effectiveness in representing temporal graphs. However, TGNNs still face significant efficiency challenges in real-world low-resource settings. First, from a data-efficiency standpoint, training TGNNs requires sufficient temporal edges and data labels, which is problematic in practical scenarios with limited data collection and annotation. Second, from a resource-efficiency perspective, TGNN training and inference are computationally demanding due to complex encoding operations, especially on large-scale temporal graphs. Minimizing resource consumption while preserving effectiveness is essential. Inspired by these efficiency challenges, this tutorial systematically introduces state-of-the-art data-efficient and resource-efficient TGNNs, focusing on algorithms, frameworks, and tools, and discusses promising yet under-explored research directions in efficient temporal graph learning. This tutorial aims to benefit researchers and practitioners in data mining, machine learning, and artificial intelligence. =============== Landing Generative AI in Industrial Social and E-commerce Recsys =============== Da Xu (LinkedIn), Danqing Zhang (Amazon), Lingling Zheng (Microsoft), Bo Yang (Amazon), Guangyu Yang (TikTok), Shuyuan Xu (TikTok) and Cindy Liang (LinkedIn) Over the past two years, GAI has evolved rapidly, influencing various fields including social and e-commerce Recsys. Despite exciting advances, landing these innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems. Our tutorial begins with a brief overview of building industrial Recsys and GAI fundamentals, followed by the ongoing efforts and opportunities to enhance personalized recommendations with foundation models. We then explore the integration of curation capabilities into Recsys, such as repurposing raw content, incorporating external knowledge, and generating personalized insights/explanations to foster transparency and trust. Next, the tutorial illustrates how AI agents can transform Recsys through interactive reasoning and action loops, shifting away from traditional passive feedback models. Finally, we shed insights on real-world solutions for human-AI alignment and responsible GAI practices. A critical component of the tutorial is detailing the AI, Infrastructure, LLMOps, and Product roadmap (including the evaluation and responsible AI practices) derived from the production solutions in LinkedIn, Amazon, TikTok, and Microsoft. While GAI in Recsys is still in its early stages, this tutorial provides valuable insights and practical solutions for the Recsys and GAI communities. =============== Transforming Digital Forensics with Large Language Models =============== Eric Xu (University of Maryland, College Park), Wenbin Zhang (Florida International University) and Weifeng Xu (University of Baltimore) In the pursuit of justice and accountability in the digital age, the integration of Large Language Models (LLMs) with digital forensics holds immense promise. This half-day tutorial provides a comprehensive exploration of the transformative potential of LLMs in automating digital investigations and uncovering hidden insights. Through a combination of real-world case studies, interactive exercises, and hands-on labs, participants will gain a deep understanding of how to harness LLMs for evidence analysis, entity identification, and knowledge graph reconstruction. By fostering a collaborative learning environment, this tutorial aims to empower professionals, researchers, and students with the skills and knowledge needed to drive innovation in digital forensics. As LLMs continue to revolutionize the field, this tutorial will have far-reaching implications for enhancing justice outcomes, promoting accountability, and shaping the future of digital investigations. =============== Collecting and Analyzing Public Data from Mastodon =============== Haris Bin Zia (Queen Mary University of London), Ignacio Castro (none) and Gareth Tyson (Hong Kong University of Science and Technology) Understanding online behaviors, communities, and trends through social media analytics is becoming increasingly important. Recent changes in the accessibility of platforms like Twitter have made Mastodon a valuable alternative for researchers. In this tutorial, we will explore methods for collecting and analyzing public data from Mastodon, a decentralized micro-blogging social network. Participants will learn about the architecture of Mastodon, techniques and best practices for data collection, and various analytical methods to derive insights from the collected data. This session aims to equip researchers with the skills necessary to harness the potential of Mastodon data in computational social science and social data science research.

2 1

[CfP] 2nd CfP - LLMs Beyond the Cutoff workshop @ CIKM 2024
by Patricia Martín Chozas 22 Aug '24

22 Aug '24

*Apologies for crossposting* LLMs Beyond the Cutoff: 1st International Workshop on Computational Methods Beyond the Temporal Borders of Training Data https://llmsbeyondthecutoff2024.wordpress.com Collocated with CIKM 2024 October 25, 2024 — Boise (Idaho), USA * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date === NEWS === * LLMs Beyond the CutOff will be published as a volume of Springer Nature’s post-proceedings * Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff * Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu… SUMMARY LLMs are trained on large amounts of web data that spread temporally up to a specific moment in time. For instance, chatGPT’s LLM “knows” the world before May 2023 with no real time access to information beyond this limit, other than a browsing tool similar to a search engine enabling simple lookup. However, in many scenarios, being able to analyze and reason with novel emerging events and topics is crucial to face the challenges of rapidly evolving landscapes of information. The workshop provides an interdisciplinary forum for discussing the temporal limitations of LLMs and proposing technical solutions of how to apply and develop LLMs beyond their cutoff dates. We explore two prominent scenarios, where contexts tend to evolve faster than the LLMs that are used to analyze them: (1) journalism and (2) industry. In terms of (1) the goal is to propose methods of detecting, classifying and reasoning with emerging topics that infuse public discourse on social or mainstream media. An example of such a topic is COVID-19 at the dawn of the pandemics outbreak. Downstream tasks of interest are fake news detection and fact-checking on novel topics, including claim analysis, opinion mining and narratives extraction. With regard to (2), the goal is to shed light on the limits of LLMs for companies in sectors such as international geopolitical monitoring and corporate intelligence, finance and stock market trading or insurance, where companies need to track their interests and products in real time. This does not address the inclusion of corporate data into the LLMs, but rather proposes solutions by using publicly available and constantly growing data. An overarching problem that will be studied is that of the cross-language and cross-country specificities of emerging data, where novel information in underrepresented languages or contexts may be more challenging to analyze. We welcome insights and parallels from the field of knowledge representation, where the similar problem with cutoff dates of knowledge graphs (dynamics and regular updates) is well understood. The expected outcomes are: 1) insights on the temporal limitations of LLMs, where the workshop will outline concrete challenges and bottlenecks in the identified scenarios; 2) novel methodological and technical solutions in terms of (incremental) machine learning models when dealing with (reasoning, extracting and classifying) information beyond the cutoff dates of current LLMs. TOPICS OF INTEREST * Analysis of emerging topics and events, including counterfactual/what-if reasoning * Methods for few-shot or zero-shot learning * Large language models for online discourse * Large language models for corporate near real-time data analysis * Large language models for multimodal understanding and generation * Multilingual and cross-country emerging information extraction * Computational journalism, disinformation spread, fact-checking and fake news detection * Stance and viewpoint discovery for novel information * Detection and classification of claims within emerging narratives * Social, ethical and legal aspects of LLMs up-to-dateness * Interpretability / explainability of computational methods beyond the cut off * Linking and enrichment of data beyond LLM cut off * Foundational models for knowledge graph building and entity alignment * Recommender systems for novel information * Quality, provenance, uncertainty and trust of emerging information and data * Use-cases, applications and cross-community interfaces * Evaluation frameworks and benchmarks SUBMISSION We welcome the following types of contributions: * Full papers (12-15 pages including references): contain original research. * Short papers (up to 11 pages including references): contain original research in progress. * Demo papers (up to 11 pages including references): contain descriptions of prototypes, demos or software systems. * Data papers (up to 11 pages including references): contain descriptions of resources related to the workshop topics, such as datasets, knowledge graphs, corpora, annotation protocols, etc. * Position papers (up to 11 pages including references): discuss vision statements or research directions. Workshop papers must be self-contained and in English. They should not have been previously published, should not be considered for publication, and should not be under review for another workshop, conference, or journal. Manuscripts should be submitted via EasyChair ( https://easychair.org/conferences/?conf=llmsbeyondthecut0ff) in PDF format, using the Springer LNCS format. For full authors instructions, please check Springer’s website: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…. The review of manuscripts will be double-blind. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted contribution must register for the workshop and present the paper. Pre-prints of all contributions will be made available during the conference. The accepted papers will appear as a volume of Springer Nature’s LNCS post-proceedings. Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu… For any enquiries, please contact the workshop organizers: todorov(a)lirmm.fr, rettinger(a)uni-trier.de, jmgomez(a)expert.ai, croitoru(a)lirmm.fr, IMPORTANT DATES * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time zone. KEYNOTES * TBA AWARD * All contributions are eligible for the "Best Paper" award ORGANIZING COMMITTEE * Konstantin Todorov (University of Montpellier, CNRS, LIRMM, France) * José Manuel Gomèz Perèz (Expert.ai, Spain) * Madalina Croitoru (University of Montpellier, CNRS, LIRMM, France) * Achim Rettinger (University of Trier, Germany) PROGRAM COMMITTEE * Preslav Nakov, MBZUAI, United Arabe Emirates * Serena Villata, I3S, CNRS, France * Ronald Denaux, Amazon, USA * Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands * Elena Montiel, Universidad Politécnica de Madrid, Spain * Sandra Bringay, University Paul Valéry, France * Carlos Badenes, Universidad Politécnica de Madrid, Spain * Ioana Manolescu, Inria Saclay, France * Dino Ienco, INRAE, France * Colin Porlezza, Univ. della Svizzera Italiana, Switzerland * Katarina Boland, Heinrich Heine Universität, Germany * Gabriella Lapesa, GESIS, Germany * Jonas Fegert, FZI, Germany * Michael Färber, TU-Dresden, Germany * Salim Hafid, University of Montpellier, France * Pavlos Fafalios, FORTH, Greece * Andrés García Silva, Expert.ai, Spain * Sarah Labelle, University Paul Valéry, France * Pablo Calleja, Universidad Politécnica de Madrid, Spain *Patricia Martín Chozas* *Assistant Professor *at the Applied Linguistics Department *Postdoctoral Researcher *at the Ontology Engineering Group (Artificial Intelligence Department) ETSI Informáticos - Universidad Politécnica de Madrid Phone: (+34) 910673091

2 1

Call for Participation: NLP Winter School -- 5th ALPS
by Matthias Gallé 22 Aug '24

22 Aug '24

FIRST CALL FOR PARTICIPATION Advanced Language Processing School (ALPS) 2025 March 30th - April 4th 2025 Aussois (French Alps) We are pleased to announce the 5th edition of ALPS - the Advanced NLP School to be held in the French Alps from March 30th to April 4th 2025. This school targets advanced research students in Natural Language Processing and related fields and brings together world leading experts and motivated students. The programme comprises lectures, poster presentations, practical lab sessions and nature activities - the venue is located near a National Park. Important Dates - Oct 15th 2024: Application deadline - Nov 15th 2024: acceptance notification - Jan 15th 2025: registration deadline - March 30th 2025: Start of School Confirmed speakers so far: - Kyunghyun Cho (New York University & Prescient Design) - Titouan Parcollet (Cambridge University & Samsung AI Center) - Barbara Plank (LMU Munich) - François Yvon (ISIR CNRS) Website and online application: https://alps.imag.fr/ <http://alps.imag.fr/> Questions: alps(a)univ-grenoble-alpes.fr The registration fees for the event encompass accommodation and full board at the conference venue, the Centre Paul Langevin <https://lig-alps.imag.fr/index.php/venue/>. We will announce the fee amounts later, and they will vary depending on the participant's background: students, academia, and industry. Student fees will be set at or below €600, including twin room accommodation. We will have a limited amount of scholarships for the registration: if you are interested please mark this in the application form. The rates for academia and industry will be higher, as is customary, and will include accommodation in a single room.

2 1

First Call for Papers: The First Workshop on Language Models for Low-Resource Languages (LoResLM 2025@COLING)
by Ranasinghe, Tharindu 22 Aug '24

22 Aug '24

Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages. >> Topics LoResLM 2025 invites submissions on a broad range of topics related to the development and evaluation of neural language models for low-resource languages, including but not limited to the following. * Building language models for low-resource languages. * Adapting/extending existing language models/large language models for low-resource languages. * Corpora creation and curation technologies for training language models/large language models for low-resource languages. * Benchmarks to evaluate language models/large language models in low-resource languages. * Prompting/in-context learning strategies for low-resource languages with large language models. * Review of available corpora to train/fine-tune language models/large language models for low-resource languages. * Multilingual/cross-lingual language models/large language models for low-resource languages. * Applications of language models/large language models for low-resource languages (i.e. machine translation, chatbots, content moderation, etc. >> Important Dates * Paper submission due – 5th November 2024 * Notification of acceptance – 25th November 2024 * Camera-ready due – 13th December 2024 * LoResLM 2025 workshop – 19th / 20th January 2025 co-located with COLING 2025 >> Submission Guidelines We follow the COLING 2025 standards for submission format and guidelines. LoResLM 2025 invites the submission of long papers of up to eight pages and short papers of up to four pages. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references), papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix. To prepare your submission, please make sure to use the COLING 2025 style files available here: * Latex - https://coling2025.org/downloads/coling-2025.zip * Word - https://coling2025.org/downloads/coling-2025.docx * Overleaf - https://www.overleaf.com/latex/templates/instructions-for-coling-2025-proce… Papers should be submitted through Softconf/START using the following link: https://softconf.com/coling2025/LoResLM25/ >> Organising Committee * Hansi Hettiarachchi, Lancaster University, UK * Tharindu Ranasinghe, Lancaster University, UK * Paul Rayson, Lancaster University, UK * Ruslan Mitkov, Lancaster University, UK * Mohamed Gaber, Birmingham City University, UK * Damith Premasiri, Lancaster University, UK * Fiona Anting Tan, National University of Singapore, Singapore * Lasitha Uyangodage, University of Münster, Germany URL - https://loreslm.github.io/ Twitter - https://x.com/LoResLM2025 Best Regards Tharindu Ranasinghe

2 1

[ECAI-2024] Call for Participation: 27th European Conference on Artificial Intelligence
by luis.magdalena＠gmail.com 22 Aug '24

22 Aug '24

Registration for ECAI-2024, the 27th European Conference on Artificial Intelligence, is now open. The early registration period will end on Monday, 19 August 2024. https://urldefense.com/v3/__https://www.ecai2024.eu/registration__;!!D9dNQw… Please join us during 19-24 October 2024 in Santiago de Compostela to mark the 50th anniversary since the first AI conference was held in Europe back in 1974. We are looking forward to an exciting programme with some 600 accepted papers across all areas of AI, as well as lots of special events, including invited talks, panel sessions, satellite workshops, tutorials, and more. -- Luis Magdalena Publicity Chair of the European Conference on Artificial Intelligence (ECAI-2024)

2 1

Call for participation: the 34th Meeting on Computational Linguistics in the Netherlands, Leiden University
by Wijnholds, G.J. (Gijs) 22 Aug '24

22 Aug '24

*Apologies for cross-posting* Dear colleague, We cordially invite you to participate in the 34th Meeting of Computational Linguistics in The Netherlands (CLIN34) which takes place in Leiden on Friday 30 August 2024. Besides a large and diverse programme of posters and oral presentations, we are happy to report that CLIN34 will have two keynote talks by: * Diana Maynard, Sheffield University * Dominique Blok and Erik de Graaf, TNO If you wish to participate, please register via the conference website: clin34.leidenuniv.nl<http://clin34.leidenuniv.nl/> The programme can also be found at: clin34.leidenuniv.nl/program/<https://clin34.leidenuniv.nl/program/> We hope to see you in Leiden in August! The CLIN34 organizers Leiden University

2 1

TAL journal - Non thematic issue - Call for papers
by Cecile Fabre 22 Aug '24

22 Aug '24

Non thematic issue of the TAL journal: 2025 Volume 66-1 http://tal-66-1.sciencesconf.org/ Editors: Maxime Amblard, Cécile Fabre, Benoit Favre and Sophie Rosset The call for volume 66-1 is open until December 31, 2024. NEW since 2023: Non-thematic issues of the Automatic Language Processing journal become "on the fly". Each paper in issue 66-1 will be evaluated as soon as it is submitted and will be published, subject to its acceptance, within an indicative period of six months after its submission. THEMES The journal Automatic Language Processing has an open call for papers. Submissions may concern theoretical and experimental contributions on all aspects of written, spoken, and signed language processing and computational linguistics, both theoretical and experimental, for example: Computational models of language Linguistic resources Statistical learning and modeling Intermodality and multimodality Language multiplicity and diversity Semantics and comprehension Information access and text mining Language production and processing/generation/synthesis Evaluation Explicability and reproducibility NLP in interaction with other disciplines, digital humanities This list is indicative. On all topics, it is essential that the aspects related to natural language processing are emphasized. We also welcome position papers and survey papers. LANGUAGE Manuscripts may be submitted in English or French. THE TAL JOURNAL TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, https://www.atala.org/revuetal) since 1959. TAL has an electronic mode of publication with immediate free access to published articles. SCHEDULE Submission deadline: on the fly until December 31, 2024 Notification to the authors after first review: two months after submission Notification to the authors after second review: two months after the first review Publication : two months after the second review FORMAT SUBMISSION Papers must be between 20 and 25 pages long, including references and appendices (with no possible derogation on the length). TAL is a double-blind review journal: it is thus necessary to anonymise the manuscript and the name of the pdf file. Self-references that reveal the author's identity must be avoided. Style sheets are available for download on the Web site of the journal. More information on: http://tal-66-1.sciencesconf.org/

2 1

2025

2024

2023

2022