=============== ===============
* We apologize if you receive multiple copies of this Tutorial program * * For the online version of this program, visit: https://cikm2024.org/tutorials/
===============
CIKM 2024: 33rd ACM International Conference on Information and Knowledge Management
Boise, Idaho, USA
October 21–25, 2024
===============
The tutorial program of CIKM 2024 has been published. Tutorials are planned to take place on 21 October 2024. Here you can find a summary of each accepted tutorial.
=============== Systems for Scalable Graph Analytics and Machine Learning =============== Da Yan (Indiana University Bloomington), Lyuheng Yuan (Indiana University Bloomington), Akhlaque Ahmad (Indiana University Bloomington) and Saugat Adhikari (Indiana University Bloomington)
Graph-theoretic algorithms and graph machine learning models are essential tools for addressing many real-life problems, such as social network analysis and bioinformatics. To support large-scale graph analytics, graph-parallel systems have been actively developed for over one decade, such as Google’s Pregel and Spark’s GraphX, which (i) promote a think-like-a-vertex computing model and target (ii) iterative algorithms and (iii) those problems that output a value for each vertex. However, this model is too restricted for supporting the rich set of heterogeneous operations for graph analytics and machine learning that many real applications demand. In recent years, two new trends emerge in graph-parallel systems research: (1) a novel think-like-a-task computing model that can efficiently support the various computationally expensive problems of subgraph search; and (2) scalable systems for learning graph neural networks. These systems effectively complement the diversity needs of graph-parallel tools that can flexibly work together in a comprehensive graph processing pipeline for real applications, with the capability of capturing structural features. This tutorial will provide an effective categorization of the recent systems in these two directions based on their computing models and adopted techniques, and will review the key design ideas of these systems.
=============== Fairness in Large Language Models: Recent Advances and Future =============== Thang Viet Doan (Florida International University), Zichong Wang (Florida International University), Minh Nhat Nguyen (Florida International University) and Wenbin Zhang (Florida International University)
Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. In this tutorial, we give a systematic overview of recent advances in the existing literature concerning fair LLMs. Specifically, a series of real-world case studies serve as a brief introduction to LLMs, and then an analysis of bias causes based on their training process follows. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, current research challenges and open questions are discussed.
=============== Unifying Graph Neural Networks across Spatial and Spectral Domains =============== Zhiqian Chen (Mississippi State University), Lei Zhang (Virginia Tech) and Liang Zhao (Emory University)
Over recent years, Graph Neural Networks (GNNs) have garnered significant attention. However, the proliferation of diverse GNN models, underpinned by various theoretical approaches, complicates model selection, as they are not readily comprehensible within a uniform framework. Early GNNs were implemented using spectral theory, while others were based on spatial theory. This divergence renders direct comparisons challenging. Moreover, the multitude of models within each domain further complicates evaluation. In this half-day tutorial, we examine state-of-the-art GNNs and introduce a comprehensive framework bridging spatial and spectral domains, elucidating their interrelationship. This framework enhances our understanding of GNN operations. The tutorial explores key paradigms, such as spatial and spectral methods, through a synthesis of spectral graph theory and approximation theory. We provide an in-depth analysis of recent research developments, including emerging issues like over-smoothing, using well-established GNN models to illustrate our framework's universality.
=============== Tabular Data-centric AI: Challenges, Techniques and Future Perspectives =============== Yanjie Fu (Arizona State University), Dongjie Wang (University of Kansas), Hui Xiong (Hong Kong University of Science and Technology (Guangzhou)) and Kunpeng Liu (Portland State University)
Tabular data is ubiquitous across various application domains such as biology, ecology, and material science. Tabular data-centric AI aims to enhance the predictive power of AI through better utilization of tabular data, improving its readiness at structural, predictive, interaction, and expression levels. This tutorial targets professionals in AI, machine learning, and data mining, as well as researchers from specific application areas. We will cover the settings, challenges, existing methods, and future directions of tabular data-centric AI. The tutorial includes a hands-on session to develop, evaluate, and visualize techniques in this emerging field, equipping attendees with a thorough understanding of its key challenges and techniques for integration into their research.
=============== Frontiers of Large Language Model-Based Agentic Systems =============== Reshmi Ghosh (Microsoft), Jia He (Microsoft Corp.), Kabir Walia (Microsoft), Jieqiu Chen (Microsoft), Tushar Dhadiwal (Microsoft), April Hazel (Microsoft) and Chandra Inguva (Microsoft)
Large Language Models (LLMs) have recently demonstrated remarkable potential in achieving human-level intelligence, sparking a surge of interest in LLM-based autonomous agents. However, there is a noticeable absence of a thorough guide that methodically compiles the latest methods for building LLM-agents, their assessment, and the associated challenges. As a pioneering initiative, this tutorial delves into the intricacies of constructing LLM-based agents, providing a systematic exploration of key components and recent innovations. We dissect agent design using an established taxonomy, focusing on essential keywords prevalent in agent-related framework discussions. Key components include profiling, perception, memory, planning, and action. We unravel the intricacies of each element, emphasizing state-of-the-art techniques. Beyond individual agents, we explore the extension from single-agent paradigms to multi-agent frameworks. Participants will gain insights into orchestrating collaborative intelligence within complex environments. Additionally, we introduce and compare popular open-source frameworks for LLM-based agent development, enabling practitioners to choose the right tools for their projects. We discuss evaluation methodologies for assessing agent systems, addressing efficiency and safety concerns. We present a unified framework that consolidates existing work, making it a valuable resource for practitioners and researchers alike.
=============== Hands-On Introduction to Quantum Machine Learning =============== Samuel Yen-Chi Chen (Wells Fargo) and Joongheon Kim (Korea University)
This tutorial offers a hands-on introduction into the captivating field of quantum machine learning (QML). Beginning with the bedrock of quantum information science (QIS)—including essential elements like qubits, single and multiple qubit gates, measurements, and entanglement—the session swiftly progresses to foundational QML concepts. Participants will explore parametrized or variational circuits, data encoding or embedding techniques, and quantum circuit design principles. Delving deeper, attendees will examine various QML models, including the quantum support vector machine (QSVM), quantum feed-forward neural network (QNN), and quantum convolutional neural network (QCNN). Pushing boundaries, the tutorial delves into cutting-edge QML models such as quantum recurrent neural networks (QRNN) and quantum reinforcement learning (QRL), alongside privacy-preserving techniques like quantum federated machine learning, bolstered by concrete programming examples. Throughout the tutorial, all topics and concepts are brought to life through practical demonstrations executed on a quantum computer simulator. Designed with novices in mind, the content caters to those eager to embark on their journey into QML. Attendees will also receive guidance on further reading materials, as well as software packages and frameworks to explore beyond the session.
=============== On the Use of Large Language Models for Table Tasks =============== Yuyang Dong (NEC), Masafumi Oyamada (NEC), Chuan Xiao (Osaka University, Nagoya University) and Haochen Zhang (Osaka University)
The proliferation of LLMs has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a variety of table-related tasks, such as table understanding, text-to-SQL conversion, and tabular data preprocessing. It surveys LLM solutions to these tasks in five classes, categorized by their underpinning techniques: prompting, fine-tuning, RAG, agents, and multimodal methods. It discusses how LLMs offer innovative ways to interpret, augment, query, and cleanse tabular data, featuring academic contributions and their practical use in the industrial sector. It emphasizes the versatility and effectiveness of LLMs in handling complex table tasks, showcasing their ability to improve data quality, enhance analytical capabilities, and facilitate more intuitive data interactions. By surveying different approaches, this tutorial highlights the strengths of LLMs in enriching table tasks with more accuracy and usability, setting a foundation for future research and application in data science and AI-driven analytics.
=============== Data Quality-aware Graph Machine Learning =============== Yu Wang (Vanderbilt University), Kaize Ding (Northwestern University), Xiaorui Liu (North Carolina State University), Jian Kang (University of Rochester), Ryan Rossi (Adobe Research) and Tyler Derr (Vanderbilt University)
Recent years have seen a significant shift in Artificial Intelligence from model-centric to data-centric approaches, highlighted by the success of large foundational models. Following this trend, despite numerous innovations in graph machine learning model design, graph-structured data often suffers from data quality issues, which jeopardizes the progress of Data-centric AI in graph-structured applications. Our proposed tutorial aims to address this gap by raising awareness about data quality issues within the graph machine-learning community. We provide an overview of existing issues, including topology, imbalance, bias, limited data, and abnormalities in graph data. Additionally, we highlight previous studies and recent developments in foundational graph models that focus on identifying, investigating, mitigating, and resolving these issues.
=============== Towards Efficient Temporal Graph Learning: Algorithms, Frameworks, and Tools =============== Ruijie Wang (University of Illinois Urbana-Champaign), Wanyu Zhao (University of Illinois Urbana-Champaign), Dachun Sun (University of Illinois Urbana-Champaign), Charith Mendis (University of Illinois Urbana-Champaign) and Tarek Abdelzaher (University of Illinois Urbana-Champaign)
Temporal graphs capture dynamic node relations via temporal edges, finding extensive utility in wide domains where time-varying patterns are crucial. Temporal Graph Neural Networks (TGNNs) have gained significant attention for their effectiveness in representing temporal graphs. However, TGNNs still face significant efficiency challenges in real-world low-resource settings. First, from a data-efficiency standpoint, training TGNNs requires sufficient temporal edges and data labels, which is problematic in practical scenarios with limited data collection and annotation. Second, from a resource-efficiency perspective, TGNN training and inference are computationally demanding due to complex encoding operations, especially on large-scale temporal graphs. Minimizing resource consumption while preserving effectiveness is essential. Inspired by these efficiency challenges, this tutorial systematically introduces state-of-the-art data-efficient and resource-efficient TGNNs, focusing on algorithms, frameworks, and tools, and discusses promising yet under-explored research directions in efficient temporal graph learning. This tutorial aims to benefit researchers and practitioners in data mining, machine learning, and artificial intelligence.
=============== Landing Generative AI in Industrial Social and E-commerce Recsys =============== Da Xu (LinkedIn), Danqing Zhang (Amazon), Lingling Zheng (Microsoft), Bo Yang (Amazon), Guangyu Yang (TikTok), Shuyuan Xu (TikTok) and Cindy Liang (LinkedIn)
Over the past two years, GAI has evolved rapidly, influencing various fields including social and e-commerce Recsys. Despite exciting advances, landing these innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems. Our tutorial begins with a brief overview of building industrial Recsys and GAI fundamentals, followed by the ongoing efforts and opportunities to enhance personalized recommendations with foundation models. We then explore the integration of curation capabilities into Recsys, such as repurposing raw content, incorporating external knowledge, and generating personalized insights/explanations to foster transparency and trust. Next, the tutorial illustrates how AI agents can transform Recsys through interactive reasoning and action loops, shifting away from traditional passive feedback models. Finally, we shed insights on real-world solutions for human-AI alignment and responsible GAI practices. A critical component of the tutorial is detailing the AI, Infrastructure, LLMOps, and Product roadmap (including the evaluation and responsible AI practices) derived from the production solutions in LinkedIn, Amazon, TikTok, and Microsoft. While GAI in Recsys is still in its early stages, this tutorial provides valuable insights and practical solutions for the Recsys and GAI communities.
=============== Transforming Digital Forensics with Large Language Models =============== Eric Xu (University of Maryland, College Park), Wenbin Zhang (Florida International University) and Weifeng Xu (University of Baltimore)
In the pursuit of justice and accountability in the digital age, the integration of Large Language Models (LLMs) with digital forensics holds immense promise. This half-day tutorial provides a comprehensive exploration of the transformative potential of LLMs in automating digital investigations and uncovering hidden insights. Through a combination of real-world case studies, interactive exercises, and hands-on labs, participants will gain a deep understanding of how to harness LLMs for evidence analysis, entity identification, and knowledge graph reconstruction. By fostering a collaborative learning environment, this tutorial aims to empower professionals, researchers, and students with the skills and knowledge needed to drive innovation in digital forensics. As LLMs continue to revolutionize the field, this tutorial will have far-reaching implications for enhancing justice outcomes, promoting accountability, and shaping the future of digital investigations.
=============== Collecting and Analyzing Public Data from Mastodon =============== Haris Bin Zia (Queen Mary University of London), Ignacio Castro (none) and Gareth Tyson (Hong Kong University of Science and Technology)
Understanding online behaviors, communities, and trends through social media analytics is becoming increasingly important. Recent changes in the accessibility of platforms like Twitter have made Mastodon a valuable alternative for researchers. In this tutorial, we will explore methods for collecting and analyzing public data from Mastodon, a decentralized micro-blogging social network. Participants will learn about the architecture of Mastodon, techniques and best practices for data collection, and various analytical methods to derive insights from the collected data. This session aims to equip researchers with the skills necessary to harness the potential of Mastodon data in computational social science and social data science research.