August 2024 - Corpora

Third Call for Participation - CoLI-Dravidian@FIRE 2024: Word-level Code-Mixed Language Identification in Dravidian Languages
by Sabur B 22 Aug '24

22 Aug '24

****We apologize for multiple postings of this e-mail**** CALL FOR PARTICIPATION FIRE 2024 Task - CoLI-Dravidian: Word-level Code-Mixed Language Identification in Dravidian Languages Held as a shared task in the 16th meeting of Forum for Information Retrieval Evaluation (FIRE 2024 <http://fire.irsi.org.in/fire/2024/home>) December 12-15, 2024. DAIICT, Gandhinagar, India Website: https://sites.google.com/view/coli-dravidian-2024/datasets?authuser=0 Codalab link: https://codalab.lisn.upsaclay.fr/competitions/19357 Dear All, We are inviting researchers and students to participate in the shared task CoLI-Dravidian: Word-level Code-Mixed Language Identification in Dravidian Languages, which is held as a shared task in the 16th meeting of Forum for Information Retrieval Evaluation (FIRE 2024 <http://fire.irsi.org.in/fire/2024/home>). Language Identification (LI) involves detecting the language(s) used in a given text, which is a preliminary step for many applications such as sentiment analysis, machine translation, information retrieval, and natural language understanding. In multilingual India, especially among the youth, social media often features code-mixed text, blending local languages with English at various levels. However, this poses significant challenges for LI, particularly when languages are mixed within a single word. Dravidian languages, extensively spoken in southern India, are under-resourced despite their rich morphological structure. These languages face technological challenges, especially in script representation on digital platforms, leading users to prefer Roman or hybrid scripts for communication. This prevalent code-mixing offers vast linguistic data for research yet remains understudied. To address word-level LI challenges in code-mixed Dravidian languages, we are conducting a shared task by providing code-mixed datasets for four languages - Kannada, Tamil, Malayalam, and Tulu, to encourage the development of advanced LI models. There will be a real-time leaderboard, and the participants will be allowed to make a maximum of 10 submissions in the training phase and 5 submissions in the testing phase through CodaLab. Each team will have to select the best submission for ranking. To download the data and participate, go to: https://codalab.lisn.upsaclay.fr/competitions/19357. Best regards, The CoLI-Dravidian 2024 Organizing Committee Important dates - 14th June 2024 - open track websites and training data release - 1st July 2024– test data release - 25th July – run submission deadline - 27th July – results declared - 27th August – Working notes due - 10th September - Reviews - 30th October – Camera-ready copies of working notes NOTE: All dates mentioned here are in the AoE (Anywhere on Earth) zone. Organizing Committee - Shashirekha Hosahalli Lakshmaiah, Department of Computer Science, Mangalore University, India. - Ameeta Agrawal, Department of Computer Science, Portland State University, USA. - Fazlourrahman Balouchzahi, CIC, IPN, Mexico. - Asha Hegde, Department of Computer Science, Mangalore University, India. - Sabur Butt, IFE, Tecnologico de Monterrey, Mexico. - Sharal Coelho, Department of Computer Science, Mangalore University, India. - Kavya G, Department of Computer Science, Mangalore University, India. - Harshitha, Department of Computer Science, Mangalore University, India. - Sonith D, Department of Computer Science, Mangalore University, India. *Sabur Butt, Ph.D. *(He/Him) Institute for the Future of Education (IFE) *Tecnológico de Monterrey, Mexico* Address: Av. Eugenio Garza Sada 2501 Sur Tecnológico, 64849 Monterrey, N.L. LinkedIn <https://www.linkedin.com/in/saburb> - GitHub <https://github.com/saburbutt> - Scholar <https://scholar.google.com/citations?user=re7md-0AAAAJ&hl=en> - Website <https://saburbutt.github.io/>

2 1

Second Call for Papers: GenBench, the second workshop on generalisation (benchmarking) in NLP @ EMNLP 2024
by Verna 22 Aug '24

22 Aug '24

*GenBench: The second workshop on generalisation (benchmarking) in NLP* *Workshop description*The ability to generalise well is often mentioned as one of the primary desiderata for models of natural language processing (NLP). Yet, there are still many open questions related to what it means for an NLP model to generalise well, and how generalisation should be evaluated. LLMs, trained on gigantic training corpora that are – at best – hard to analyse or not publicly available at all, bring a new set of challenges to the topic. The second GenBench workshop aims to serve as a cornerstone to catalyse research on generalisation in the NLP community. The workshop aims to bring together different expert communities to discuss challenging questions relating to generalisation in NLP, crowd-source challenging generalisation benchmarks for LLMs, and make progress on open questions related to generalisation. Topics of interest include, but are not limited to: - Opinion or position papers about generalisation and how it should be evaluated; - Analyses of how existing or new models generalise; - Empirical studies that propose new paradigms to evaluate generalisation; - Meta-analyses that compare how results from different generalisation studies compare; - Meta-analyses that study how different types of generalisation are related; - Papers that discuss how generalisation of LLMs can be evaluated; - Papers that discuss why generalisation is (not) important in the era of LLMs; - Studies on the relationship between generalisation and fairness or robustness. The second GenBench workshop on generalisation (benchmarking) in NLP will be co-located with EMNLP 2024. *Submission types* We call for two types of submissions: regular workshop submissions and collaborative benchmarking task submissions. The latter will consist of a data/task artefact and a companion paper motivating and evaluating the submission. In both cases, we accept archival papers and extended abstracts. *1. Regular workshop submissions* Regular workshop submissions present papers on the topic of generalisation (see examples listed above). Regular workshop papers may be submitted as an archival paper, when they report on completed, original and unpublished research, or as a shorter extended abstract, otherwise. More details on this category can be found below. If you are unsure whether a specific topic is well-suited for submission, feel free to reach out to the organisers of the workshop at genbench(a)googlegroups.com. *2. Collaborative Benchmarking Task (CBT) submissions* The goal of this year's CBT is to generate versions of existing evaluation datasets for LLMs which, given a particular training corpus, have a larger distribution shift than the original test set, or – in other words – evaluate generalisation to a stronger degree than the original dataset. For this particular challenge, we focus on three training corpora: C4, RedPajama-Data-1T, and Dolma. All three corpora are publicly available, and they can be searched via the What's in My Big Data API (https://github.com/allenai/wimbd). We will focus on three popular evaluation datasets: MMLU, HumanEval, and SiQA. Submitters to the CBT are asked to design a way to assess distribution shift for one or more of these evaluation datasets, given particular features of the training corpus, and then generate one or more versions of the dataset that have a larger distribution shift according to this method. Newly generated sets do not have to have the same size as the original test set, but should have at least 200 examples. Practically speaking, CBT submissions consist of: 1. the data/task artefact, submitted through https://github.com/GenBench/genbench_cbt 2. a paper describing the dataset and its method of construction, submitted through https://openreview.net/group?id=GenBench.org/2024/Workshop We accept submissions that consider only one pretraining dataset and evaluation dataset, but encourage submitters to apply their suggested protocols to both pretraining datasets. We also suggest that submitters include model results for models trained on these datasets. Suggestions are provided on the CBT website: https://genbench.org/cbt. Given enough high-quality submissions, we aim to write a paper with the combined results, to which submitters can be co-authors, if they wish so. More detailed guidelines will be given on https://genbench.org/cbt. *Archival vs extended abstract* Archival papers are up to 8 pages excluding references and report on completed, original and unpublished research. They follow the requirements of regular EMNLP 2024 submissions. Accepted papers will be published in the workshop proceedings and are expected to be presented at the workshop. The papers will undergo double-blind peer review and should thus be anonymised. Extended abstracts can be up to 2 pages excluding references, and may report on work in progress or be cross-submissions of work that has already appeared in another venue. Abstract titles will be posted on the workshop website, but will not be included in the proceedings. *Submission instructions*For both archival papers and extended abstracts, we refer to the EMNLP 2024 website for paper templates and requirements. Additional requirements for both regular workshop papers and collaborative benchmarking task submissions can be found on our website. All submissions can be submitted through OpenReview: https://openreview.net/group?id=GenBench.org/2024/Workshop. *Important dates* - August 15, 2024: Paper submission deadline - September 20, 2024: Notification deadline - October 4, 2024: Camera-ready deadline - November 15 or 16, 2024: Workshop Note: all deadlines are 11:59 PM UTC-12:00. Check the website for final updates to these deadlines (https://genbench.org/workshop). *Preprints* We do not have an anonymity deadline, preprints are allowed, both before the submission deadline as well as after. *Contact* Email address: genbench(a)googlegroups.com Website: https://genbench.org/workshop *On behalf of the organisers*Dieuwke Hupkes Verna Dankers Khuyagbaatar Batsuren Amirhossein Kazemnejad Christos Christodoulopoulos Mario Giulianelli Ryan Cotterell

2 1

[CfP] 2nd CfP - LLMs Beyond the Cutoff workshop @ CIKM 2024
by Patricia Martín Chozas 22 Aug '24

22 Aug '24

*Apologies for crossposting* LLMs Beyond the Cutoff: 1st International Workshop on Computational Methods Beyond the Temporal Borders of Training Data https://llmsbeyondthecutoff2024.wordpress.com Collocated with CIKM 2024 October 25, 2024 — Boise (Idaho), USA * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date === NEWS === * LLMs Beyond the CutOff will be published as a volume of Springer Nature’s post-proceedings * Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff * Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu… SUMMARY LLMs are trained on large amounts of web data that spread temporally up to a specific moment in time. For instance, chatGPT’s LLM “knows” the world before May 2023 with no real time access to information beyond this limit, other than a browsing tool similar to a search engine enabling simple lookup. However, in many scenarios, being able to analyze and reason with novel emerging events and topics is crucial to face the challenges of rapidly evolving landscapes of information. The workshop provides an interdisciplinary forum for discussing the temporal limitations of LLMs and proposing technical solutions of how to apply and develop LLMs beyond their cutoff dates. We explore two prominent scenarios, where contexts tend to evolve faster than the LLMs that are used to analyze them: (1) journalism and (2) industry. In terms of (1) the goal is to propose methods of detecting, classifying and reasoning with emerging topics that infuse public discourse on social or mainstream media. An example of such a topic is COVID-19 at the dawn of the pandemics outbreak. Downstream tasks of interest are fake news detection and fact-checking on novel topics, including claim analysis, opinion mining and narratives extraction. With regard to (2), the goal is to shed light on the limits of LLMs for companies in sectors such as international geopolitical monitoring and corporate intelligence, finance and stock market trading or insurance, where companies need to track their interests and products in real time. This does not address the inclusion of corporate data into the LLMs, but rather proposes solutions by using publicly available and constantly growing data. An overarching problem that will be studied is that of the cross-language and cross-country specificities of emerging data, where novel information in underrepresented languages or contexts may be more challenging to analyze. We welcome insights and parallels from the field of knowledge representation, where the similar problem with cutoff dates of knowledge graphs (dynamics and regular updates) is well understood. The expected outcomes are: 1) insights on the temporal limitations of LLMs, where the workshop will outline concrete challenges and bottlenecks in the identified scenarios; 2) novel methodological and technical solutions in terms of (incremental) machine learning models when dealing with (reasoning, extracting and classifying) information beyond the cutoff dates of current LLMs. TOPICS OF INTEREST * Analysis of emerging topics and events, including counterfactual/what-if reasoning * Methods for few-shot or zero-shot learning * Large language models for online discourse * Large language models for corporate near real-time data analysis * Large language models for multimodal understanding and generation * Multilingual and cross-country emerging information extraction * Computational journalism, disinformation spread, fact-checking and fake news detection * Stance and viewpoint discovery for novel information * Detection and classification of claims within emerging narratives * Social, ethical and legal aspects of LLMs up-to-dateness * Interpretability / explainability of computational methods beyond the cut off * Linking and enrichment of data beyond LLM cut off * Foundational models for knowledge graph building and entity alignment * Recommender systems for novel information * Quality, provenance, uncertainty and trust of emerging information and data * Use-cases, applications and cross-community interfaces * Evaluation frameworks and benchmarks SUBMISSION We welcome the following types of contributions: * Full papers (12-15 pages including references): contain original research. * Short papers (up to 11 pages including references): contain original research in progress. * Demo papers (up to 11 pages including references): contain descriptions of prototypes, demos or software systems. * Data papers (up to 11 pages including references): contain descriptions of resources related to the workshop topics, such as datasets, knowledge graphs, corpora, annotation protocols, etc. * Position papers (up to 11 pages including references): discuss vision statements or research directions. Workshop papers must be self-contained and in English. They should not have been previously published, should not be considered for publication, and should not be under review for another workshop, conference, or journal. Manuscripts should be submitted via EasyChair ( https://easychair.org/conferences/?conf=llmsbeyondthecut0ff) in PDF format, using the Springer LNCS format. For full authors instructions, please check Springer’s website: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…. The review of manuscripts will be double-blind. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted contribution must register for the workshop and present the paper. Pre-prints of all contributions will be made available during the conference. The accepted papers will appear as a volume of Springer Nature’s LNCS post-proceedings. Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu… For any enquiries, please contact the workshop organizers: todorov(a)lirmm.fr, rettinger(a)uni-trier.de, jmgomez(a)expert.ai, croitoru(a)lirmm.fr, IMPORTANT DATES * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time zone. KEYNOTES * TBA AWARD * All contributions are eligible for the "Best Paper" award ORGANIZING COMMITTEE * Konstantin Todorov (University of Montpellier, CNRS, LIRMM, France) * José Manuel Gomèz Perèz (Expert.ai, Spain) * Madalina Croitoru (University of Montpellier, CNRS, LIRMM, France) * Achim Rettinger (University of Trier, Germany) PROGRAM COMMITTEE * Preslav Nakov, MBZUAI, United Arabe Emirates * Serena Villata, I3S, CNRS, France * Ronald Denaux, Amazon, USA * Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands * Elena Montiel, Universidad Politécnica de Madrid, Spain * Sandra Bringay, University Paul Valéry, France * Carlos Badenes, Universidad Politécnica de Madrid, Spain * Ioana Manolescu, Inria Saclay, France * Dino Ienco, INRAE, France * Colin Porlezza, Univ. della Svizzera Italiana, Switzerland * Katarina Boland, Heinrich Heine Universität, Germany * Gabriella Lapesa, GESIS, Germany * Jonas Fegert, FZI, Germany * Michael Färber, TU-Dresden, Germany * Salim Hafid, University of Montpellier, France * Pavlos Fafalios, FORTH, Greece * Andrés García Silva, Expert.ai, Spain * Sarah Labelle, University Paul Valéry, France * Pablo Calleja, Universidad Politécnica de Madrid, Spain *Patricia Martín Chozas* *Assistant Professor *at the Applied Linguistics Department *Postdoctoral Researcher *at the Ontology Engineering Group (Artificial Intelligence Department) ETSI Informáticos - Universidad Politécnica de Madrid Phone: (+34) 910673091

2 1

[CIKM-2024] - Tutorial program announced
by antonela.tommasel＠isistan.unicen.edu.ar 22 Aug '24

22 Aug '24

=============== =============== * We apologize if you receive multiple copies of this Tutorial program * * For the online version of this program, visit: https://cikm2024.org/tutorials/ =============== CIKM 2024: 33rd ACM International Conference on Information and Knowledge Management Boise, Idaho, USA October 21–25, 2024 =============== The tutorial program of CIKM 2024 has been published. Tutorials are planned to take place on 21 October 2024. Here you can find a summary of each accepted tutorial. =============== Systems for Scalable Graph Analytics and Machine Learning =============== Da Yan (Indiana University Bloomington), Lyuheng Yuan (Indiana University Bloomington), Akhlaque Ahmad (Indiana University Bloomington) and Saugat Adhikari (Indiana University Bloomington) Graph-theoretic algorithms and graph machine learning models are essential tools for addressing many real-life problems, such as social network analysis and bioinformatics. To support large-scale graph analytics, graph-parallel systems have been actively developed for over one decade, such as Google’s Pregel and Spark’s GraphX, which (i) promote a think-like-a-vertex computing model and target (ii) iterative algorithms and (iii) those problems that output a value for each vertex. However, this model is too restricted for supporting the rich set of heterogeneous operations for graph analytics and machine learning that many real applications demand. In recent years, two new trends emerge in graph-parallel systems research: (1) a novel think-like-a-task computing model that can efficiently support the various computationally expensive problems of subgraph search; and (2) scalable systems for learning graph neural networks. These systems effectively complement the diversity needs of graph-parallel tools that can flexibly work together in a comprehensive graph processing pipeline for real applications, with the capability of capturing structural features. This tutorial will provide an effective categorization of the recent systems in these two directions based on their computing models and adopted techniques, and will review the key design ideas of these systems. =============== Fairness in Large Language Models: Recent Advances and Future =============== Thang Viet Doan (Florida International University), Zichong Wang (Florida International University), Minh Nhat Nguyen (Florida International University) and Wenbin Zhang (Florida International University) Large Language Models (LLMs) have demonstrated remarkable success across various domains but often lack fairness considerations, potentially leading to discriminatory outcomes against marginalized populations. On the other hand, fairness in LLMs, in contrast to fairness in traditional machine learning, entails exclusive backgrounds, taxonomies, and fulfillment techniques. In this tutorial, we give a systematic overview of recent advances in the existing literature concerning fair LLMs. Specifically, a series of real-world case studies serve as a brief introduction to LLMs, and then an analysis of bias causes based on their training process follows. Additionally, the concept of fairness in LLMs is discussed categorically, summarizing metrics for evaluating bias in LLMs and existing algorithms for promoting fairness. Furthermore, resources for evaluating bias in LLMs, including toolkits and datasets, are summarized. Finally, current research challenges and open questions are discussed. =============== Unifying Graph Neural Networks across Spatial and Spectral Domains =============== Zhiqian Chen (Mississippi State University), Lei Zhang (Virginia Tech) and Liang Zhao (Emory University) Over recent years, Graph Neural Networks (GNNs) have garnered significant attention. However, the proliferation of diverse GNN models, underpinned by various theoretical approaches, complicates model selection, as they are not readily comprehensible within a uniform framework. Early GNNs were implemented using spectral theory, while others were based on spatial theory. This divergence renders direct comparisons challenging. Moreover, the multitude of models within each domain further complicates evaluation. In this half-day tutorial, we examine state-of-the-art GNNs and introduce a comprehensive framework bridging spatial and spectral domains, elucidating their interrelationship. This framework enhances our understanding of GNN operations. The tutorial explores key paradigms, such as spatial and spectral methods, through a synthesis of spectral graph theory and approximation theory. We provide an in-depth analysis of recent research developments, including emerging issues like over-smoothing, using well-established GNN models to illustrate our framework's universality. =============== Tabular Data-centric AI: Challenges, Techniques and Future Perspectives =============== Yanjie Fu (Arizona State University), Dongjie Wang (University of Kansas), Hui Xiong (Hong Kong University of Science and Technology (Guangzhou)) and Kunpeng Liu (Portland State University) Tabular data is ubiquitous across various application domains such as biology, ecology, and material science. Tabular data-centric AI aims to enhance the predictive power of AI through better utilization of tabular data, improving its readiness at structural, predictive, interaction, and expression levels. This tutorial targets professionals in AI, machine learning, and data mining, as well as researchers from specific application areas. We will cover the settings, challenges, existing methods, and future directions of tabular data-centric AI. The tutorial includes a hands-on session to develop, evaluate, and visualize techniques in this emerging field, equipping attendees with a thorough understanding of its key challenges and techniques for integration into their research. =============== Frontiers of Large Language Model-Based Agentic Systems =============== Reshmi Ghosh (Microsoft), Jia He (Microsoft Corp.), Kabir Walia (Microsoft), Jieqiu Chen (Microsoft), Tushar Dhadiwal (Microsoft), April Hazel (Microsoft) and Chandra Inguva (Microsoft) Large Language Models (LLMs) have recently demonstrated remarkable potential in achieving human-level intelligence, sparking a surge of interest in LLM-based autonomous agents. However, there is a noticeable absence of a thorough guide that methodically compiles the latest methods for building LLM-agents, their assessment, and the associated challenges. As a pioneering initiative, this tutorial delves into the intricacies of constructing LLM-based agents, providing a systematic exploration of key components and recent innovations. We dissect agent design using an established taxonomy, focusing on essential keywords prevalent in agent-related framework discussions. Key components include profiling, perception, memory, planning, and action. We unravel the intricacies of each element, emphasizing state-of-the-art techniques. Beyond individual agents, we explore the extension from single-agent paradigms to multi-agent frameworks. Participants will gain insights into orchestrating collaborative intelligence within complex environments. Additionally, we introduce and compare popular open-source frameworks for LLM-based agent development, enabling practitioners to choose the right tools for their projects. We discuss evaluation methodologies for assessing agent systems, addressing efficiency and safety concerns. We present a unified framework that consolidates existing work, making it a valuable resource for practitioners and researchers alike. =============== Hands-On Introduction to Quantum Machine Learning =============== Samuel Yen-Chi Chen (Wells Fargo) and Joongheon Kim (Korea University) This tutorial offers a hands-on introduction into the captivating field of quantum machine learning (QML). Beginning with the bedrock of quantum information science (QIS)—including essential elements like qubits, single and multiple qubit gates, measurements, and entanglement—the session swiftly progresses to foundational QML concepts. Participants will explore parametrized or variational circuits, data encoding or embedding techniques, and quantum circuit design principles. Delving deeper, attendees will examine various QML models, including the quantum support vector machine (QSVM), quantum feed-forward neural network (QNN), and quantum convolutional neural network (QCNN). Pushing boundaries, the tutorial delves into cutting-edge QML models such as quantum recurrent neural networks (QRNN) and quantum reinforcement learning (QRL), alongside privacy-preserving techniques like quantum federated machine learning, bolstered by concrete programming examples. Throughout the tutorial, all topics and concepts are brought to life through practical demonstrations executed on a quantum computer simulator. Designed with novices in mind, the content caters to those eager to embark on their journey into QML. Attendees will also receive guidance on further reading materials, as well as software packages and frameworks to explore beyond the session. =============== On the Use of Large Language Models for Table Tasks =============== Yuyang Dong (NEC), Masafumi Oyamada (NEC), Chuan Xiao (Osaka University, Nagoya University) and Haochen Zhang (Osaka University) The proliferation of LLMs has catalyzed a diverse array of applications. This tutorial delves into the application of LLMs for tabular data and targets a variety of table-related tasks, such as table understanding, text-to-SQL conversion, and tabular data preprocessing. It surveys LLM solutions to these tasks in five classes, categorized by their underpinning techniques: prompting, fine-tuning, RAG, agents, and multimodal methods. It discusses how LLMs offer innovative ways to interpret, augment, query, and cleanse tabular data, featuring academic contributions and their practical use in the industrial sector. It emphasizes the versatility and effectiveness of LLMs in handling complex table tasks, showcasing their ability to improve data quality, enhance analytical capabilities, and facilitate more intuitive data interactions. By surveying different approaches, this tutorial highlights the strengths of LLMs in enriching table tasks with more accuracy and usability, setting a foundation for future research and application in data science and AI-driven analytics. =============== Data Quality-aware Graph Machine Learning =============== Yu Wang (Vanderbilt University), Kaize Ding (Northwestern University), Xiaorui Liu (North Carolina State University), Jian Kang (University of Rochester), Ryan Rossi (Adobe Research) and Tyler Derr (Vanderbilt University) Recent years have seen a significant shift in Artificial Intelligence from model-centric to data-centric approaches, highlighted by the success of large foundational models. Following this trend, despite numerous innovations in graph machine learning model design, graph-structured data often suffers from data quality issues, which jeopardizes the progress of Data-centric AI in graph-structured applications. Our proposed tutorial aims to address this gap by raising awareness about data quality issues within the graph machine-learning community. We provide an overview of existing issues, including topology, imbalance, bias, limited data, and abnormalities in graph data. Additionally, we highlight previous studies and recent developments in foundational graph models that focus on identifying, investigating, mitigating, and resolving these issues. =============== Towards Efficient Temporal Graph Learning: Algorithms, Frameworks, and Tools =============== Ruijie Wang (University of Illinois Urbana-Champaign), Wanyu Zhao (University of Illinois Urbana-Champaign), Dachun Sun (University of Illinois Urbana-Champaign), Charith Mendis (University of Illinois Urbana-Champaign) and Tarek Abdelzaher (University of Illinois Urbana-Champaign) Temporal graphs capture dynamic node relations via temporal edges, finding extensive utility in wide domains where time-varying patterns are crucial. Temporal Graph Neural Networks (TGNNs) have gained significant attention for their effectiveness in representing temporal graphs. However, TGNNs still face significant efficiency challenges in real-world low-resource settings. First, from a data-efficiency standpoint, training TGNNs requires sufficient temporal edges and data labels, which is problematic in practical scenarios with limited data collection and annotation. Second, from a resource-efficiency perspective, TGNN training and inference are computationally demanding due to complex encoding operations, especially on large-scale temporal graphs. Minimizing resource consumption while preserving effectiveness is essential. Inspired by these efficiency challenges, this tutorial systematically introduces state-of-the-art data-efficient and resource-efficient TGNNs, focusing on algorithms, frameworks, and tools, and discusses promising yet under-explored research directions in efficient temporal graph learning. This tutorial aims to benefit researchers and practitioners in data mining, machine learning, and artificial intelligence. =============== Landing Generative AI in Industrial Social and E-commerce Recsys =============== Da Xu (LinkedIn), Danqing Zhang (Amazon), Lingling Zheng (Microsoft), Bo Yang (Amazon), Guangyu Yang (TikTok), Shuyuan Xu (TikTok) and Cindy Liang (LinkedIn) Over the past two years, GAI has evolved rapidly, influencing various fields including social and e-commerce Recsys. Despite exciting advances, landing these innovations in real-world Recsys remains challenging due to the sophistication of modern industrial product and systems. Our tutorial begins with a brief overview of building industrial Recsys and GAI fundamentals, followed by the ongoing efforts and opportunities to enhance personalized recommendations with foundation models. We then explore the integration of curation capabilities into Recsys, such as repurposing raw content, incorporating external knowledge, and generating personalized insights/explanations to foster transparency and trust. Next, the tutorial illustrates how AI agents can transform Recsys through interactive reasoning and action loops, shifting away from traditional passive feedback models. Finally, we shed insights on real-world solutions for human-AI alignment and responsible GAI practices. A critical component of the tutorial is detailing the AI, Infrastructure, LLMOps, and Product roadmap (including the evaluation and responsible AI practices) derived from the production solutions in LinkedIn, Amazon, TikTok, and Microsoft. While GAI in Recsys is still in its early stages, this tutorial provides valuable insights and practical solutions for the Recsys and GAI communities. =============== Transforming Digital Forensics with Large Language Models =============== Eric Xu (University of Maryland, College Park), Wenbin Zhang (Florida International University) and Weifeng Xu (University of Baltimore) In the pursuit of justice and accountability in the digital age, the integration of Large Language Models (LLMs) with digital forensics holds immense promise. This half-day tutorial provides a comprehensive exploration of the transformative potential of LLMs in automating digital investigations and uncovering hidden insights. Through a combination of real-world case studies, interactive exercises, and hands-on labs, participants will gain a deep understanding of how to harness LLMs for evidence analysis, entity identification, and knowledge graph reconstruction. By fostering a collaborative learning environment, this tutorial aims to empower professionals, researchers, and students with the skills and knowledge needed to drive innovation in digital forensics. As LLMs continue to revolutionize the field, this tutorial will have far-reaching implications for enhancing justice outcomes, promoting accountability, and shaping the future of digital investigations. =============== Collecting and Analyzing Public Data from Mastodon =============== Haris Bin Zia (Queen Mary University of London), Ignacio Castro (none) and Gareth Tyson (Hong Kong University of Science and Technology) Understanding online behaviors, communities, and trends through social media analytics is becoming increasingly important. Recent changes in the accessibility of platforms like Twitter have made Mastodon a valuable alternative for researchers. In this tutorial, we will explore methods for collecting and analyzing public data from Mastodon, a decentralized micro-blogging social network. Participants will learn about the architecture of Mastodon, techniques and best practices for data collection, and various analytical methods to derive insights from the collected data. This session aims to equip researchers with the skills necessary to harness the potential of Mastodon data in computational social science and social data science research.

2 1

[ECAI-2024] Call for Participation: 27th European Conference on Artificial Intelligence
by luis.magdalena＠gmail.com 22 Aug '24

22 Aug '24

Registration for ECAI-2024, the 27th European Conference on Artificial Intelligence, is now open. The early registration period will end on Monday, 19 August 2024. https://urldefense.com/v3/__https://www.ecai2024.eu/registration__;!!D9dNQw… Please join us during 19-24 October 2024 in Santiago de Compostela to mark the 50th anniversary since the first AI conference was held in Europe back in 1974. We are looking forward to an exciting programme with some 600 accepted papers across all areas of AI, as well as lots of special events, including invited talks, panel sessions, satellite workshops, tutorials, and more. -- Luis Magdalena Publicity Chair of the European Conference on Artificial Intelligence (ECAI-2024)

2 1

First Call for Papers: The First Workshop on Language Models for Low-Resource Languages (LoResLM 2025@COLING)
by Ranasinghe, Tharindu 22 Aug '24

22 Aug '24

Neural language models have revolutionised natural language processing (NLP) and have provided state-of-the-art results for many tasks. However, their effectiveness is largely dependent on the pre-training resources. Therefore, language models (LMs) often struggle with low-resource languages in both training and evaluation. Recently, there has been a growing trend in developing and adopting LMs for low-resource languages. LoResLM aims to provide a forum for researchers to share and discuss their ongoing work on LMs for low-resource languages. >> Topics LoResLM 2025 invites submissions on a broad range of topics related to the development and evaluation of neural language models for low-resource languages, including but not limited to the following. * Building language models for low-resource languages. * Adapting/extending existing language models/large language models for low-resource languages. * Corpora creation and curation technologies for training language models/large language models for low-resource languages. * Benchmarks to evaluate language models/large language models in low-resource languages. * Prompting/in-context learning strategies for low-resource languages with large language models. * Review of available corpora to train/fine-tune language models/large language models for low-resource languages. * Multilingual/cross-lingual language models/large language models for low-resource languages. * Applications of language models/large language models for low-resource languages (i.e. machine translation, chatbots, content moderation, etc. >> Important Dates * Paper submission due – 5th November 2024 * Notification of acceptance – 25th November 2024 * Camera-ready due – 13th December 2024 * LoResLM 2025 workshop – 19th / 20th January 2025 co-located with COLING 2025 >> Submission Guidelines We follow the COLING 2025 standards for submission format and guidelines. LoResLM 2025 invites the submission of long papers of up to eight pages and short papers of up to four pages. These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references), papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix. To prepare your submission, please make sure to use the COLING 2025 style files available here: * Latex - https://coling2025.org/downloads/coling-2025.zip * Word - https://coling2025.org/downloads/coling-2025.docx * Overleaf - https://www.overleaf.com/latex/templates/instructions-for-coling-2025-proce… Papers should be submitted through Softconf/START using the following link: https://softconf.com/coling2025/LoResLM25/ >> Organising Committee * Hansi Hettiarachchi, Lancaster University, UK * Tharindu Ranasinghe, Lancaster University, UK * Paul Rayson, Lancaster University, UK * Ruslan Mitkov, Lancaster University, UK * Mohamed Gaber, Birmingham City University, UK * Damith Premasiri, Lancaster University, UK * Fiona Anting Tan, National University of Singapore, Singapore * Lasitha Uyangodage, University of Münster, Germany URL - https://loreslm.github.io/ Twitter - https://x.com/LoResLM2025 Best Regards Tharindu Ranasinghe

2 1

Call for Participation: NLP Winter School -- 5th ALPS
by Matthias Gallé 22 Aug '24

22 Aug '24

FIRST CALL FOR PARTICIPATION Advanced Language Processing School (ALPS) 2025 March 30th - April 4th 2025 Aussois (French Alps) We are pleased to announce the 5th edition of ALPS - the Advanced NLP School to be held in the French Alps from March 30th to April 4th 2025. This school targets advanced research students in Natural Language Processing and related fields and brings together world leading experts and motivated students. The programme comprises lectures, poster presentations, practical lab sessions and nature activities - the venue is located near a National Park. Important Dates - Oct 15th 2024: Application deadline - Nov 15th 2024: acceptance notification - Jan 15th 2025: registration deadline - March 30th 2025: Start of School Confirmed speakers so far: - Kyunghyun Cho (New York University & Prescient Design) - Titouan Parcollet (Cambridge University & Samsung AI Center) - Barbara Plank (LMU Munich) - François Yvon (ISIR CNRS) Website and online application: https://alps.imag.fr/ <http://alps.imag.fr/> Questions: alps(a)univ-grenoble-alpes.fr The registration fees for the event encompass accommodation and full board at the conference venue, the Centre Paul Langevin <https://lig-alps.imag.fr/index.php/venue/>. We will announce the fee amounts later, and they will vary depending on the participant's background: students, academia, and industry. Student fees will be set at or below €600, including twin room accommodation. We will have a limited amount of scholarships for the registration: if you are interested please mark this in the application form. The rates for academia and industry will be higher, as is customary, and will include accommodation in a single room.

2 1

Call For Papers : IMLIP 2024
by Jun Miao 22 Aug '24

22 Aug '24

Call For Papers: The International Conference on Intelligent Multilingual Information Processing 2024 (IMLIP 2024) The International Conference on Intelligent Multilingual Information Processing 2024 (IMLIP 2024) will take place in Beijing, China, on 16-17 November 2024, hosted by Beijing Institute of Technology ( https://english.bit.edu.cn/). As a professional committee of the Chinese Association of Artificial Intelligence, the Institue of Multilingual Intelligent Information Processing (IMLIP), focuses on multilingual intelligent information processing and its applications. The aim of the conference IMLIP 2024 is to bring together experts from industry, academia, and research in the community, to provide a platform for academic exchange and collaborative research for scholars from around the world, and also to promote linguistic research and natural language processing studies related to China's ethnic minorities and countries. Conference Website：http://www.imlip.org/ Topics IMLIP 2024 welcomes original research and applications related to multilingual intelligent information processing. We encourage interdisciplinary studies and the integration of humanities and sciences. Topics of interest include, but are not limited to, the following: Linguistics Cross-lingual processing Large language models Computational linguistics theory Resource and corpus construction Evaluation Multilingual language understanding Machine translation Multimodal intelligent information processing, including multilingual speech recognition and text processing Intelligent processing in international Chinese education Applications of multilingual intelligent information processing Keynote speakers Academician Nima Tashi, Professor, Tibet University, Tibetan multilingual processing Professor Kim Gerdes, University of Saclay, France Important Dates Paper Submission System Open: June 30, 2024 Paper submission Deadline: August 30, 2024 Notification of Acceptance: September 30, 2024 Conference Dates: November 16-17, 2024 Submissions Papers submitted to IMLIP 2024 can be in Chinese or English. An accepted paper will be presented either as an oral talk or as a poster, as determined by the Program Committee. Accepted Chinese papers will be recommended to "Corpus Linguistics" and "AppliedTechnology" based on circumstances, with further review required to determine final acceptance. Accepted English papers will be published in the Springer conference proceedings (EIindexed). The authors of accepted papers must revise the papers according to the review before publication. At least one author of the accepted paper must attend the conference. Format Please use the Word or LaTeX templates provided. Papers may consist of up to 8 pages of content, plus unlimited references. Papers will be double-blindly reviewed without the authors’ names and affiliations included. Furthermore, self-references that reveal the author’s identity, e.g., “We previously showed (Smith, 1991) …”, must be avoided. Instead, use citations such as “Smith (1991) previously showed …”. Papers that do not conform to these requirements will be rejected without review. For Chinese submission, please download the template at http://jcip.cipsc.org.cn/CN/item/downloadFile.do?id=79. For English submission, please download the template at https://github.com/acl-org/acl-style-files. Submission Website: Submission will be electronic in PDF format through https://openreview.net/groupid=IMLIP.org/2024/Conference. Multiple-Submission Policy IMLIP 2024 allows authors to submit manuscripts to leading NLP international conferences simultaneously only if the conferences have established similar multiple-submission policies. Papers that have been or will be submitted to other conferences must indicate this at the submission time. Authors of papers accepted for presentation in IMLIP 2024 must notify the program chairs by the camera-ready deadline as to whether the paper will be presented at IMLIP 2024. Once confirmed, the paper must be withdrawn from other venues. We will not accept papers that are identical or overlap significantly in content or results with papers that will be (or have been) published elsewhere except the arXiv preprint version. Awards and Funds IMLIP 2024 will grant Best Paper Awards in Chinese and English respectively. Contact For further information, visit the conference website at http://www.imlip.org/ 参会地点：北京理工大学良乡校区文博中心 Venue: Cultural and Museum Center, Liangxiang Campus, Beijing Institute of Technology

2 1

ICLC-11: 1st Call for Abstracts
by alexandr.rosen＠ff.cuni.cz 22 Aug '24

22 Aug '24

ICLC-11 11TH INTERNATIONAL CONTRASTIVE LINGUISTICS CONFERENCE First Call for Abstracts September 17–19, 2025 Prague, Czech Republic The Faculty of Arts at Charles University in Prague is pleased to announce the 11th International Contrastive Linguistics Conference. The ICLC conference series, running since 1998, aims to promote fine-grained cross-linguistic research comprising two or more languages from a broad range of theoretical and methodological perspectives. Following the success of ICLC-10 in Mannheim 2023, ICLC-11 wants to bring together researchers from different linguistic subfields and neighbouring disciplines to continue the interdisciplinary dialog on comparing languages, to foster the development of an international community and to advance possible new areas of cross-linguistic research. See https://iclc11.ff.cuni.cz/ for more and note the submission deadline of February 24, 2025. We invite abstracts on a broad range of topics, including but not limited to: (1) Comparison of phenomena in two or more languages focused on any area and level of linguistic analysis: * lexicon * phonetics and phonology * morphology, syntax and morphosyntax, linguistic complexity * semantics, pragmatics, register and socio-cultural context (2) Methodological challenges and solutions in cross-linguistic research: * language corpora (multilingual, learner, and multimodal) and issues of linguistic annotation (e.g., Universal Dependencies) * comparability issues, tertia comparationis, language universals; experimental and naturalistic interaction data * AI and new digital tools in linguistic analysis * low-resourced languages (3) Contrastive linguistics in touch with related disciplines: * generative, model-theoretic, functional or cognitive (e.g., constructional) approches * historical, sociolinguistic and variationist perspectives; registers, multimodality, pragmatics, interculturality; language contact; language policy * cognitive and psycholinguistic approaches to bilingualism and multilingualism; language acquisition, language teaching and learning * translation studies The abstracts should present empirical research, well-defined research questions or hypotheses, details of the research approach and methods, theoretical insights, and (preliminary or expected) results. For details see https://iclc11.ff.cuni.cz/calls-and-circulars/call-for-papers/. PRELIMINARY PROGRAM * Parallel Oral Sessions * Poster Sessions * Keynote Speakers: Sabine De Knop (Université Saint-Louis, Bruxelles, Belgium) Volker Gast (Friedrich-Schiller-University, Jena, Germany) Dan Zeman (Charles University, Prague) * Panel Discussion IMPORTANT DATES 24.02.2025: Deadline for abstract submission 26.05.2025: Notification of acceptance 02.06.2025: Registration opens 16.06.2025: Deadline for revised abstract submission 30.06.2025: Last day for early bird registration 01.09.2025: Online registration closes 16.09.2025: Arrival, Registration, Get-together 17–19.09.2025: Conference ORGANIZING COMMITTEE * Mirjam Fried (chair) 1) * Viktor Elšík 1) * Jana Kocková 2) * Michal Křen 1) * Olga Nádvorníková 1) * Alexandr Rosen 1) 1) Charles University, Faculty of Arts 2) Czech Academy of Sciences, Institute of Slavonic Studies PROGRAM COMMITTEE: tba CONTACT INFORMATION Website: https://iclc11.ff.cuni.cz/ Email: iclc11(a)ff.cuni.cz

2 1

TAL journal - Non thematic issue - Call for papers
by Cecile Fabre 22 Aug '24

22 Aug '24

Non thematic issue of the TAL journal: 2025 Volume 66-1 http://tal-66-1.sciencesconf.org/ Editors: Maxime Amblard, Cécile Fabre, Benoit Favre and Sophie Rosset The call for volume 66-1 is open until December 31, 2024. NEW since 2023: Non-thematic issues of the Automatic Language Processing journal become "on the fly". Each paper in issue 66-1 will be evaluated as soon as it is submitted and will be published, subject to its acceptance, within an indicative period of six months after its submission. THEMES The journal Automatic Language Processing has an open call for papers. Submissions may concern theoretical and experimental contributions on all aspects of written, spoken, and signed language processing and computational linguistics, both theoretical and experimental, for example: Computational models of language Linguistic resources Statistical learning and modeling Intermodality and multimodality Language multiplicity and diversity Semantics and comprehension Information access and text mining Language production and processing/generation/synthesis Evaluation Explicability and reproducibility NLP in interaction with other disciplines, digital humanities This list is indicative. On all topics, it is essential that the aspects related to natural language processing are emphasized. We also welcome position papers and survey papers. LANGUAGE Manuscripts may be submitted in English or French. THE TAL JOURNAL TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, https://www.atala.org/revuetal) since 1959. TAL has an electronic mode of publication with immediate free access to published articles. SCHEDULE Submission deadline: on the fly until December 31, 2024 Notification to the authors after first review: two months after submission Notification to the authors after second review: two months after the first review Publication : two months after the second review FORMAT SUBMISSION Papers must be between 20 and 25 pages long, including references and appendices (with no possible derogation on the length). TAL is a double-blind review journal: it is thus necessary to anonymise the manuscript and the name of the pdf file. Self-references that reveal the author's identity must be avoided. Style sheets are available for download on the Web site of the journal. More information on: http://tal-66-1.sciencesconf.org/

2 1

Call for participation: the 34th Meeting on Computational Linguistics in the Netherlands, Leiden University
by Wijnholds, G.J. (Gijs) 22 Aug '24

22 Aug '24

*Apologies for cross-posting* Dear colleague, We cordially invite you to participate in the 34th Meeting of Computational Linguistics in The Netherlands (CLIN34) which takes place in Leiden on Friday 30 August 2024. Besides a large and diverse programme of posters and oral presentations, we are happy to report that CLIN34 will have two keynote talks by: * Diana Maynard, Sheffield University * Dominique Blok and Erik de Graaf, TNO If you wish to participate, please register via the conference website: clin34.leidenuniv.nl<http://clin34.leidenuniv.nl/> The programme can also be found at: clin34.leidenuniv.nl/program/<https://clin34.leidenuniv.nl/program/> We hope to see you in Leiden in August! The CLIN34 organizers Leiden University

2 1

Call for papers: Building and Using Comparable Corpora workshop at COLING'25
by s.sharoff＠leeds.ac.uk 22 Aug '24

22 Aug '24

18th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA WITH SHARED TASK ON MULTILINGUAL TERMINOLOGY EXTRACTION FROM COMPARABLE CORPORA Co-located with COLING 2025 (Abu Dhabi) Paper submission deadline: 30 November, 2024 Workshop website: https://comparable.lisn.upsaclay.fr/bucc2025/ COLING website: https://coling2025.org/ Keynote speaker: Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi ************************************************************** * Motivation In the language engineering and linguistics communities, research in comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is chiefly motivated by the need to use comparable corpora as training data for statistical NLP applications such as statistical and neural machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest because they enable cross-language discoveries and comparisons. It is generally accepted in both communities that comparable corpora consist of documents that are comparable in content and form in various degrees and dimensions across several languages. Parallel corpora are on the one end of this spectrum, and unrelated corpora are on the other. In recent years, the use of comparable corpora for pre-training Large Language Models (LLMs) has led to their impressive multilingual and cross-lingual abilities, which are relevant to a range of applications, including Information Retrieval, Machine Translation, Cross-lingual text classification, etc. The linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora or to improve cross-lingual transfer of LLMs. Therefore, it is of great interest to bring together builders and users of such corpora. * Shared Task This year we will run a shared task aimed at detecting translations of terms via comparable corpora. Please see the website for details: https://comparable.limsi.fr/bucc2025/bucc2025-task.html * Topics We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following: Building Comparable Corpora: - Automatic and semi-automatic methods - Methods to mine parallel and non-parallel corpora from the web - Tools and criteria to evaluate the comparability of corpora - Parallel vs non-parallel corpora, monolingual corpora - Rare and minority languages, across language families - Multi-media/multi-modal comparable corpora Applications of comparable corpora: - Human translation - Language learning - Cross-language information retrieval & document categorization - Bilingual and multilingual projections - (Unsupervised) Machine translation - Writing assistance - Machine learning techniques using comparable corpora Mining from Comparable Corpora: - Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models - Extraction of parallel segments or paraphrases from comparable corpora - Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation) - Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, and paraphrases from comparable corpora, etc. - Induction of morphological, grammatical, and translation rules from comparable corpora - Induction of multilingual word classes from comparable corpora Comparable Corpora in the Humanities: - Comparing linguistic phenomena across languages in contrastive linguistics - Analyzing properties of translated language in translation studies - Studying language change over time in diachronic linguistics - Assigning texts to authors via authors' corpora in forensic linguistics - Comparing rhetorical features in discourse analysis - Studying cultural differences in sociolinguistics - Analyzing language universals in typological research * Workshop Organizers - Serge Sharoff (University of Leeds) - Ayla Rigouts Terryn (Université de Montréal (UdeM), Mila) - Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay) - Reinhard Rapp (University of Mainz, Germany) * Program Committee - Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences, Iran) - Eleftherios Avramidis (DFKI, Germany) - Gabriel Bernier-Colborne (National Research Council, Canada) - Thierry Etchegoyhen (Vicomtech, Spain) - Alex Fraser (University of Munich, Germany) - Natalia Grabar (University of Lille, France) - Amal Haddad Haddad (Universidad de Granada, Spain) - Amir Hazem (University of Tokyo, Japan) - Kyo Kageura (University of Tokyo, Japan) - Natalie Kübler (Université Paris Cité, France) - Philippe Langlais (Université de Montréal, Canada) - Yves Lepage (Waseda University, Japan). - Shervin Malmasi (Amazon, USA) - Michael Mohler (Language Computer Corporation, USA) - Emmanuel Morin (Nantes Université, France) - Dragos Stefan Munteanu (RWS, USA) - Ted Pedersen (University of Minnesota, Duluth, USA) - Nasredine Semmar (CEA LIST, Paris, France) - Silvia Severini (Leonardo Labs, Italy) - Pranaydeep Singh (University of Gent, Belgium) - Richard Sproat (Google, USA) - Marko Tadić (University of Zagreb, Croatia) - François Yvon (Sorbonne Université, France)

2 1

PhD positions in NLP, healthcare & responsible AI
by Matthew Purver 22 Aug '24

22 Aug '24

We are recruiting PhD researchers for the UKRI/RAi UK Keystone project AdSoLve on Addressing Sociotechnical Limitations of LLMs: https://adsolve.github.io/ Up to four funded positions are available in a joint collaboration between Queen Mary University of London (QMUL) and the Imperial College London CDT in healthcare AI - a great opportunity to work with leading academics in NLP, AI, healthcare, and responsible AI. AdSoLve offers collaborations across 4 universities, a large consortium and a network of over 21 non-academic partners. QMUL has one of the UK's leading NLP research groups, with 8 core faculty and a group of c.40 researchers. APPLICATION DEADLINE 28th July 2024 Interviews 5th & 6th September 2024 For details see: https://www.findaphd.com/phds/programme/phd-opportunities-in-addressing-soc… https://adsolve.github.io/assets/other/phd_advert_QMUL.pdf -- Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/ Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/ Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/ School of Electronic Engineering and Computer Science Queen Mary University of London, London E1 4NS, UK *My working days for QMUL are **Tuesday-Thursday**; responses to mail on other days may be delayed.*

2 1

Post-doctoral position in ML/XAI : 18 month at IMT Mines Alès or IMT Business School
by Andon Tchechmedjiev 22 Aug '24

22 Aug '24

Dear Corpora-list, We are advertising a post-doctoral position in ML/XAI : 18 month at IMT Mines Alès (south of France), or IMT Business School, Evry (near Paris) Subject: Evaluation of the impact of XAI techniques on Human-Machine collaboration Context: ENFIELD project, Horizon-funded European AI Network of Excellence on adaptive, sustainable, human-centered and trustworthy AI. Objectives : Evaluate the impact of XAI methods on Human-Machine collaboration through the study of : Performance of the human operator in performing a task, in different contexts: alone, with the help of a predictive model for which decisions will be explained/not explained, with the help of an XAI technique, Types of human-machine collaboration (e.g. delegation, substitution, mediation), Potential biases induced by XAI techniques. A focus will be made on specific contexts of study (e.g., image classification or NLP tasks, XAI techniques based on local interpretability using attribution methods). You will contribute to: Defining the study contexts (e.g. games, image classification) and test protocols to be considered. Selecting and implementing predictive models and XAI techniques. Set up the tools needed to carry out the experiments covered by the study protocols, e.g. development of simple games, decision interfaces. Implement the above-mentioned protocols on cohorts of human operators. Evaluate and promote the results obtained. Deadline for applications: 20/09/2024 Desired start date: 01/11/2024 Application and additional info: https://institutminestelecom.recruitee.com/o/post-doctorant-post-doctorante… Contacts : Sébastien Harispe, Associate Professor sebastien.harispe(a)mines-ales.fr Nicolas Soulié, Associate Professor nicolas.soulie(a)imt-bs.eu Best regards, -- Andon Tchechmedjiev, PhD. Associate Professor of Artificial Intelligence and Computer Engineering at EuroMov Digital Health in Motion, IMT Mines Alès. Taxonomy and Semantics of Movement (SemTaxM) co-lead, Learning and Complexity group member. Research expertise: Deep Learning, Knowledge Engineering, Computational Linguistics and Semantics, Biomedical Informatics, Neuroengineering and Human Movement Processing

2 1

Jobs-Postdoctoral Researcher – Defining Authentic Inclusive Communication- Galway-reg
by Bharathi Raja Asoka Chakravarthi 22 Aug '24

22 Aug '24

Postdoctoral Researcher – Defining Authentic Inclusive Communication Insight SFI Research Centre for Data Analytics Data Science Institute Ref. No. 010548 JOB ADVERTISEMENT Applications are invited from suitably qualified candidates for a full-time, fixed term position as a Postdoctoral Researcher with Data Science Institute <https://www.universityofgalway.ie/dsi/>at the University of Galway, Ireland. This position is funded by Science Foundation of Ireland and is available from 1st October 2024 to contract end date of 30th September 2025. Salary: Postdoctoral salary scale €44,346 – €56,764 per annum per annum, (subject to the project’s funding limitations), and pro rata for shorter and/or part-time contracts. Closing date for receipt of applications is 17:00 (Irish Time) on 5th Aug2024. It will not be possible to consider applications received after the closing date. ELIGIBILITY REQUIREMENTS Essential Requirements: - PhD in Natural Language Processing (NLP) or Linguistics - Published at top conferences in the NLP field or in high impact factor - Excellent understanding of experimental design and scientific methodologies - Strong command of oral and written English - Good programming skills Desirable Requirements: - Strong knowledge of NLP equality, diversity, and inclusion - Experience engaging in research collaborations with industry - Experience in writing grant proposals - Experience of working in national and/or EU research projects To apply: Jobs – University of Galway. <https://www.universityofgalway.ie/about-us/jobs/> Applications must be submitted How to apply guide <https://www.universityofgalway.ie/human-resources/recruitment-and-selection…> - For informal enquiries, please contact Bharathi Raja Chakravarthi bharathi.raja(a)universityofgalway.ie <bharathi.raja(a)universityofgalway.ie>and cc Dr Meghann L. Drury-Grogan Meghann.Drury- <Meghann.Drury-Grogan(a)atu.ie> Grogan(a)atu.ie <Meghann.Drury-Grogan(a)atu.ie> - University’s Strategic Plan <https://www.universityofgalway.ie/strategy2025/> - Working in Research at University of Galway <https://www.universityofgalway.ie/our-research/> - Moving to Ireland (Euraxess) <https://www.euraxess.ie/> - Applicant Information <https://www.universityofgalway.ie/human-resources/recruitment-and-selection…> - We reserve the right to re-advertise or extend the closing date for this - University of Galway is an equal opportunities - All positions are recruited in line with Open, Transparent, Merit (OTM) and Competency based with regards, Dr. Bharathi Raja Chakravarthi, Assistant Professor / Lecturer-above-the-bar School of Computer Science, University of Galway, Ireland Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway, Ireland E-mail: bharathiraja.akr(a)gmail.com , bharathi.raja(a)universityofgalway.ie <bharathiraja.asokachakravarthi(a)universityofgalway.ie> Google Scholar: https://scholar.google.com/citations?user=irCl028AAAAJ&hl=en Website: https://www.universityofgalway.ie/our-research/people/computer-science/bhar… <https://www.universityofgalway.ie/our-research/people/computer-science/bhar…>

2 1

2 Assistant Professors in Generative AI and Human-centered AI at Leiden Institute of Advanced Computer Science (LIACS)
by Wijnholds, G.J. (Gijs) 22 Aug '24

22 Aug '24

Dear all, LIACS currently has a vacancy for two assistant professor positions, which might be of interest to some people on this list. Here’s the beginning of the vacancy: "The Faculty of Science, Leiden Institute of Advanced Computer Science (LIACS), is seeking candidates for two Assistant Professors (0.8-1.0 FTE), one in generative AI and another in Human-centered AI. We seek to appoint an expert in the research area of Generative AI with focus on software systems and engineering (code generation, bug detection and repair, refactoring, and optimization but also at a larger scale such as architecture reconstruction and impact analysis for changes), prompt engineering (for content creation and data analysis), or diffusion models (for transforming the creation of high-fidelity data, such as images and simulations). Additionally, we seek to appoint an expert in Human-centered AI with focus on the designing of AI systems that prioritize human needs, usability, and collaboration, and/or on the involvement of humans in the training and refining processes (interactive machine learning)." Here’s the full vacancy: https://www.universiteitleiden.nl/en/vacancies/2024/q3/150312-assistant-pro… Best, dr. Gijs Wijnholds Assistant Professor in Natural Language Processing Text Mining and Retrieval Group<https://tmr.liacs.nl/> Leiden Institute of Advanced Computer Science https://gijswijnholds.github.io

2 1

Shared Task on Quality Estimation at WMT'24
by Fred Blain 22 Aug '24

22 Aug '24

Dear all, the QE shared task 2024 is ON! You can now submit and test your quality estimation system(s) on a set of different languages and tasks: to predict translation quality at sentence level, to detect error spans, or even to correct translations! For information on how to access the test data and the submission platforms, visit the shared task's webpage: https://www2.statmt.org/wmt24/qe-task.html Deadline to participate is July 31 (AoE). Looking forward to receiving your predictions! -- Best wishes, on behalf of the organisers. Dear all, we are happy to invite you to participate in the Shared Task on Quality Estimation at WMT'24. The details of the task can be found at: https://www2.statmt.org/wmt24/qe-task.html New this year: * We introduce a new language pair (zero-shot): English-Spanish * Continuing from the previous edition, we will also analyse the robustness of submitted QE systems to a set of different phenomena which will span from hallucinations and biases to localized errors, which can significantly impact real-world applications. * We also introduce a new task, seeking not only to detect but also to correct errors: Quality-aware Automatic Post-Editing! We invite participants to submit systems capable of automatically generating QE predictions for machine-translated text and the corresponding output corrections. 2024 QE Tasks: Task 1 -- Sentence-level quality estimation This task follows the same format as last year but with fresh test sets and a new language pair: English-Spanish. We will test the following language pairs: * English to German (MQM) * English to Spanish (MQM) * English to Hindi (MQM & DA) * English to Gujarati (DA) * English to Telugu (DA) * English to Tamil (DA) More details: https://www2.statmt.org/wmt24/qe-subtask1.html Task 2 -- Fine-grained error span detection Sequence labelling task: predict the error spans in each translation and the associated error severity: Major or Minor. We will test the following language pairs: * English to German (MQM) * English to Spanish (MQM) * English to Hindi (MQM) More details: https://www2.statmt.org/wmt24/qe-subtask2.html Task 3 -- Quality-aware Automatic Post-editing We expect submissions of post edits correcting detected error spans of the original translation. Although the task is focused on quality-informed APE, we also allow participants to submit APE output without QE predictions to understand the impact of their QE system. Submissions w/o QE predictions will also be considered official. We will test the following language pairs: * English to Hindi * English to Tamil More details: https://www2.statmt.org/wmt24/qe-subtask3.html Important dates: 1. Test sets will be released on July 15th. 2. Participants can submit their systems by July 23rd on codalab. 3. System paper submissions are due by 20th August [aligned with WMT deadlines]. Note: Like last year, we aligned with the General MT and Metrics shared tasks to facilitate cross-submission on the common language pairs: English-German, English-Spanish, and English-Hindi (MQM). We look forward to your submissions and feel free to contact us if you have any more questions! Best wishes, on behalf of the organisers.

2 1

First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS 2024) - Call for Participation
by Ranasinghe, Tharindu 22 Aug '24

22 Aug '24

First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS 2024) Lancaster, UK, 29-30 July 2024 Call for Participation We are pleased to share the NLPAICS 2024 conference programme, which you can view by clicking here - https://nlpaics.com/programme-2/. To register, please visit https://nlpaics.com/registration/. We very much hope to welcome you to NLPAICS 2024 at Lancaster! The conference Recent advances in Natural Language Processing (NLP), Deep Learning and Large Language Models (LLMs) have resulted in improved performance of applications. In particular, there has been a growing interest in employing AI methods in different Cyber Security applications. In today's digital world, Cyber Security has emerged as a heightened priority for both individual users and organisations. As the volume of online information grows exponentially, traditional security approaches often struggle to identify and prevent evolving security threats. The inadequacy of conventional security frameworks highlights the need for innovative solutions that can effectively navigate the complex digital landscape for ensuring robust security. NLP and AI in Cyber Security have vast potential to significantly enhance threat detection and mitigation by fostering the development of advanced security systems for autonomous identification, assessment, and response to security threats in real-time. Recognising this challenge and the capabilities of NLP and AI approaches to fortify Cyber Security systems, the First International Conference on Natural Language Processing (NLP) and Artificial Intelligence (AI) for Cyber Security (NLPAICS’2024) serves as a gathering place for researchers in NLP and AI methods for Cyber Security. We invite contributions that present the latest NLP and AI solutions for mitigating risks in processing digital information. Venue The First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS’2024) will take place at Lancaster University and is organised by the Lancaster University UCREL NLP research group. Keynote speakers We are delighted to announce the NLPAICS’2024 keynote speakers - Iva Gumnishka (Humans in the Loop) - Sevil Şen (Hacettepe University) - Paolo Rosso (Universitat Politècnica de València) - Jacques Klein (University of Luxembourg) Sponsors We are proud to announce the conference sponsors: CodeAgent – Collaborative Agents for Software Engineering Further information and contact details The conference website is https://nlpaics.com/ and will be updated on a regular basis. The conference updates will also be available on social media (X - https://x.com/nlpaics, LinkedIn - https://linkedin.com/company/nlpaics/ ) Regards Tharindu Ranasinghe

2 1

NLP for Positive Impact Workshop EMNLP 2024: Second Call for Papers
by Daryna Dementieva 22 Aug '24

22 Aug '24

Second Call for Papers NLP for Positive Impact Workshop Miami, USA November 15 or 16, 2024 (co-located with EMNLP 2024 <https://2024.emnlp.org/>) https://sites.google.com/view/nlp4positiveimpact *Submission* Direct submission via ARR*: *link <https://openreview.net/group?id=EMNLP/2024/Workshop/NLP4PI_Direct_Submission> Deadline: August, 15th For papers submitted to June (or earlier) ARR cycle: Commitment deadline to the Workshop: August 20, 2024 Commit to the workshop: via this link <https://openreview.net/group?id=EMNLP/2024/Workshop/NLP4PI_ARR_Commitment> Notification of Acceptance: September 20, 2024 Camera-Ready Papers Due: October 3, 2024 Workshop Date: either November 15 or 16 All deadlines are 11:59 PM (Anywhere on Earth <https://www.timeanddate.com/time/zones/aoe>) *Submission Information* We are using the EMNLP Submission Guidelines <https://2024.emnlp.org/calls/main_conference_papers/#paper-submission-detai…> for the workshop. Authors are invited to submit a full paper of up to 8 pages of content with unlimited pages for references. We also invite short papers of up to 4 pages of content, including unlimited pages for references. Final camera ready versions of accepted papers will be given an additional page of content to address reviewer comments. Summary The widespread and indispensable use of language-oriented AI systems presents new opportunities to have a positive social impact. NLP technologies are starting to mature to the point where they could have an even broader impact, supporting the UN sustainability goals <https://sdgs.un.org/goals> by helping to address big problems such as poverty, hunger, healthcare, education, inequality, COVID-19 and climate change. Our workshop aims to promote innovative NLP research that will positively impact society, focusing on responsible methods and new applications. We will encourage submissions from areas including (but not limited to): - Work that grounds the impact of NLP: Beyond developing a better-performing NLP model, can we make a step further to connect the model to actual social impact? Example directions include: case studies of real-world deployments; or improving the deployment and maintenance of NLP models in practice. - In addition to commonly recognized NLP for social good areas such as NLP for healthcare, mental well-being, and many others, we also call for work on neglected areas such as NLP for poverty, hunger, energy, climate change, among others. - We also highly value work that builds on interdisciplinary expertise, and encourages submissions of case studies or worked examples that seek to expand the social impact of NLP through collaboration with other fields (e.g., philanthropy, social science, political science, economics, HCI). Special theme: This year, we would like to encourage submission providing solutions or concepts to address digital violence. Digital violence encompasses various forms of violence that utilize digital tools and media, such as cell phones, apps, internet applications, and emails, and occurs within digital spaces like online portals and social platforms. We aim to explore how modern NLP and AI technologies can contribute to enhancing safety in digital environments. At the workshop, you will have an opportunity to connect and share your results with NGO representatives from this field! Submission types: Thus, we would appreciate to see various types of works on this (but not only) topic like: - automatic identification of various social needs, their corresponding sizes and demographics of people affected; - position papers to propose promising new tasks or directions that the field should pursue; - literature review of a subfield; - philosophical discussions of what how positive impact can be achieved with NLP methods; - approaches to interdisciplinary collaboration; - user study designs, user surveys; - ethical considerations, and other related topics. Note that we want submissions to our workshop to have some distinctive features of social good implications, beyond a general paper on NLP. We will require each submission to discuss the ethical and societal implications of their work, and encourage a discussion of what "positive impact" means in the work. Organizers Zhijing Jin (Max Planck Institute & ETH Zurich) Daryna Dementieva (Technical University of Munich) Giorgio Piatti (ETH Zürich) Steven Wilson (Oakland University) Oana Ignat (Santa Clara University) Jieyu Zhao (University of Maryland, College Park) Joel Tetreault (Dataminr, Inc.) Rada Michaela (University of Michigan) Contact Email - nlp4pi.workshop(a)gmail.com

2 1

AIRiAL 2024 Conference Registration
by Voss, Erik 22 Aug '24

22 Aug '24

**Apologies for cross-posting!** Registration for Artificial Intelligence Research in Applied Linguistics (AIRiAL) 2024 Conference <https://sites.google.com/tc.columbia.edu/airialconference/airial-2024> is open. Register here <https://sites.google.com/tc.columbia.edu/airialconference/airial-2024/regis…> for the conference. Early registration ends on July 31, 2024, so don't delay! Please note that banquet seating is limited to 50 guests. Contact use for information about sponsorship opportunities. We look forward to seeing you in New York in September! Best regards, The AIRiAL 2024 Conference Organizing Committee -- Erik Voss, Ph.D. Assistant Professor, Applied Linguistics & TESOL program Language & Technology Specialization Department of Arts & Humanities Teachers College, Columbia University TC Faculty Profile <https://www.tc.columbia.edu/faculty/ev2449/>, Linkedin Profile <https://www.linkedin.com/in/erik-voss-ph-d-941a3ab9>, Google Scholar <https://scholar.google.com/citations?user=FMnVdjcAAAAJ&hl=en> ALTESOL Language & Technology Research Group <https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home> Editor-in-Chief of NYS TESOL Journal Associate Editor of Language Assessment Quarterly *Latest Publications* TC Interview: How New Artificial Intelligence Tools Will Keep Changing Education <https://youtu.be/Zh1RB7DLRMI?si=vDIvowSnzrWy480P>(7:28 mins.) Voss, E. et al. (2023). The Use of Assistive Technologies Including Generative AI by Test Takers in Language Assessment: A Debate of Theory and Practice. <https://doi.org/10.1080/15434303.2023.2288256> LAQ Journal Voss, E. (2024) Duolingo Webinar: Current Applications of Artificial Intelligence in Language Assessment <https://youtu.be/b-mjLmvXLBU?si=nmph76-lizkfzi1J> (1 hour)

2 1

The 22nd International Workshop on Treebanks and Linguistic Theories – SECOND CALL FOR PAPERS
by Zinsmeister, Heike 22 Aug '24

22 Aug '24

[With apologies for cross-posting] We are excited to announce the 22nd International Workshop on Treebanks and Linguistic Theories (TLT 2024), which will bring together developers and users of linguistically annotated natural language corpora. The workshop is endorsed by ACL SIGPARSE and will be hosted by Universität Hamburg in Germany on December 5th-6th, 2024. ----------------------------- VENUE ----------------------------- TLT 2024 will take place at the guest house of Universität Hamburg. In order to support rich discussions and networking, TLT 2024 will primarily be an in-person event; we will, however, accommodate a limited number of live / synchronous remote presentations, prioritizing those with circumstances that prevent travel. Universität Hamburg and its guest house are conveniently located near the Dammtor train station / metro station Stephansplatz which are well-connected with many parts of the city and beyond, providing an easy commute for attendees. Hamburg is a vibrant city known for its rich maritime history as one of the leading cities in the medieval Hanseatic League, as well as its modern cultural diversity, including events at the world-famous Elbphilharmonie Concert Hall. The city is easily accessible by train or plane (Hamburg Airport (HAM); about 1 to 1.5 hours train ride: Bremen Airport (BRE) and Hannover Airport (HAJ)). ----------------------------- SUBMISSION INFORMATION ----------------------------- TLT addresses all aspects of treebank design, development, and use. As ‘treebanks’ we consider any pairing of natural language data (spoken, signed, or written) with annotations of linguistic structure at various levels of analysis, including, e.g., morpho-phonology, syntax, semantics, and discourse. Annotations can take any form (including trees or general graphs), but they should be encoded in a way that enables computational processing. Reflections on the design of linguistic annotations, methodology studies, resource announcements or updates, annotation or conversion tool development, or reports on treebank usage including probing the leakage of treebanks into large language models are but some examples of the types of papers we anticipate for TLT. Papers should describe original work; they should emphasize completed work rather than intended work, and should indicate clearly the state of completion of the reported results. Submissions will be judged on correctness, originality, technical strength, significance and relevance to the conference, and interest to the attendees. We invite paper submissions in two distinct tracks: * regular papers on substantial and original research, including empirical evaluation results, where appropriate; * short papers on smaller, focused contributions, work in progress, negative results, surveys, or opinion pieces. Submissions (in both tracks) may either be archival—in case of unpublished work—or non-archival, based on the wish of the authors. All archival papers accepted for presentation at the workshop will be included in the TLT 2024 proceedings volume, which will be part of the ACL Anthology. Non-archival papers must have been published or accepted for publication at another CL conference. Long papers may consist of up to 8 pages of content (excluding references and appendices). Short papers may consist of up to 4 pages of content (excluding references and appendices). Accepted papers will be given an additional page to address reviewer comments. All submissions should follow the two-column format and the ACL style guidelines. We strongly recommend the use of the LaTeX style files, OpenDocument, or Microsoft Word templates created for ACL: https://github.com/acl-org/acl-style-files Submissions will be reviewed double-blind, and all full and short papers must be anonymous, i.e. not reveal author(s) on the title page or through self-references. So e.g., “We previously showed (Smith, 2020) …”, should be avoided. Instead, use citations such as “Smith (2020) previously showed …. Papers must be submitted digitally, in PDF, and uploaded through the on-line conference system (link forthcoming). Submissions that violate these requirements will be rejected without review. ----------------------------- IMPORTANT DATES ----------------------------- * Long and short paper submission deadlines: August 15th, 2024 * Reviews Due: September 26th, 2024 * Notification of acceptance: October 6th, 2024 * Final version of papers due: November 6th, 2024 * TLT2024: December 5th-6th, 2024 in Hamburg ----------------------------- TLT2024 WORKSHOP CHAIRS ----------------------------- Daniel Dakota, Indiana University Sandra Kübler, Indiana University Heike Zinsmeister, Universität Hamburg ----------------------------- TLT2024 COMMUNICATION CHAIR ----------------------------- Sarah Jablotschkin, Universität Hamburg Contact: tlt2024.gw(a)uni-hamburg.de Website: https://www.korpuslab.uni-hamburg.de/en/tlt2024.html --------------------------- Prof. Dr. Heike Zinsmeister (sie/ihr) Linguistik des Deutschen / Korpuslinguistik Universität Hamburg, Institut für Germanistik, Raum C7012 Von-Melle-Park 6, Postfach #15, D-20146 Hamburg Tel.: 040 42838-7119 heike.zinsmeister(a)uni-hamburg.de http://www.slm.uni-hamburg.de/germanistik/personen/zinsmeister.html

2 1

Eugene Charniak Memorial (Save the Date)
by ACL Announcements 22 Aug '24

22 Aug '24

Friday, November 8 - Saturday, November 9 Brown Computer Science Department, Providence, RI https://cs.brown.edu/people/in-memorium/eugene_charniak/ Brown University invites you to attend an academic memorial event to commemorate the research and legacy of Eugene Charniak. Eugene, an ACL Lifetime Achievement Award winner and ACL fellow, passed away in June 2023. His colleagues and students have organized a two-day workshop of invited presentations of cutting-edge research with an emphasis on the themes which defined Eugene's career: the legacy of classic statistical NLP/ML, the sometimes-surprising effectiveness of simple baselines, clever tricks for dealing with data sparsity such as self-training or distant supervision, and unsupervised learning. A full program will be posted later this summer. Mark Johnson will give a keynote presentation, along with research talks by Regina Barzilay, Michael Collins, Jason Eisner, Lillian Lee, Ani Nenkova, Ellie Pavlick, Brian Roark, Chris Tanner and Byron Wallace. There will also be opportunities to remember Eugene in a social setting, and a panel discussion of the workshop's research themes. The event will take place at the Brown Computer Science Department in Providence, RI; attendees are responsible for finding their own accommodations. Instructions for travel to Providence are available here: https://cs.brown.edu/about/directions/. The program will begin at 9am on Friday the 8th, and conclude at 1:30pm on Saturday the 9th. All members of the ACL community are welcome, whether you knew Eugene well or not. Please mark your calendars now! To stay in the loop about the event, please fill out this form: https://docs.google.com/forms/d/e/1FAIpQLSe_7LZBSjP3Ur2XCTtsDtwnL_Jbxgh5Wfi… If you have questions about the event, contact the organizers, Micha Elsner (melsner0(a)gmail.com) and David McClosky (david.mcclosky(a)gmail.com).

3 3

a postdoc in text and data mining (newspapers) at Uni Tartu
by Peeter Tinits 22 Aug '24

22 Aug '24

Hi all, I have a postdoc job to share, see below. Thank you for sharing the offer in potentially interested circles. Best, Peeter ---- Post doc job(s) available for text and data mining and social history / sustainability transitions. The project “The Crisis and Transformation of Industrial Modernity, 1900-2055”, is a five-year project at the University of Tartu. It is based on the Deep Transitions framework which theorizes industrialization as a long-term co-evolution of various socio-technical systems. Website: https://www.deeptransitions.ut.ee/. Job description: https://www.deeptransitions.ut.ee/jobs/ Job call PDF: http://tiny.cc/dt_postdoc_call_2024<http://tiny.cc/dt_postdoc_call_2024> 1,5 FTE available for jobs. Some flexibility in workload and work location based on the exact focus. Contact laur.kanger(a)ut.ee<mailto:laur.kanger@ut.ee> for more details.

2 1

ELRA Catalogue of Language Resources - Update
by info＠elda.org 22 Aug '24

22 Aug '24

2 1

COLING 2025 : [CFT] International Conference on Computational Linguistics - Last Call for Tutorials
by Sakhar AlKhereyf 22 Aug '24

22 Aug '24

** Apologies for cross-posting ** Dear Colleagues, This is the last call for tutorial proposals for COLING 2025. Due: July, 31, 2024 CFT: The 2025 International Conference on Computational Linguistics (COLING 2025) invites proposals for tutorials to be held in conjunction with the conference. We seek proposals in all areas of natural language processing and computation, language resources (LRs) and evaluation, including spoken language, sign language, and multimodal interaction. We invite proposals for three types of tutorials, and we especially encourage submissions from early-career researchers: Cutting-edge: tutorials that cover advances in newly emerging areas. The tutorials are expected to give a brief introduction to the topic, but participants are assumed to have some prior knowledge of the topic. The focus of the class will be on discussing the most recent developments in the field, and it will spend a considerable amount of time pointing out open research questions and important novel research directions. Introductory to computational linguistics/NLP topics: tutorials that provide introductions to topics that are established in the COLING communities. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire sufficient understanding of the topic to understand the most recent research in the field. Introduction to Key Concepts in Linguistics including Semantics, Syntax, Psycholinguistics, Neurolinguistics, and Sociolinguistics: tutorials that provide introductions to topics that are established or emerging in areas adjacent to CL/NLP. The lecturers provide an overview of the development of the field from the beginning until now. Attendees are not expected to come with prior knowledge. They acquire a sufficient understanding of the topic to understand the most recent research in the field and the relevance for the CL/NLP domains. Each of these types of tutorials can either be half-day (4h long including a coffee break (30m long)) or full-day (8h long including two coffee breaks (1h long in total) but excluding a lunch break). In all cases, the aim of a tutorial is primarily to help understand a scientific problem, its tractability, and its theoretical and practical implications. Presentations of particular technological solutions or systems are welcome, provided that they serve as illustrations of broader scientific considerations. None of the tutorial types are expected to be “self-invited” long talks – the content should be a good balance between research from multiple groups and perspectives, not only fromof the teachers of the tutorial. The tutorials will be held at COLING 2025 in Abu Dhabi, UAE, on 19 and 20 January, 2025. Important Dates All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”). Proposal submission due July 31, 2024 Notification of acceptance August 31, 2024 COLIING2025 tutorials January 19-20, 2025 COLING2025 conference January 21-24, 2025 Diversity and Inclusion We particularly encourage submissions of underrepresented groups in computational linguistics, researchers from any demographic or geographic minority, with disabilities, or others. In the evaluation of the proposal, we will take these aspects into account to create a varied and balanced set of tutorials. This includes several aspects of diversity, namely (1) how the topic of the tutorial contributes to improved diversity and increased fairness in the field, (2) if the topic is particularly relevant for a specific underrepresented group of potential participants, (3), if the presenters are from an underrepresented group. Submission Details They should contain: A title that helps the potential attendees to understand what the tutorial will be about. An abstract that summarizes the topics, goals, target audience, and type (see above) of the tutorial (this abstract will also be on the LREC-COLING website). A section called “Introduction” that explains the topic and summarizes the starting point and relevance for our community and in general. A section called “Target Audience” that explains for whom the tutorial will be developed and what the expected prior knowledge is. Clearly specify what attendees should know and be able to practically do to get the most out of your tutorial. Examples of what to specify include prior mathematical knowledge, knowledge of specific modeling approaches and methods, programming skills, or adjacent areas like computer vision. Also specify the number of expected participants. A section called “Outline” in which the various topics are explained. This can be a list of bullet points or a set of paragraphs explaining the content. Explain what you intend and how long the tutorial will be. A section called “Diversity Considerations”, discussing each of the three aspects of diversity mentioned above or others. A section called “Reading List”: What are introductory papers or books that potential attendees can read to get a first impression of the tutorial content? What do you expect them to have read before attending? What does provide further information beyond the content of the tutorial? A section called “Presenters” in which each tutorial presenter is briefly introduced in one paragraph, including their research interests, their areas of expertise for the tutorial topic, and their experience in teaching a diverse and international audience. A section called “Other Information” which should include information on how many people are expected to participate and how you came to this estimate. You can also explain any other aspects that you find important, including special equipment that you would need. A section called “Ethics Statement” which discusses ethical considerations related to the topics of the tutorial. The proposals should be submitted no later than 31 July, 2024, 11:59 PM Samoa Standard Time (SST) (UTC/GMT-11, “anywhere on Earth”). Submission is electronic. Please submit the proposals using the START system at this URL: https://softconf.com/coling2025/tutorialsCL25 <https://softconf.com/coling2025/tutorialsCL25> Please note that tutorials should either be 100% in-person or 100% virtual; hybrid formats will not be allowed. For in-person tutorials, at least one tutorial organiser should be physically present to run the tutorial at COLING. Evaluation Criteria The tutorial proposals will be evaluated according to their originality and impact, the expected interest level of participants, as well as the quality of the organizing team and Program Committee and their contribution to the diversity of the conference. Each tutorial will be evaluated regarding its clarity and preparedness, novelty or timely character of the topic, the instructor’s experience, the audience interest, and the potential to increase diversity in our community. Instructor Responsibilities Accepted tutorial presenters will be notified by the date mentioned above. They must then provide abstracts of their tutorials for inclusion in the conference registration material by the specific deadlines. The abstract needs to be provided in ASCII format. The summary will be submitted in PDF format and can be updated from the version submitted for review. The instructors will make their material available in an appropriate way, for instance, by setting up a website. They will be invited to submit their slides to the ACL Anthology. Tutorial Chairs Email: coling25tutorialchairs(a)gmail.com The tutorial chairs are: Djamé Seddah, Senior Researcher, INRIA, Paris, Frace (on leave from Sorbonne University) Shaonan Wang, Associate Professor at the Institute of Automation, Chinese Academy of Sciences, Beijing, China

2 1

2025

2024

2023

2022

Corpora August 2024