- Corpora - ELRA lists

2nd CfP R2CASS2025 at ICWSM 2025
by taimoor.nlp＠gmail.com 19 Mar '25

19 Mar '25

Dear colleagues, We invite research contributions to our workshop, focusing on reproducible and reusable computational models for social data analysis. The event will also feature an interactive replicability session to enhance hands-on learning and collaboration. Please feel free to share with those interested in your network. Keynote: Professor Carole Goble, University of Manchester (leading eScience Research Lab) Call for papers: R2CASS2025 – International workshop “Social Science Meets Web Data: Reproducible and Reusable Computational Approaches” at ICWSM 2025 Paper submission deadline: March 31st, 2025 Website: https://r2cass2025.wordpress.com/ Submission: https://easychair.org/my/conference?conf=r2ca The first international workshop “Social Science Meets web Data: Reproducible and Reusable Computational Approaches” (R2CASS2025) will be held in conjunction with “International AAAI Conference on Web and Social Media” (ICWSM 2025) in Copenhagen, Denmark on June 23rd, 2025. The workshop will be held in-person and at least one author of the accepted papers will be required to register and present it in the workshop. The papers presented will be published in the workshop proceedings of the conference. Artificial Intelligence (AI) based models have a growing influence in social science research for analyzing behavioral patterns on social media and other digital platforms. Computational reproducibility has been a concern with these models as they deal with living data that is likely to change, contain personal information and therefore, ethical restrictions on their use. The workshop aims to bring together researchers and practitioners to discuss and exchange ideas towards potential interdisciplinary collaborations from computer science, social science, meta-science and other related disciplines. It provides a platform to present and critically analyze computational reproducibility guidelines and checklists for social sciences. The participants will focus on the open challenges and propose methodologies in computational reproducibility for social science towards improving research transparency, reproducibility, and reusability. Workshop participants are also encouraged to volunteer for and participate in the interactive replicability session attempting to replicate existing methods on sample data. (for participation in the interactive replicability session, please contact the workshop organizers at https://r2cass2025.wordpress.com/contact-2/) The workshop calls for theoretical and practical research contributions that employ qualitative, quantitative and analytical approaches including full papers, short papers, resource papers, position papers and posters on the topics (but not limited to): • Data and software management in machine learning and natural language processing-driven studies • Computational methods on sentiment analysis, bias analysis, toxicity, and sexism detection • Text categorization and topic analysis • Digital behavior analysis on social media and other digital platforms • Computational reproducibility checklists and workflows in social science • FAIR principles in ML research for social science • Open Science applications and reproducibility challenges • Metadata standards for ML/NLP research • Tools for replicating complex social science models using web-based data • Integration of open-source ML/NLP tools for web data analysis Submission guidelines As per AAAI-ICWSM guidelines, all papers must be submitted as high-resolution PDF files formatted in AAAI two-column, camera-ready style, for US Letter (8.5” x 11”) paper. Full papers are recommended to be 8 pages long and must be at most 11 pages long, including only the main text and the references. The mandatory Ethics Checklist (and brief additional Ethics Statement, if desired, see below), optional appendices, etc. do not count toward the page limit and should be placed after the references. Appendices, if they exist, should be placed after the Ethics Checklist. Revision papers and final camera-ready full papers can be up to 12 pages. Dataset papers must be no longer than 10 pages, Poster papers must be no longer than 4 pages, and Demo descriptions must be no longer than 2 pages. No source files (Word or LaTeX) are required at the time of submission for review; only the PDF file is permitted. Finally, the copyright slug may be omitted in the initial submission phase, and no copyright form is required until a paper is accepted for publication. For more on paper formatting guidelines, please visit ICWSM guidelines. Co-organizers Fakhri Momeni, GESIS – Leibniz Institute for the Social Sciences, Germany M. Taimoor Khan, GESIS – Leibniz Institute for the Social Sciences, Germany Arnim Bleier, GESIS – Leibniz Institute for the Social Sciences, Germany Tony Ross-Hellauer, Know-Center, Austria Regards Taimoor

1 0

Third call for participation - MentalRiskES@IberLEF2025: Early detection of mental risk
by María Dolores Molina González 19 Mar '25

19 Mar '25

*Release of training corpora and registration still open* *!!* ****We apologize for multiple postings of this e-mail**** MentalRiskES2025 describes the third edition of a novel task on early risk identification of mental disorders in Spanish comments from social media sources. The first and the second editions took place in the IberLEF evaluation forum as part of the SEPLN 2023 and SEPLN 2024. The task was resolved as an online problem, that is, the participants had to detect a potential risk as early as possible in a continuous stream of data. Therefore, the performance not only depended on the accuracy of the systems but also on how fast the problem is detected. These dynamics are reflected in the design of the tasks and the metrics used to evaluate participants. For this third edition, we propose two novel tasks, the first subtask is about the detection of the gambling disorder and the second subtask consists of detecting a type of Addiction. We would like to invite you to participate in the following tasks: 1. Risk Detection of Gambling Disorders (Binary classification) 2. Type of Addiction Detection (Multiclass classification) Find out more at https://sites.google.com/view/mentalriskes2025. MentalRiskES 2025 is part of the IberLEF Workshop and will be held in conjunction with the SEPLN 2025 conference in Zaragoza (Spain). ------------------------------------------------------------------------------- Important Dates ------------------------------------------------------------------------------- Feb 14th Registration open Feb 25th Release of trial corpora (trial server available) *Mar 19th Release of training corpora* Mar 31st Registration closed Apr 7th Release of test corpora and start of the evaluation campaign (test server available and trial submissions closed) Apr 14th End of evaluation campaign (deadline for submission of runs) Apr 18th Publication of official results and release of test gold labels May 12th Deadline for paper submission May 30th Acceptance notification Jun 16th Camera-ready submission deadline Sep TBD Publication of proceedings Note: All deadlines are 11:59PM UTC-12:00 Please reach out to the organizers at MentalRiskEs@IberLEF2025. The MentalRiskES 2025 organizing committee. ----------------------------------------------------------- Mas informacion sobre listas de correo en la Univ. de Jaen http://www.ujaen.es/sci/redes/listas/ -----------------------------------------------------------

1 0

1st Call for Papers: SLM4Health workshop @ AIME 2025
by Douglas Teodoro 19 Mar '25

19 Mar '25

SLM4Health: Improving Healthcare with Small Language Models (Workshop held in conjunction with AIME 2025 conference, June 26, 2025, 9am-5 pm, in Pavia (Italy) https://slm4health2025.netlify.app/ SLM4Health focuses on exploring the role and potential of Small Language Models (SLMs) in healthcare-related natural language processing (NLP) tasks. As SLMs gain traction in clinical settings due to their adaptability, efficiency, and lower resource demands, they offer a promising alternative to larger models, especially in resource-constrained environments. However, challenges such as performance trade-offs and ethical concerns—bias, privacy, and interpretability—need to be addressed. The workshop will bring together researchers and practitioners to discuss SLM applications in clinical tasks, compare them with large language models, and explore methods to overcome these challenges, ultimately aiming to improve patient care and clinician support through more tailored NLP tools. We invite researchers to present their latest research results on the following topics: Applications of SLMs for information extraction, sentiment analysis, named entity recognition, relation extraction from medical documents; Adaptation of SLMs to effectively handle diverse languages, especially those with limited resources; Sustainability of SLMs compared to LLMs; Ethical aspects, including safety, privacy concerns and bias mitigation, explainability; Possibilities and challenges of SLMs in tasks of medical language processing; Comparisons of SMLs and LLMs on specific use cases in healthcare; Evaluation metrics, datasets, and benchmarks. To enable reproducibility and some level of comparison among approaches, we encourage researchers to use the MIMIC-III or MIMIC-IV dataset. Submission deadline: April 15, 2025 __________________________________ Prof. Douglas Teodoro, PhD Department of Radiology and Medical Informatics Faculty of Medicine | University of Geneva Campus Biotech G6-N3 | Chemin des Mines 9, 1202 Genève tel: 022 379 02 25 | douglas.teodoro(a)unige.ch <mailto:douglas.teodoro@unige.ch> www.unige.ch/medecine <http://www.unige.ch/medecine>

1 0

Invitation to #SMM4H-HeaRD Shared Task 1: Detection of adverse drug events in multilingual and multi-platform social media posts
by Pierre Zweigenbaum 19 Mar '25

19 Mar '25

We are happy to announce the next round of [#SMM4H-HeaRD](https://healthlanguageprocessing.org/smm4h-2025/), which will be co-located with [AAAI ICWSM](https://www.icwsm.org/2025/index.html) 2025, the International AAAI Conference on Web and Social Media in June 23-26, 2025, Copenhagen, Denmark. Our team is organizing **Shared Task 1: Detection of adverse drug events in multilingual and multi-platform social media posts**. We provide data in German, French, Russian, and English, from platforms such as X and patient forums. We invite you to participate and attempt to beat our multilingual baseline! As the deadline is approaching, please [register](https://docs.google.com/forms/d/e/1FAIpQLScOdaY58DZQ_2aw_rISJut3G… as soon as possible. Here is the schedule: - Training and validation data available: February 14, 2025 - System predictions for validation data due: March 31, 2025 (23:59 CodaLab server time); this is a simple test to check that teams have a syntactically valid system - Test data available: April 7, 2025 - System predictions for test data due: April 11, 2025 (23:59 CodaLab server time) - Submission deadline for system description papers: May 2, 2025 - Notification of acceptance: May 23, 2025 - Camera-ready papers due: June 6, 2025 - Workshop: June 23, 2025 Please share this call with interested colleagues. Organizers of Task 1: Lisa Raithel, Philippe Thomas, Roland Roller, Elena Tutubalina, Takeshi Onishi, Dongfang Xu, Pierre Zweigenbaum BIFOLD, TU Berlin (XplaiNLP), DFKI SLT, AIRI, Cedars-Sinai Medical Center, LISN, CNRS, Université Paris Saclay

1 0

BEA 2025 Shared Task: Pedagogical Ability Assessment of AI-powered Tutors
by Ekaterina Kochmar 19 Mar '25

19 Mar '25

Conversational agents offer promising opportunities for education as they can fulfill various roles (e.g., intelligent tutors and service-oriented assistants) and pursue different objectives (e.g., improving student skills and increasing instructional efficiency), among which serving as an AI tutor is one of the most prevalent tasks. Recent advances in the development of Large Language Models (LLMs) provide our field with promising ways of building AI-based conversational tutors, which can generate human-sounding dialogues on the fly. The key question posed in previous research, however, remains: *How can we test whether state-of-the-art generative models are good AI teachers, capable of replying to a student in an educational dialogue?* In this shared task, we will focus on educational dialogues between a student and a tutor in the mathematical domain grounded in student mistakes or confusion, where the AI tutor aims to remediate such mistakes or confusions, with the goal of evaluating the quality of tutor responses along the key dimensions of tutor’s ability to (1) identify student’s mistake, (2) point to its location, (3) provide the student with relevant pedagogical guidance, that is also (4) actionable. Dialogues used in this shared task include the dialogue contexts from MathDial (Macina et al., 2023) and Bridge (Wang et al., 2024) datasets, including the last utterance from the student containing a mistake, and a set of responses to the last student’s utterance from a range of LLM-based tutors and, where available, human tutors, aimed at mistake remediation and annotated for their quality. **Tracks** This shared task will include five tracks. Participating teams are welcome to take part in any number of tracks. - Track 1 - Mistake Identification: Participants are invited to develop systems to detect whether tutors' responses recognize mistakes in students' solutions. - Track 2 - Mistake Location: Participants are invited to develop systems to assess whether tutors' responses accurately point to genuine mistakes and their locations in the students' responses. - Track 3 - Pedagogical Guidance: Participants are invited to develop systems to evaluate whether tutors' responses offer correct and relevant guidance, such as an explanation, elaboration, hint, examples, and so on. - Track 4 - Actionability: Participants are invited to develop systems to assess whether tutors' feedback is actionable, i.e., it makes it clear what the student should do next. - Track 5 - Guess the tutor identity: Participants are invited to develop systems to identify which tutors the anonymized responses in the test set originated from. **Participant registration** All participants should register using the following link: https://forms.gle/fKJcdvL2kCrPcu8X6 **Important dates** All deadlines are 11:59pm UTC-12 (anywhere on Earth). - March 12, 2025: Development data release - April 9, 2025: Test data release - April 23, 2025: System submissions from teams due - April 30, 2025: Evaluation of the results by the organizers - May 21, 2025: System papers due - May 28, 2025: Paper reviews returned - June 9, 2025: Final camera-ready submissions - July 31 and August 1, 2025: BEA 2025 workshop at ACL **Shared task website**: https://sig-edu.org/sharedtask/2025 **Organizers** - Ekaterina Kochmar (MBZUAI) - Kaushal Kumar Maurya (MBZUAI) - Kseniia Petukhova (MBZUAI) - KV Aditya Srivatsa (MBZUAI) - Justin Vasselli (Nara Institute of Science and Technology) - Anaïs Tack (KU Leuven) **Contact**: bea.sharedtask.2025(a)gmail.com<mailto:bea.sharedtask.2025@gmail.com>

1 0

Third call for participation - MentalRiskES@IberLEF2025: Early detection of mental risk
by María Dolores Molina González 19 Mar '25

19 Mar '25

*Release of training corpora and registration still open* *!!* ****We apologize for multiple postings of this e-mail**** MentalRiskES2025 describes the third edition of a novel task on early risk identification of mental disorders in Spanish comments from social media sources. The first and the second editions took place in the IberLEF evaluation forum as part of the SEPLN 2023 and SEPLN 2024. The task was resolved as an online problem, that is, the participants had to detect a potential risk as early as possible in a continuous stream of data. Therefore, the performance not only depended on the accuracy of the systems but also on how fast the problem is detected. These dynamics are reflected in the design of the tasks and the metrics used to evaluate participants. For this third edition, we propose two novel tasks, the first subtask is about the detection of the gambling disorder and the second subtask consists of detecting a type of Addiction. We would like to invite you to participate in the following tasks: 1. Risk Detection of Gambling Disorders (Binary classification) 2. Type of Addiction Detection (Multiclass classification) Find out more at https://sites.google.com/view/mentalriskes2025. MentalRiskES 2025 is part of the IberLEF Workshop and will be held in conjunction with the SEPLN 2025 conference in Zaragoza (Spain). ------------------------------------------------------------------------------- Important Dates ------------------------------------------------------------------------------- Feb 14th Registration open Feb 25th Release of trial corpora (trial server available) *Mar 19th Release of training corpora* Mar 31st Registration closed Apr 7th Release of test corpora and start of the evaluation campaign (test server available and trial submissions closed) Apr 14th End of evaluation campaign (deadline for submission of runs) Apr 18th Publication of official results and release of test gold labels May 12th Deadline for paper submission May 30th Acceptance notification Jun 16th Camera-ready submission deadline Sep TBD Publication of proceedings Note: All deadlines are 11:59PM UTC-12:00 Please reach out to the organizers at MentalRiskEs@IberLEF2025. The MentalRiskES 2025 organizing committee. ----------------------------------------------------------- Mas informacion sobre listas de correo en la Univ. de Jaen http://www.ujaen.es/sci/redes/listas/ -----------------------------------------------------------

1 0

Fully funded PhD position in NLP & Interactive Information Retrieval at HHU Düsseldorf
by Stefan Dietze 19 Mar '25

19 Mar '25

* PhD position (salary group TV-L 13, working time 100 %, initially limited to 3 years) * ******************************************************************* We are seeking a highly motivated PhD candidate to join the Data & Knowledge Engineering group (https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering) at Heinrich-Heine-University Düsseldorf in collaboration with GESIS, Cologne (http://www.gesis.org) in the context of the DFG-funded research project "EmergentIR: Improving Informational Web Search for Emerging Topics". The project investigates web search behavior and algorithms in the context of emerging topics, i.e. for novel and less-well understood search queries. An example is the search for COVID-19 related terms and topics during the early days of the pandemic, where available online resources and information were evolving quickly and reliable high quality information was sparse. * Your responsibilities * ************************ - Research in fields such as information retrieval, information extraction/NLP and/or human-computer interaction to investigate and support web search on emerging topics, e.g. to predict search intents, detect emerging topics or support ranking and retrieval of information. - Work and collaborate with an interdisciplinary team of researchers to develop and evaluate computational methods in the context of web search. - Publish and present research results at major scientific events * Your profile * **************** - Master’s degree in computer science or a related field. - Background in the following areas: information retrieval, natural language processing, or machine learning/deep learning. - Experience in programming (e.g. Python) and with machine learning frameworks such as PyTorch or TensorFlow - Fluency in English. German language skills are desirable but not required. * Contact & application process * ******************************** Please send your complete application documents (CV, certificates & transcripts) as a single PDF file to Stefan Dietze (stefan.dietze(a)hhu.de) by 15 April. For any informal enquiries about the position, please don't hesitate to get in touch via the same email address. -- Prof. Dr. Stefan Dietze Scientific Director Knowledge Technologies for the Social Sciences GESIS - Leibniz Institute for the Social Sciences Web: https://www.gesis.org/en/kts Chair of Data & Knowledge Engineering Heinrich-Heine-University Düsseldorf Web: https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering Phone: +49 (0)221-47694-421 Web: http://stefandietze.net

1 0

Email subject: GITT-2025 at MT Summit 2025 - SUBMISSION DATE EXTENDED TO MARCH 31st
by bsavoldi＠fbk.eu 18 Mar '25

18 Mar '25

Third International Workshop on Gender-Inclusive Translation Technologies (GITT) at MT Summit 2025 23 June 2025, Geneva, Switzerland https://sites.google.com/tilburguniversity.edu/gitt2025 @gitt-workshop.bsky.social Paper SUBMISSION DEADLINE EXTENDED We extend the GITT submission deadline to March 31st 2025. This is the final submission deadline. NEW Dates (Time zone: Anywhere on Earth) Final submission deadline: 31 March, 2025 Notification of Acceptance: 7 April, 2025 Camera Ready Copy due: 21 April, 2025 Workshop: 23 June, 2025 **Aim and scope** The Gender-Inclusive Translation Technologies Workshop (GITT) is set out to be the dedicated workshop that focuses on gender-inclusive language in translation and cross-lingual scenarios. The workshop aims to bring together researchers from diverse areas, including industry partners, MT practitioners, and language professionals. GITT aims to encourage multidisciplinary research that develops and interrogates both solutions and challenges for addressing bias and promoting gender inclusivity in MT and translation tools, including LMs applications for the translation task. **Topics** GITT invites technical as well as non-technical submissions, which consist of experimental, theoretical or methodological contributions. We explicitly welcome interdisciplinary submissions and submissions that focus on innovative, non-binary linguistic strategies and/or with sociolinguistically-informed perspectives. The topics of interest include, but are not limited to: - Models or methods for assessing and mitigating gender bias - New resources for inclusive language and gender translation (e.g., datasets, translation memories, dictionaries) - Social, cross-lingual, and ethical implications of gender bias - Qualitative and quantitative analyses on the potential limits of current approaches to gender bias in translation and MT, error taxonomies as well as best practices and guidelines - User-centric case studies on the impact of biased language and/or mitigating approaches which can include translators, post-editors, or monolingual MT users GITT is also open to other non-listed topics aligned with the scope of the workshop and works focusing on non-textual modalities (e.g., audiovisual translation) **Submission** We welcome four types of submissions, two archival and two non-archival. ARCHIVAL - Research papers: of at least 4 up to 10 pages (excluding references) - Extended Abstracts: up to 2 pages (including references) Accepted papers and extended abstracts consisting of novel work will be published online as proceedings in the ACL Anthology. NON-ARCHIVAL - Research Communications: up to 2 pages (including references). We include a parallel submission policy in the form of Research Communications for papers related to the topic of GITT that were accepted in other venues in 2024 and 2025. - Potluck Communications: short abstract up to 500 words (including references). Potluck Communications offer a space for anyone—especially students and early career researchers—to discuss bold new ideas for collaboration, brainstorm about ongoing work, and explore future research directions. The communications will not be included in the proceedings, but will serve to promote the dissemination of research aligned with the scope of the workshop. All submissions should adhere to the MT Summit 2025 guidelines and style templates (PDF, LaTeX, Word) and be uploaded on Easychair (https://easychair.org/my/conference?conf=mtsummit2025) **Workshop organizers** Janiça Hackenbuchner, University of Ghent Luisa Bentivogli, Fondazione Bruno Kessler Joke Daems, University of Ghent Chiara Manna, University of Tilburg Beatrice Savoldi, Fondazione Bruno Kessler Eva Vanmassenhove, University of Tilburg

1 0

Data in Historical Linguistics seminar: Quantitative methods on small corpora for historical sociolinguistics: a case study of Old French fabliaux
by Mathilde Bru 18 Mar '25

18 Mar '25

The sixth talk of the Data in Historical Linguistics Seminar Series 2025 will take place remotely on Monday 31st March 2025 at 5pm BST. Zinaida Geylikman (Université Paris Cité) will present on ‘Quantitative methods on small corpora for historical sociolinguistics: a case study of Old French fabliaux.’ Registration for this talk will close at midnight on Friday 28th March and the link for this can be accessed here: https://docs.google.com/forms/d/e/1FAIpQLSciGltVD7ft6dgyMu45DrYbEB0WyJ67RyU… Participants will receive a Microsoft Teams link via email on the morning of the talk. The abstract for this talk can be found here: https://datainhistoricallinguistics.wordpress.com/2024/12/31/geylikman/ The programme and registration links for all talks in the series can be found on our website: https://datainhistoricallinguistics.wordpress.com/2025-programme/ This seminar series is run by Andrea Farina and Mathilde Bru (King’s College London) and is aimed at PhD students and early career researchers. The purpose of this seminar series is to bring together researchers working on historical linguistics with a quantitative approach, and to discuss current avenues of research in this topic. We hope that these seminars will nurture international collaboration and establish academic ties among researchers working on similar topics in this field. Join our mailing list<https://datainhistoricallinguistics.wordpress.com/join-us/>!

1 0

2nd call for papers: GEM-Squared workshop @ ACL (data released)
by Simon Mille 18 Mar '25

18 Mar '25

Dear colleagues, As announced a few weeks ago, the fourth iteration of the GEM workshop will be held as part of ACL <https://2025.aclweb.org/>, July 27–August 1st, 2025. This year we’re planning a major upgrade to the GEM workshop, which we dub GEM2, through the introduction of a large dataset of 1B model predictions together with prompts and gold standard references, encouraging researchers from all backgrounds to submit work on meaningful, efficient and robust evaluation of LLMs. In this second CfP, we are happy to announce that (i) the large datasets of model predictions have been released (DOVE <https://slab-nlp.github.io/DOVE/> and DataDecide <https://huggingface.co/datasets/allenai/DataDecide-eval-instances>), and (ii) GEM2 will host the ReproNLP <https://repronlp.github.io/> shared task results. Important Dates - April 11: Direct paper submission deadline (ARR). - May 5: Pre-reviewed (ARR) commitment deadline. - May 19: Notification of acceptance. - June 6: Camera-ready paper deadline. - July 7: Pre-recorded videos due. - July 31 - August 1: Workshop at ACL in Vienna. Please check the GEM website <https://gem-benchmark.com/workshop> for submission links, templates, and more details. For any questions, please email gem-benchmark-chairs(a)googlegroups.com. best, simon *ADAPT Research Centre / Ionaid Taighde ADAPT* *School of Computing, Dublin City University, Glasnevin Campus / Scoil na Ríomhaireachta, Campas Ghlas Naíon, Ollscoil Chathair Bhaile Átha Cliath*

1 0

2025

2024

2023

2022