- Corpora - ELRA lists

Call for bids – ECIR 2028
by Ingo Frommholz 10 Jun '26

10 Jun '26

[Apologies for cross-posting] In 2028 ECIR will have its 50th anniversary – and you could be organising it! Where will the journey go after Southampton? We are interested in hearing from bidders who wish to organise the European Conference on Information Retrieval 2028 (ECIR 2028) in a European country. ECIR is the major information retrieval conference in Europe (CORE-A ranked). ECIR 2027 will be in Southampton, UK. Recent ECIR events have taken place in Delft, The Netherlands, 2026; Lucca, Italy, 2025; Glasgow, Scotland, 2024; Dublin, Ireland, 2023; Stavanger, Norway, 2022; Lucca, Italy, 2021 (online); Lisbon, Portugal, 2020 (online); Cologne, Germany, 2019; Grenoble, France, 2018; Aberdeen, UK, 2017. Interested? More info – https://www.bcs.org/membership-and-registrations/member-communities/informa… — Univ.-Prof. Dr. Ingo Frommholz (he/him), PhD, Dipl.-Inform., FBCS, FHEA Professor and Head of School of Applied Data Science Modul University Vienna, Austria Adjunct Professor, Bern University of Applied Sciences, Switzerland Chair, BCS Information Retrieval Specialist Group, UK Web: http://www.frommholz.org/ | Email: ifrommholz(a)acm.org Bluesky: @frommholz.org | Mastodon: @ingo@idf.social

1 0

CfP: JAIR Special Track on Challenges in Pretraining
by Marco Antonio Stranisci 10 Jun '26

10 Jun '26

-------------------------------------------------------------------------------------- Call for Paper for the Special Track on Challenges in Pretraining -------------------------------------------------------------------------------------- Deadline: 31st October 2026 Website: https://jair.org/index.php/jair/specialtrack_pretrain Contact: marco.stranisci(a)gmail.com OVERVIEW -------------------------------------------------------------------------------------- The invention of Large Language Models (LLM) determined a paradigm shift in machine learning, which changed its focus from task-specific training of systems to their self-supervised pretraining on large data resources. This turn had a broad impact in Artificial Intelligence (AI) research, as it raises new challenges in the design and development of Machine Learning systems. Whilst the growing interest in language modeling is driving rapid innovation, research on pretraining remains fragmented. Training LLMs is a highly intersectoral endeavor that raises research challenges spanning a wide range of topics, addressed by multiple communities involved in the design of these technologies. However, the resulting research outputs are scattered across many venues, limiting a comprehensive understanding of the general challenges related to pretraining. The special track addresses this gap seeking submissions on the state of the art and existing challenges on pretraining. Specifically, the track focuses on the following broad content areas: ● *Pretraining Datasets*. The quality of pretraining datasets is crucial for developing LLMs. Recent research focuses not just on larger datasets but on data selection strategies aimed to improve performance efficiently. Data quality is also tied to governance in pretraining pipelines, requiring strong stewardship and archival practices. Methods like active learning and data minimization help reduce the amount of data needed while maintaining high-quality pretraining. ● *Model Architectures*. A large body of research explores what and how LLMs learn during pretraining, including studies on uncertainty, multilingual and token-free learning, and Mixture of Experts methods. It also examines architectural innovations: although autoregressive models dominate, alternatives such as neuro-symbolic and spiking neural networks aim to create more cognitively inspired learning approaches. ● *Computational Resources*. The growing computational need of training LLMs strongly influences pretraining strategies. Approaches like federated learning support distributed training while protecting data ownership and shaping model representations. Efficiently distributing computational loads to possibly heterogeneous platforms remains a complex optimization problem itself while alternative methods focus on training models under constrained computational budgets. This special track seeks contributions on the following, non-exhaustive list of topics: ● Advanced data selection and filtering strategies for pretraining ● Data stewardship and archival practices of pretraining data ● Data minimization and Active Learning approaches ● Neurosymbolic and cognitively inspired learning architectures ● Quantification and analysis of uncertainty in pretraining ● Tokenizer-free and alternative input representation approaches ● Distributed and federated learning for large-scale pretraining ● Language modeling under low-computational settings ● Benchmarking and resource efficiency analyses in pretraining The special track welcomes submissions that introduce innovative methodological contributions relevant to its topics. Papers that primarily rely on standard methods or established benchmarks fall outside the scope of this track. KEY DATES -------------------------------------------------------------------------------------- JAIR special tracks have a submission window. Papers can be submitted anytime during that window. They are reviewed as they arrive and accepted papers will go to production and will be published asynchronously as soon as they are ready, first as part of a usual JAIR pipeline and later on a JAIR webpage dedicated to the special track. *Target timeline*: ● Submission period: April 30, 2026 - October 31, 2026 ● First round of review and authors' notification: December 2026 ● Resubmissions: February 2027 ● Second round of review and authors' notification: April 2027 ● Final manuscripts: June 2027 Accepted submissions will be added to this page on publication. TRACK EDITORS -------------------------------------------------------------------------------------- Golnoosh Farnadi, McGill University Ferdinando Fioretto, University of Virginia Lucie Flek, University of Bonn Raphael Fischer, TU Dortmund Marco Antonio Stranisci, University of Turin ---- Marco personal website: https://marcostranisci.github.io/ ---- Marco personal website: https://marcostranisci.github.io/

1 0

CFP 4th TRR 318 Conference: Scaffolding Social XAI 16–17 March 2027 in Paderborn, Germany
by j.b.fisher＠uni-paderborn.de 10 Jun '26

10 Jun '26

Dear Colleagues, we are the TRR 318 Constructing Explainability and are hosting the 4th TRR 318 Conference: Scaffolding Social XAI which will be held 16–17 March 2027 in Paderborn, Germany. Website: https://trr318.uni-paderborn.de/konferenzen/scaffolding-social-xai We invite submissions of: Extended Abstracts: 2 pages - Deadline: 1 November 2026. ________________________________________ CfP: 4th TRR 318 Conference: Scaffolding Social XAI ________________________________________ https://trr318.uni-paderborn.de/konferenzen/scaffolding-social-xai 16–17 March 2027 in Paderborn, Germany 4th TRR 318 Conference: Scaffolding Social XAI We invite submissions to explore, extend, and consolidate the interdisciplinary boundaries of this exciting research field. ________________________________________ Important Deadlines Extended Abstracts: (2 pages including references) deadline: 1 November 2026. Notification of Acceptance: 10 December 2026. ________________________________________ Speakers: tba ________________________________________ Topics ________________________________________ The interdisciplinary Collaborative Research Center TRR 318 Constructing Explainability explores the ways in which explanations are created, negotiated, and made meaningful across contexts. Previous editions have addressed dimensions of explanation, including understanding and contextualising. The 4th TRR 318 Conference turns to a perspective that has recently gained increasing attention in explainability research: Social XAI (sXAI), which conceptualises explainability as a social practice. Accordingly, the topic of the upcoming conference is "Scaffolding Social XAI." By scaffolding, we refer to the social and communicative supports that enable critical and interactive engagement with ideas, assumptions, technologies, and tools. This encompasses the ability to critically interrogate AI-generated outputs in practice (e.g., in educational contexts such as assessment or diagnosis), including emerging forms of XAI literacy as a situated competence. We invite contributions from a wide range of disciplines that address explainability as a social phenomenon, including work on: (a) explanatory practices in dyadic interaction; (b) explainability and interpretability in AI systems; or (c) explanation processes embedded in institutional and organisational settings. Topics of interest include (but are not limited to): • Explainability as a social and interactional practice; • Social XAI and critical perspectives on AI explainability; • Everyday explanations in dyadic or group interaction; • Explainability, interpretability, and transparency in human–AI or human–robot interaction; • The role of explanation in institutional, educational, clinical, or workplace settings; • Critical engagement with explanatory tools, technologies, and design practices; • XAI literacy and critical use of AI in professional practices (e.g., educational assessment, diagnosis, decision-making); • Theoretical, empirical, and methodological approaches to studying explanation across disciplines. ________________________________________ Submission Instructions Submissions are managed via EasyChair. Format Requirements: Please submit an extended abstract containing the following: • Length: 2 pages (including references). • Content: Title, author name(s), affiliation(s), email address(es), and 4–5 keywords. • Template: Please use the provided Word or LaTeX template. Presentation Formats: Accepted contributions may be presented as: • A 20-minute oral presentation • A poster Please indicate your preferred format within the submission. ________________________________________ Additional Information • Publication: Accepted papers will be published in the Proceedings of the 4th TRR 318 Conference. • Contact: Should you have any queries, please contact us at conference(a)trr318.uni-paderborn.de. General chairs: Amit Singh, Josephine B. Fisher, Katharina J. Rohlfing Program chairs: tbd — Conference Organization SFB/TRR 318 CONSTRUCTING EXPLAINABILITY Universität Paderborn SFB/TRR 318 Zukunftsmeile 2 33102 Paderborn Phone +49 5251 60-4493 Email conference(a)trr318.uni-paderborn.de Web trr318.de

1 0

Final Reminder: Summer School on LLMs and NLP
by Amal Haddad 09 Jun '26

09 Jun '26

Final Reminder: Summer School on LLMs and NLP Alicante, Spain · 15-17 June 2026 *Limited places still available* We are pleased to invite students, researchers, and practitioners to participate in a three-day Summer School on Large Language Models and Natural Language Processing, taking place from 15 to 17 June in Alicante, Spain. The programme brings together leading researchers and experts to discuss the foundations, applications, and future directions of LLMs and NLP (https://summer-school.gplsi.es/programme/ [1]). The summer school will feature keynote talks by Roberto Navigli on lexical semantics in the LLM era and Preslav Nakov on open, safe, factual, and language-specific large language models, as well as a panel on the paradigm shift and the future of NLP featuring well-known NLP experts. Across three days, participants will explore a wide range of timely topics, including foundations of LLMs, LLMs for low-resource languages, datasets and bias, explainable AI in NLP, machine translation, eye-tracking and gaze data, digital humanities, legal NLP, quantum NLP, sentiment analysis, and model optimisation. Beyond the scientific programme, participants will also have the opportunity to enjoy Alicante's Mediterranean weather, welcoming atmosphere, and excellent local food. The summer school offers an excellent opportunity to learn from international experts, exchange ideas, and discuss current challenges in NLP and language technologies. There are limited places still available. See https://summer-school.gplsi.es/ for further details and registration. We warmly encourage interested participants to join us for this exciting event. -- Amal Haddad Haddad (She/her) Facultad de Traducción e Interpretación Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad Lexicon Research Group |http://lexicon.ugr.es/haddad Co-Convenor, BAAL SIG 'Humans, Machines, Language'|https://r.jyu.fi/humala Event Coordinator, BAAL SIG 'Language, Learning and Teaching' =============== Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción. This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it" =============== Links: ------ [1] https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsummer-sc…

1 0

Extended Deadline: Oxford Test of English Learner Corpora Research Competition
by colinfinnerty＠yahoo.co.uk 08 Jun '26

08 Jun '26

We are delighted to announce the launch of the first annual Oxford Test of English Learner Corpora (OTELC) Research Competition, hosted by Oxford University Press. The application deadline has now been extended, and the eligibility criteria expanded. This is an exciting opportunity for master’s and PhD students, as well as recent master’s graduates in linguistics, corpus linguistics, language assessment, English language teaching/TESOL, or a closely related field to conduct original research using authentic test taker responses. Why participate? • Free access to the Oxford Test of English Learner Corpora • An opportunity to base a dissertation/thesis, coursework, or independent/pilot research projects on real learner data • Winner’s prize: a 13-inch iPad Air • Winning research published on the Oxford English Assessment Research webpage Eligibility Applicants must be either: • Currently enrolled in a master’s or PhD programme, or • A recent master’s graduate (within the past 12 months) How to apply Applicants should submit a research proposal using the official application form, outlining their aims, research questions, and how they plan to use the OTELC, along with a letter of reference from a tutor/professor supporting their application and confirming their academic suitability for the project. We especially welcome proposals exploring the intersection of corpus linguistics and second language assessment, as well as work that develops pedagogical materials or other applications of research-based findings. Full competition details are available on the competition page: https://elt.oup.com/feature/global/learner-corpora Extended Deadline: Friday, 31 July 2026 For any queries, please contact: OTELC(a)oup.com Please feel free to share this opportunity widely with colleagues and students. Best wishes, The Oxford Test of English Corpus team

1 0

Final call for participation: Convergence 2026: Human-AI Integration for Multilingual and Accessible Communication
by Constantin Orasan 08 Jun '26

08 Jun '26

Dear Colleagues, We are pleased to invite you to participate in Convergence 2026: Human-AI Integration for Multilingual and Accessible Communication, which will take place between 17th and 19th June at University of Surrey, Guildford, UK. This year, in addition to onsite participation, we are delighted to offer an online participation option to make the event more accessible to a wider audience. Whilst the onsite registration is now closed, there is the possibility to attend the conference online. Building on the success of the first Convergence conference<https://www.surrey.ac.uk/centre-translation-studies/convergence-2023> in 2023, which explored the responsible and intelligent integration of human and machine capabilities in translation and interpreting, the Centre for Translation Studies is proud to organised Convergence 2026: Human-AI Integration for Multilingual and Accessible Communication. The second edition of the Convergence conference will create an opportunity to bring together innovative research on the evolving landscape of AI in the context of multilingual and accessible communication, reflecting on the complexity and effects of using AI-driven technologies in these fields. The conference will foster a multidisciplinary dialogue that will generate new theoretical perspectives and practical research, focusing on themes such as the ethical aspects of AI in translation and interpreting, AI-enabled digital accessibility and societal inclusion, and the impact of Generative AI on language mediation. We will also examine the evolving role of language professionals, the power of Large Language Models (LLMs) in supporting multilingual communication, and the crucial need for responsible use of language AI in the public sector. The conference programme includes keynote talks by leading experts, paper presentations, panel discussions, and poster sessions, providing a rich environment for exchanging ideas and fostering collaboration. For more information about the conference, programme details, and registration link, please visit: https://www.surrey.ac.uk/centre-translation-studies/convergence-2026 Online participants will be able to view all scheduled paper presentations, keynote speeches, and panel sessions via our streaming platform. They will have the opportunity to ask questions using the online Q&A facilities provided. To register, please visit https://www.surrey.ac.uk/centre-translation-studies/convergence-2026/regist… If you have any questions, please contact us at cts_inquiries(a)surrey.ac.uk<mailto:cts_inquiries@surrey.ac.uk>. Kind regards, Constantin -- Prof Constantin Orăsan Professor of Language and Translation Technologies Room 06LC03, Phone extension: 4115 Centre for Translation Studies<https://www.surrey.ac.uk/centre-translation-studies>, University of Surrey Third Floor | Library | University of Surrey | Guildford | Surrey | GU2 7XH Phone: +44 (0) 1483 684115 Personal page: https://www.surrey.ac.uk/people/constantin-orasan -- Prof Constantin Orăsan Professor of Language and Translation Technologies Room 06LC03, Phone extension: 4115 Centre for Translation Studies<https://www.surrey.ac.uk/centre-translation-studies>, University of Surrey Third Floor | Library | University of Surrey | Guildford | Surrey | GU2 7XH Phone: +44 (0) 1483 684115 Personal page: https://www.surrey.ac.uk/people/constantin-orasan

1 0

Second CfP: 13th Web-as-Corpus (WaC-13) Workshop @EMNLP2026, Budapest, Hungary, 24-29 Oct, 2026
by Veronika Laippala 08 Jun '26

08 Jun '26

Second Call for Papers 13th Web-as-Corpus (WaC-13) Workshop @EMNLP2026, Budapest, Hungary, 29 Oct, 2026 https://wackyworkshop.org The World Wide Web has evolved from a resource for building linguistic corpora into the central data infrastructure powering modern natural language processing and Large Language Models (LLMs). As web-scale data increasingly shapes AI systems’ knowledge and capabilities, understanding its quality, representativeness, and ethical implications has become critical. At the same time, the “more is better” paradigm is being challenged by issues such as machine-generated content, data toxicity, limited metadata, and the under-representation of many languages and domains. These challenges call for a shift toward Data-Centric AI, focusing on the curation, analysis, and responsible use of web-derived data. The 13th Web-as-Corpus (WaC-13) workshop provides a multidisciplinary forum for research addressing the full lifecycle of web data. We invite submissions on methods, resources, and applications related to web corpora, with special emphasis on multilingual data and less-resourced languages. Topics of interest include (but are not limited to): * Creation and evaluation of high-quality datasets for foundation models (e.g., data collection, filtering, enrichment, language identification) * Use of web data in empirical linguistic research * Analysis of web-scale corpora for quality, representativeness, and societal insights * Ethical and legal aspects of collecting, sharing, and using web data By bringing together researchers from NLP, linguistics, and the social sciences, WaC aims to advance best practices for one of the field’s most influential data sources. Important dates: Direct paper submission deadline: 7 August, 2026 Pre-reviewed ARR commitment deadline: 1 September, 2026 Notification of acceptance: 5 September, 2026 Camera-ready paper due: 20 September, 2026 Workshop date: 29 Oct, 2026 Submissions: Submit your papers through https://openreview.net/group?id=EMNLP/2026/Workshop/WaC-13 or through ARR commitment https://openreview.net/group?id=EMNLP/2026/Workshop/WaC-13_ARR_Commitment. Workshop Organizers: Nikola Ljubešić, Jožef Stefan Institute, Slovenia Yves Scherrer, University of Oslo, Norway Laurie Burchell, Common Crawl Foundation Veronika Laippala, University of Turku, Finland Pedro Ortiz Suarez, Common Crawl Foundation Thom Vaughan, Common Crawl Foundation Vuk Dinić, Jožef Stefan Institute, Slovenia

1 0

PhD Proposal GRAIL - NLP - Graph - France
by julien.romero＠telecom-sudparis.eu 08 Jun '26

08 Jun '26

Thesis Description The GRAIL (Grounded Recommendation with Auditable Inference Links) project addresses the structural challenges of two-sided cold start in job recommendation. The research focuses on algorithmic decision support within the highly dynamic labor market, where interactions are sparse and profiles are frequently new. The thesis centers on three primary scientific objectives: Uncertainty-Aware Extraction: Design a pipeline to extract missing skill tokens from unstructured resumes and job descriptions while maintaining strict document-level provenance and confidence scores. Hybrid Generative Recommendation: Develop a recommender component that infers missing candidate-job interaction edges to restore predictive power. Each generated edge must be accompanied by an auditable inference link, functioning as a compact evidence subgraph connecting job requirements directly to candidate profiles. Trustworthiness and Fairness Auditing: Implement rigorous equal-opportunity auditing focused on the candidate selection stage, utilizing sensitive attributes strictly for offline reporting to ensure transparent utility-fairness trade-offs. Timeline The PhD program is structured as a three-year project with defined research milestones: Year 1: Finalize the two-sided cold-start protocol, implement the uncertainty-aware skill extraction pipeline, and establish strong evaluation baselines to deliver the initial auditable evidence graph. Year 2: Develop the grounded generative module to propose candidate recommendations using auditable inference links and train the explicit predictive scorer. Year 3: Deploy equal-opportunity auditing mechanisms, quantify system robustness to extraction uncertainty, and consolidate the benchmark, pretrained models, and toolkit into a documented release. Required Skills and Qualifications Strong academic background in computer science, specifically in machine learning, natural language processing (NLP), or recommender systems. Proficiency in programming, algorithm design, and large-scale experimentation. Familiarity with heterogeneous graph-based learning or foundation models. Prior research experience is strongly recommended. Administrative Details Host Institution: Télécom SudParis, Institut Polytechnique de Paris (IPParis). Laboratory: SAMOVAR lab. Funding: Financed by Hi!Paris. Supervisor: Julien Romero. Start Date: September 2026 (flexible). Contract: Full-time. How to Apply To apply for this position, please submit the following materials: A comprehensive resume. A cover letter detailing your research interests and alignment with the GRAIL project. A complete transcript of your academic grades. Recommendation letters (highly appreciated).

1 0

MIAI–PRAIRIE Online Seminar: Adele Goldberg (Princeton): Compositionality, creativity in natural language and LLMs (15/6, 5pm)
by Thierry Poibeau 08 Jun '26

08 Jun '26

MIAI–PRAIRIE Online Seminar on LLMs and the Study of Language, Mind, and Society Our next speaker will be Adele Goldberg, from Princeton, for a talk on ''Compositionality, creativity in natural language and LLMs’’, on Monday 15 June, 5pm (French time), Online, free access, with no registration Organized by Caroline Rossi (Université Grenoble Alpes / MIAI) and Thierry Poibeau (ENS–PSL / PRAIRIE–PSAI). Next year’s speakers will include Eloïse Boisseau (AMU, Marseille), and Dallas Card (U. Michigan), among others. ---- *** Compositionality, creativity in natural language and LLMs *** Monday 15 June, 5pm (French time), online (free access, no registration) Connexion link: https://webinaire.numerique.gouv.fr/meeting/signin/invite/78275/creator/433… Adele Goldberg, Princeton Abstract: Today’s LLMs interpret and produce familiar and novel language without abstract symbolic rules. An appreciation of the complexity of natural languages indicates this is more a feature than a bug. New evidence demonstrates that LLMs are also at least as creative as the typical person. Parallels between LLMs and human language highlight the statistical and functional aspects of both systems. For cognitive scientists, LLMs promise of a deeper understanding of compositionality and creativity. Bio: Adele Goldberg is the M. Taylor Pyne Professor of Psychology at Princeton University. Her research explores the formal, semantic, social, statistical, and memory-based factors that shape how languages are learned, represented, and used. She is fascinated by what makes human language both creative and constrained, across adults and children, first and second language learners, and neurotypical and atypical populations. Her current work touches on word meaning, language change, island constraints, metaphor and emotion, good-enough language production, and the forms and functions of grammatical constructions. She is a Fellow of the Linguistic Society of America, the Association for Psychological Science, and the Cognitive Science Society, and an elected member of the American Academy of Arts and Sciences.

1 0

Lecturer jobs at Leeds University UK
by Eric Atwell 08 Jun '26

08 Jun '26

Leeds Uni is recruiting Lecturers in science and engineering, including AI and computer science - https://jobs.leeds.ac.uk/Vacancy.aspx?ref=EPSFD1003 Salary £51,753 to £59,966 p.a. Post Type Full Time, (permanent) Closing Date Tuesday 30 June 2026

1 0