March 2025 - Corpora

2nd call for papers: GEM-Squared workshop @ ACL (data released)
by Simon Mille 18 Mar '25

18 Mar '25

Dear colleagues, As announced a few weeks ago, the fourth iteration of the GEM workshop will be held as part of ACL <https://2025.aclweb.org/>, July 27–August 1st, 2025. This year we’re planning a major upgrade to the GEM workshop, which we dub GEM2, through the introduction of a large dataset of 1B model predictions together with prompts and gold standard references, encouraging researchers from all backgrounds to submit work on meaningful, efficient and robust evaluation of LLMs. In this second CfP, we are happy to announce that (i) the large datasets of model predictions have been released (DOVE <https://slab-nlp.github.io/DOVE/> and DataDecide <https://huggingface.co/datasets/allenai/DataDecide-eval-instances>), and (ii) GEM2 will host the ReproNLP <https://repronlp.github.io/> shared task results. Important Dates - April 11: Direct paper submission deadline (ARR). - May 5: Pre-reviewed (ARR) commitment deadline. - May 19: Notification of acceptance. - June 6: Camera-ready paper deadline. - July 7: Pre-recorded videos due. - July 31 - August 1: Workshop at ACL in Vienna. Please check the GEM website <https://gem-benchmark.com/workshop> for submission links, templates, and more details. For any questions, please email gem-benchmark-chairs(a)googlegroups.com. best, simon *ADAPT Research Centre / Ionaid Taighde ADAPT* *School of Computing, Dublin City University, Glasnevin Campus / Scoil na Ríomhaireachta, Campas Ghlas Naíon, Ollscoil Chathair Bhaile Átha Cliath*

1 0

PhD scholarship UPF (Barcelona) - Text Generation with applications to the Generation of Accessible Content
by Horacio Saggion 18 Mar '25

18 Mar '25

PhD scholarship (4-years) in Natural Language Processing: Text Generation with applications to the Generation of Accessible Content - Start date 1/10/2025 Deadline for applications: 30/4/2025 The TALN (Natural Language Processing) Research Group in Barcelona (Spain) is looking for a PhD student in the area of text generation with applications to text simplification and text style adaptation to accessible language among others adopting current and future developments in Deep Learning paradigms (e.g. Large Language Models) as backbones. The student will join a dynamic, creative, and collaborative research team under the guidance of Prof. Horacio Saggion, head of the TALN research group. The student will be encouraged to collaborate with researchers involved in the Horizon Europe projects iDEM and IDEAL in the areas of Democracy and Inclusion. The candidate will benefit from a scholarship (PIPF1) for four years, plus social benefits (national health insurance) and 4 weeks annual leave. Current salaries for this position are as follows: First year compensation €18180,49 gross salary Second year compensation €18180,49 gross salary Third year compensation €19479,14 gross salary Fourth year compensation €24348,78 gross salary The selected candidate for the scholarship must be admitted to the PhD program of the Department of Engineering at Universitat Pompeu Fabra, Barcelona. Under the scholarship, the selected candidate will teach 45 hours per year in subjects related to the Engineering degrees offered by the School of Engineering which the candidate is confident with. The selected candidate will benefit from participation in dissemination events such as conferences, workshops, tutorials or summer schools depending on the availability of funding and on the capability of the student to produce publishable research output. The candidate should have a Bachelor degree in Computer Science (or related fields) and a Master degree in Natural Language Processing (NLP) or Computational Linguistics. She/he should have demonstrated experience in research (experimentation, software coding, paper writing,presentation of research, etc.) with current methods in NLP such as Large Language Models and machine learning in tasks such as summarization, simplification, text generation, etc. She/he should have excellent knowledge of English. Knowledge of Spanish, Catalan, Italian or other languages is a plus. The PhD program has the following deadlines for application: https://www.upf.edu/web/doctorats/calendari-de-preinscripcio The candidates are encouraged to contact Prof. Horacio Saggion well before applying for the PhD program (horacio.saggion(a)upf.edu) before 30/4/2025. The TALN group Web page is: https://www.upf.edu/web/taln The Department Web page: https://www.upf.edu/web/etic The candidate is encouraged to visit the Web page of the PhD program to learn more about requirements to apply for a PhD: https://www.upf.edu/en/web/doctorats/tecnologies-de-la-informacio-i-les-com… -- Horacio Saggion Full Professor / Chair in Computer Science and Artificial Intelligence Head of the Natural Language Processing Group - TALN Project Coordinator iDEM Project (HE) Co-PI of the AI-BOOST project (HE) Co-PI of the IDEAL project (HE) Universitat Pompeu Fabra https://twitter.com/h_saggion https://www.linkedin.com/in/horacio-saggion-1749b916

1 0

Iberlef2025: PRESTA Task Call for Participation
by Eugenio Martínez Camara 18 Mar '25

18 Mar '25

[apologies if you receive multiple copies of this call] [Spanish version below] CALL FOR PARTICIPATION - IberLEF 2025 - PRESTA: Questions and Answers about Tables in Spanish We are pleased to announce the first IberLEF task on Question Answering on Tabular Data: PRESTA. The PRESTA shared-task consists of Question Answering over Tabular Data making use of the DataBenchSPA benchmark. DataBenchSPA is a benchmark composed of real-world table datasets from different domains and with large size of rows and columns, as well as a wide variety of data types that allow to assess distinct sort of questions related to each data type. We propose a task to encourage participants to develop a system that answers the questions of the kind present in DataBenchSPA over day-to-day datasets, where the answer is either a number, a categorical value, a boolean value or lists of several types. DataBenchSPA can be used as a training and validation set, while we will release another test set explicitly compiled for the task competition. The system developed by the participants will be provided by a series of (dataset, question) pairs and will need to provide an answer which would then be compared with a gold standard. The answer might be achieved through a variety of methods. In our paper [1] we illustrate two different approaches: In-Context Learning and Code Generation. You may use any of these or come up with your own approach. There will be two subtasks: Subtask I : DataBenchSPA QA Participants will be provided with a dataset (of any size) and a question over it. The question should be answered using the data from the dataset only. Subtask II: DataBenchSPA Lite QA The task is essentially the same as the previous subtask, but involves using the sampled version of each dataset with a maximum of 20 rows per dataset. The question should be answered using the data from the sampled dataset only. For the test set, we will similarly provide a reduced version of each dataset for this subtask. This task is especially relevant when testing for models with a smaller window size. Important Dates Release of training data: 18 March 2025 Release of test data - competition starts: 30 April 2025 Submission of the results - competition ends: 12 May 2025 Submission of the description paper: 30 May 2025 Task Organizers Jorge Osés Grijalba - Graphext L. Alfonso Ureña-López - University of Jaén Eugenio Martínez Cámara - University of Jaén Jose Camacho-Collados - Cardiff University Codabench: https://www.codabench.org/competitions/5538/ Google Group: CREAR POR JORGE [Spanish version] CONVOCATORIA DE PARTICIPACIÓN EN - IberLEF 2025 - PRESTA: PREGUNTAS Y RESPUESTAS SOBRE TABLAS EN ESPAÑOL Anunciamos por primera vez en IberLEF una tarea competitiva sobre recuperación de respuestas sobre sobre datos tabulares, en particular la tarea PRESTA: Preguntas y Respuestas sobre Tablas en Español. La tarea PRESTA consiste en responder preguntas sobre datos tabulares utilizando como fuente de información el conjunto de datos DataBenchSPA. DataBenchSPA está compuesto por conjuntos de datos de tablas del mundo real de diferentes dominios y con un gran tamaño de filas y columnas, así como una amplia variedad de tipos de datos que permiten evaluar distintos tipos de preguntas relacionadas con cada tipo de datos. Animamos a los participantes a desarrollar un sistema que responda preguntas del tipo presentes en DataBenchSPA sobre conjuntos de datos del día a día, donde la respuesta puede ser un número, un valor categórico, un valor booleano o listas de varios tipos. DataBenchSPA se puede utilizar como conjunto de entrenamiento y validación, mientras que lanzaremos otro conjunto de prueba compilado explícitamente para la competencia de tareas. El sistema desarrollado por los participantes estará compuesto por una serie de pares (conjunto de datos, preguntas) y deberá proporcionar una respuesta que luego se comparará con un respuesta de referencia. La respuesta podría lograrse mediante una variedad de métodos. En nuestro artículo [1] ilustramos dos enfoques diferentes: aprendizaje en contexto y generación de código. Puede utilizar cualquiera de estos o crear su propio enfoque. Subtareas: Subtarea I: DataBenchSPA completo Los participantes recibirán un conjunto de datos (de cualquier tamaño) y una pregunta sobre él. La pregunta debe responderse utilizando únicamente los datos del conjunto de datos. Subtarea II: DataBenchSPA Reducido La tarea es esencialmente la misma que la subtarea anterior, pero implica utilizar la versión muestreada de cada conjunto de datos con un máximo de 20 filas por conjunto de datos. La pregunta debe responderse utilizando únicamente los datos del conjunto de datos muestreado. Para el conjunto de prueba, proporcionaremos de manera similar una versión reducida de cada conjunto de datos para esta subtarea. Esta tarea es especialmente relevante cuando se prueban modelos con un tamaño de contexto más pequeño. Fechas Importantes Publicación de datos de entrenamiento: 18 de marzo de 2025 Publicación de datos de prueba - inicio de la competición: 30 de abril de 2025 Envío de resultados - finalización de la competición: 12 de mayo de 2025 Presentación del artículo de descripción del sistema: 30 de mayo de 2025 Organizadores de tareas Jorge Osés Grijalba - Graphext L. Alfonso Ureña-López - Universidad de Jaén Eugenio Martínez Cámara - University of Jaén Jose Camacho-Collados - Cardiff University CodaBench: https://www.codabench.org/competitions/5538/ Grupo de Google: CREAR POR JORGE -- Suelo trabajar a deshoras por lo que este correo puede haberte llegado fuera de tu horario laboral, y al cual puedes responder en el momento que mejor se ajuste a tus hábitos de trabajo. | I sometimes work at irregular times and this email might arrive out of working hours so please be assured that I respect your working pattern and look forward to your response when it suits you. [image: Universidad de Jaén] <https://www.ujaen.es/> Eugenio Martínez Cámara Vicepresidente de la SEPLN <http://www.sepln.org/> | Vice President of the SEPLN <http://www.sepln.org/en>. Profesor Titular de Universidad | Associate Professor. Investigador en Proc. del Lenguaje Natural | Postdoctoral Researcher in Natural Language Proc. Grupo de Investigación SINAI <http://sinai.ujaen.es/> | SINAI <http://sinai.ujaen.es/> Research Group. emcamara(a)ujaen.es Código ORCID:0000-0002-5279-8355 <http://orcid.org/0000-0002-5279-8355> Universidad de Jaén Dpto. de Informática | Computer Science Department. Edificio A3, despacho 145 | +34 953212883 <https://www.ujaen.es/servicios/sinformatica/sites/servicio_sinformatica/fil…> [image: Universidad de Jaén] <https://www.ujaen.es/> Este mensaje y los ficheros anexos son confidenciales dirigiéndose exclusivamente al destinatario mencionado en el encabezamiento. Los mismos contienen información reservada que no puede ser difundida. Si usted ha recibido este correo por error, tenga la amabilidad de eliminarlo de su sistema y avisar al remitente mediante reenvío a su dirección electrónica; no deberá copiar el mensaje ni divulgar su contenido a ninguna persona. Los datos personales facilitados por usted o por terceros serán tratados por UNIVERSIDAD DE JAÉN, con la finalidad de gestionar y mantener los contactos y relaciones que se produzcan como consecuencia de la relación que mantiene con UJA. Normalmente, la base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. El plazo de conservación de sus datos vendrá determinado por la relación que mantiene con nosotros. Para más información al respecto, o para ejercer sus derechos de acceso, rectificación, cancelación/supresión, oposición, limitación o portabilidad, dirija una comunicación por escrito a UNIVERSIDAD DE JAÉN, Campus Las Lagunillas s/n. 23071 – Jaén, o a nuestro delegado de protección de datos [ dpo(a)ujaen.es]. En caso de considerar vulnerado su derecho a la protección de datos personales, podrá interponer una reclamación ante el Consejo Andaluz de Transparencia y Protección de Datos (www.ctpdandalucia.es). Asimismo, es su responsabilidad comprobar que este mensaje o sus archivos adjuntos no contengan virus informáticos, y en caso que los tuvieran eliminarlos.

1 0

Call for participation: MT Marathon 2025 in Helsinki
by Tiedemann, Jörg 18 Mar '25

18 Mar '25

This is the first call for participation on the 18th MT Marathon that will take place in Helsinki on August 25-29, 2025. The eighteenth edition of the MT Marathon will be organized by the Language Technology Research group at the University of Helsinki, Finland, with sponsorship of EAMT. Each Machine Translation Marathon is a week-long gathering of machine translation researchers, developers, students and users featuring: - MT Lectures and Labs covering the basics and tutorials. - Keynote Talks from experienced researchers and practitioners. - Presentations of research and open source tools related to MT. - Hacking Projects to advance tools or research in one week or start new collaborations. Details can be found on the event page: https://blogs.helsinki.fi/language-technology/mt-marathon-2025/ ** Registration ** The registration is free of charge for EAMT members. To register, use the following link: https://forms.gle/uvrZuWpeSbcmJozK7. The registration form will remain open until the start of the event; however, please register as soon as possible if you plan to attend to help us with the planning. ** Programme ** The event will include a poster session, labs, and lessons from experts in the field, including: - Ayodele Awokoya, McPherson University, University of Ibadan, Masakhane, - Wilker Aziz, University of Amsterdam, - Marta Costa-Jussa, Meta AI, - Barry Haddow, University of Edinburgh, - Amit Moryossef, University of Zürich, - Sara Papi, FBK Trento, - Jörg Tiedemann, University of Helsinki, - Marco Turchi, Zoom, The programme is still under construction. For up to date information about invited speakers and the topics that will be covered by talks and labs, have a look at the event page here: https://blogs.helsinki.fi/language-technology/mt-marathon-2025/ The event will also include a poster session where participants will be invited to present their own work in machine translation. ** Call for project proposals ** As always, project topics will get finalized on the first day of the Marathon, but it was found useful in the past to announce and refine project proposals earlier. If you have an idea what you'd like to implement in a small team of fellow participants, or if you just want to peek at what is going to be proposed, have a look or edit the live document linked here: https://docs.google.com/document/d/1A4Iy_iOVvYHKAwnSV2ZGIPru7t-jeMauCQd6i9G… .

1 0

Second call for paper: BriGap-2, Bridges and Gaps between Formal and Computational Linguistics (an IWCS 2025 workshop)
by Timothée Bernard 18 Mar '25

18 Mar '25

Second call for paper: *BriGap-2, Bridges and Gaps between Formal and Computational Linguistics* (an IWCS 2025 workshop) (with our apologies for cross-posting) Venue: IWCS 2025 (https://iwcs2025.github.io/), Düsseldorf, Germany Date: *September 24th, 2025* (main conference: 22nd-23rd) Workshop website: https://brigap-workshop.github.io/ BriGap-2 is a venue for linguists and NLP scientists to meet: what fruitful interactions can we have? How do we build upon each other’s work? * Description * In recent years, the natural language processing (NLP) community has shifted its focus towards engineering questions. This state of affairs is in no small part due to the recent technical advances that have transformed NLP as a field. In the current large language model (LLM) era, much of what was deemed near impossible to achieve a few years prior is now taken for granted and it stands to reason that mapping how far ahead new computational models have advanced the field has become a central topic for the NLP community. Hence, the current ongoing discourse in NLP focuses more on what can be achieved through language rather than studying language for its own sake. It seems thus that computational and formal linguistics are now separate domains, and that the former is no longer rooted in the latter. To what extent are these traditions truly divorced, and what fruitful bridges can be (re)built? To answer these questions, the second iteration of the workshop on Bridges and Gaps between Formal and Computational Linguistics (BriGap-2) intends to provide a space for formal linguists, computational linguists, and NLP scientists to exchange their perspectives on how their different domains of research can build upon one another. * Workshop topics * - investigation of the linguistic properties of machine learning models, - linguistic representations, vector space semantics, and their relations with theoretical concepts such as compositionality, - use of information-theoretical and computational methods for linguistic inquiry, - formal distributional semantics and neural-symbolic integration for NLP, - formal grammars, symbolic structures and their applications for computational linguistics and NLP, - trends in the history of computational linguistics and NLP, - … * Invited speakers * - Anna ROGERS, IT University of Copenhagen - Kees VAN DEEMTER, Universiteit Utrecht * Submission details * The workshop accepts both archival (original and unpublished research) and non-archival (work-in-progress, dissemination of research published or accepted elsewhere, etc.) submissions in either short (up to 4 pages) or long (up to 8 pages) format. Camera-ready versions of papers will be given one additional page of content so that reviewers’ comments can be taken into account. Each submission should mention whether it targets archival or non-archival status. Archival papers accepted at BriGap-2 will be indexed in the ACL Anthology. Please use the ACL style templates available here: https://github.com/acl-org/acl-style-files The submissions need to be done in PDF format via OpenReview, using the following link: https://openreview.net/group?id=IWCS/2025/Workshop/BriGap-2 * Important dates * - Submission deadline:* Friday, June 6th 2025* - Notification of acceptance: Friday, August 1st 2025 - Workshop: *September 24th, 2025* (main conference: 22nd-23rd) * Contact * For questions, please send an email to brigapworkshop(a)gmail.com or contact one of the workshop chairs: - Timothée Bernard, Université Paris Cité, timothee.bernard(a)u-paris.fr - Timothee Mickus, University of Helsinki, timothee.mickus(a)helsinki.fi - Grégoire Winterstein, Université du Québec à Montréal, winterstein.gregoire(a)uqam.ca

1 0

HealTAC 2025 - keynotes announced and call for contributions (reminder)
by Bea Alex 18 Mar '25

18 Mar '25

---------------------------- HealTAC 2025 June 16-18th, 2025, Glasgow (UK) https://healtac2025.github.io/ ---------------------------- 1) Call for contributions – deadline 28 March 2) Keynotes, panels and workshop 3) Registration fees 4) Key dates ---------------------------- ---------------------------------------- Call for contributions - reminder ---------------------------------------- The 8th Healthcare Text Analytics Conference (HealTAC 2025) invites contributions that address any aspect of healthcare text analytics. We invite submissions in the form of extended abstracts that describe either methodological or application work that has not been previously presented in a conference. Submissions (up to 2 pages) should be prepared based on a template that is available at the conference web site. We also invite PhD and fellowship project submissions that describe ongoing PhD research (any stage) or a planned fellowship application. The conference will provide an opportunity to receive constructive feedback from a panel of experts. Deadline for all submissions is March 28th, 2025. As in previous years, there will be a post-conference call to submit a journal length paper for further peer review and publication in Frontiers in Digital Health. ---------------------------- Programme ---------------------------- We are delighted to announce keynotes by Dr Jason Fries from Stanford University and Dr Alison O'Neil from Canon Medical Research, and panels on "Opportunities and challenges in LLMs for health research: social inequalities, bias detection, and mitigation " and "Challenges in AI deployment within NHS" (industry forum). A pre-conference workshop on June 16th will focus on "NLP in mental healthcare and research" (https://healtac2025.github.io/workshop/). ---------------------------- Registration fees ---------------------------- Due to generous support from Health Data Research UK, CogStack, Frontiers, University of Glasgow, Research Data Scotland and Healtex, we will keep the registration fee low as before: an early registration fee for students is expected to be £100 and for others £200, and will include the full 3-day programme, lunches and the conference dinner. ---------------------------- Key dates ---------------------------- Deadline for all contributions: March 28th 2025 Notification of acceptance: April 18th 2025 Early-bird registration: by May 16th 2025 Pre-conference workshop: June 16th 2025 Conference: June 17-18th 2025 Follow the conference announcements on social media at #HEALTAC2025 We are looking forward to welcoming you to HealTAC 2025. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

1 0

CFP DepLing 2025, Ljubljana, August 26-29, deadline April 15
by Sylvain Kahane 18 Mar '25

18 Mar '25

DepLing 2025, Ljubljana, August 26-29 deadline April 15 We are pleased to announce the 8th International Workshop on Dependency Grammar (DepLing 2025) , which will bring together researchers interested in dependency-based approaches in linguistics and natural language processing. Dependencies, directed labeled graph structures representing hierarchical relations between morphemes, words or semantic units, have now become the standard representation of syntactic resources and NLP technologies. Depling has become the central event for people discussing the linguistic significance of these structures, their theoretical and formal foundations, their processing, and their use in NLP tools. The workshop is part of SyntaxFest 2025 and will be hosted by University of Ljubljana in Slovenia on August 26-29, 2025. Link to DepLing 2025: https://depling.org/depling2025/ Link to SyntaxFest 2025: https://syntaxfest.github.io/ ----------------------------- SELECTED TOPICS OF INTEREST ----------------------------- Topics include but are not limited to: The use of dependency structures in theoretical linguistics; a.o.: The use of syntactic trees to model syntactic relations; The use of semantic, valency-based or predicate-argument graph structures; The use of dependency-like structures to model semantic and pragmatic phenomena related to information structure; The use of dependency-like structures beyond the sentence (e.g., to model discourse phenomena); The elaboration of formal lexicons for dependency-based syntax and semantics, including descriptions of collocations and paradigmatic relations; The use of dependency in the field of linguistic universals, and typology. Historical and epistemological foundations of dependency grammar; a.o.: The definition of the very notion of dependency; The development and the use of dependency-based diagrams; Dependency grammar and its relation to other formalisms; The use of dependency-like concepts in the history of grammar and linguistics. The use of the dependency structures in corpus linguistics; a.o.: Corpus annotation and development of dependency-based treebanks and other linguistic resources of written and spoken texts; Recent advances in dependency-based parsing, and text generation; Cross-lingual dependency parser evaluation, with particular emphasis on intrinsic evaluation metrics. The relation between dependency-based grammar and other fields of science, such as, e.g., the psycholinguistic relevance of dependency grammar. ----------------------------- INVITED SPEAKER ----------------------------- Daniel Zeman, Inst. of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague ----------------------------- IMPORTANT DATES ----------------------------- * Paper submission deadline: 15 April 2025 * Notification of acceptance: 2 June 2025 * Camera-ready papers: 16 June 2025 * Early bird registration: June 2025 * Conference dates: 26 to 29 August 2025 ----------------------------- DepLing 2025 WORKSHOP CHAIRS ----------------------------- * Sylvain Kahane, Paris Nanterre University * Eva Hajičová, Charles University, Prague

2 1

QUECHUA to SPANISH SPEECH TRANSLATION at ACL 2025
by John Ortega 17 Mar '25

17 Mar '25

We need your help to preserve indigenous languages! Due to the overwhelming success of previous workshops like LoResMT, AmericasNLP, and IWSLT, we have decided to continue to push the needle for Quechua to Spanish translations another year. We ask that you kindly participate in the 2025 edition of the QUE-SPA speech translation shared task being held at ACL 2025. This low-resource task will help increase language preservation for low-resource languages. We invite advanced research and approaches of all types so bring your rule-based, statistical, neural, and more! IMPORTANT LINKS Dialectal and Low-resource webpage: https://iwslt.org/2025/low-resource Data webpage: https://github.com/Llamacha/IWSLT2025_Quechua_data Google Group: https://groups.google.com/g/iwslt-evaluation-campaign IWSLT conference webpage: https://iwslt.org/2025 HOW TO PARTICIPATE Please join the IWSLT Evaluation Campaign Google Group and access the registration using the following link: https://groups.google.com/g/iwslt-evaluation-campaign The QUE-SPA data set can be downloaded here: https://github.com/Llamacha/IWSLT2025_Quechua_data Task submissions can be uploaded to GitHub or emailed directly, please email the organizers below for more details. IMPORTANT DATES Apr 21, 2025 System description paper submission deadline May 15, 2025 Notification of acceptance June 1, 2025 Camera ready deadline July 31-Aug 1, 2025 IWSLT conference ORGANIZING COMMITTEE John E. Ortega (Northeastern University) j.ortega(a)northeastern.edu William Chen (Carnegie Mellon University) wc4(a)andrew.cmu.edu Rodolfo Zevallos (Universitat Pompeu Fabra) rodolfojoel.zevallos(a)upf.edu

1 0

QUECHUA to SPANISH SPEECH TRANSLATION at ACL 2025
by John Ortega 17 Mar '25

17 Mar '25

We need your help to preserve indigenous languages! Due to the overwhelming success of previous workshops like LoResMT, AmericasNLP, and IWSLT, we have decided to continue to push the needle for Quechua to Spanish translations another year. We ask that you kindly participate in the 2025 edition of the QUE-SPA speech translation shared task being held at ACL 2025. This low-resource task will help increase language preservation for low-resource languages. We invite advanced research and approaches of all types so bring your rule-based, statistical, neural, and more! IMPORTANT LINKS - Dialectal and Low-resource webpage: https://iwslt.org/2025/low-resource - Data webpage: https://github.com/Llamacha/IWSLT2025_Quechua_data - Google Group: https://groups.google.com/g/iwslt-evaluation-campaign - IWSLT conference webpage: <https://iwslt.org/2023/>https://iwslt.org/2025 HOW TO PARTICIPATE Please join the IWSLT Evaluation Campaign Google Group and access the registration using the following link: https://groups.google.com/g/iwslt-evaluation-campaign The QUE-SPA data set can be downloaded here: https://github.com/Llamacha/IWSLT2025_Quechua_data Task submissions can be uploaded to GitHub or emailed directly, please email the organizers below for more details. IMPORTANT DATES - Apr 21, 2025 System description paper submission deadline - May 15, 2025 Notification of acceptance - June 1, 2025 Camera ready deadline - July 31-Aug 1, 2025 IWSLT conference ORGANIZING COMMITTEE John E. Ortega (Northeastern University) j.ortega(a)northeastern.edu William Chen (Carnegie Mellon University) wc4(a)andrew.cmu.edu Rodolfo Zevallos (Universitat Pompeu Fabra) rodolfojoel.zevallos(a)upf.edu

1 0

1st Call for Papers: Second International Workshop on Construction Grammars and NLP (CxGs+NLP 2025)
by claire.n.bonial.civ＠army.mil 17 Mar '25

17 Mar '25

Second International Workshop on Construction Grammars and NLP (CxGs+NLP 2025) Call for Papers Please join the workshop’s Google Group for the latest updates and to post any questions you might have: https://groups.google.com/g/cxgsnlp-workshop Overview Constructionist approaches to language posit that all linguistic knowledge needed for language comprehension and production can be captured as a network of form-meaning mappings, called constructions. Construction Grammars (CxGs) do not distinguish between words and grammar rules, but allow for mappings between forms and meanings of arbitrary complexity and degree of abstraction. CxGs are thereby able to uniformly capture the compositional and non-compositional aspects of language use, making the theory particularly attractive to researchers in the field of Natural Language Processing (NLP). CxG theories, for example, can serve as a valuable ‘lens’ to assess and investigate the abilities of today’s large language models, which lack explicit, theoretically grounded linguistic insights. At the same time, techniques from the field of NLP are often employed for the further development and scaling of CxG theories and applications. This workshop aims to bring together researchers across theory and practice from the two complementary perspectives of Construction Grammar and NLP to explore how CxG approaches can both inform and benefit from NLP methods, with an emphasis on LLMs. Therefore, we invite original research papers from a broad spectrum of topics, including but not limited to: Contributions to Construction Grammar theory Construction Grammar Formalisms Computational Construction Grammar Implementations Natural Language Understanding (NLU) Opinion pieces on the interplay between Construction Grammar and NLP Constructions and Language Models (Mechanistic interpretability, probing (e.g., BERTology), and evaluation of LLMs) Resources: Constructicons and corpora annotated for Construction Grammar Construction Grammar learning and adaptation Applications at the intersection of Construction Grammar and NLP Invited Speakers Adele Goldberg, Professor of Psychology, Princeton University Thomas Hoffmann, Professor of English Language and Linguistics, Catholic University of Eichstätt-Ingolstadt Laura Michaelis, Professor of Linguistics, University of Colorado Boulder Venue The 2nd CxGs+NLP workshop will be co-located with the 16th International Conference on Computational Semantics (IWCS), organized by the Heinrich Heine University (HHU) in Düsseldorf, Germany. The workshop will be held on 24 September 2025. We are expecting the workshop to be in-person only, but are awaiting details on the possibility of a hybrid presentation option. Important Dates Jun 06: submission deadline Aug 01: notification of acceptance, registration opens Aug 22: camera-ready papers due Sep 22-23: IWCS main conference Sep 24: workshop Submission information Two types of submission are solicited: long papers and short papers. Long papers should describe original research and must not exceed 8 pages. Short papers (typically system or project descriptions, or ongoing research) must not exceed 4 pages. Acknowledgments, references, a limitations section (optional), an ethics statement (optional), and a technical appendix (optional, not subject to reviewing) do not count towards the page limit. Accepted papers get an extra page in the camera-ready version and will be published in the conference proceedings in the ACL Anthology. Additionally, non-archival publications will be considered for acceptance into the workshop as in-person poster presentations only. CxGs+NLP 2 papers should be formatted following the common two-column structure as used by IWCS 2021 (borrowed from ACL 2021). Please use these specific style-files or the Overleaf template. Style files: https://iwcs2021.github.io/download/iwcs2021-templates.zip Overleaf template: https://www.overleaf.com/latex/templates/instructions-for-iwcs-2021-proceed… Double submission policy: We will accept submissions that have been submitted elsewhere, but require that the authors notify us, including information on where else they are submitting and let us know if the work is accepted for publication elsewhere. Submission site TBA. Instructions for Double-Blind Review As reviewing will be double blind, papers must not include authors’ names and affiliations. Furthermore, self-references or links (such as github) that reveal the author’s identity, e.g., “We previously showed (Smith, 1991) …” must be avoided. Instead, use citations such as “Smith previously showed (Smith, 1991) …” Papers that do not conform to these requirements will be rejected without review. Papers should not refer, for further detail, to documents that are not available to the reviewers. For example, do not omit or redact important citation information to preserve anonymity. Instead, use third person or named reference to this work, as described above (“Smith showed” rather than “we showed”). If important citations are not available to reviewers (e.g., awaiting publication), these paper/s should be anonymised and included in the appendix. They can then be referenced from the submission without compromising anonymity. Papers may be accompanied by a resource (software and/or data) described in the paper, but these resources should also be anonymized. Workshop Chairs Claire Bonial (U.S. Army Research Lab) Harish Tayyar Madabushi (The University of Bath) Workshop Organizing Committee Melissa Torgbi (The University of Bath) Leonie Weissweiler (University of Texas at Austin) Austin Blodgett (U.S. Army Research Lab) Katrien Beuls (University of Namur,Belgium) Paul Van Eecke (Vrije Universiteit Brussel,Belgium) Contact: Please join the workshop’s Google Group for the latest updates and to post any questions you might have: https://groups.google.com/g/cxgsnlp-workshop

1 0

2026

2025

2024

2023

2022

Corpora March 2025