March 2024 - Corpora

3-year postdoctoral position in NLP (Evaluation of LLMs) - University of Oslo, Norway
by Yves Scherrer 14 Mar '24

14 Mar '24

We offer a 3-year postdoctoral position in NLP at the University of Oslo, Norway, on the topic "Evaluating large language models - model architectures, training regimes and data selection". The application deadline is April 14, 2024. This position is funded by the DSTrain program (https://www.uio.no/dscience/english/dstrain/). In the past years, (generative) large language models have become the core foundation models for a wide range of traditional NLP tasks, and they have also seen widespread adoption by the general public. At the same time, little is known about the specific training setups of commercial models, and some design decisions (in terms of model architecture, training regimes, and data selection) are based on traditions rather than empirical or theoretical considerations. Moreover, most current LLMs rely heavily on English training and evaluation data, and their performance on non-English languages remains difficult to assess. Potential candidates are expected to formulate their research project within the broad area of LLM evaluation. Examples of research topics are given below: - Compare fine-tuning external pre-trained LLMs with training language-specific LLMs from scratch. - Compare encoder-decoder LLMs with decoder-only LLMs. - Evaluate generative LLMs on various text generation tasks, such as summarization, simplification, text normalization. - Assess the multilingual (e.g. machine translation) and cross-lingual capabilities (cross-lingual transfer) of LLMs. - Investigate how closely related low-resource languages are best accommodated in LLMs. - Implement benchmarking datasets for LLM evaluation. Applicants are expected to submit a research project that fits in the proposed research theme (Evaluaing large language models). Prospective applicants are encouraged to discuss their application with the contact person (me) to explore scientific focus and cooperation possibilities. The application process for the DSTrain call is described here: https://www.uio.no/dscience/english/dstrain/guide-for-applicants/applicatio… This is the relevant research theme description: https://www.uio.no/dscience/english/dstrain/research-areas/informatics/eval… Please apply here: https://www.jobbnorge.no/en/available-jobs/job/255679/dstrain-msca-postdoct… Contact: Yves Scherrer, LTG, University of Oslo yves.scherrer(a)ifi.uio.no

1 0

[CFP] 2nd Call for Participation - CLEF 2024 SimpleText Task4: SOTA? "Tracking the State-of-the-Art in Scholarly Publications"
by Jennifer D'Souza 14 Mar '24

14 Mar '24

[apologies if you receive multiple copies of this call] Dear colleagues and friends, *We are pleased to release the 2nd Call for Participation - CLEF 2024 SimpleText Task4: SOTA?* *Overview:* SOTA? is introduced as Task 4 in the SimpleText track of CLEF 2024. The goal of the SOTA? shared task is to develop systems which given the full text of an AI paper, are capable of recognizing whether an incoming AI paper indeed reports model scores on benchmark datasets, and if so, to extract all pertinent (Task, Dataset, Metric, Score) quadruples presented within the paper. More info on the task website: https://sites.google.com/view/simpletext-sota/home SOTA? will be divided into two evaluation phases: - Evaluation Phase 1: Few-shot Testing; - Evaluation Phase 2: Zero-shot Testing *To participate in SOTA? i.e. SimpleText Task 4 @ CLEF 2024, please register your team*: 1. CLEF 2024 official registration page https://clef2024.imag.fr/index.php?page=Pages/registration.html 2. Codalab competition site: https://codalab.lisn.upsaclay.fr/competitions/16616 Note, SOTA? is organized as a new task this year under the "SimpleText - Improving Access to Scientific Texts for Everyone" initiative https://simpletext-project.com/. Please take a look at the other 3 tasks, i.e. Task 1, 2, and 3, offered by SimpleText and select one or more of those task options too if you are interested. Note that there is no interdependence of the dataset between "Task 4 - SOTA?" and the other three tasks of SimpleText. *Dates* Training and validation datasets available: Feb 1, 2024 March 13, 2024 Test data available/Evaluation starts: April 23, 2024 Evaluation ends: May 3, 2024 Participant paper submissions due: May 31, 2024 Notification to authors: June 24, 2024 Camera ready due: July 8, 2024 CLEF 2024 Workshop, Grenoble, France: 9-12 September 2024 *Task Organizers* Jennifer D’Souza (TIB Leibniz Information Centre for Science and Technology - Germany) Salomon Kabongo (L3S Research Center, Germany) Hamed Babaei Giglou (TIB Leibniz Information Centre for Science and Technology - Germany) Yue Zhang (Berlin Technical University, Germany) Sören Auer (TIB Leibniz Information Centre for Science and Technology - Germany) *We look forward to having you on board!* *Contact:* sota.task [at] gmail.com

1 0

[CfP] EXTENDED deadline! 8th International Conference 'Discourse Markers in Romance Languages'
by Amália Mendes 14 Mar '24

14 Mar '24

Extended deadline for abstract submission: 24 March 2024 The 8th International Conference 'Discourse Markers in Romance Languages' https://sites.google.com/view/disrom2024 Lisbon, Portugal, 19-21 June 2024 *Important Dates* 24 March 2024 New deadline for abstract submission ! 30 April 2024 Notification of acceptance 19-21 June 2024 Conference dates *Meeting Description* The Conference is one of a series of conferences on discourse markers in Romance languages (Madrid, 2010; Buenos Aires, 2011; Campinas, 2012; Heidelberg, 2015; Louvain-la- Neuve, 2017; Bergamo, 2019; Craiova 2022) and aims to build on the previous events, serving as a platform for internationally renowned linguists and young researchers alike to exchange views and ideas and to broaden their research perspectives. This Conference’s theme will deal specifically : 1. with interactions between DMs and their explicit/implicit context, overcoming the traditional divide between their textual and interpersonal functions; 2. with the subjective adjustment function of DMs. Researchers on discourse markers in Romance languages are invited to submit contributions on these topics, as well as on related subjects including (but not restricted to): - definition of the discourse marker category; - lexicons of discourse markers; - discourse markers and their relation to other pragmatic categories; - syntax-prosody-discourse interface; - sociolinguistic approaches to discourse markers; - variation of discourse markers across registers, languages and language varieties; - translation studies; - L1 and L2 acquisition of discourse markers; - diachronic studies; - experimental studies; - corpus-based and computational studies; - applied studies (business language, legal discourse, educational settings, etc.). *Submissions* The Conference will be on-site. Two presentation modalities will be possible: oral presentation and poster presentation. Abstracts should not exceed one page (single spacing, 12-point Times New Roman font, not including figures and references, and must be uploaded as pdf). Abstracts can be written in any Romance language or in English. They should be anonymous. They will be submitted via EasyChair (https://easychair.org/conferences/?conf=disrom2024). Authors must select the option oral presentation or poster presentation during the submission process on EasyChair. *Keynote Speakers (provisional list)* Denis Paillard (CNRS and Université Paris Diderot) Isabel Margarida Duarte (Universidade de Porto) *Workshop organizers (University of Lisbon)* - Pierre Lejeune - Marco Favaro - Fabrizio Macagno - Amália Mendes *Scientific Committee* Joanna Blochowiak (Université de Genève) Margarita Borreguero Zuloaga (Universidad Complutense de Madrid) Chloé Braud (University of Copenhague) Sorina Ciobanu (University of Iasi) Maria Antónia Diniz Caetano Coutinho (Universidade Nova de Lisboa) Maria Josep Cuenca (Universitat de València) Antonio Briz Gómez (Universitat de València) Conceição Carapinha (Universidade de Coimbra) Anna-Maria De Cesare (Universität Dresden) Iria da Cunha (Universidad Nacional de Educación a Distancia) Gaétane Dostie (Université de Sherbrooke) Oana Adriana Duta (University of Craiova) Chiara Fedriani (Università di Genova) Mar Garachana Camarero (Universitat de Barcelona) Chiara Ghezzi (Universitá di Bergamo) Sonia Gómez-Jordana (Universidad Complutense de Madrid) Pedro Gras (Université d’Anvers) Martin Hummel (Universität Graz) Julia Lavid Lopez (Universidad Complutense de Madrid) Diana Lewis (Université Aix-Marseille) Araceli López Serena (Universidad de Sevilla) José Pinto de Lima (Centro de Linguística da Universidade Nova de Lisboa) Maria Aldina Marques (Universidade do Minho) Piera Molinelli (Università di Bergamo) Silvia Murillo Ornat (Universidad de Zaragoza) Cornelia Plag (Universidade de Coimbra) Salvador Pons Bordería (Universitat de València) Cecilia Popescu (University of Craiova) Laurent Prévot (Université Aix-Marseille) Augusto Soares da Silva (Universidade Catolica Portuguesa) Laure Vieu (IRIT – Université de Toulouse III – Paul Sabatier) Jacqueline Visconti (Università di Genova) Sandrine Zufferey (Universität zu Bern)

1 0

[Call for Papers] 1st Workshop on Reliable Evaluation of LLMs for Factual Information (REAL-Info)
by Bjorn Ross 13 Mar '24

13 Mar '24

Call for Papers 1st Workshop on Reliable Evaluation of LLMs for Factual Information (REAL-Info) Co-located with ICWSM 2024, June 3, 2024, Buffalo, NY https://sites.google.com/view/real-info-2024 LLMs have achieved state-of-the-art performance in several textual inference tasks and are gaining popularity. There is a significant focus on their integration with web and online applications, including web search, thus allowing them to reach millions of users. LLMs can influence various information tasks in our everyday lives, ranging from personal content creation to education, financial advice, and mental health support (Augenstein, 2023). However, with their vast linguistic capabilities and opaque nature, LLMs can inadvertently generate or amplify false information. There is growing concern about the factuality of LLM-generated content and its potential adverse impact on our information ecosystem (Chen, 2023; Peskoff, 2023). Thus the need for reliable methods to assess the factuality of information is more critical than ever. This is where the synergy of AI, Natural Language Processing (NLP), and Human-Computer Interaction (HCI) becomes essential. AI and NLP techniques can be employed to analyze and identify the factuality of information through various tasks (Augenstein, 2023), such as fact-checking, stance detection, claim verification, and misinformation detection. These techniques can sift through the vast amounts of data to spot inconsistencies, biases, or inaccuracies that could indicate misinformation. Still, these approaches often use language models themselves, and epistemological questions arise when one LLM is fact-checked using another (or itself). Meanwhile, HCI plays a vital role in designing interactions and tools that enable humans to effectively oversee, interpret, and correct the outputs of LLMs. This human-in-the-loop approach ensures a critical evaluation and context-sensitive understanding of the factuality of information, which pure algorithmic methods might overlook. The combination of NLP's analytical capabilities and HCI's focus on human-centric design is instrumental in creating a digital ecosystem where LLMs can be utilized safely and responsibly, minimizing the risks of false information while maximizing their potential for user-centric applications. The goals of the 1st ICWSM workshop Reliable Evaluation of LLMs for Factual Information (REAL-Info) are to facilitate discussion around such new LLM evaluation approaches, metrics, and benchmarks for factuality assessment tasks within the community, to inform the scope, biases, and blindspots of LLMs. It will spark interdisciplinary conversations from academic and industry researchers in computational social sciences (CSS), natural language processing (NLP), human-computer interaction (HCI), data science, and social computing. The workshop will solicit, research, and position papers with novel ideas, including but not limited to: - New evaluation methods and metrics for evaluating LLM’s factuality considering diverse social context, e.g., source and domain of data, language, temporal generalization of information, or hallucination in generated/summarized content. - Human-centered design approaches to aid LLMs in detecting and mitigating false information, e.g., human experts in the loop, and variation in prompting. - New LLM-powered tools, methods, and applications for improving factuality assessment in social computing and computational social science. - Biases and blindspots of LLMs in factuality assessment, including approaches for error analysis and model diagnostics. - Limitations of existing benchmarks for tasks relevant to factuality assessment, e.g., claim verification, fact-checking, stance detection, and misinformation detection. - Improve datasets and evaluation quality, e.g., avoidance of selection bias, addressing subjective judgments and biases in crowd-sourced annotation. - Comparative evaluation and implications of open source and commercial LLMs for tasks relevant to factuality assessment. - How does the reliability and factuality of LLM impact users (e.g. journalists, software engineers, artists..) and communities? Submission instructions can be found on the workshop website. The workshop will take place as a half-day meeting in June. Authors of accepted papers will have the opportunity to publish their papers through workshop proceedings by the AAAI Press. Timeline - Workshop Papers Submission deadline: March 24, 2024 - Notifications: April 14, 2024 - Final Camera-Ready Paper Due: May 5, 2024 - ICWSM-2024 Workshops Day: June 3, 2024 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

1 0

Invitation to the debate "Artificial Intelligence and the future of the Portuguese Language" on March 15 at PROPOR 2024
by Antonio Branco 13 Mar '24

13 Mar '24

_*INVITATION*_ We kindly invite you to the debate on *Artificial Intelligence and the Future of the Portuguese Language* which will take place on March 15, 2024, from 10h30 to 12h00 (Lisbon time) as part of PROPOR 2024 - 16th International Conference on the Computational Processing of the Portuguese Language. As a plenary session of this conference, it will count on the contributions from the researchers who are experts in this field gathering here. It will also include contributions from guests who are experts in the field of public policies for language promotion and who will help launch the debate: *Ana Paula Laborinho* Former President of the Camões Institute for Cooperation and Language, current Director in Portugal of the OEI Organization of Ibero-American States, and professor at the University of Lisbon, Faculty of Letters *Claudio Pinhanez* Deputy Director of the C4AI Artificial Intelligence Center in São Paulo, and principal investigator at IBM Research, Brazil *Ismael Gómez García* Director of the OEI's Global Digital Strategy *Valentín García* Secretary General for Language Policy, Galicia Regional Government *António Branco**(moderator)* Honorary President of the ELRA Language Resources Association, Director General of PORTULAN CLARIN Research Infrastructure for the Science and and Technology of Language, and Professor at the University of Lisbon, Faculty of Sciences Information about this debate can be found here https://propor2024.citius.gal/index.php/discussion-panel/ where in due course, it will be made available the way to participate. ++++++++++++++++++++++++++++++++++++++++ _*BACKGROUND*_ For about a year now, it's been a rare day when we don't come across news, comments, opinions, interviews, debates, podcasts, prognoses, plans, panics, condemnations, glorifications, warnings, regulations, fears and hopes about Artificial Intelligence. We are living the privilege, rare in human history, to find ourselves facing the unprecedented promises and challenges of a civilizational transformation induced by a technological shock of a scope never before experienced. This scientific and social tsunami has its origins in what for decades has been considered the subarea of AI with the most difficult and challenging interdisciplinarity. Also known as natural language processing, computational linguistics, computational language processing, etc., language technology deals the most distinctively human cognitive capacity. No area of human activity will be immune to this technological shock. Even less so will the very object of its scientific inquiry, natural languages. It is opportune to hold a debate on AI and the Portuguese language by the scientists themselves, and inverting the perspective of passive analysis to that of building an active contribution: What is the impact on the future of the the Portuguese language and on citizenship and sovereignty in the age of artificial intelligence? What is the impact on public policies promoting language and how should they be rethought and reconfigured? What is the impact on public policies promoting science and technology and how should their priorities be rethought and reconfigured? What is the role of international cooperation, given that the Portuguese is a multicentric language with global projection? What should we learn from the responses are being advanced in other geographies and for other languages? etc The scientific community dedicated to research into the Portuguese language technology has been meeting every other year for 30 years, alternately in Portugal and Brazil, at the PROPOR international conference, which will be held again soon, between March 13 and 15, 2024, the first time it will be held in another geopgraphy: https://propor2024.citius.gal With the help of guest speakers who are experts in the fields of language promotion and international cooperation, scientific researchers in this field will try to open up this reflection and contribute to finding answers to these questions in a debate that will take place on March 15, 2024 between 10h30 and 12h00 (Lisbon time). Information on this debate can be found here https://propor2024.citius.gal/index.php/discussion-panel/ where in due course, it will be made available the way to participate remotely, as technical conditions allow. The debate will take place in Portuguese.

1 0

job opening: professorship (W2) in ML (possibly NLP) at HHU Düsseldorf, Germany
by Laura Kallmeyer 13 Mar '24

13 Mar '24

The Faculty of Mathematics and Natural Sciences at Heinrich Heine University Düsseldorf is inviting applications for the position of a full professorship (W2) for Machine Learning at the Department of Computer Science to be filled as soon as possible. Ideally, candidates should have an outstanding expertise in the field of Machine Learning, particularly in modern machine learning techniques (e.g., large language models and deep learning architectures such as transformers and related sequence models) and are willing to contribute to collaborative projects, especially in the field of Natural Language Processing (NLP). Application deadline 17 April 2024. For more information see https://berufungsportal.hhu.de/VAADIN/dynamic/resource/2/96c63060-a332-4561… -- Prof. Dr. Laura Kallmeyer Institut für Linguistik Heinrich-Heine Universität Duesseldorf Universitaetsstr. 1 D-40225 Duesseldorf, Germany https://user.phil.hhu.de/kallmeyer/ Phone +49 (0)211 8113899

1 1

Call for PC Members for ECAI-2024
by Ulle Endriss 13 Mar '24

13 Mar '24

We are reaching out to the research community to ask for volunteers to join the programme committee of ECAI-2024, the 27th European Conference on Artificial Intelligence. We are looking for volunteers who have completed their PhD, who have published at good AI conferences in the past, and who have prior experience with reviewing. If you're interested, please sign up here: https://forms.gle/24AxSkGdKv57cYqSA Thanks a lot for your support! Ulle Endriss and Francisco Melo ECAI-2024 PC Chairs

1 0

CfP: Humor and Artificial Intelligence at ISHS/HRC 2024
by Tristan Miller 13 Mar '24

13 Mar '24

Humor and Artificial Intelligence Track ======================================= 34th International Society for Humor Studies Conference (ISHS 2024) and 14th Humor Research Conference (HRC) Hosted online by Texas A&M University-Commerce, April 19 to 21, 2024 https://tamuc.edu/humor ABSTRACT SUBMISSION DEADLINE: MARCH 25, 2024 Call for papers --------------- As in previous years, the Humor and AI Special Interest Group <https://humorstudies.org/Forum/forumdisplay.php?fid=9> of the International Society for Humor Studies will hold a panel at the 34th International Society for Humor Studies Conference (ISHS 2024). This year's conference is being held concurrently with the 14th Humor Research Conference (HRC), hosted online by Texas A&M University-Commerce from April 19 to 21, 2024. We invite 20-minute presentations on AI-based technology for generating, processing, or analyzing humor, for our dedicated panel that kicks off ISHS's 2024 webinar series: <https://humorstudies.org/WebinarCenter2024.htm> Application areas include, but are not limited to: * human–computer interaction * computer-mediated communication * intelligent writing assistants * conversational agents * machine and computer-assisted translation * digital humanities * natural language processing * computer vision Abstracts of 250 words, excluding references, should be submitted by e-mail to the conveners by March 25, 2024. Conveners --------- Kiki Hempelmann, Texas A&M University-Commerce <kiki(a)tamuc.edu> Tristan Miller, University of Manitoba <Tristan.Miller(a)umanitoba.ca> Julia M. Rayz, Purdue University <jtaylor1(a)purdue.edu> -- Dr. Tristan Miller, Assistant Professor Department of Computer Science, University of Manitoba https://logological.org/ | Tel. +1 204 474 6792

1 0

[CfP] EXTENDED Deadline! TermTrends @ MDTT24: Models and Best Practices for Terminology Representation in the Semantic Web
by Patricia Martín Chozas 12 Mar '24

12 Mar '24

*Apologies for crossposting* TermTrends24: Models and Best Practices for Terminology Representation in the Semantic Web Workshop colocated with MDTT 2024 <https://mdtt2024.dei.unipd.it/en/> Date: 26th June, 2024 Venue: Granada, Spain More info: https://termtrends.linkeddata.es/ *15 March 2024 7th April (Extended): Deadline for paper submission* *About TermTrends*TermTrends 2024, co-located with MDTT 2024 aims to provide a discussion forum on the theoretical and methodological approaches for the representation of terminological data, both at a conceptual and a linguistic level. In particular, we would like to focus on their connection to the Linguistic Linked (Open) Data (LLOD) paradigm through the representation of these data according to Semantic Web formats. By adopting models or vocabularies proposed for the representation of linguistic data, we would contribute to the creation of interoperable and reusable terminological resources. With this objective, the workshop intends to explore the advantages and challenges underlying various Terminology-related standardisation approaches, ranging from the initially proposed standards to represent terminology within the International Standardisation Organisation (ISO), such as the TermBase eXchange (TBX) format, to models that represent linguistic descriptions associated with ontologies in the Semantic Web, such as SKOS and Ontolex-lemon. Being multidisciplinary in scope, it focuses on identifying terminological representation needs, as well as limitations of current models in addressing such needs, with the aim of also exploring the development of an extension of the Ontolex-lemon vocabulary and how that may contribute to overcoming such challenges. *Call for Papers*The topics of interest for this workshop include, but are not limited to, the following topics: - Terminology Representation Standards - Terminology as Linguistic Linked (Open) Data - Interoperability of Terminological Resources - Reusability of Terminological Resources - Challenges in Terminology Representation - Analysis of the structure of Terminological Resources *Submissions* Papers proposals should follow the CEUR template. Short and long papers will be accepted. Following CEUR guidelines, short papers should be 5-6 pages long and long papers 8-10 pages long. Authors must submit their papers through the EasyChair platform following this link. *Important Dates*15 March 2024* 7th April (Extended) *- Deadline for paper submission *20 April 2024* - Deadline for notification for paper submission *15 May 2024* - Deadline for camera-ready paper submission *26 June 2024 *- TermTrends Workshop *Workshop Organisers* Rute Costa, NOVA FCSH / NOVA CLUNL (Portugal) Elena Montiel-Ponsoda, Universidad Politécnica de Madrid (Spain) Sara Carvalho, Univ. de Aveiro / NOVA CLUNL (Portugal) Patricia Martín-Chozas, Universidad Politécnica de Madrid (Spain) Federica Vezzani, University of Padova (Italy) *Patricia Martín Chozas - Postdoctoral Researcher* * Ontology Engineering Group* Artificial Intelligence Department ETSI Informáticos - Universidad Politécnica de Madrid Phone: (+34) 910673091

1 0

A postdoc position in NLP at the University of Tartu, Estonia
by Kairit Sirts 12 Mar '24

12 Mar '24

Dear all, Applications are invited for a postdoctoral fellowship with the TartuNLP lab in the Institute of Computer Science at the University of Tartu. The funding for the position is provided by the recently established Estonian Centre of Excellence in AI (EXAI), which gathers various research teams from several Estonian research institutions. The successful candidate will work with Kairit Sirts on AI-related projects at the interface between natural language processing (mainly using large language model technology) and psychology with the goal of developing chatbots for supporting mental health. The candidate will also participate with their NLP expertise in collaborative projects with other teams that are part of the EXAI. The suitable candidate has a PhD in natural language processing, artificial intelligence, computer science, or other relevant discipline. They should have a good research and publication track record in NLP. Interest in psychology or mental health related topics is a bonus. Employer: Institute of Computer Science, University of Tartu Title: Researcher in natural language processing Speciality: Natural language processing Location: Tartu, Estonia Deadline: 1 April, 2024 More info about the job offer, application process and the requirements: https://ut.ee/en/job-offer/research-fellow-natural-language-processing-0 All questions related to the position should be sent to me (kairit.sirts(a)ut.ee). Best regards Kairit Sirts

1 0

2026

2025

2024

2023

2022

Corpora March 2024