December 2023 - Corpora

7th workshop CASE 2024 @ EACL: Deadline extension & shared tasks (with a new one with awards)
by ali hürriyetoglu 20 Dec '23

20 Dec '23

Dear all, We are extending the regular paper submission deadline of the workshop CASE 2024 @ EACL to Jan 4, 2024 (AoE) [1]. Please pay attention to the shared tasks organized in the scope of CASE 2024. The new ones are: *1- Climate Activism Stance and Hate Event Detection Shared Task at CASE 2024* Hate speech detection and stance detection are some of the most important aspects of event identification during climate change activism events. In the case of hate speech detection, the event is the occurrence of hate speech, the entity is the target of the hate speech, and the relationship is the connection between the two. The hate speech event has targets to which hate is directed. Identification of targets is an important task within hate speech event detection. Additionally, stance event detection is an important part of assessing the dynamics of protests and activisms for climate change. This helps to understand whether the activist movements and protests are being supported or opposed. This task will have three subtasks (i) Hate speech identification (ii) Targets of Hate Speech Identification (iii) Stance Detection. Codalab Link: https://codalab.lisn.upsaclay.fr/competitions/16206 Registration: In order to register for the shared task, please send a request in codalab. The organizers will approve requests on a daily basis. GitHub Page: https://github.com/therealthapa/case2024-climate *2- Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) (7k Euros total award)* Hate speech, which targets groups based on characteristics such as ethnicity, nationality, religion, colour, gender, and sexual orientation, is a significant problem on social media platforms. Automated detection of such content is crucial for effective content moderation and minimising societal harm, and can also be used in socio-political event analysis. Following the SIU2023-NST competition organized to benchmark progress in Turkish hate speech detection and classification, we are organizing a new shared task in conjunction with CASE @ EACL 2024. This shared task focuses on tackling the challenge of identifying hate speech in tweets in Turkish and Arabic languages. The task is divided into two subtasks: A) Hate Speech Detection in Turkish across Various Contexts; B) Hate Speech Detection with Limited Data in Arabic More details can be found at: https://github.com/boun-tabi/case-2024-hsd-2lang/ Please check the website for detailed information and contact us for anything you think we can help. Best wishes, Ali [1] https://emw.ku.edu.tr/case-2024/, The 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text

1 0

The 4th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP): First Call for Papers
by Luis Chiruzzo - Inco 20 Dec '23

20 Dec '23

The 4th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP 2024) First Call for Papers The 4th Workshop on NLP for Indigenous Languages of the Americas (AmericasNLP) will be co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024 <https://2024.naacl.org/>), which is scheduled to be held in Mexico City, Mexico, between June 16-21, 2024. The goal of the workshop is to encourage and increase the visibility of work on the Indigenous languages of the Americas. It aims to encourage research on NLP, computational linguistics, corpus linguistics and speech for Indigenous languages, to connect researchers and professionals from underrepresented communities and native speakers of endangered languages with the ACL community, and, more generally, to promote machine learning approaches suitable for low-resource languages. We invite the submission of - Long papers (8 pages) and short papers (4 pages) on substantial, original, and unpublished research - Non-archival extended abstracts (2 pages), technical reports (8 pages), and work which has been presented at other venues (in the format of the original publication) Submissions do not need to describe work on native languages directly, as long as it is clear why those can benefit from the described approaches. Areas of interest include but are not limited to: - Creation of datasets for NLP applications - Incorporation of external knowledge into neural systems - Linguistic typology and the use of typological features for NLP - Transfer learning, meta-learning, and active learning - Weakly supervised, semi-supervised, and unsupervised learning - Machine translation of low-resource languages - Morphology and phonology of low-resource languages - NLP applications for Indigenous languages of the Americas Important dates: - Start of the anonymity period: February 10, 2024 - Submission deadline: March 10, 2024 - Notification of acceptance: April 14, 2024 - Camera ready papers due: April 24, 2024 - Workshop: June 20 or 21, 2024 All deadlines are 11.59 pm UTC -12h (anywhere on earth). Link to submission portal: https://softconf.com/naacl2024/americasnlp The workshop also includes: - A machine translation shared task on truly low-resource languages - A shared task on morphological adaptation to generate educational examples We also have a diverse set of invited speakers, focused on bridging the gap between linguists, NLP, and machine learning research! - Graham Neubig (multilingual NLP and ML research) - Jaime Pérez González (linguistics research on critically endangered South American languages; field linguistics) Organizing Committee - Manuel Mager, AWS AI Labs - Abteen Ebrahimi, University of Colorado Boulder - Shruti Rijhwani, Google DeepMind - Arturo Oncevay, JP Morgan AI Research - Luis Chiruzzo, Universidad de la República, Uruguay - Robert Pugh, Indiana University, Bloomington - Katharina von der Wense, University of Colorado Boulder and Johannes Gutenberg University Mainz More information and contact information can be found at http://turing.iimas.unam.mx/americasnlp/.

1 0

18 month postdoc position in Natural Language Processing at the University of Copenhagen
by Desmond Elliott 20 Dec '23

20 Dec '23

The Natural Language Processing Section at the Department of Computer Science at University of Copenhagen is advertising an 18 month position for a Postdoctoral Researcher in Natural Language Processing. The position is funded by the European Union Horizon Europe project to Democratize Trustworthy and Efficient Large Language Model Technology for Europe. The overall goal of the project is to develop European large language models (LLMs) on an unprecedented scale, trained on the largest amount of text so far in European AI, covering a range of underrepresented languages, and pushing the limits of European exascale computing. The successful candidate will join a team developing hybrid token-pixel language models and retrieval-augmented language models. The project team includes a consortium of researchers across Europe, and, locally, the co-investigator, a postdoctoral researcher, and one Ph.D student. Further information about the project is available at https://cordis.europa.eu/project/id/101135671. The successful candidate will join the Language and Multimodal Processing group, which is part of a section with a strong, international, and diverse environment for research within core as well as emerging topics in natural language processing, natural language understanding, computational linguistics and multi-modal language processing. It is housed within the main Science Campus, which is centrally located in Copenhagen. Further information about the group is available here: https://lampgroup.github.io/ and further information about research at the Department is available here: https://di.ku.dk/english/research/. The application deadline is 31 January 2024, with a start date of 1 April 2024, or as soon as possible thereafter. Further information about the position can be found here: https://employment.ku.dk/faculty/?show=160726 Informal enquiries about the positions can be made to the co-investigator Desmond Elliott, Department of Computer Science, University of Copenhagen, e-mail: de(a)di.ku.dk.

1 0

CALL FOR PAPERS: 9th Symposium on Corpus Approaches to Lexicogrammar (LxGr2024)
by Costas Gabrielatos 20 Dec '23

20 Dec '23

9th Symposium on Corpus Approaches to Lexicogrammar (LxGr2024) CALL FOR PAPERS Deadline for abstract submission: Friday 15 March 2024 The symposium will take place online on Friday 5 and Saturday 6 July 2024. LxGr primarily welcomes papers reporting on corpus-based research on any aspect of the interaction of lexis and grammar - particularly studies that interrogate the system lexicogrammatically to get lexicogrammatical answers. However, position papers discussing theoretical or methodological issues are also welcome, as long as they are relevant to both lexicogrammar and corpus linguistics. If you would like to present, send an abstract of 500 words (excluding references) to lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>. Make sure that the abstract clearly specifies the research focus (research questions or hypotheses), the corpus, the methodology (techniques and metrics), the theoretical orientation, and the main findings. Abstracts will be double-blind reviewed, and decisions will be communicated within four weeks. Full papers will be allocated 35 minutes (including 10 minutes for discussion). Work-in-progress reports will be allocated 20 minutes (including 5 minutes for discussion). There will be no parallel sessions. Participation is free. For details, visit the LxGr website: https://sites.edgehill.ac.uk/lxgr/lxgr2024 If you have any questions, contact lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>. ________________________________ Edge Hill University<http://ehu.ac.uk/home/emailfooter> Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter> University of the Year, Educate North 2021/21 ________________________________ This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>

1 0

iCASE EPSRC funded PhD- multimodal NLP - University of Manchester & BAE
by Sophia Ananiadou 20 Dec '23

20 Dec '23

We are seeking an enthusiastic PhD candidate to work in multimodal NLP for model-based systems engineering. Details This project is funded by the EPSRC iCASE (sponsored by BAE Systems) to conduct research in the area of multimodal natural language processing (NLP) for model-based systems engineering based on Large Language Models (LLMs). LLMs have demonstrated a remarkable ability to generate text when presented with images, text, audio and video as input. They are able to achieve higher performance than traditional neural methods and pre-trained language models, without the need for supervised training. The project will examine different approaches to multimodal LLM-based NLP to address complex and fine-grained tasks such as reasoning in model-based systems engineering. The PhD will delve into LLM architectures, data augmentation methods, multi-task and domain-specific LLMs, prompting engineering and interpretability. The candidate will have the opportunity to work with experts at BAE to gain experience in the practical application of model-based systems engineering. The candidate will join the world-class teams of Prof. S. Ananiadou (Computer Science and National Centre for Text Mining, Natural Language Processing, LLMs) and Prof. H. Yin (Electrical and Electronic Engineering, Deep Learning, Computer Vision). Requirements You will have a very good undergraduate degree in Computer Science (minimum 2:1 UK or equivalent for EU students). Experience and knowledge of NLP, multimodal LLMs, Ontologies, Semantic Web, Computer Aided Engineering (CAE) and Model-Based Systems Engineering (MBSE) tools and technology will be considered as an advantage. The successful candidate must be capable of obtaining UK security clearance to fulfil any onsite industrial placement at the location of the host site. Research Environment in Host Institution The Department of Computer Science at the University of Manchester (UoM) is in the unique position of hosting the National Centre for Text Mining (NaCTeM), the first publicly funded centre for text mining in the world, focusing on fundamental research in Natural Language Processing (LLMs, interpretability, information extraction) in a variety of domains. Besides NaCTeM, academic expertise in AI is spread across a number of other institutes including the Institute for Data Science and AI (IDSAI), the Centre for AI Fundamentals and partnerships with the Alan Turing Institute and the European Laboratory for Learning and Intelligent Systems (ELLIS). BAE Systems BAE Systems provides some of the world's most advanced, technology-led defence, aerospace and security solutions. They employ a skilled workforce of more than 93,000 people in around 40 countries. Working with customers and local partners, they develop, engineer, manufacture, and support products and systems to deliver military capability, protect national security and people, and keep critical information and infrastructure secure. Before you apply We strongly recommend that you contact the supervisors of this project prior to application. How to apply To be considered for this project, you will need to complete a formal application through our online application portal<https://www.findaphd.com/common/clickCount.aspx?theid=167305&type=199&DID=1…> by the 26th of January 2024. When applying, you will need to specify the full name of this project, the name of your supervisor, how you are planning to fund your research, details of your previous studies, and the names and contact details of two referees. Please also send the following to Prof. Sophia Ananiadou (Sophia.ananiadou(a)manchester.ac.uk<mailto:Sophia.ananiadou@manchester.ac.uk>) and Prof. Hujun Yin (hujun.yin(a)manchester.ac.uk<mailto:hujun.yin@manchester.ac.uk>): * cover letter and full CV * Full degree transcripts and relevant certificates Candidates will be shortlisted by a panel comprising members of UoM and BAE Systems. Selected candidates will be invited to give a presentation followed by a formal interview. The interviews will be held during the week of 29th January 2024. Your application will not be processed unless all of the required documents are submitted at the time of application, and we cannot accept responsibility for late or missed deadlines. Incomplete applications will not be considered. If you have any questions about making an application, please contact our admissions team by emailing FSE.doctoralacademy.admissions(a)manchester.ac.uk<mailto:FSE.doctoralacademy.admissions@manchester.ac.uk>. Equality, diversity and inclusion<https://www.findaphd.com/common/clickCount.aspx?theid=167305&type=199&DID=1…> is fundamental to the success of The University of Manchester, and is at the heart of all of our activities. We know that diversity strengthens our research community, leading to enhanced research creativity, productivity and quality, and societal and economic impact. We actively encourage applicants from diverse career paths and backgrounds and from all sections of the community, regardless of age, disability, ethnicity, gender, gender expression, sexual orientation and transgender status. We also support applications from those returning from a career break or other roles. We will consider offering flexible study arrangements (including part-time: 50%, 60% or 80%, depending on the project/funder). Funding Notes This project is funded through EPSRC iCASE (with BAE Systems). The project will pay the tuition fees and provide a tax free stipend set at the UKRI rate (£18,622). We are able to offer a limited number of studentships to applicants outside the UK. Therefore, full studentships will only be awarded to exceptional quality candidates, due to the competitive nature of this scheme. Additional research funds will be available. ---------- Professor Sophia Ananiadou Department of Computer Science Director, National Centre for Text Mining Deputy Director, Institute for Data Science and Artificial Intelligence Turing Fellow The University of Manchester

1 0

CfP: The 9th Workshop on Linked Data in Linguistics (LDL) - LREC-COLING, May 25th 2024
by Max Ionov 20 Dec '23

20 Dec '23

(apologies for cross-posting) The 9th Workshop on Linked Data in Linguistics: Resources, Applications, Best Practices Workshop colocated with *LREC-COLING 2024*, *Date*: May 25, 2024 *Venue*: Torino, Italy and online For up to date info, check: https://ldl2024.linguistic-lod.org/ Call for Papers The Linked Data in Linguistics (LDL) workshop series has established itself as the premier venue for discussing the application of Semantic Web technologies to the fields of linguistics, digital lexicography, and digital humanities (DH). While recent years have witnessed a steady growth in adoption of the technology in these areas, its uptake in other relevant domains, most notably in the case of natural language processing (NLP), continues to lag behind. This year, aside from embracing the full bandwidth of applications of LLOD technologies and the closely related area of knowledge graphs in linguistics, we welcome contributions addressing the application of LLOD technologies to NLP applications, as well as those dealing with emerging hot topics of future bridges between structured (linguistic) knowledge and neural methods. In addition, this year’s edition of the workshop will be a venue for in-depth discussions on community standards and best practices, and, above all, those related to the work of the W3C community groups OntoLex <https://www.w3.org/community/ontolex/> [1], LD4LT <https://www.w3.org/community/ld4lt/> [2] and BPMLOD <https://www.w3.org/community/bpmlod/> [3]. To this end, it will include featured talks on the latest achievements, developments, and perspectives of these W3C Community Groups. [1] Ontology-Lexica Community Group [2] Linked Data in Language Technology Community Group [3] Best Practices in Multilingual Linked Open Data * Topics of interest * We invite presentations of algorithms, methodologies, experiments, tools, use cases, descriptions of ongoing or planned research projects as well as position papers that describe the creation, publication or application of linked linguistic data collections and their linking with other resources. Descriptions of such data, and in particular, its uses in research (linguistics, lexicology, digital humanities) and technology (NLP, e-lexicography, localization) are also welcome. The following is a non-exhaustive list of relevant topics: 1. Building, managing and linking language resources - Lexicons and Lexical Data, including Dictionaries and Lexicographic Resources - Annotations and Annotated Corpora - Entity Linking 2. Technologies, challenges and best practices for language technology and language resources on the web: - Interoperability - Sustainability - FAIRness 3. Structured data in language technology: - Knowledge Graphs - Machine Learning - Multilingual Technologies - Language Knowledge Injection in LLMs 4. Show cases, case studies and applications by different communities of practice: - Multimodality - Corpus Linguistics - Lexicography - Digital Humanities 5. Current directions and critical reflection. Position papers on: - Ethical, legal, technological aspects of structured data in the age of LLMs - The role of LLOD in promoting low-resource languages - Extensions of RDF and graph formalisms We invite both long (8 pages and 2 pages of references) and short papers (4 pages and 2 pages of references) representing original research, innovative approaches and resource descriptions. Short papers may also represent project descriptions. These do not have to be implemented but discuss to what extent and for which purposes Linguistic Linked Open Data is reused or created. Projects that are still in their early stages and seek advice from the broader Linguistic Linked Data community are welcome, especially if they include underrepresented fields of study. Papers should be formatted according to the LREC-COLING guidelines, please see https://lrec-coling-2024.org/authors-kit/. Please note that the review process will be *single-blind*. * Identify, Describe and Share your LRs! * When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC-COLING authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones). * Important Dates * - Submission Date: February 23, 2024 - Notification of Acceptance: March 22, 2024 - Camera-Ready: April 5, 2024 - Workshop: May 25, 2024 * Workshop Organizers * - Christian Chiarcos (University of Augsburg, Germany) - Katerina Gkirtzou (Athena Research Center, Greece) - Maxim Ionov (University of Cologne, Germany) - Fahad Khan (Consiglio Nazionale delle Ricerche, Italy) - John P. McCrae, (University of Galway, Ireland) - Elena Montiel Ponsoda (Universidad Politécnica de Madrid, Spain) - Patricia Martín Chozas (Universidad Politécnica de Madrid, Spain) Please get in contact via ldl2024(a)linguistic-lod.org. * Program Committee * - Sina Ahmadi (George Mason University, USA) - Verginica Barbu Mititelu (Research Institute for Artificial Intelligence of the Romanian Academy, Romania) - Paul Buitelaar (Insight, Ireland) - Sara Carvalho (University of Aveiro, Portugal) - Rute Costa (NOVA FCSH/NOVA CLUNL, Portugal) - Milan Dojchinovski (Czech Technical University, Czech Republic) - Agata Filipowska (Uniwersytet Ekonomiczny w Poznaniu, Poland) - Francesca Frontini (CNR-ILC, Italy) - Frances Gillis Webber (University of Cape Town, South Africa) - Voula Giouli (Athena Research Center, Greece) - Dagmar Gromann (University of Vienna, Austria) - Yoshihiko Hayashi (Waseda University, Japan) - Alik Kirillovich (Higher School of Economics, Russia) - Penny Labropoulou (Athena Research Center, Greece) - Chaya Liebeskind (Jerusalem College of Technology, Israel) - David Lindemann (University of the Basque Country, Spain) - Francesco Mambrini (Università Cattolica del Sacro Cuore, Italy) - Monica Monachini (CNR-ILC, Italy) - Diego Moussallem (Paderborn University, Germany) - Roberto Navigli (“La Sapienza” Università di Roma, Italy) - Petya Osenova (IICT-BAS, Bulgaria) - Ana Ostroški Anić (Institute of Croatian Language and Linguistics, Croatia) - Giulia Pedonese (CNR-ILC, Italy) - Sigita Rackevičienė (Mykolas Romeris University, Lithuania) - Felix Sasaki (SAP, Germany) - Andrea Schalley (Karlstad University, Sweden) - Gilles Sérasset (University Grenoble Alpes, France) - Milena Slavcheva (IICT-BAS, Bulgaria) - Blerina Spahiu (Bicocca University, Italy) - Ranka Stanković (University of Belgrade, Serbia) - Armando Stellato (University of Rome, Italy) - Federica Vezzani (University of Padua, Italy)

1 0

Fully funded PhD Opportunities in AI (NLP) for Science at the Bredesen Center, University of Tennessee Knoxville and ORNL
by Tirthankar Ghosal 20 Dec '23

20 Dec '23

****Fully funded PhD Opportunities in AI (NLP) for Science at the Bredesen Center, University of Tennessee Knoxville and ORNL*** * The Bredesen Center brings together the extensive scientific resources of the University of Tennessee (UT) and Oak Ridge National Laboratory (ORNL). We are looking for candidates interested in working on #nlp <https://www.linkedin.com/feed/hashtag/?keywords=nlp&highlightedUpdateUrns=u…> for #scientificdiscovery <https://www.linkedin.com/feed/hashtag/?keywords=scientificdiscovery&highlig…> #aiforscience <https://www.linkedin.com/feed/hashtag/?keywords=aiforscience&highlightedUpd…> #llmforscience <https://www.linkedin.com/feed/hashtag/?keywords=llmforscience&highlightedUp…> applications. Successful candidates will have the opportunity to use world-class research facilities at #ORNL <https://www.linkedin.com/feed/hashtag/?keywords=ornl&highlightedUpdateUrns=…>, including the Frontier #Exascale <https://www.linkedin.com/feed/hashtag/?keywords=exascale&highlightedUpdateU…> #supercomputer <https://www.linkedin.com/feed/hashtag/?keywords=supercomputer&highlightedUp…> . Students at the Bredesen Center, UTK, will benefit from a unique experience, working jointly with ORNL, UTK, and our industry partners on research projects that significantly contribute to advancements in trustworthy AI for scientific discovery. *Application Deadline: January 15th, 2024.* Note: The GRE is optional. Please reach out to me at ghosalt(a)ornl.gov if you have questions. We are also hiring #interns <https://www.linkedin.com/feed/hashtag/?keywords=interns&highlightedUpdateUr…> #postdocs <https://www.linkedin.com/feed/hashtag/?keywords=postdocs&highlightedUpdateU…> on the above topic! https://bredesencenter.utk.edu/data-science/ <https://lnkd.in/gqeMRJM9> #sdp <https://www.linkedin.com/feed/hashtag/?keywords=sdp&highlightedUpdateUrns=u…> #llms <https://www.linkedin.com/feed/hashtag/?keywords=llms&highlightedUpdateUrns=…> #sciNLP <https://www.linkedin.com/feed/hashtag/?keywords=scinlp&highlightedUpdateUrn…> Relevant Links: https://utorii.com https://bredesencenter.utk.edu https://www.ornl.gov/staff-profile/tirthankar-ghosal https://www.ornl.gov/ai-initiative -- +++++++++++++++++++++++++++++++++++ *Tirthankar Ghosal* Scientist National Center for Computational Sciences (NCCS) Oak Ridge National Laboratory, United States ++++++++++++++++++++++++++++++++++++

1 0

*SEM 2024 Call for Papers
by jodiechou001＠gmail.com 20 Dec '23

20 Dec '23

*SEM brings together researchers interested in the semantics of (many and diverse!) natural languages and its computational modeling. The conference embraces data-driven, neural, and probabilistic approaches, as well as symbolic approaches and everything in between; practical applications as well as theoretical contributions are welcome. The long-term goal of *SEM is to provide a stable forum for the growing number of NLP researchers working on all aspects of semantics of (many and diverse!) natural languages. Topics of interest: Lexical semantics and word representations Compositional semantics and sentence representations Statistical, machine learning, and deep learning methods in semantic tasks Multilingual and cross-lingual semantics Word sense disambiguation and induction Semantic parsing, and syntax-semantics interface Frame semantics and semantic role labeling Textual inference, textual entailment, and question answering Formal approaches to semantics Extraction of events and of causal and temporal relations Entity linking, pronouns and coreference Discourse, pragmatics, and dialogue Machine reading Extra-propositional aspects of meaning Multiword and idiomatic expressions Metaphor, irony, and humor Knowledge mining and acquisition Common sense reasoning Language generation Semantics in NLP applications: sentiment analysis, abusive language detection, summarization, fact-checking, etc. Multidisciplinary research on semantics Grounding and multimodal semantics Psycholinguistics Interpretability and Explainability Human semantic processing Semantic annotation, evaluation, and resources Ethical aspects and bias in semantic representations We encourage authors to think about the ethical aspects of their work, and to address and discuss all ethical questions and implications relevant to their research. STARSEM values reproducibility and particularly welcomes submissions that adhere to the reproducibility guidelines as specified here. Submission Instructions Submissions must describe unpublished work and be written in English. We solicit both long and short papers. Please note that double submission of papers will need to be notified at submission. Long papers describe original research and may consist of up to eight (8) pages of content, plus unlimited pages for references. Appendices are allowed after the references, but the paper should be self-contained and reviewers will not be required to check the appendices, if any. Final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account. Short papers describe original focused research and may consist of up to four (4) pages, plus unlimited pages for references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers comments in their final versions. Submissions should follow the ARR formatting requirements. The deadline for direct submissions is Feb 22, 2024, and these submissions will be reviewed by the *SEM-2024 program committee. ACL Rolling Review (ARR) submissions can be committed to *SEM up to March 22, 2024 (authors of ARR-reviewed papers need to include their OpenReview link with reviews in the submission form). Both types of submissions are through OpenReview. Limitations and Ethics Statement sections are allowed and encouraged, but they are not mandatory. They should be placed after the conclusion and they will not count towards the overall page limit.). In *SEM there is no special policy against multiple submissions, but this should be notified to the Program Chairs. Submission link: https://openreview.net/group?id=aclweb.org/StarSEM/2024/Conference Important Dates Anonymity period for direct submissions begins Jan 22, 2024 Direct submission deadline Feb 22, 2024 ARR-reviewed paper submission deadline Mar 22, 2024 Notification of acceptance Apr 22, 2024 Camera-ready deadline May 5, 2024 Conference date Jun 16, 2024 Anonymity period To protect the integrity of double-blind review and ensure that submissions are reviewed fairly, we adopt the rules and guidelines for ACL conferences. The following rules and guidelines make reference to the anonymity period, which runs from 1 month before the submission deadline (starting February 22, 2024 11:59PM UTC-12:00) up to the date when your paper is either accepted, rejected (Apr 22, 2024), or withdrawn. You may not make a non-anonymized version of your paper available online to the general community (for example, via a preprint server) during the anonymity period. By a version of a paper we understand another paper having essentially the same scientific content but possibly differing in minor details (including title and structure) and/or in length (e.g., an abstract is a version of the paper that it summarizes). If you have posted a non-anonymized version of your paper online before the start of the anonymity period, you may submit an anonymized version to the conference. The submitted version must not refer to the non-anonymized version, and you must inform the program chair(s) that a non-anonymized version exists. You may not update the non-anonymized version during the anonymity period, and we ask you not to advertise it on social media or take other actions that would further compromise double-blind reviewing during the anonymity period. Note that, while you are not prohibited from making a non-anonymous version available online before the start of the anonymity period, this does make double-blind reviewing more difficult to maintain, and we therefore encourage you to wait until the end of the anonymity period if possible. Alternatively, you may consider submitting your work to the Computational Linguistics journal, which does not require anonymization and has a track for “short” (i.e., conference-length) papers.

1 0

2nd Call for Participation - SHROOM: Shared-task on Hallucinations and Related Observable Overgeneration Mistakes at SemEval-2024
by Mickus, Timothee 20 Dec '23

20 Dec '23

Welcome to SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes! Task description: SHROOM participants will need to detect grammatically sound output that contains incorrect semantic information (i.e. unsupported or inconsistent with the source input), with or without having access to the model that produced the output. Overview of the task: The modern NLG landscape is plagued by two interlinked problems: On the one hand, our current neural models have a propensity to produce inaccurate but fluent outputs; on the other hand, our metrics are most apt at describing fluency, rather than correctness. This leads neural networks to “hallucinate”, i.e., produce fluent but incorrect outputs that we currently struggle to detect automatically. For many NLG applications, the correctness of an output is however mission critical. For instance, producing a plausible-sounding translation that is inconsistent with the source text puts in jeopardy the usefulness of a machine translation pipeline. With our shared task, we hope to foster the growing interest in this topic in the community. With SHROOM we adopt a post hoc setting, where models have already been trained and outputs already produced: participants will be asked to perform binary classification to identify cases of fluent overgeneration hallucinations in two different tracks: a model-aware and a model-agnostic track. In the former, participants have access to the model that produced the output; in the latter, they do not. To ensure a low-barrier to entry, we format the task as a binary classification problem. We now also provide a baseline kit, containing a baseline system, a format checker and the scoring program. All systems will be rated on accuracy (i.e., the proportion of test examples correctly labeled) and calibration (i.e., the correlation between the probability assigned by a system and the proportion of annotators marking a production as hallucinatory). We provide to participants a collection of checkpoints, inputs, references and outputs of systems covering three NLG tasks: definition modeling (DM), machine translation (MT), and paraphrase generation (PG), trained with varying degrees of accuracy. The development set provides binary annotations from five different annotators and a majority vote gold label. Anyone wishing to participate in the task is welcome! Participants will have to * Submit at least once during the evaluation phase on January; * Write a system description paper before February 19; * Review other system description papers (max. 2). Trial, dev and train data are now available on the task website: https://helsinki-nlp.github.io/shroom/ Codalab competition: https://codalab.lisn.upsaclay.fr/competitions/15726 Join the mailing group: https://groups.google.com/u/1/g/semeval-2024-task-6-shroom Updates on Twitter: @shroom2024<https://twitter.com/shroom2024> Important dates: * Sample data ready: July 15th, 2023 * Validation data ready: September 11th, 2023 * Unlabeled train data ready: September 22nd, 2023 * Evaluation period starts (test set released): January 10th, 2024 * Evaluation period ends: January 31st, 2024 * Workshop paper submission deadline: February 19th, 2024 * Notification to authors: March 18th, 2024 * SemEval workshop: 16–21 June, Mexico (collocated with NAACL 2024) Task organizers * Elaine Zosa, Silo AI, Finland * Raúl Vázquez, University of Helsinki, Finland * Jörg Tiedemann, University of Helsinki, Finland * Vincent Segonne, Southern Brittany University, France * Teemu Vahtola, University of Helsinki, Finland * Alessandro Raganato, University of Milano-Bicocca, Italy * Timothee Mickus, University of Helsinki, Finland * Marianna Apidianaki, University of Pennsylvania, USA

1 0

CfP: HTRes 2024 – Holocaust Testimonies as Language Resources at LREC-COLING 2024
by Martin Wynne 20 Dec '23

20 Dec '23

Call for Papers: * HTRes 2024 – Holocaust Testimonies as Language Resources *Pre-conference workshop at LREC-COLING 2024 (https://lrec-coling-2024.org/) Tuesday, 21st May, 2024 in Torino, Italy Workshop webpage: https://www.clarin.eu/HTRes2024 ** Final date for paper submission: 21 February 2024 ** Holocaust testimonies serve as a bridge between survivors and history’s darkest chapters, providing a connection to the profound experiences of the past. Testimonies stand as the primary source of information that describe the Holocaust, offering first-hand accounts and personal narratives of those who experienced it. The majority of testimonies are captured in an oral format, as survivors vividly explain and share their personal experiences and observations from that time period. Transforming Holocaust testimonies into a machine-processable digital format can be a difficult task owing to the unstructured nature of the text. The creation of accessible, comprehensive, and well-annotated Holocaust testimony collections is of paramount importance to our society. These collections empower researchers and historians to validate the accuracy of socially and historically significant information, enabling them to share critical insights and trends derived from these data. This workshop will investigate a number of ways in which techniques and tools from natural language processing and corpus linguistics can contribute to the exploration, analysis, dissemination and preservation of Holocaust testimonies. Topics of interest: We expect contributions related to the following topics: * Creation of datasets and development of tools for the study of Holocaust testimonies: * Creation of language corpora of Holocaust testimonies * Digitisation and enhancement of oral and written testimonies (including automatic speech recognition, alignment of text and speech, format conversion, OCR, handwriting recognition, machine translation) * Named entity recognition for identifying people, places, and events in testimonies * Standards, representation formats, and guidelines for annotations and vocabularies relevant to the Holocaust testimonies * Creation, adaptation and tuning of software applications for the creation, annotation, enhancement and use of Holocaust testimonies as language resources Research using NLP and Holocaust testimonies * Applications of NLP in analysing Holocaust survivor testimonies * Sentiment analysis and emotional content extraction from survivor narratives. Data Visualisation, Knowledge Representation and Information Extraction: * Visualising complex data structures from Holocaust testimonies * Building knowledge graphs and networks to represent historical relationships * Interactive data visualisations for education and research * Extracting biographical and temporal information relevant to the Holocaust * Deep learning and large language models Digital Archiving and Long-Term Preservation: * Methods and tools for digitising and preserving Holocaust testimonies * Best practices for metadata standards and cataloguing * Ensuring long-term accessibility and data integrity Ethical Considerations and Privacy * Ethical challenges in digitising and sharing sensitive testimonies * Anonymisation and privacy protection in Holocaust data * Community engagement and consent in digital projects User and application aspects * Development of tools and interfaces for the search, analysis and exploration of Holocaust testimonies * Other relevant use cases and application scenarios All papers must clearly state and explain their relevance to the topic of 'Holocaust Testimonies as Language Resources'. All papers must represent original and unpublished work that is not currently under review. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. We welcome the following types of contributions: Standard research papers (up to 8 pages, plus more pages for references if needed); Short research papers (from 4 to 6 pages, plus more pages for references if needed). Submissions should strictly follow the LREC2024 stylesheet formatting guidelines. All papers should be electronically submitted in PDF format via the main conference platform via START (https://softconf.com/lrec-coling2024/htres2024/) Important Dates: Final date for paper submission: 21 February 2024 Notification of Acceptance: 20 March 2024 Camera-ready version submission: 15 April 2024 Workshop date: 21 May 2024 Programme: Please refer to the website for the details of the programme, plus the organizing and programme committees: https://www.clarin.eu/HTRes2024 -- Senior Researcher in Corpus Linguistics Faculty of Linguistics, Philology and Phonetics, University of Oxford National Co-ordinator, CLARIN-UK martin.wynne(a)ling-phil.ox.ac.uk https://orcid.org/0000-0002-4155-0530

1 0

2026

2025

2024

2023

2022

Corpora December 2023