April 2024 - Corpora

CfP: JOKER @ CLEF 2024: Automatic Humour Analysis
by Tristan Miller 15 Apr '24

15 Apr '24

Call for Participation - JOKER @ CLEF 2024: Automatic Humour Analysis ===================================================================== https://joker-project.com/ We invite individuals and teams to participate in JOKER 2024, a set of shared tasks on automatic humour analysis. JOKER 2024 will be held at the Conference and Labs of the Evaluation Forum (CLEF 2024) from 9 to 12 September 2024 in Grenoble, France. Topics and tasks ---------------- The goal of the JOKER workshop is to bring together linguists and computer scientists to work on an evaluation framework for humour, including data and metric development, and to foster work on automatic methods for humour analysis and translation. We invite submissions of automatic or manual runs for any or all of the following tasks, for which we have prepared annotated data: Task 1: Humour-aware information retrieval Task 2: Humour classification according to genre and technique Task 3: Translation of puns from English to French Unshared task: We welcome submissions that use our data for other tasks. How to participate ------------------ At least one team member should register at the CLEF website, and all team members should join the JOKER mailing list (see URLs below). Task data will be made available to all registered participants. Runs and system description papers should be submitted according to the schedule and instructions posted on the JOKER and CLEF websites. Deadlines --------- 2024-04-22: CLEF 2024 registration 2024-05-06: Submission of runs for JOKER 2024-05-24: Evaluation results posted 2024-05-31: Submission of system description papers 2024-06-24: Reviews of system description papers 2024-07-08: Submission of camera-ready papers 2024-09-09: CLEF 2024 conference begins Contacts -------- JOKER website: https://joker-project.com/ CLEF website: https://clef2024.clef-iniative.eu/ Email: contact(a)joker-project.com Twitter: https://twitter.com/joker_research Mailing list (Google Groups): https://groups.google.com/g/joker-project Chairs ------ Liana Ermakova, Université de Bretagne Occidentale Tristan Miller, University of Manitoba Adam Jatowt, University of Innsbruck Anne-Gwenn Bosser, ENIB Victor Manuel Palma Preciado, Instituto Politécnico Nacional Grigori Sidorov, Instituto Politécnico Nacional -- Dr. Tristan Miller, Assistant Professor Department of Computer Science, University of Manitoba https://logological.org/ | Tel. +1 204 474 6792

1 0

Funded PhD in text mining, deadline May 15
by Peeter Tinits 15 Apr '24

15 Apr '24

Dear all, A fully funded PhD available for with a text mining / digital humanities / computational social science specialization. An interdisciplinary team of researchers is seeking to recruit a PhD student to a research project on industrial modernity and Deep Transitions at the Institute of Social Studies, University of Tartu, Estonia, led by Laur Kanger. The PhD study (4 years) will focus on the identification of long-term trends in industrial modernity in a comparative-historical perspective, combining the text mining of digitalized newspapers with existing databases. The application deadline is 15.05.2024. Please do not hesitate to forward this to anyone who might be interested in taking up the challenge. General information can be found here: https://ut.ee/en/content/phd-open-calls (navigate to “1-15 May and 1-15 June 2024” > “Faculty of Social Sciences” > “Media and Communication, Sociology” tab). Detailed description of the PhD position can be found here (https://ut.ee/sites/default/files/2024-04/Sociology_PhD%20call_Deep%20Trans…) . Long story short: if you have experience in text mining and an interest in societal change, you're likely to be a good match. :) Don't hesitate to contact for more information about the research! Best, Peeter Tinits

1 0

Research Fellow position, University of Essex, UK- deadline April 21
by Ravi Shekhar 15 Apr '24

15 Apr '24

Dear All, We are seeking a passionate and motivated Research Fellow at the University of Essex, UK. You will be working on the EU/UKRI funded ELOQUENCE project <https://eloquenceai.eu/>. The funding is available until 31 December 2026. We are interested in attracting a researcher working on Natural Language Processing, Speech Processing, and Machine Learning/AI. Some areas of interest include (but not limited to) - Multimodel foundation model to combine text, speech and vision. - Multi-lingual text and speech processing - Factual retrieval-augmented generation - Human feedback incorporation using HCI techniques. - Ethics dimension of AI in real-world use cases You will be responsible for developing, validating, and deploying a deep-learning model. You will need experience building deep learning models using established tools (e.g., PyTorch) and proficiency in Python. You should also have, or be close to completing, a Ph.D. in computer science or a related field. You will have opportunities to develop your research interests, collaborate with other project partners, and present work in international conferences/journals. *Application Deadline: 21 April 2024 (Application details <https://www.jobs.ac.uk/job/DGX705/senior-research-officer>) * Informal inquiries about the position can be made to Prof. Haris Mouratidis and Dr. Ravi Shekhar (e-mail: {h.mouratidis and r.shekhar}(a)essex.ac.uk) -- Ravi Ravi Shekhar Lecturer, University of Essex, UK http://shekharravi.github.io/

1 1

SIGDIAL 2024: Second Call for Papers
by Ultes, Stefan 15 Apr '24

15 Apr '24

The 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) will be held in Kyoto, Japan on September 18-20, 2024. SIGDIAL will be co-located with INLG which will take place after SIGDIAL in Tokyo, Japan. The SIGDIAL venue provides a regular forum for the presentation of cutting-edge research in dialogue and discourse to both academic and industry researchers, continuing a series of 24 successful previous meetings. The conference is sponsored by the SIGDIAL organization - the Special Interest Group in discourse and dialogue for ACL and ISCA. * Topics of Interest * We welcome formal, corpus-based, implementation, experimental, or analytical work on discourse and dialogue including, but not restricted to, the following themes: - Discourse Processing: Rhetorical and coherence relations, discourse parsing and discourse connectives. Reference resolution. Event representation and causality in narrative. Argument mining. Quality and style in text. Cross-lingual discourse analysis. Discourse issues in applications such as machine translation, text summarization, essay grading, question answering and information retrieval. Discourse issues in text generated by large language models. - Dialogue Systems: Task oriented and open domain spoken, multimodal, embedded, situated, and text-based dialogue systems, their components, evaluation and applications, Knowledge representation and extraction for dialogue, State representation, tracking and policy learning. Social and emotional intelligence, Dialogue issues in virtual reality and human-robot interaction. Entrainment, alignment and priming. Generation for dialogue, Style, voice, and personality. Safety and ethics issues in Dialogue. - Corpora, Tools and Methodology: Corpus-based and experimental work on discourse and dialogue, including supporting topics such as annotation tools and schemes, crowdsourcing, evaluation methodology and corpora. - Pragmatic and Semantic Modeling: Pragmatics and semantics of conversations (i.e., beyond a single sentence), e.g., rational speech act, conversation acts, intentions, conversational implicature, presuppositions. - Applications of Dialogue and Discourse Processing Technology. * Special Session * SIGDIAL 2024 invites work on the special session “GEMINI - Graph-based knowledge for Modelling Intelligent Natural Interaction” that focuses on knowledge and knowledge modeling for dialogue systems, in particular on the opportunities and challenges for enhancing and stabilizing dialogue capabilities of chatbots, robots, and virtual agents with the use of LLMs. * Submissions * The program committee welcomes the submission of long papers, short papers, and demo descriptions. Submitted long papers may be accepted for oral or for poster presentation. Accepted short papers will be presented as posters. - Long paper submissions must describe substantial, original, completed and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included. Long papers must be no longer than 8 pages, including title, text, figures and tables. An unlimited number of pages is allowed for references and appendices, and an extra page is allowed in the final version to address reviewers’ comments. - Short paper submissions must describe original and unpublished work. Please note that a short paper is not a shortened long paper. Instead, short papers should have a point that can be made in a few pages, such as a small, focused contribution; a negative result; or an interesting application nugget. Short papers should be no longer than 4 pages including title, text, figures and tables. An unlimited number of pages is allowed for references and appendices, and an extra page is allowed in the final version to address reviewers’ comments. - Demo descriptions should be no longer than 4 pages including title, text, examples, figures, tables and references. A separate one-page document should be provided to the program co-chairs for demo descriptions, specifying furniture and equipment needed for the demo. Note that content that is an important part of the contribution or that is important for the reviewers to assess the technical correctness of the work should be a part of the main paper, and not appear in appendices. Reviewers are not required to consider material in appendices. Authors are encouraged to also submit additional accompanying materials, such as corpora (or corpus examples), demo code, videos and sound files. * Multiple Submissions * SIGDIAL 2024 cannot accept work for publication or presentation that will be (or has been) published elsewhere and that have been or will be submitted to other meetings or publications whose review periods overlap with that of SIGDIAL. Any questions regarding submissions can be sent to program-chairs [at] sigdial.org. * Blind Review * Building on previous years’ move to anonymous long and short paper submissions, SIGDIAL 2024 will follow the ACL policies for preserving the integrity of double-blind review (see author guidelines: https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines). Unlike long and short papers, demo descriptions will not be anonymous. Demo descriptions should include the authors’ names and affiliations, and self-references are allowed. * Submission Format * All long, short, and demonstration submissions must follow the two-column ACL format, which are available as an Overleaf template (https://www.overleaf.com/read/crtcwgxzjskr) and also downloadable directly (Latex and Word) (https://github.com/acl-org/acl-style-files). Submissions must conform to the official ACL style guidelines, which are contained in these templates. Submissions must be electronic, in PDF format. * Submission Deadline * SIGDIAL will accept regular submissions through the Softconf/START system, as well as commitment of already reviewed papers through the ACL Rolling Review (ARR) system. * Regular submission * Authors have to fill in the submission form in the Softconf/START system and upload an initial pdf of their papers before May 17, 2024 (23:59 GMT-11). Details and the submission link will be posted on the conference website (https://2024.sigdial.org/). Submission via ACL Rolling Review (ARR, https://aclrollingreview.org/) Please refer to the ARR Call for Papers (https://aclrollingreview.org/cfp ) for detailed information about submission guidelines to ARR. The commitment deadline for authors to submit their reviewed papers, reviews, and meta-review to SIGDIAL 2024 is June 19, 2024. Note that the paper needs to be fully reviewed by ARR in order to make a commitment, thus the latest date for ARR submission will be April 15, 2024. * Mentoring * Acceptable submissions that require language (English) or organizational assistance will be flagged for mentoring, and accepted with a recommendation to revise with the help of a mentor. An experienced mentor who has previously published in the SIGDIAL venue will then help the authors of these flagged papers prepare their submissions for publication. * Best Paper Awards * In order to recognize significant advancements in dialogue/discourse science and technology, SIGDIAL 2024 will include best paper awards. All papers at the conference are eligible for the best paper awards. A selection committee consisting of prominent researchers in the fields of interest will select the recipients of the awards. SIGDIAL 2024 Program Committee Vera Demberg and Stefan Ultes Conference Website: https://2024.sigdial.org/

1 0

The 12th IEEE International Conference on Mobile Cloud Computing, Services and Engineering (IEEE Mobile Cloud 2024): Last Call for Papers
by Announce 15 Apr '24

15 Apr '24

*** Last Call for Papers *** IEEE Mobile Cloud 2024 The 12th IEEE International Conference on Mobile Cloud Computing, Services and Engineering July 15-18, 2024 | Shanghai, China https://ieeemobilecloud.com (*** Submission Deadline: April 30th, 2024 AoE (extended) ***) IEEE Mobile Cloud is a pioneering IEEE sponsored international conference devoted to the research in mobile, edge, and cloud computing. It covers all aspects of mobile, edge, and cloud computing from architectures, techniques, tools and methodologies to applications. This year's conference is scheduled to take place in Shanghai, China, from 15-18 July 2024. IEEE Mobile Cloud 2024 is part of the IEEE International Congress On Intelligent And Service- Oriented Systems Engineering offering a broad spectrum of international events, sharing renowned keynotes and fostering exchange among researchers and practitioners (see common homepage for all colocated events, https://ieee-cisose-congress.org). The fusion of mobile communications, computing, and intelligence is catalysing the emergence of innovative systems and applications that facilitate intelligent resource provisioning, process extensive data from mobile sensors and interconnected hardware platforms, and bolster the Internet of Things (IoT) through robust edge and cloud-based backend infrastructure. The pivotal role of current and forthcoming communication technologies, machine learning implementation, and mobile cloud infrastructures as facilitators for this convergence cannot be understated. These mobile intelligent applications are poised to revolutionise various facets of daily life, encompassing domains such as transportation, e-commerce, healthcare, smart homes, smart cities, social interaction, and more. Mobile intelligence serves as an inclusive platform for both academic and industrial researchers to share their latest research insights, experimental findings, and the latest advancements in industry technologies related to mobile systems, machine learning, edge and cloud computing, services, and engineering. Leveraging the synergy of mobile communications, machine intelligence, edge computing, and edge/cloud infrastructures, the future of Mobile Intelligence Systems is envisioned to provide a multitude of critical and personalised services across diverse application domains, ranging from education, transportation, to public health, safety, and security. Submissions will be evaluated on the criteria of originality, significance, clarity, relevance, and accuracy. TOPICS OF INTEREST They include but not limited to: Theory, Modelling, and Methodologies • Mobile cloud computing models, architectures, infrastructures, and platforms • Mobile intelligence theories, concepts, algorithms, and methodologies • Mobile cloud data management • Mobile cloud tools, middleware, and data centres • Mobile intelligence as a service • Mobile networking, protocols, and technologies • Quality of service (QoS) • Mobile intelligence security and privacy Applications and Industry Practice • Mobile intelligence for autonomous driving systems, V2X, intelligent transportation systems (ITS), telematics • Mobile intelligence for robotics, unmanned aerial vehicles (UAVs), and unmanned ground vehicles (UGVs) • Mobile intelligence for sensor networks, Industrial IoT, industrial 4.0, and industry 5.0 • Mobile intelligence for future wireless technologies, 5G/6G, WiFi, Satellite, etc. • Mobile intelligence for aviation, airports, and railway • Mobile intelligence for Augmented Reality/Virtual Reality (AR/VR) • Mobile intelligence for computer vision and video analytics • Mobile intelligence for surveillance and disaster management • Mobile intelligence for healthcare • Mobile intelligence for the metaverse • Mobile intelligence for smart city • Mobile intelligence for satellite • Mobile intelligence for mission-critical systems • Mobile intelligence for community services and social networking • Mobile intelligence computing for sustainable development PAPER SUBMISSION GUIDELINES Papers must be written in English. All papers must be prepared in the IEEE double column proceedings format. Please see the following link for details: https://www.ieee.org/conferences/publishing/templates.html . All accepted conference papers will be published by IEEE Computer Society and IEEE Explore digital library with EI-index. Selected papers will be recommended to SCI-index journals as special issue papers. The paper length should be up to 8 pages for regular conference papers and 6 pages for work-in-progress papers. Submitted papers should contain original work and not being submitted elsewhere. Each paper must be presented by an author at the conference. Presentations via teleconference are not permitted. Permissions to have the paper presented by a qualified substitute presented may be granted by the TCP Chairs under extraordinary circumstances, upon written request. Submissions should be made via Easy Chair using the following link: https://easychair.org/my/conference?conf=imc24 . IMPORTANT DATES • Abstract submission: March 31st, 2024 (AoE) • Paper submission: April 30th, 2024 (AoE) (extended) • Notification of acceptance: May 15th, 2024 • Final manuscript submission: May 22nd, 2024 • Author registration: May 22nd, 2024 • Conference: July 15th-18th, 2024 COMMITTEES General Chairs • Hiroyuki Sato, University of Tokyo, Japan • Yan Bai, University of Washington Tacoma, USA Program Chairs • Lan Zhang, Clemson University, USA • Sun Yao, The University of Glasgow, Scotland, UK • Tomoki Watanabe, Kanagawa Institute Technology, Japan • Fan Wu, Shanghai Jiao Tong University, China Publicity Chair George Angelos Papadopoulos, University of Cyprus, Cyprus Program Committee • Ouri Wolfson, University of illinois • Felix Beierle, University of Würzburg • Thomas Richter, Rhein-Waal University of Applied Sciences • Dan Grigoras, University College Cork • Sergio Ilarri, University of Zaragoza • Iulian Sandu Popa, University of Versailles Saint-Quentin & INRIA Saclay-Ile-de-France • Haiping Xu, University of Massachusetts Dartmouth • Prasad Calyam, University of Missouri • Dana Petcu, West University of Timisoara • Fabio Costa, Federal University of Goias • Cristian Borcea, New Jersey Institute of Technology • Lei Huang, Prairie View A&M University • Chunsheng Zhu, Southern University of Science and Technology • Xuyun Zhang, Macquarie University • Jia Zhao, Changchun Institute of Technology • Richard Han, University of Colorado Boulder CISOSE General Chairs • Jerry Gao, San Jose State University, USA • Iraklis Varlamis, Harokopio University of Athens, Greece CISOSE Steering Committee • Jerry Gao, San Jose State University, USA • Guido Wirtz, University of Bamberg, Germany • Huaimin Wang, NUDT, China • Jie Xu, University of Leeds, UK • Wei-Tek Tsai, Arizona State University, USA • Axel Kupper, TU Berlin, Germany • Hong Zhu, Oxford Brookes University, UK • Longbin Cao, University of Technology Sydney, Australia • Cristian Borcea, New Jersey Institute of Technology, USA • Sato Hiroyuki, University of Tokyo, Japan

1 0

2-year postdoc in quantitative text analysis/ NLP
by Stephanie Dornschneider-Elkink 15 Apr '24

15 Apr '24

*Two-year fully funded postdoctoral position in quantitative text analysis/ NLP* *Location:* University College Dublin, School of Politics and International Relations *Start date:* 1st September, 2024 *Deadline:* 12th May, noon, 2024 University College Dublin is currently recruiting a post-doctoral researcher to implement natural language processing (NLP) tools to analyse interview data. The main objective of this position is to develop tools to identify and analyse so-called cognitive maps (Axelrod 1976) from interview data. Dornschneider and Henderson (2016, 2023) and Dornschneider (2019) have developed tools for the computational analysis of cognitive maps. What is needed is a set of tools to infer cognitive maps from natural language. This Irish Research Council funded project investigates the role of women in Muslim resistance movements, based on Arabic interviews conducted by the Principal Investigator. The cognitive mapping analysis has several main objectives: 1- to show typical behavioral decisions (e.g. to join a resistance movement) described by the interviewees; 2- to identify common reasoning processes related to these decisions; and 3- to trace the role of religious beliefs in these reasoning processes. You will work with the PI, Dr. Stephanie Dornschneider-Elkink, to deliver the research objectives of the project. You will support the development and subsequent publication of new tools to convert text into cognitive maps. Tasks will include but are not limited to POS tagging, sequence analysis, word embeddings, and visualization. You will have the chance to give substantial input to the analysis and to co-author papers with the PI. Full ad*:* https://my.corehr.com/pls/coreportal_ucdp/apply?id=017201 *References* Axelrod, R. (ed.). 1976. *Structure of decision: The cognitive maps of political elites*. Princeton: Princeton university press. Dornschneider-Elkink, S. and Henderson, N., 2023. Repression and Dissent: How Tit-for-Tat Leads to Violent and Nonviolent Resistance. *Journal of Conflict Resolution*, p.00220027231179102. https://doi.org/10.1177/0022002714540473 Dornschneider, S., 2019. High‐Stakes Decision‐Making Within Complex Social Environments: A Computational Model of Belief Systems in the Arab Spring. *Cognitive Science*, *43*(7), p.e12762. https://doi.org/10.1111/cogs.12762 Dornschneider, S. and Henderson, N., 2016. A computational model of cognitive maps: Analyzing violent and nonviolent activity in Egypt and Germany. *Journal of Conflict Resolution*, *60*(2), pp.368-399. -- Dr Stephanie Dornschneider-Elkink Assistant Professor, School of Politics & International Relations (SPIRe) University College Dublin Newman Building, F316, Belfield, Dublin 4, Ireland http://www.dornschneider.net/

1 0

Deadline extension (May 15th) SEPLN BEST 2023 THESIS AWARD
by aitziber.atucha＠ehu.eus 15 Apr '24

15 Apr '24

[Apologies for cross-posting] 23rd EDITION OF THE SEPLN AWARD TO THE BEST DOCTORAL THESIS IN NATURAL LANGUAGE PROCESSING [EXTENSION: May 15nd, 2024] The Spanish Society for Natural Language Processing announces the 23rd Edition of the SEPLN Award for the Best Doctoral Thesis in Natural Language Processing, which will be governed by the following bases: 1.- The purpose of this award is the promotion and dissemination of research in the field of natural language processing. 2.- The thesis will be awarded with a compact laptop (tablet) and €300 for attendance at the congress. The award will be presented at the 40th International Congress of the Spanish Society of Natural Language Processing (SEPLN 2024), after a brief presentation of the award-winning work by the author. 3.- In order to compete, the author of the doctoral thesis must be a member of the SEPLN at the time of submitting the work. No contestant may participate as author in more than one work. 4.- Doctoral theses read during the year 2023, written in a language of the Spanish State or in English, may be submitted to the competition. In addition to the complete thesis, it is essential to send: a) a 4-page summary of the thesis, clearly describing the topic and the relevance of the research, the objectives, methods, results achieved and contributions. b) a brief description of the scientific career of the author of the thesis, detailing the participation in scientific activities such as organization of competitive tasks, congresses, generation of open access resources such as sets of data, language models, etc., and participation in projects, contracts, and/or patents. The quality of the presentation, the technical and methodological correctness, the relevance, originality, the generation, evaluation and publication of resources, as well as the research trajectory during the pre-doctoral period will be the criteria used for the award of the prize by the jury. The works will be submitted through the website of the Society's magazine (http://journal.sepln.org) in PDF format before May 15nd 2024. The final decision will be communicated during the 40th International Congress of the Spanish Society for Natural Language Processing (SEPLN 2024). Submission instructions (http://www.sepln.org/sites/default/files/noticia/documentos_relacionados/20…) For more information aitziber.atucha(a)ehu.eus EDICIÓN XXIII PREMIO SEPLN A LA MEJOR TESIS DOCTORAL EN PROCESAMIENTO DEL LENGUAJE NATURAL [EXTENSIÓN: 15 de mayo de 2024] La Sociedad Española para el Procesamiento del Lenguaje Natural convoca la Edición XXIII del Premio SEPLN a la Mejor Tesis Doctoral en Procesamiento del Lenguaje Natural, que se regirá por las siguientes bases: 1.- La finalidad de este premio es la promoción y divulgación de la investigación en el campo del procesamiento del lenguaje natural. 2.- La tesis será premiada con una computadora portátil compacta (tablet) y 300€ para la asistencia al congreso. Se dará entrega del premio en el 40 Congreso Internacional de la Sociedad Española del Procesamiento del Lenguaje Natural (SEPLN 2024), tras una breve presentación del trabajo premiado por parte del autor. 3.- Para poder concursar, el autor de la tesis doctoral debe ser socio de la SEPLN en el momento de presentar el trabajo. Ninguna persona concursante podrá participar como autora en más de un trabajo. 4.- Se podrán presentar a concurso tesis doctorales leídas durante el año 2023, escritas en una lengua del Estado español o en lengua inglesa. Además de la tesis completa, es imprescindible enviar: a.- Un breve resumen de 4 páginas donde claramente se indique el tema y la relevancia de la investigación, los objetivos, métodos, resultados alcanzados y contribuciones. b.- Una breve descripción de la trayectoria científica del autor de la tesis, en la que se describa la participación en actividades científicas como organización de de tareas competitivas, congresos, generación de recursos open access como conjuntos de datos, modelos de lenguaje, etc, y participación en proyectos, contratos, y/o patentes. La calidad de la presentación, la corrección técnica y metodológica, la relevancia, originalidad, la generación, evaluación y publicación de recursos, así como la trayectoria investigadora durante el periodo predoctoral serán los criterios empleados para la adjudicación del premio por parte del jurado. Los trabajos se enviarán a través de la web de la revista de la Sociedad (http://journal.sepln.org) en formato PDF antes del 15 de mayo de 2024. La resolución del premio se comunicará durante el 40 Congreso Internacional de la Sociedad Española del Procesamiento del Lenguaje Natural (SEPLN 2024). Documento con las instrucciones (http://www.sepln.org/sites/default/files/noticia/documentos_relacionados/20…) Para más información dirigirse a aitziber.atucha(a)ehu.eus

1 0

11th CMC-Corpora Conference, Nice, France, 5-7 September 2024: Extension of submission deadline to 26 April
by Steven Coats 15 Apr '24

15 Apr '24

Dear colleagues, We have received many requests to extend the submission deadline for CMC-Corpora 2024 and are therefore pleased to announce an extension of the paper and abstract submission deadline to 23:59 CEST (GMT +2), April, 26th, 2024. We are also very happy to inform you that Susan Herring (Indiana University) will be our keynote speaker! For submission details, please see the conference website: https://cmc-corpora-nice.sciencesconf.org/ Looking forward to receiving your submission! On behalf of the organizing and steering committees, Céline Poudat and Steven Coats University Lecturer, Docent English, Faculty of Humanities University of Oulu P.O. Box 8000, FI-90014 University of Oulu Finland https://cc.oulu.fi/~scoats

1 0

Subject: Call for Shared Task Participation: Data Contamination Evidence Collection - CONDA workshop @ ACL 2024
by Eneko Agirre 15 Apr '24

15 Apr '24

We invite the community to participate in a shared task organized in the context of the CONDA workshop https://conda-workshop.github.io/ <https://conda-workshop.github.io/>. Data contamination, where evaluation data is inadvertently included in pre-training corpora of large scale models, and language models (LMs) in particular, has become a concern in recent times (Sainz et al. 2023 <https://aclanthology.org/2023.findings-emnlp.722/>; Jacovi et al. 2023 <https://aclanthology.org/2023.emnlp-main.308/>). The growing scale of both models and data, coupled with massive web crawling, has led to the inclusion of segments from evaluation benchmarks in the pre-training data of LMs (Dodge et al., 2021 <https://aclanthology.org/2021.emnlp-main.98/>; OpenAI, 2023 <https://arxiv.org/abs/2303.08774>; Google, 2023 <https://arxiv.org/abs/2305.10403>; Elazar et al., 2023 <https://arxiv.org/abs/2310.20707>). The scale of internet data makes it difficult to prevent this contamination from happening, or even detect when it has happened (Bommasani et al., 2022 <https://arxiv.org/abs/2108.07258>; Mitchell et al., 2023 <https://arxiv.org/abs/2212.05129>). Crucially, when evaluation data becomes part of pre-training data, it introduces biases and can artificially inflate the performance of LMs on specific tasks or benchmarks (Magar and Schwartz, 2022 <https://aclanthology.org/2022.acl-short.18/>). This poses a challenge for fair and unbiased evaluation of models, as their performance may not accurately reflect their generalization capabilities. The shared task is a community effort on centralized data contamination evidence collection. While the problem of data contamination is prevalent and serious, the breadth and depth of this contamination are still largely unknown. The concrete evidence of contamination is scattered across papers, blog posts, and social media, and it is suspected that the true scope of data contamination in NLP is significantly larger than reported. With this shared task we aim to provide a structured, centralized platform for contamination evidence collection to help the community understand the extent of the problem and to help researchers avoid repeating the same mistakes. The shared task also gathers evidence of clean, non-contaminated instances. The platform is already available for perusal at https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database <https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Report>. Participants in the shared task need to submit their contamination evidence (see instructions below). The CONDA 2024 workshop organizers will review the evidence through pull requests. */Compilation Paper/* As a companion to the contamination evidence platform, we will produce a paper that will provide a summary and overview of the evidence collected in the shared task. The participants who contribute to the shared task will be listed as co-authors in the paper. */ /* */Instructions for Evidence Submission/* Each submission should report a case of contamination or lack of contamination thereof. The submission can be either about (1) contamination in the corpus used to pre-train language models, where the pre-training corpus contains a specific evaluation dataset, or about (2) contamination in a model that shows evidence of having seen a specific evaluation dataset while being trained. Each submission needs to mention the corpus (or model) and the evaluation dataset, in addition to some evidence of contamination. Alternatively, we also welcome evidence of a lack of contamination. Reports must be submitted through a Pull Request in the Data Contamination Report space at HuggingFace. The reports must follow the Contribution Guidelines provided in the space and will be reviewed by the organizers. If you have any questions, please contact us at conda-workshop(a)googlegroups.com <mailto:conda-workshop@googlegroups.com> or open a discussion in the space itself. URL with contribution guidelines: https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Database <https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Report> (“Contribution Guidelines” tab) */Important dates/* * Deadline for evidence submission: July 1, 2024 * Workshop day: August 16, 2024 */Sponsors/* * AWS AI and Amazon Bedrock * HuggingFace * Google */Contact/* * Website: https://conda-workshop.github.io/ <https://conda-workshop.github.io/> * Email: conda-workshop(a)googlegroups.com <mailto:conda-workshop@googlegroups.com> */Organizers/* Oscar Sainz, University of the Basque Country (UPV/EHU) Iker García Ferrero, University of the Basque Country (UPV/EHU) Eneko Agirre, University of the Basque Country (UPV/EHU) Jon Ander Campos, Cohere Alon Jacovi, Bar Ilan University Yanai Elazar, Allen Institute for Artificial Intelligence and University of Washington Yoav Goldberg, Bar Ilan University and Allen Institute for Artificial Intelligence

1 0

ArabicNLP 2024 - Third Call for Papers
by Salam Khalifa 14 Apr '24

14 Apr '24

Dear all, (Apologies for cross-posting) This is the third CFP for the second Arabic Natural Language Processing Conference (ArabicNLP 2024) Co-located with ACL 2024 in Bangkok, Thailand, August 16, 2024. (Hybrid Mode). Conference URL: https://arabicnlp2024.sigarab.org/ Upcoming deadline: May 3, 2024: Abstract of direct conference paper submissions due date (Open Review) ArabicNLP 2024 builds on eight previous conference and workshop editions, which have been very successful drawing in a large active participation in various capacities (See Scholar Page <https://scholar.google.com/citations?user=LGzh8jYAAAAJ>). This conference is timely given the continued rise in research projects focusing on Arabic NLP. The conference is organized by the Special Interest Group on Arabic NLP (SIGARAB <https://www.sigarab.org/>), an Association for Computational Linguistics Special Interest Group on Arabic NLP. Call for Papers We invite long (up to 8 pages), short (up to 4 pages), and demo paper (up to 4 pages) submissions. Long and short papers will be presented orally or as posters as determined by the program committee; presentation mode does not reflect the quality of the work. Submissions are invited on topics that include, but are not limited to, the following: - Enabling technologies: (any size) language models, diacritization, lemmatization, morphological analysis, disambiguation, tokenization, POS tagging, named entity detection, chunking, parsing, semantic role labeling, sentiment analysis, Arabic dialect modeling, etc. - Applications: dialog modeling, machine translation, speech recognition, speech synthesis, optical character recognition, pedagogy, assistive technologies, social media analytics, etc. - Resources: dictionaries, annotated data, corpora, etc. Submissions may include work in progress as well as finished work. Submissions must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, classical, or mixed. Papers on other languages sharing problems faced by Arabic NLP researchers, such as Semitic languages or languages using Arabic script, are welcome provided that they propose techniques or approaches that would be of interest to Arabic NLP, and they explain why this is the case. Additionally, papers on efforts using Arabic resources but targeting other languages are also welcome. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work. We also welcome position papers and surveys about any of the above topics. Conference Paper Submission URL: <https://softconf.com/emnlp2022/WANLP2022> https://openreview.net/group?id=SIGARAB.org/ArabicNLP/2024/Conference Important Dates for Conference Papers - May 3, 2024: Abstract of direct conference paper submissions due date (Open Review) - May 10, 2024: Full direct conference paper submissions due date (Open Review) - May 17, 2024: ARR commitment date <https://aclrollingreview.org/dates> - May 31, 2024: Reviews submission deadline - June 17, 2024: Notification of acceptance - July 1, 2024: Camera-ready papers due - August 16, 2024: ArabicNLP conference All deadlines are 11:59 pm UTC -12h <https://www.timeanddate.com/time/zone/timezone/utc-12> (“Anywhere on Earth”). There are eight exciting shared tasks: https://arabicnlp2024.sigarab.org/shared-tasks - Task 1: AraFinNLP: Arabic Financial NLP - Task 2: FIGNEWS 2024: Shared Task on News Media Narratives of the Israel War on Gaza - Task 3: ArAIEval: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content - Task 4: StanceEval2024: Arabic Stance Evaluation Shared Task - Task 5: WojoodNER 2024: The 2nd Arabic Named Entity Recognition Shared Task - Task 6: ArabicNLU Shared-Task: Arabic Natural Language Understanding - Task 7: NADI 2024: Nuanced Arabic Dialect Identification - Task 8: KSAA-CAD Shared Task: Contemporary Arabic Reverse Dictionary and Word Sense Disambiguation If you have any questions, please contact us at arabicnlp-pc-chairs(a)sigarab.org The ArabicNLP 2024 Organizing Committee -- Salam Khalifa PhD Student at Stony Brook Linguistics <https://www.linguistics.stonybrook.edu/>.

1 0

2026

2025

2024

2023

2022

Corpora April 2024