- Corpora - ELRA lists

Second Call for Participation- IWSLT 2024
by Atul K. Ojha 16 Jan '24

16 Jan '24

Apologies for cross-posting. ---------------------------------------- *The International Conference on Spoken Language Translation* *21st IWSLT 2024 – **Second** Call for Participation* *August 15-16, 2024 – Bangkok, Thailand* *http://iwslt.org <http://iwslt.org/>* The International Conference on Spoken Language Translation (IWSLT) is the premier annual conference for all aspects of Spoken Language Translation. Every year, the conference organizes and sponsors open evaluation campaigns around key challenges in simultaneous and consecutive translation, under real-time/low latency or offline conditions and under low-resource or multilingual constraints. System descriptions and results from participants’ systems and scientific papers related to key algorithmic advances and best practices are presented. IWSLT is the venue of the SIGSLTs, the Special Interest Group on Spoken Language Translation of ACL, ISCA and ELRA. With a track record of 20 years, IWSLT benchmarks and proceedings serve as reference for all researchers and practitioners working on speech translation and related fields. The 21st edition of IWSLT <https://iwslt.org/2024/> will be run as an *ELRA/ACL* event and co-located with ACL 2024 <https://2024.aclweb.org/> on August 15-16, 2024. It will be run as a hybrid event. Important Dates January 15, 2024: Release of shared task training and dev data April 01-15, 2024: Evaluation period April 29, 2024: Paper submission due (all papers) June 4, 2024: Notification of acceptance June 24, 2024: Camera-ready paper due July 22, 2024: Pre-recorded video due August 15-16, 2024: Conference Evaluation The IWSLT 2024 features shared tasks <https://iwslt.org/2024/#shared-tasks> that address the following focus areas: - Speech-to-speech track - Simultaneous track - Subtitling track - Offline track - Dubbing track - Low-resource track - Indic track Training, development and test data for each shared task will be prepared and released by the respective organizers (for further information on this initiative, please refer to the website <https://iwslt.org/2024/>). Participants will receive instructions about how to submit their runs. In addition, participants have the opportunity to present their work through a system paper that will be published in the ACL Proceedings. Conference IWSLT also invites submissions of scientific papers to be published in the ACL Proceedings and presented either in oral or poster format. The conference selects high-quality, original contributions on theoretical and practical issues of spoken language translation research, technologies and applications. For further information on this initiative, please refer to the website <https://iwslt.org/2024/#paper-submission> Contact Please send an email to iwslt-evaluation-campaign(a)googlegroups.com if you have any questions related to the shared tasks. Thanks, Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul (IWSLT organisers)

1 0

CfP: Corpus Linguistics at the 21st International Congress of Linguists (ICL), 8–14 September 2024, Poznan
by maciej.ogrodniczuk＠gmail.com 16 Jan '24

16 Jan '24

The International Congress of Linguists (ICL) is organized once every five years as the meeting place for international linguistics, where all areas and sub-disciplines of linguistics as well as interdisciplinary topics can be discussed. Its 21st edition (https://icl2024poznan.pl/) will be held from 8 to 14 September 2024 in Poznań and now invites abstracts for Sections, Focus streams, and Workshops. Call for Abstracts: Corpus Linguistics Focus stream 8 invites abstracts of papers that examine the methods and applications of corpus linguistics. Topics may include the design and construction of corpora, the analysis and interpretation of corpus data, the use of corpus tools and software, and the implications of corpus findings for various linguistic domains and disciplines. The focus stream also explores the challenges and opportunities of corpus linguistics in the era of big data, artificial intelligence, and natural language processing. Abstracts should clearly state the research question(s), approach, method, data, and (expected) results. They should not display the names of the presenters, nor their affiliations or addresses, or any other information that could reveal their authorship. They should contain the title, five keywords, and a text between 300 and 400 words (including examples, excluding references). Each abstract will be reviewed anonymously by two reviewers (section/focus stream/workshop convenor + external reviewer). Important dates Feb 1, 2024: (Extended) submission deadline (12.00 PM CET). Submission link: https://easychair.org/conferences/?conf=icl2024poznan Apr 15, 2024: Notification of acceptance. Sep 11, 2024: Focus stream date Presentations and posters Authors may apply, upon abstract submission, for a presentation or a poster. Presentations will be organized in 30 minute slots (20 min. presentation, 7 min. discussion, 3 min. room change). Posters are always displayed during one full day. Separate time slots will be included in the program in which participants can discuss with the poster presenters. Best regards – Maciej Ogrodniczuk Convenor of FS8: Corpus Linguistics at ICL 2024

1 0

2nd CfP 5th workshop on Resources for African Indigenous Language (RAIL) @ LREC-COLING
by Menno Van Zaanen 16 Jan '24

16 Jan '24

The fifth workshop on Resources for African Indigenous Language (RAIL) Colocated with LREC-COLING 2024 https://bit.ly/rail2024 Conference dates: 20-25 May 2024 Workshop date: 25 May 2024 Venue: Lingotto Conference Centre, Torino (Italy) The fifth RAIL workshop website: https://bit.ly/rail2024 LREC-COLING 2024 website: https://lrec-coling-2024.org/ Submission website: https://softconf.com/lrec-coling2024/rail2024/ The fifth Resources for African Indigenous Languages (RAIL) workshop will be co-located with LREC-COLING 2024 in Lingotto Conference Centre, Torino, Italy on 25 May 2024. The RAIL workshop is an interdisciplinary platform for researchers working on resources (data collections, tools, etc.) specifically targeted towards African indigenous languages. In particular, it aims to create the conditions for the emergence of a scientific community of practice that focuses on data, as well as computational linguistic tools specifically designed for or applied to indigenous languages found in Africa. Many African languages are under-resourced while only a few of them are somewhat better resourced. These languages often share interesting properties such as writing systems, or tone, making them different from most high-resourced languages. From a computational perspective, these languages lack enough corpora to undertake high level development of Human Language Technologies (HLT) and Natural Language Processing (NLP) tools, which in turn impedes the development of African languages in these areas. During previous workshops, it has become clear that the problems and solutions presented are not only applicable to African languages but are also relevant to many other low-resource languages. Because these languages share similar challenges, this workshop provides researchers with opportunities to work collaboratively on issues of language resource development and learn from each other. The RAIL workshop has several aims. First, the workshop brings together researchers who work on African indigenous languages, forming a community of practice for people working on indigenous languages. Second, the workshop aims to reveal currently unknown or unpublished existing resources (corpora, NLP tools, and applications), resulting in a better overview of the current state-of-the-art, and also allows for discussions on novel, desired resources for future research in this area. Third, it enhances sharing of knowledge on the development of low-resource languages. Finally, it enables discussions on how to improve the quality as well as availability of the resources. The workshop has “Creating resources for less-resourced languages” as its theme, but submissions on any topic related to properties of African indigenous languages (including non-African languages) may be accepted. Suggested topics include (but are not limited to) the following: * Digital representations of linguistic structures * Descriptions of corpora or other data sets of African indigenous languages * Building resources for (under resourced) African indigenous languages * Developing and using African indigenous languages in the digital age * Effectiveness of digital technologies for the development of African indigenous languages * Revealing unknown or unpublished existing resources for African indigenous languages * Developing desired resources for African indigenous languages * Improving quality, availability and accessibility of African indigenous language resources Submission requirements: We invite papers on original, unpublished work related to the topics of the workshop. Submissions, presenting completed work, may consist of up to eight (8) pages of content plus additional pages of references. The final camera-ready version of accepted long papers are allowed one additional page of content (up to 9 pages) so that reviewers’ feedback can be incorporated. Papers should be formatted according to the LREC- COLING style sheet (https://lrec-coling-2024.org/authors-kit/), which is provided on the LREC-COLING 2024 website (https://lrec-coling-2024.org/). Reviewing is double-blind, so make sure to anonymise your submission (e.g., do not provide author names, affiliations, project names, etc.) Limit the amount of self citations (anonymised citations should not be used). The RAIL workshop follows the LREC-COLING submission requirements. Please submit papers in PDF format to the START account (https://softconf.com/lrec-coling2024/rail2024/). Accepted papers will be published in proceedings linked to the LREC-COLING conference. Important dates: Submission deadline: 16 February 2024 Date of notification: 15 March 2024 Camera ready deadline: 29 March 2024 RAIL workshop: 25 May 2024 Organising Committee Rooweither Mabuya, South African Centre for Digital Language Resources (SADiLaR), South Africa Muzi Matfunjwa, South African Centre for Digital Language Resources (SADiLaR), South Africa Mmasibidi Setaka, South African Centre for Digital Language Resources (SADiLaR), South Africa Menno van Zaanen, South African Centre for Digital Language Resources (SADiLaR), South Africa -- Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za Professor in Digital Humanities South African Centre for Digital Language Resources https://www.sadilar.org ________________________________ NWU PRIVACY STATEMENT: http://www.nwu.ac.za/it/gov-man/disclaimer.html DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system. ________________________________

1 0

CfP Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING, Turin (Italy), Saturday 25th May 2024.
by fgaspari＠sslmit.unibo.it 16 Jan '24

16 Jan '24

[Apologies for cross-postings] CALL FOR PAPERS FOR THE SECOND INTERNATIONAL WORKSHOP TOWARDS DIGITAL LANGUAGE EQUALITY (TDLE): FOCUSING ON SUSTAINABILITY _ _ co-located with LREC-COLING 2024, Saturday 25th May 2024, Turin (Italy) _ _ https://european-language-equality.eu/tdle-2024/ 1 DESCRIPTION AND AIMS OF THE WORKSHOP The key aim of this half-day workshop co-located with LREC-COLING 2024 (https://lrec-coling-2024.org/), to be held in Turin (Italy) on Saturday 25th May 2024, is to discuss and promote the importance of sustainability in the design, development, creation, use, distribution and sharing of language data, resources, platforms, infrastructures, tools and technologies, with the intention of achieving Digital Language Equality (DLE). While some important work has recently addressed these crucial areas (e.g. Fort and Couillault, 2016; Hessenthaler et al., 2022; Ramesh et al., 2023; Castilho et al., forthcoming), the relevant contributions seem to be as yet unsystematic and relatively isolated. The workshop intends to provide an inclusive forum to encourage in-depth debate and facilitate collaborations to promote the sustainability of resources and technologies in any (combination of) languages, in support of multilingualism and of the overarching goal of DLE. _The sustainability of language resources and technologies is key to enabling multilingualism and digital language equality in the age of Artificial Intelligence._ 2 TOPICS OF INTEREST The _Second International Workshop_ _Towards Digital Language Equality (TDLE) _focuses on sustainability in relation to the design, development, creation, use, distribution and sharing of language data, resources, platforms, infrastructures, tools and technologies, with a view to promoting the broader goal of Digital Language Equality (DLE). The concept of DLE has been firmly established in relation to all languages of Europe (Rehm and Way, 2023), and has the potential to also benefit other languages throughout the world, to support the prosperity of the respective communities at a time of impressive - but as yet very unevenly distributed and severely imbalanced - progress in language-centric Artificial Intelligence (AI), e.g. through large language models (LLMs). The workshop places particular emphasis on multilingualism and on leveling up digital support for languages, domains and applications that have so far been underserved, and wishes to explore ways to develop policies and funding streams to work towards sustainability in connection with DLE, especially in support of regional, minority and territorial languages. To this end, recognizing that the sustainability of Language Resources and Technologies (LRTs) is key to enabling multilingualism and DLE in the age of AI, topics of particular interest for the workshop on which we invite original contributions covering any (combination of) languages include, but are not limited to, the following: * research on the factors affecting DLE and the sustainability of LRTs; * best practices, case studies and validated guidelines related to the design, implementation and improvement of sustainability of written, oral/spoken, signed and/or multimodal LRTs (including LLMs), particularly in support of DLE; * how multilingual LLM technology can support DLE; * retrospectively assessing the sustainability of legacy LRTs, and future-proofing new LRTs in the interest of DLE; * analyzing the costs and benefits of foregrounding sustainability for LRTs; * the role of metadata, accompanying documentation and licenses in showing and improving the sustainability of LRTs; * sustainability, fairness and accessibility (e.g. for users with physical or cognitive disabilities, limited computing resources and connectivity) of platforms and infrastructures hosting, distributing and sharing LRTs in the interest of DLE; * how current data and computing access inequality is affecting DLE (in particular regarding LLMs); * ecological sustainability and environmental fairness of developing and deploying state-of-the-art LRTs, e.g. LLMs with regard to energy consumption, global warming and climate change; * developing data and parameter efficient methods to train or adapt language models to new languages; * how to evaluate, measure, compare and improve the sustainability of LRTs; * establishing benchmarks and protocols to ensure the sustainability of LRTs; * how to avoid the potential dangers of developing and using _un_fair and _un_sustainable LRTs, e.g. for malicious, ill-intentioned or harmful purposes; * ethical, legal, cultural and/or socio-economic implications of (ignoring) fairness and sustainability of LRTs; * developing and implementing forward-looking policies to promote fairness and long-term sustainability of LRTs to achieve DLE; * education and training needs and experiences in relation to promoting fairness and sustainability of LRTs and ways to raise broad awareness of DLE and related topics, e.g. among the general public, policy- and decision-makers. Given this wide-ranging and inclusive remit, the workshop intends to bring together developers, creators, vendors, distributors, brokers, users, evaluators and researchers of written, oral/spoken, signed and/or multimodal LRTs in any (combination of) languages. 3 BACKGROUND AND FIRST TDLE WORKSHOP HELD IN 2022 The second 2024 edition of the workshop builds on the success of the first _Towards Digital Language Equality (TDLE) workshop_,[1] that was held at LREC 2022 in Marseille (France) on 20 June 2022, and whose accepted papers were published in a dedicated volume of proceedings, Aldabe et al. (2022).[2] Following this well-received inaugural workshop held in June 2022, the second event in the series will be co-located with LREC-COLING 2024 in Turin (Italy) on Saturday 25th May 2024, and will focus specifically on the highly relevant topic of the sustainability of LRTs in connection with multilingualism and DLE. 4 SUBMISSIONS Up-to-date information on the workshop, including materials for authors, guidelines, templates, stylesheet and key dates can be found at the dedicated website https://european-language-equality.eu/tdle-2024/. To contact the organizing committee of the workshop directly, you can email tdle2024.hitz(a)ehu.eus. Papers submitted to the workshop should be completely anonymous for double-blind peer review, written in English, and prepared using the official LREC-COLING 2024 author's kit and submission stylesheet/template available at https://lrec-coling-2024.org/authors-kit/. The submissions to the workshop should not exceed 8 pages, excluding references, and be saved in unprotected PDF format. Papers should be submitted no later than 23 February 2024 through the START submission management system available at https://softconf.com/lrec-coling2024/tdle2024/. The workshop seeks original papers, i.e. it does not accept submissions that have been, or will be, published elsewhere. The workshop allows simultaneous submissions, and in these cases the authors should clearly indicate in the manuscript to which other conference, workshop or venue they have submitted the paper for review. Each paper submitted to the workshop will receive three double-blind peer reviews. Papers accepted for presentation will be included in the proceedings of the workshop. In light of the LREC-COLING 2024 Map and the "Share your LRs!" initiative, when submitting their papers through the START system authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of their research. Moreover, ELRA encourages all LREC-COLING authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones). 5 KEY DATES Paper submission deadline: 23 February 2024 Notification of acceptance: 19 March 2024 Camera-ready papers due: 8 April 2024 Half-day workshop date: Saturday, 25th May 2024 6 WORKSHOP ORGANIZERS * Itziar Aldabe (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) * Begoña Altuna (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) * Aritz Farwell (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) * Federico Gaspari (University of Naples "Federico II", Italy & ADAPT Centre, Dublin City University, Ireland - co-chair) * Joss Moorkens (School of Applied Language & Intercultural Studies/ADAPT Centre, Dublin City University, Ireland - co-chair) * Stelios Piperidis (Institute of Language and Speech Processing, Athena Research and Innovation Center in Information, Communication and Knowledge Technologies, Greece) * Georg Rehm (Speech and Language Technology Lab, Deutsches Forschungszentrum für Künstliche Intelligenz, Germany) * German Rigau (HiTZ Basque Center for Language Technology - Ixa, University of the Basque Country, Spain) 7 PROGRAM COMMITTEE * Antonios Anastasopoulos (GMU, USA) * Anya Belz (ADAPT, DCU, Ireland) * Steven Bird (CDU, Australia) * Fred Blain (Uni. Tilburg, Netherlands) * Franco Cutugno (Uni. Naples "Federico II", Italy) * Bessie Dendrinos (NKUA, Greece & ECSPM, Denmark) * Félix do Carmo (Uni. Surrey, UK) * Annika Grützner-Zahn (DFKI, Germany) * Ana Guerberof-Arenas (Uni. Groningen, Netherlands) * Davyth Hicks (ELEN, Belgium) * Monja Jannet (ADAPT, DCU, Ireland) * John Judge (ADAPT, DCU, Ireland) * Dorothy Kenny (SALIS/CTTS/ADAPT, DCU, Ireland) * Sabine Kirchmeier (EFNIL, Luxembourg) * Teresa Lynn (MBZUAI, United Arab Emirates) * Maite Melero (BSC, Spain) * Helena Moniz (Uni. Lisbon, Portugal & EAMT) * Johanna Monti (UniOR, Italy) * Rachele Raus (UniBO, Italy) * Wessel Reijers (Uni. Paderborn, Germany) * Celia Rico Pérez (Universidad Complutense de Madrid, Spain) * Dimitar Shterionov (TU, Netherlands) * Carlos S. C. Teixeira (IOTA Localisation Services & Uni. Rovira i Virgili, Spain) * Antonio Toral ( Groningen, Netherlands) * Vincent Vandeghinste (Instituut voor de Nederlandse Taal, Netherlands & KU Leuven, Belgium) REFERENCES Itziar Aldabe, Begoña Altuna, Aritz Farwell and German Rigau, editors. 2022. _Proceedings of the Workshop Towards Digital Language Equality (TDLE)_ [1]. European Language Resources Association, Marseille, France. Sheila Castilho, Federico Gaspari, Joss Moorkens, Maja Popović and Antonio Toral, editors. Forthcoming. _Journal of Specialised Translation_ [2]. Special Issue n. 41 on "Translation Automation and Sustainability". Karën Fort and Alain Couillault, 2016. "Yes, We Care! Results of the Ethics and Natural Language Processing Surveys [3]". _Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)_ [4]. European Language Resources Association, Portorož, Slovenia. 1593-1600. Marius Hessenthaler, Emma Strubell, Dirk Hovy and Anne Lauscher, 2022. "Bridging Fairness and Environmental Sustainability in Natural Language Processing [5]". _Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing_ [6], Abu Dhabi, United Arab Emirates. 7817-7836. András Kornai, 2013. "Digital Language Death [7]". _PLoS ONE_, 8(10):e77056. Krithika Ramesh, Sunayana Sitaram and Monojit Choudhury, 2023. "Fairness in Language Models Beyond English: Gaps and Challenges [8]". _Findings of the Association for Computational Linguistics: EACL 2023_ [9]. Association for Computational Linguistics, Dubrovnik, Croatia. 2106-2119. Georg Rehm and Andy Way, editors. 2023. _European Language Equality: A Strategic Agenda for Digital Language Equality_ [10]. Berlin: Springer. [1] https://european-language-equality.eu/tdle-2022/ [2] www.lrec-conf.org/proceedings/lrec2022/workshops/TDLE/2022.tdle-1.0.pdf [11] Links: ------ [1] https://aclanthology.org/2022.tdle-1.pdf [2] https://www.jostrans.org/ [3] https://aclanthology.org/L16-1252.pdf [4] https://aclanthology.org/volumes/L16-1/ [5] https://aclanthology.org/2022.emnlp-main.533.pdf [6] https://aclanthology.org/volumes/2022.emnlp-main/ [7] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0077056 [8] https://aclanthology.org/2023.findings-eacl.157.pdf [9] https://aclanthology.org/2023.findings-eacl.pdf [10] https://link.springer.com/book/10.1007/978-3-031-28819-7 [11] http://www.lrec-conf.org/proceedings/lrec2022/workshops/TDLE/2022.tdle-1.0.…

1 0

[PhD-Position] Muiltimodal Fake-News and Disinformation Detection incl. xAI in DFKI Berlin
by Tim Polzehl 16 Jan '24

16 Jan '24

Job offer: Researcher for Multimodal Fake-News and Disinformation Detection at DFKI Berlin The German Research Center for Artificial Intelligence (DFKI) has operated as a non-profit, Public-Private-Partnership (PPP) since 1988. DFKI combines scientific excellence and commercially-oriented value creation with social awareness and is recognized as a major "Center of Excellence" by the international scientific community. In the field of artificial intelligence, DFKI as Germany’s biggest public and independent organisation dedicated to AI research and development, has focused on the goal of human-centric AI for more than 30 years. Research is committed to essential, future-oriented areas of application and socially relevant topics. We are looking for a highly motivated research assistant to join our existing team and work on a project focused on fake-news and disinformation detection from speech and multimedia data. Content authenticity verification of speech combined with other modalities like text, visuals or meta-data will be a center part. In any case, xAI and bias analysis are aspects of high relevance to the position as well. The successful candidate will work closely with high-impact partners in this field, e.g. Technical University of Berlin, RBB (Berlin TV and news broadcaster), Deutsche Welle (Germany's broadcaster abroad), and 5 other partners. Responsibilities will include developing and testing different AI/NLP models and techniques, analyzing the performance of machine learning models in the context of applicable fake-news and disinformation fighting for journalists, and communicating project progress and results to relevant stakeholders. The position offers opportunities for pursuing a doctorate and publishing research results in scientific journals and conferences. Qualified candidates will have a completed university degree in (technical) computer science or computational linguistics, excellent programming skills in Python, and a strong background in machine learning/AI and signal processing or NLP. Previous experience in the field of fake-news or spoofing / authenticity detection of multimedia data is an advantage. DFKI offers an agile and lively international and interdisciplinary environment for working in a self-determined manner. If you are interested in contributing to cutting-edge research and working with a dynamic team, please apply! More details and link: https://jobs.dfki.de/en/vacancy/researcher-m-f-d-547585.html Application deadline: Jan 23, 2024. In terms of questions please don’t hesitate to contact tim.polzehl(a)dfki.de<mailto:tim.polzehl@dfki.de> -- Dr.-Ing. Tim Polzehl Senior Researcher Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) German Research Center for Artificial Intelligence Speech & Language Technology Associate Senior Researcher Technische Universität Berlin Quality and Usability Lab DFKI Labor Berlin Alt-Moabit 91c, D-10559 Berlin, Germany Tel.: +49.30.238951863 Fax: +49 30 23895 1810 E-Mail tim.polzehl(a)dfki.de<mailto:tim.polzehl@dfki.de> ------------------------------------------------------------- Deutsches Forschungszentrum für Künstliche Intelligenz GmbH Trippstadter Straße 122, 67663 Kaiserslautern, Germany Geschäftsführung: Prof. Dr. Antonio Krüger (Vorsitzender) Helmut Ditzer Vorsitzender des Aufsichtsrats: Dr. Ferri Abolhassan Amtsgericht Kaiserslautern, HRB 2313 -------------------------------------------------------------

1 0

1st CfP: Special Session on Emergent Phenomena in Deep Representations and Large Language Models @IJCNN 2024 & IEEE WCCI 2024
by Ozge Alacam 16 Jan '24

16 Jan '24

Apologies for cross-posting ------------------------------------------------------ Dear colleagues, We invite you to submit to the special session on “Emergent Phenomena in Deep Representations and Large Language Models” as a part of IJCNN 2024 and IEEE WCCI 2024, which will be located in Yokohama, Japan. We are looking forward to your contributions. Please find the CfP below. Best wishes, On behalf of Organising Committee Özge Alacam ------------------------------------------------------ First Call for Papers: Special Session on Emergent Phenomena in Deep Representations and Large Language Models @IJCNN 2024 & IEEE WCCI 2024: Deep learning models trained on large datasets have shown spectacular performance in a wide range of tasks demonstrated by current applications of Large Language Models. However, recent works have shown that the abilities large machine learning models acquire often emerge unpredictably with increasing model complexity or training dataset size. These emergent phenomena include the unexpected appearance of abilities for which the model was not explicitly trained, but they might also be related to unexpected performance boosts due to the increased model complexity. Emergent phenomena are not always beneficial: larger models may pick up new biases from the training data or start hallucinating. To move towards increasingly sustainable, reliable, and explainable applications of AI systems, it is necessary to increase the understanding of the mechanisms surrounding emergent phenomena. Moreover, this effort provides increased insight into the learning process behind the acquisition of abilities of large models to perform specific tasks. Important research questions relate to the definition of emergent phenomena, their causes (what controls which abilities are acquired and when?), training efficiency, and training data quality (e.g., acquiring desired abilities with less computational effort), prompting strategies to get or test for desired model behaviour (e.g., a chain of thought), and further verification methods of model abilities and properties. The primary goal of this special session is (i) to discuss the emergent abilities and risks in deep neural networks and representations from very different angles and (ii) facilitate networking and encourage collaboration between various research fields that approach this issue from different perspectives, like computational linguistics, ethics in AI, computer science, physics, etc. Topics of interest include, but are not limited to: • The definition of emergence in the context of NLP and ML • Prompting strategies • Physics-based/inspired analyses (e.g. phase transitions in ML models) • Explainability and interpretability (XAI) • Evaluation measures for model ability, monitoring strategies, assessment of model abilities (e.g. technical or psychology-based) • Knowledge distillation, model pruning, energy-efficient models. • Mitigation strategies for emergent risks and model deterioration. • Fine-tuning and Retrieval-augmented generation (RAG) • Papers focusing on specific emergent phenomena (reasoning, creativity, double descent phenomena etc.) The website for the call for papers is accessible at https://sites.google.com/view/emergenn/call-for-papers Organising Committee: ------------------------------ • Dr. Özge Alacam (Ludwig-Maximilian University & Uni Bielefeld, Germany) • Dr. Michiel Straat (Uni Bielefeld, Germany) • Prof. Dr. Hinrich Schütze (Ludwig-Maximilian University, Germany) • Prof. Dr. Alessandro Sperduti (University of Padova, Italy) Important Dates: ------------------------------ • January 15, 2024 - Paper Submission Deadline • March 15, 2024 - Notification of Acceptance • May 1, 2024 - Camera-ready Deadline & Early Registration Deadline • June 30 - July 5, 2024 - Main Conference (IEEE WCCI 2024, Yokohama, Japan) * All deadlines are 11:59 PM UTC-12:00 ("anywhere on Earth") Submission Format and Platform: ------------------------------ • Submissions will be through the IEEE WCCI 2024 Submission page <https://edas.info/login.php?rurl=aHR0cHM6Ly9lZGFzLmluZm8vTjMxNjE0P2M9MzE2MT…>. • Each paper is limited to 8 pages, including figures, tables, and references. Please refer to the author guidelines provided by IEEE WCCI 2024 • Please specify during the submission that your paper is intended for the Special Session: Emergent Phenomena in Deep Representations and Large Language Models. • Special session webpage: https://sites.google.com/view/emergenn/call-for-papers • IEEE WCCI 2024 webpage: https://2024.ieeewcci.org/ Contact information: ------------------------------ • Özge Alacam : oezge.alacam(a)uni-bielefeld.de • Michiel Straat : mstraat(a)techfak.uni-bielefeld.de

1 1

CODI workshop: Call for direct submissions
by Chloé Braud 16 Jan '24

16 Jan '24

CODI, 5th Workshop on Computational Approaches to Discourse 2024-03-21 or 22 - EACL 2024 - Malta ** Direct Submission deadline: January 17th, 2024 ** Direct submission: We now open submissions for papers rejected at another main conference. Website link: https://sites.google.com/view/codi2024 CODI considers for publication papers rejected at one of the main conferences, authors will have to submit both the paper and the reviews as a supplemantary pdf file. If modifications have been made since the original submission, please submit an additional file describing briefly the modifications made. The organizers will decide on the acceptance of the papers based on the quality of the paper and its fit with the workshop. As a reminder, CODI also invites presentations of paper accepted at another main conference. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings. Please submit your workshop papers (category: "direct submission") at https://softconf.com/eacl2024/CODI-2024/

1 0

Job advertisement, UK Civil Service
by Paul Thompson 16 Jan '24

16 Jan '24

DSTL (Defence Science and Technology Laboratory, part of the UK Civil Service) is advertising for a computational/corpus linguist to work in their 'Behavioural and Social Science Group'. They are looking for a linguist who understands the potential for (and limitations of) computational approaches to discourse and who would be comfortable interacting with computer / data scientists. Details can be found at: https://www.civilservicejobs.service.gov.uk/csr/index.cgi?SID=b3duZXJ0eXBlP… Regards Paul Thompson = = = = = = = = = = = = = Dr Paul Thompson Reader in Applied Corpus Linguistics Co-Director, Centre for Corpus Research Head of Department, English Language and Linguistics University of Birmingham Birmingham B15 2TT, UK Editor-in-Chief, Applied Corpus Linguistics journal = = = = = = = = = = = = =

1 0

CfP: 6th International Workshop on Geospatial Linked Data at ESWC2024
by Beyza Yaman 16 Jan '24

16 Jan '24

Apologies for cross-posting! GeoLD2024: 6th International Workshop on Geospatial Linked Data Hersonissos, Greece, May 26-27, 2024 Conference website https://i3mainz.github.io/GeoLD2024/ Submission link https://easychair.org/conferences/?conf=geold2024 Submission deadline March 10, 2024 GeoLD2024 *6th International Workshop on Geospatial Linked Data* at ESWC 2024 <https://2024.eswc-conferences.org/> Geospatial data is vital for both traditional applications like navigation, logistics, and tourism and emerging areas like autonomous vehicles, smart buildings and GIS on demand. Spatial linked data has recently transitioned from experimental prototypes to national infrastructure. However the next generation of spatial knowledge graphs will integrate multiple spatial datasets with the large number of general datasets that contain some geospatial references (e.g., DBpedia, Wikidata). This integration, either on the public Web or within organizations has immense socio-economic as well as academic benefits. The upsurge in Linked data related presentations in the recent Eurogeographics data quality workshop shows the deep interest in Geospatial Linked Data (GLD) in national mapping agencies. GLD enables a web-based, interoperable geospatial infrastructure. This is especially relevant for delivering the INSPIRE directive in Europe. Moreover, geospatial information systems benefit from Linked Data principles in building the next generation of spatial data applications e.g., federated smart buildings, self-piloted vehicles, delivery drones or automated local authority services. This workshop invites papers covering the challenges and solutions for handling with GLD, especially for building high quality, adaptable, geospatial infrastructures and next-generation spatial applications. We aim to demonstrate the latest approaches and implementations and to discuss the solutions to challenges and issues arising from research and industrial organizations. The following topics of interest are covered by GeoLD2024. *Interoperability and Integration* - Geospatial Linked Data vocabularies and standards (GeoSPARQL, INSPIRE, W3C, OGC) - Extraction/transformation of Geospatial Linked Data from native geospatial data sources - Integration (schema mapping, interlinking, fusion) techniques for Geospatial RDF Data - Enrichment, quality and evolution of Linked Data with Geospatial information - Machine Learning improving Geospatial Linked Data processing - Natural Language Processing, especially Large Language Models for improving GLD processing *Big Geospatial Data Management* - Distributed solutions for Geospatial Linked Data management (storing, querying, mapping) - Algorithms and tools for large scale, scalable Geospatial Linked Data management - Efficient Indexing and Querying of Geospatial Linked Data - Geospatial-specific Reasoning on RDF Data - Ranking techniques on querying Geospatial RDF Data - Advanced querying capabilities on Geospatial RDF Data *Utilization of Geospatial Linked Data* - Benchmarking of Geospatial Linked Data applications - Geospatial Linked Data in social web platforms and applications - Geospatial linked data applications for indoor navigation - Visualization models/interfaces for browsing/authoring/querying Geospatial Linked Data - Real-world applications/use cases/paradigms using Geospatial Linked Data - Evaluation/comparison of tools/libraries/frameworks for Geospatial Linked Data - Data governance models for Geospatial Linked Data Submission Guidelines All papers must be original and not simultaneously submitted to another journal or conference. The following paper categories are welcome: - *Long papers (up to 12 pages)*: Presenting novel scientific research pertaining to geospatial Linked Data. - *Short papers (up to 6 pages)*: Position papers, System, Library, API and Dataset descriptions, relevant to the topics of interest. - *Demo/Tutorial papers (up to 4 pages)*: Describe a demo or hands-on tutorial of a tool on the workshop topics Organizing committee - Timo Homburg (i3mainz -- Institute for Spatial Information Surveying Technology, Mainz University Of Applied Sciences, Germany) - Dr. Beyza Yaman (ADAPT Centre, Trinity College Dublin, Ireland) - Dr. Mohamed Ahmed Sherif (University of Paderborn, Germany) - Prof. Dr. Axel-Cyrille Ngonga Ngomo (University Of Paderborn, Germany) Contact All questions about submissions should be emailed to Timo.Homburg(a)hs-mainz.de

1 0

Call for Abstracts: NLP STANDardization workshop in support of the EU AI Act, 29/01, Paris, France
by Timothée Bernard 15 Jan '24

15 Jan '24

STAND Workshop on Standardizing Tasks, meAsures and NLP Datasets https://stand4nlp.github.io/ Full-day workshop in Paris, France, January 29th 2024 (+ partial hybrid) Abstract submission deadline: January 24th 2024, but earlier submissions are welcome Scientific context: The current lack of standardized practices and definitions in NLP systems hinders the progress of the field. Indeed, there is not always consensus on which evaluation methods are meaningful and fruitful, or which of their implementations are to be used with which parameters (eg. SacreBLEU, Post 2018). In some cases, there is no general agreement on the very definition of a task. This situation calls for work on *standardizing* NLP practices. The International Organization for Standardization (ISO) has just created *a dedicated working group on NLP* (as a joint effort of the AI and Language committees), and *2 standards* are already under way. Topics under consideration by the ISO standardization committees include NLP terminology, evaluation metrics, interoperability, annotation guidelines, good practices in NLP development/evaluation/corpora, documentation. These topics are already heavily discussed in academia, and a number of informal guidelines have already been proposed. We believe that the creation of NLP standards can significantly benefit from the input of both NLP academics and industry NLP practitioners. Reciprocally, NLP researchers would benefit from getting involved in the standardization effort, thus ensuring that academia's views are listened to, in particular in the context of the *AI Act* (the European regulation on AI that has been finalized in December), whose enforcement will strongly rely on those standards. The STAND workshop is a research initiative whose goal is: - to foster discussion on existing standards, their creation and use - to assess the current needs of the community for standardization - to share experience on the impact on the research activities when lacking good practices - to collect existing good practices (and propose new ones) We invite contributions from NLP practitioners from both the industry and academia, as well as standardization experts. We invite two types of submission: * short abstract: 1 page * long abstract: 3 pages Accepted submissions will be presented as posters. Authors accepted in the long-abstract track will be invited to submit a full paper (5-10 pages) after the workshop. Topics for submissions include, but are not limited to: - Comparability and reproducibility of evaluation setup - Annotation guidelines - Evaluation metrics - Good practices for building, annotating and maintaining corpora - Good practices for system evaluation - Interoperability - Ethical guidelines - Guidelines for documenting corpora and models Submission instructions: - Submissions are expected in PDF form by email at stand4nlp(a)inria.fr - All submissions should be formatted using the ACL 2023 style files https://2023.aclweb.org/calls/style_and_formatting/. ============ PROGRAM AT A GLANCE: [09:00-10:00] Welcome, introduction to standardization, ongoing activities in NLP standardization, and the AI Act context [10:15-11:50] Academic keynote (*Joakim Nivre*) and invited talks (*Matt Post*, other speaker TBC) [11:50-13:30] Poster session (with boosters) & lunch [13:30-14:40] Industry keynote (speaker TBC) and invited talk (*Dirk Hovy*) [15:00-16:30] Moderator-led breakout discussions. Potential topics that will be discussed include: - [sharing / drafting] Standardizing good practices for evaluation - [sharing / drafting] Standardizing good practices for corpus management (collection, annotation, versioning) - [sharing / drafting] Standardizing evaluation metrics (definitions, implementation, sharing scripts) - [sharing / drafting] Standardizing annotation schemes (formats and guidelines) - [debate] Explainability and ethics in NLP: what needs for standards? - [debate] Comparing standardization needs with limitations of the state-of-the-art: how to bridge the gap? - [debate] Towards standardizing translations of technical terminology in NLP: how to organize i18n? [16:30-17:30] Reports from breakouts, definition of community-level actions & wrap-up. Example outcomes that are envisioned include: - Collection and drafting of existing good practices - Preparation of a joint submission for a position paper - Creation of common repositories for evaluation scripts, corpus documentation Participants to the workshop will be offered the opportunity to attend a standardization committee's meeting, which has been scheduled for the day after the workshop (January 30th). The outputs of that meeting will be used in direct support of the AI Act. Remote access will be offered for part of the workshop only. In-person participation is recommended if possible. Posters will be in-person only. IMPORTANT DATES: Abstract submission: Anytime by January 24 Notification of acceptance: Within a few days of submission Workshop: January 29 Standardization committee meeting: January 30 ORGANISING COMMITTEE: Lauriane Aufrant, Timothée Bernard, Maximin Coavoux, Yoann Dupont, Arnaud Ferré, Taras Holoyad, Rania Wazir MORE INFORMATION For the latest information see the workshop page at https://stand4nlp.github.io/; for any questions contact stand4nlp(a)inria.fr.

1 0

2026

2025

2024

2023

2022

Corpora