- Corpora - ELRA lists

Invitation to EACL-SRW 2023 Career Panel Discussion
by Elisa Bassignana 28 Apr '23

28 Apr '23

Dear all, The Student Research Workshop at EACL 2023 is organizing a career panel discussion on May 4th at 11:15-12:45 CEST. We gathered a great set of panelists at different levels of seniority and from different affiliation types (see below). We invite students (but not limited to) attending EACL to take part in the panel where there will be room for questions from the audience. The SRW EACL 2023 panelists will include: Saif M. Mohammad Dr. Saif M. Mohammad is a Senior Research Scientist at the National Research Council Canada (NRC). He received his Ph.D. in Computer Science from the University of Toronto. Before joining NRC, he was a Research Associate at the Institute of Advanced Computer Studies at the University of Maryland, College Park. His research interests are in Natural Language Processing (NLP), especially Lexical Semantics, Emotions and Language, Computational Creativity, AI Ethics, NLP for psychology, and Computational Social Science. He is currently an associate editor for Computational Linguistics, JAIR, and TACL, and Senior Area Chair for ACL Rolling Review. Joakim Nivre Joakim Nivre is Professor of Computational Linguistics at Uppsala University and Senior Researcher at RISE (Research Institutes of Sweden). He holds a Ph.D. in General Linguistics from the University of Gothenburg and a Ph.D. in Computer Science from Växjö University. His research focuses on data-driven methods for natural language processing, in particular for morphosyntactic and semantic analysis. He is one of the main developers of the transition-based approach to syntactic dependency parsing, described in his 2006 book Inductive Dependency Parsing and implemented in the widely used MaltParser system, and one of the founders of the Universal Dependencies project, which aims to develop cross-linguistically consistent treebank annotation for many languages and currently involves over 130 languages and over 500 researchers around the world. He has produced nearly 300 scientific publications and has over 22,000 citations according to Google Scholar (April, 2023). He is a fellow of the Association for Computational Linguistics and was the president of the association in 2017. Ana Marasović Ana Marasović is an Assistant Professor in the Kahlert School of Computing at the University of Utah. Her primary research interests are at the confluence of NLP, explainable AI, and multimodality. She aims to rigorously validate AI technologies and make human interaction with AI more intuitive. She was a Young Investigator at the Allen Institute for AI from 2019–2022. During that time, she also had a courtesy appointment in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. She obtained her PhD in 2019 from Heidelberg University. She received Best Paper Honorable Mention at ACL 2020 and Best Paper Award at SoCal 2022 NLP Symposium. Christos Christodoulopoulos Christos Christodoulopoulos is a Senior Applied Scientist at Amazon Research Cambridge, working on knowledge extraction and verification. He got his PhD at the University of Edinburgh, where he studied the underlying structure of syntactic categories across languages. Before joining Amazon, he was a postdoctoral researcher at the University of Illinois working on semantic role labeling and psycholinguistic models of language acquisition. He has been a co-organiser of the FEVER workshops, an area chair for various ACL conferences, and the general chair for the 2021 Truth and Trust Online conference. André Martins André Martins (PhD 2012, Carnegie Mellon University and University of Lisbon) is an Associate Professor at Instituto Superior Técnico, University of Lisbon, researcher at Instituto de Telecomunicações, and the VP of AI Research at Unbabel. His research, funded by a ERC Starting Grant (DeepSPIN) and other grants (P2020 project Unbabel4EU and CMU-Portugal project MAIA) include machine translation, quality estimation, structure and interpretability in deep learning systems for NLP. His work has received best paper awards at ACL 2009 (long paper) and ACL 2019 (system demonstration paper). He co-founded and co-organizes the Lisbon Machine Learning School (LxMLS), and he is a Fellow of the ELLIS society. Student Research Workshop Co-Chairs Elisa Bassignana, IT University of Copenhagen Matthias Lindemann, University of Edinburgh Alban Petit, University of Paris-Saclay Student Research Workshop Faculty Advisors Valerio Basile, University of Turin Natalie Schluter, Apple and IT University of Copenhagen Contact The organizers of the workshop can be contacted by email at eacl.srw23(a)gmail.com More details can be found at https://sites.google.com/view/eacl2023srw

1 0

[Touché@CLEF 2023 Deadline Extension] Intra-Multilingual Stance Classification in Online Debates (apologies for cross-posting)
by valbarrierepro＠gmail.com 27 Apr '23

27 Apr '23

News: The submission deadline has been extended to the 12th of May We invite you to participate in our multilingual stance classification shared task, as part of the Touché Lab, which will be held in conjunction with the CLEF'23 conference in Thessaloniki, Greece [1]. Context: Participatory Democracy at the scale of a continent like Europe brings many difficulties due to the high diversity of languages and cultures. At the same time, Machine Learning is an interesting tool for stance recognition in a large-scale context, in terms of data size, but also regarding the topics and themes addressed or the languages employed by the participants. Public consultations of citizens using Online Participatory Democracy platforms offer this kind of setting and are good use cases for automatic stance recognition systems. In the context of the Touché Lab at CLEF 2023 [2], we are proposing a shared task on data coming from the platform used during the Conference for the Future of Europe [2] which was inaugurated in 2021, where users can submit proposals and comment over them in any of the 24 official EU languages. A particularity of this platform is the use of a Machine Translation system in order to give the possibility to the users to interact between each others in their native languages, leading to what we call Intra-Multilingual data: pairs of proposal and comment in different languages. [1] https://clef2023.clef-initiative.eu/ [2] https://touche.webis.de/ [3] https://futureu.europa.eu/ Tasks: Given a proposal on a socially important issue, the task is to classify whether a comment is in favor, against, or neutral towards the proposal. Subtask1: Cross-debate Stance Classification. Subtask2: All-data-available Classification Learn more about this and other argumentation- and causality-related tasks at https://touche.webis.de/ Data available at https://touche.webis.de/clef23/touche23-web/multilingual-stance-classificat… Register via the CLEF website: https://clef2023-labs-registration.dei.unipd.it/ ------------------------------------------------------------------------------- Important Dates ------------------------------------------------------------------------------- Now open: Registration Jan. 15, 2023: Development data available May 10, 2023: Test data available May 12, 2023: Approaches submission on the test data June 5, 2023: Participant paper submission July 7, 2023: Camera-ready participant papers submission Sep. 18-21, 2023: Conference One of the conference days: Touché Workshop on Argument and Causal Retrieval ------------------------------------------------------------------------------- Special Announcements ------------------------------------------------------------------------------- Touché Open Source Proceedings Touché will host a collection of software developed by participants at GitHub. The Touché team invite you to publish your software too and invite software submissions using TIRA [ https://www.tira.io/ ]. In case of questions / suggestions / etc., please reach us at touche(a)webis.de. Best regards, CoFE Team @ Touché

1 0

PAN 2023 Shared Tasks on Authorship Verification, Author Profiling, Multi-Author Analysis, and Trigger Detection.
by Matti Wiegmann 27 Apr '23

27 Apr '23

------------------------------------------------------------------------------- PAN 2023: 2nd Call for Participation ------------------------------------------------------------------------------- PAN is a series of shared tasks on authorship analysis, computational ethics, and originality. PAN 2023 will be held in conjunction with the CLEF conference in Thessaloniki. We'd like to invite you to participate in the following shared tasks: 1. Cross-Discourse Type Authorship Verification Given two texts from written and oral Discourse Types, determine if they are written by the same author. 2. Profiling Cryptocurrency Influencers with Few-shot Learning Given a small set of tweets, determine the interest and intent of an influencer. 3. Multi-Author Writing Style Analysis Given a document, determine at which positions the author changes. 4. Trigger Detection Given a document, assign all appropriate trigger warning labels. Find out more at: pan.webis.de/clef23/pan23-web<http://pan.webis.de/clef23/pan23-web> ------------------------------------------------------------------------------- Important Dates (tentative) ------------------------------------------------------------------------------- Registration is open now. May 10, 2023 - Early bird software submission (optional) May 29, 2023 - Software submission June 05, 2023 - Participant paper submission June 23, 2023 - Peer review notification July 07, 2023 - Camera-ready participant papers submission Sep 18-21, 2023 - Conference

1 0

CfP "Life Narrative and the Digital", 26-27 September 2023, ACDH-CH, Vienna
by Dimitra Grigoriou 27 Apr '23

27 Apr '23

Dear colleagues, We would like to draw your attention to the following event and would be grateful if you could help us spread the word: Life Narrative and the Digital: An Interdisciplinary Conference and Workshop 26-27 September 2023 Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-CH) Austrian Academy of Sciences, Vienna CfP open until: 26 May 2023 https://digital-bio-2023.acdh.oeaw.ac.at<https://digital-bio-2023.acdh.oeaw.ac.at/> This two-day conference-plus-workshop brings together scholars and practitioners from different disciplines, communities, and career stages to explore the possibilities, uses, and challenges of digital methods and technologies for auto/biographical research and practice. We are particularly interested in the following questions: * In what ways can digital methods and technologies aid the study and analysis of biographical data? * How can the digital help us devise innovative pathways to the representation of historical individuals’ lives? (e.g. digital platforms) * To what extent do digital formats of life narration tie in with new trends in auto/biographical scholarship and practice? (e.g. metabiography, relational biography, persona studies, group biography, object biography, etc.) * How do we deal with uncertainty and the issue of data quality in the digital representation of biographical data? The event will feature both a workshop and a conference track. The workshop (26 September) will be dedicated to short presentations of work-in-progress, with a strong focus on tools, technologies, software, and methods, and with an emphasis on feedback and exchange. The conference (27 September) follows a conventional format, with a mix of research papers and panel discussions, and will be open to the public. Participation in both formats is free of charge. We invite proposals of max. 500 words via OpenReview (https://bit.ly/digital-bio-2023) for 15-minute (workshop) OR 20-minute (conference) contributions by 26 May 2023. For more information, please consult our conference website, or contact us at amp(a)oeaw.ac.at. Timo Frühwirth, Dimitra Grigoriou, Sandra Mayer (conference organisers) MMag.a Dimitra Grigoriou FWF Project 'A Digital Edition of W. H. Auden's Letters to Stella Musulin' (FWF P 33754) Austrian Centre for Digital Humanities and Cultural Heritage Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences Bäckerstraße 13, 1010 Wien, Österreich | Vienna, Austria T: +43 1 51581-2231 dimitra.grigoriou(a)oeaw.ac.at | www.oeaw.ac.at/acdh/projects/auden-musulin-papers<http://www.oeaw.ac.at/acdh/projects/auden-musulin-papers> Follow us on Twitter.com/AMP_OEaW<https://twitter.com/oeaw> Like us on Facebook.com/oeaw.at<https://www.facebook.com/oeaw.at/> Find us on Instagram.com/oeaw.at<https://www.instagram.com/oeaw.at/> Visit us on Youtube.com/c/oeawvideo<https://www.youtube.com/channel/UCY3rUdfN-VCjvfUWojWkayQ>

1 0

Open RA/PhD Funded Positions in Conversational AI ( U. Trento Italy )
by Giuseppe Riccardi 27 Apr '23

27 Apr '23

Dear Candidate, At the Signals and Interactive Systems Lab (University of Trento, Italy) we are looking for highly motivated and talented graduate students to join our research team and work on Conversational Artificial Intelligence. Conversational Artificial Intelligence includes the following research areas: - Natural Language Processing - Dialogue Modeling and Systems - Affective Computing The SIS Lab has been training intelligent machines and evaluating AI-based systems in the last three decades in many industry sectors from fintech to health. The lab research team is interdisciplinary and attracts researchers from computational linguistics, psychology, applied math, biomedical and electrical engineering and computer science. Research projects and demos can be found at the lab website : http://sisl.disi.unitn.it <http://sisl.disi.unitn.it/> . The candidates should have strong background, past achievement records in the areas of Conversational Research and/or Engineering. The official language (research and graduate teaching) of the department is English. AVAILABLE POSITIONS • Six months funded research fellowships: approximately 1600 Euro/month gross amount . • Three-year funded Phd fellowships: approximately 1600 Euro/month gross amount . For more information about cost of living in the area please visit the website :https://iecs.unitn.it/prospective-student <https://iecs.unitn.it/prospective-student> DEADLINES Openings with start date as early as June 2023. Positions open until filled. REQUIREMENTS MANDATORY ( for both positions ) - Master degree in Computer Science, Electrical Engineering, Computational Linguistics, Machine Learning or similar or related disciplines. - Proficiency in Machine Learning - Excellent academic records - Excellent programming skills - Excellent command of oral and written English - Good knowledge of most of the following: experimental design methodology and statistics, natural language processing , machine learning methods - Excellent team-work skills HOW TO APPLY Interested applicants should mention the position they are applying and send their CV to: Email: sisl-jobs(a)disi.unitn.it <mailto:sisl-jobs@disi.unitn.it> For more info: The Signals and Interactive Systems Lab: http://sisl.disi.unitn.it <http://sisl.disi.unitn.it/> The PhD School: https://iecs.unitn.it <https://iecs.unitn.it/> The Department Information Engineering and Computer Science Department @ University of Trento: www.disi.unitn.it <http://www.disi.unitn.it/>

1 0

Webminar by Martin Cooke (Ikerbasque – Basque Foundation for Science)
by HiTZ zentroa 27 Apr '23

27 Apr '23

**** We apologize for the multiple copies of this email. In case you are already registered to the next webinar, you do not need to register again. **** Dear colleague, We are happy to announce the next webinar in the Language Technology webinar series organized by the HiTZ research center (Basque Center for Language Technology, http://hitz.eus). We are organizing one seminar every month. You can check the videos of previous webinars and the schedule for upcoming webinars here: http://www.hitz.eus/webinars Next webinar: * *Speaker*: Martin Cooke (Ikerbasque – Basque Foundation for Science) * *Title*: Who needs big data? Listeners' adaptation to extreme forms of variability in speech * *Date*: May 4, 2023, 15:00 CET * *Summary*: No theory of speech perception can be considered complete without an explanation of how listeners are able to extract meaning from severely degraded forms of speech. Starting with a brief overview of a century of research which has seen the development of many types of distorted speech, followed by some anecdotal evidence that automatic speech recognisers still have some way to go to match listeners' performance in this area, I will describe the outcome of one recent [1] and several ongoing studies into the detailed time course of a listener's response to distorted speech. These studies variously consider the rapidity of adaptation, whether adaptation can only proceed if words are recognised, the degree to which the response to one form of distortion is conditioned on prior experience with other forms, and the nature of adaptation in a language other than one's own native tongue. Taken together, findings from these experiments suggest that listeners are capable of continuous and extremely rapid adaptation to novel forms of speech that differ greatly from the type of input that makes up the vast bulk of their listening experience. It is an open question as to whether big-data-based automatic speech recognition can offer a similar degree of flexibility. [1] Cooke, M, Scharenborg, O and Meyer, B (2022). The time course of adaptation to distorted speech. J. Acoust. Soc. Am. 151, 2636-2646. 10.1121/10.0010235 *Bio:* Martin Cooke is Ikerbasque Research Professor. After starting his career in the UK National Physical Laboratory, he worked at the University of Sheffield for 26 years before taking up his current position. His research has focused on analysing the computational auditory scene, devising algorithms for robust automatic speech recognition and investigating human speech perception. His interests also include the effects of noise on talkers as well as listeners, and second language listening in noise. # *Upcoming webinars*: * Pascale Fung (June 1) Check past and upcoming webinars at the following url: http://www.hitz.eus/webinars If you are interested in participating, please complete this registration form: http://www.hitz.eus/webinar_izenematea If you cannot attend this seminar, but you want to be informed of the following HiTZ webinars, please complete this registration form instead: http://www.hitz.eus/webinar_info Best wishes, HiTZ Zentroa

1 0

Research Assistant for Medical Data Anonymization at TU Berlin
by Roland Roller 26 Apr '23

26 Apr '23

We are looking for a highly motivated research assistant to work on a BMBF-funded project focused on anonymization techniques for semi-structured, longitudinal patient data. The successful candidate will work closely with partners at the German Research Center for Artificial Intelligence (DFKI) and the Charité - Universitätsmedizin Berlin. Responsibilities will include developing and testing different anonymization techniques, analyzing the performance of machine learning models in the context of anonymization, and communicating project progress and results to relevant stakeholders. The position offers opportunities for pursuing a doctorate and publishing research results in scientific journals and conferences. Qualified candidates will have a completed university degree in computer science or computational linguistics, excellent programming skills in Python, and a strong background in machine learning/NLP. Previous experience in the field of anonymization and/or synthesization of data is an advantage. The Quality and Usability Lab offers an agile and lively international and interdisciplinary environment for working in a self-determined manner. If you are interested in contributing to cutting-edge research and working with a dynamic team, please apply! More information can be found here: https://www.jobs.tu-berlin.de/stellenausschreibungen/164717?language=en If you have got questions, do not hesitate to contact me. ------ Dr. Roland Roller Senior Researcher DFKI Lab Berlin, Alt Moabit 91c, D-10559 Berlin, Germany Phone +49 30 23895 1847 Email: roland.roller(a)dfki.de

1 0

CMC-Corpora 2023: Final CfP and extended submission deadline 21 May 2023
by cotgrove＠ids-mannheim.de 26 Apr '23

26 Apr '23

3rd and Final Call for Papers *International Conference on CMC and Social Media Corpora for the Humanities* 14–15th September 2023, University of Mannheim, Germany The 10th International Conference on Ccomputer-mediated Communication and Social Media Corpora for the Humanities (CMC-Corpora) will be held at the University of Mannheim, Germany in collaboration with the Leibniz Institute for the German Language (IDS). Specialized corpora of the language of CMC and social media are increasingly vital for the analysis of the “unparalleled and rapidly evolving diversity in terms of speakers and settings” in digital contexts, as well as of “language evolution seen through the lens of user-generated content, which gives access to a number of variants, socio- and idiolects” (Barbaresi 2019: 29-30). The conference brings together language-centered research on CMC and social media in linguistics, philologies, communication sciences, media, and social sciences with research questions from the fields of corpus and computational linguistics, language technology, text technology, and machine learning. It features research in which computational methods and tools are used for language-centered empirical analysis of CMC and social media phenomena as well as research on building, processing, annotating, representing, and exploiting CMC and social media corpora, including their integration in digital research infrastructures. We adhere to a wide definition of CMC and Social Media, covering various media of digital communication, including email, newsgroups, forums, chat and messenger applications (e.g. WhatsApp), social networks (Facebook, Instagram), gaming platforms, as well as interactions in the communication areas of video portals (YouTube), learning platforms, gaming apps, online games and virtual worlds. We invite submissions on CMC-related topics, including but not limited to: * Development of CMC corpora / social media corpora * Building CMC corpora: from data collection to publication * Open access data for CMC research: ethical and GDPR issues * Annotating CMC data: genres, linguistic aspects, metadata * Multimodal corpora * Big data corpora * Legal issues concerning the sampling, distribution and (long-term) archiving of social media data * Analysis of CMC corpora / social media corpora * Sociolinguistic studies of CMC * Discourse analysis of CMC * Linguistic characteristics of CMC * Multimodal (incl. visual) aspects of CMC * Multilingualism and code-switching in CMC * CMC in language education * Natural language processing (NLP) of CMC data / social media data * Normalization * PoS tagging * Anonymisation and Pseudonymisation * Lemmatization * Syntactic parsing * Semantic Annotation ============================ *Confirmed keynote speakers* ============================ Unn Røyneland, University of Oslo Tatjana Scheffler, Ruhr-Universität Bochum ================= *Important Dates* ================= * Abstract submission: Extended deadline 21 May, 23:59 CEST * Notification of acceptance: Friday, 30 June 2023, 23:59 CEST * Deadline revised abstract submission: Sunday, 6 August 2023, 23:59 CEST * Deadline registration for participation: Sunday, 20 August 2023, 23:59 CEST * Arrival, Get-together: Wednesday, 13 September 2023 * Conference: Thursday 14 - Friday 15 September 2023 ============ *Submission* ============ We invite submissions for talks and for posters or software/corpus demonstrations on any topic relevant to the list of themes mentioned above. We invite two types of submissions: * short papers (2-4 pages including references, following the existing template) for oral presentations * abstracts (max. 300 words) for poster presentations Each paper and abstract will be double blind peer reviewed by two or three members of the scientific committee. Authors of accepted papers can present their work at the conference (30 minute time slots: 20 minute talks, followed by 10 minutes of discussion). Authors of accepted abstracts can present their work in progress, early-stage research, software/corpus demonstrations during the poster session. At the start of the conference, all accepted papers will be made available in online proceedings. After the conference, speakers with the best short papers will be invited to submit extended papers for a special issue journal or a volume publication. *Instructions for authors* All submissions have to be written in English and have to be anonymised. The short papers for oral presentations should not exceed 4 pages and the paper format should adhere to the template which you can download from the links below. The abstracts for poster presentations should not exceed 300 words, bibliographical references not included. All contributions will be collected through the online platform EasyChair under the link https://easychair.org/conferences/?conf=cmc2023). (If you do not have an EasyChair account, you need to create one first.) Template for MSWord (40 kB): https://www.uni-mannheim.de/media/Lehrstuehle/phil/deutsche_philologie/LS_G… Template for LaTeX (260 kB): https://www.uni-mannheim.de/media/Lehrstuehle/phil/deutsche_philologie/LS_G… For all enquiries, please contact the organizers at cmc-corpora2023(a)uni-mannheim.de We look forward to seeing you there! The organizing committee: Jutta Bopp, Louis Cotgrove, Laura Herzberg, Harald Lüngen, Andreas Witt Conference website: https://www.uni-mannheim.de/cmc-corpora2023/ ====================== *Scientific Committee* ====================== * Paul Baker (Lancaster University) * Adrien Barbaresi (Berlin-Brandenburgische Akademie der Wissenschaften) * Michael Beißwenger (University of Duisburg-Essen) * Mario Cal-Varela (Universidade de Santiago de Compostela) * Steven Coats (University of Oulu) * Luna DeBruyne (Ghent University) * Orphée DeClercq (Ghent University) * Francisco-Javier Fernández-Polo (University of Santiago de Compostela) * Jenny Frey (European Academy of Bozen) * Alexandra Georgakopoulou-Nunes (King's College London) * Klaus Geyer (University of Southern Denmark) * Aivars Glaznieks (Eurac Research Bolzano) * Claire Hardaker (Lancaster University) * Iris Hendrickx (Radboud University Nijmegen) * Axel Herold (Berlin-Brandenburgische Akademie der Wissenschaften) * Lisa Hilte (University of Antwerp) * Mai Hodac (Université Toulouse) * Wolfgang Imo (University of Hamburg) * Pawel Kamocki (IDS Mannheim) * Erik-Tjong Kim-Sang (Netherlands eScience Center) * Alexander Koenig (CLARIN ERIC) * Florian Kunneman (Vrije Universiteit Amsterdam) * Marc Kupietz (IDS Mannheim) * Els Lefever (Ghent University) * Julien Longhi (Cergy Paris Université) * Maja Miličević-Petrović (University of Bologna) * Nelleke Oostdijk (Radboud University) * Celine Poudat (Université Côte d'Azur) * Thomas Proisl (Friedrich-Alexander-Universität Erlangen-Nürnberg) * Ines Rehbein (University of Mannheim) * Sebastian Reimann (Ruhr-Universität Bochum) * Unn Røyneland (University of Oslo) * Müge Satar (Newcastle University) * Tatjana Scheffler (Ruhr-Universität Bochum) * Stefania Spina (Università per Stranieri di Perugia) * Egon Stemle (Eurac Research) * Caroline Tagg (The Open University) * Simone Ueberwasser (University of Zurich) * Lieke Verheijen (Radboud University)

1 0

3rd CFP and deadline extension: CMLC-11 at Corpus Linguistics 2023
by Piotr Banski 26 Apr '23

26 Apr '23

11^th Workshop on the Challenges in the Management of Large Corpora (CMLC) The next meeting of CMLC will be held as part ofCorpus Linguistics 2023 <https://wp.lancs.ac.uk/cl2023/> in Lancaster, UK, on the 2^nd of July, 2023. See https://corpora.ids-mannheim.de/cmlc-2023.html for up-to-date information. Important dates * Deadline for abstract submission: the 3^rd of May 2023 (Wednesday, 23:59 UTC) * Notification of acceptance: the 19^th of May 2023 (Thursday) * Deadline for the submission of camera-ready papers: the 4^th of June 2023 (Sunday) * Meeting: Sunday, the 2nd of July 2023, 9.30-12.30 in George Fox LT2 (Lancaster University Campus) Abstract submission * We invite anonymised extended abstracts for/oral presentations/on the topics listed below (/ideally/using theACL-2023 templates <https://2023.aclweb.org/calls/style_and_formatting/>, or PDF, 750-1000 words excluding references, font preferably 11 pt, line spacing 1.5). * CMLC has always reserved a track for national corpus project reports, and to this end, we invite/poster proposals/of 500-750 words. National project reports need not be anonymised. Submissions are accepted through the EasyChair submission system, athttps://easychair.org/conferences/?conf=cmlc11. Please note that each CMLC event produces a volume of proceedings (published in Open Access before the meeting), where both oral and poster contributions have equal status./All/final submissions to the 2023 proceedings volume will be expected to be formatted according to theACLPUB guidelines <https://acl-org.github.io/ACLPUB/formatting.html>and to pass theaclpubcheck <https://github.com/acl-org/aclpubcheck>. Workshop description The upcoming CMLC meeting continues the successful series of “Challenges in the management of large corpora” events, previously hosted at LREC (since 2012) and CL (since 2015) conferences. As in the previous meetings, we wish to explore common areas of interest across a range of issues in language resource management, corpus linguistics, natural language processing, and data science. Large textual datasets require careful design, collection, cleaning, encoding, annotation, storage, retrieval, and curation to be of use for a wide range of research questions and to users across a number of disciplines. A growing number of national and other very large corpora are being made available, many historical archives are being digitised, numerous publishing houses are opening their textual assets for text mining, and many billions of words can be quickly sourced from the web and online social media. A number of key themes and questions emerge of interest to the contributing research communities: (a) what can be done to deal with IPR and data protection issues? (b) what sampling techniques can we apply? (c) what quality issues should we be aware of? (d) what infrastructures and frameworks are being developed for the efficient storage, annotation, analysis and retrieval of large datasets? (e) what affordances do visualisation techniques offer for the exploratory analysis approaches of corpora? (f) what kinds of APIs or other means of access would make the corpus data as widely usable as possible without interfering with legal restrictions? (g) how to guarantee that corpus data remain available and usable in a sustainable way? Motivation and topics of interest This year’s event will cover the entire range of the standard CMLC themes, with some new additions: * New and hot topics o Language Models + What linguistic insights can we gain by post-hoc language model analysis in the age of ChatGPT? + How can we avoid the proliferation of stereotypes in terms of both linguistic surface form and content when using language models for linguistic analysis? o Societal and legal issues relevant for corpora and studies + political and sociological balance + social media bubbles, hate speech and fake news + proliferation of stereotypes via corpora and language models + corpora as archives of the past: evolution in mentalities or laws, personality rights o How to make corpora as accessible as possible despite big data issues, application heterogeneity, and IPR issues + What are the most interesting APIs and libraries to build, analyse and access very large corpora? + How can we get us researchers to use existing research tools, infrastructures, libraries and APIs in research and teaching? * Linguistic content challenges o Dealing with the variety of language resources: multilinguality, historical texts, noisy OCR texts, user-generated content, etc. o Integration of human computation (crowdsourcing) and automatic annotation o Quality management of annotations * Technical challenges o Storage and retrieval solutions for big textual data corpora: primary data, metadata, and annotation data o Scalable and efficient NLP tooling for annotating and analysing large datasets: distributed and GPGPU computing; using big data analysis frameworks for language processing o Dealing with streaming (e.g. Social Media) and rapidly changing underlying data * Exploitation challenges o Legal and privacy issues o Query languages, data models, and standardisation o Licensing models of open and closed data, coping with intellectual property restrictions o Innovative approaches for aggregation and visualisation of text analytics In the tradition of CMLC, we invite reports on national corpus initiatives; submitters of these reports should be prepared to present a poster along with a short presentation. Programme Committee Names are being added as Programme Committee members confirm their participation. * Laurence Anthony (Waseda University, Japan) * Vladimír Benko (Slovak Academy of Sciences) * Tomaž Erjavec (Jožef Stefan Institute, Ljubljana) * Stephanie Evert (Friedrich-Alexander-Universität Erlangen-Nürnberg) * Johannes Graën (University of Zurich, Switzerland) * Andrew Hardie (Lancaster University, UK) * Serge Heiden (ENS de Lyon) * Dawn Knight (Cardiff University) * Paweł Kamocki (IDS Mannheim) * Natalia Kotsyba (Samsung Poland) * Michal Křen (Charles University, Prague) * Paul Rayson (Lancaster University) * Martin Reynaert (Tilburg University) * Kevin Scannell (Saint-Louis University) * Marko Tadić (University of Zagreb, Faculty of Humanities and Social Sciences) Organising Committee Institut für Deutsche Sprache, Mannheim 📩 Piotr Bański,Marc Kupietz,Harald Lüngen Berlin-Brandenburg Academy of Sciences 📩 Adrien Barbaresi Institute of Computational Linguistics, University of Zurich Simon Clematide Homepage CMLC series homepage is located athttp://corpora.ids-mannheim.de/cmlc.html

1 0

(WANLP2023) Call for Task Proposals - The 1st Arabic Natural Language Processing Conference
by Salam Khalifa 26 Apr '23

26 Apr '23

Hello All, *** Apologies for Cross-Posting *** The First Arabic Natural Language Processing Conference (WANLP 2023) Co-located with EMNLP 2023 in Singapore. Conference URL: https://wanlp2023.sigarab.org/ We invite you to submit proposals for shared tasks to be run as part of WANLP 2023. WANLP 2023 will run as a conference for the first time. WANLP 2023 builds on seven previous workshop editions, which have been extremely successful drawing in a large active participation in various capacities. With the move to a conference format, we aim to bring a larger participation from the Arabic NLP community. The conference is organized by the Special Interest Group on Arabic NLP (SIGARAB), an Association for Computational Linguistics Special Interest Group on Arabic Natural Language Processing. Submission Details The proposals should provide an overview of the proposed task, motivation, data/resources (how the data will be collected), task description (what are the tasks to be included), evaluation (proposed evaluation method for each task), pilot run (if available), tentative timeline that matches the submission dates below, and task organizers (name, affiliation). Proposals (up to 4 pages) should be sent to: wanlp-shared-task-chair(a)sigarab.org Please use the ACL template files: https://2023.emnlp.org/calls/style-and-formatting/ Selection Process The proposals will be reviewed by the organizing committee and will be selected based on multiple factors such as the novelty of the task, the expected interest from the community, how convincing the data collection plans are, the soundness of the evaluation method, and the expected impact of the task. Task Organization Upon acceptance, the task organizers are expected to verify that the task organization and data delivery to participants are happening in a timely manner, provide the participants with all needed resources related to the task, create a mailing list and maintain communication and support to participants, create and manage CodaLab or similar competition website, manage submissions to CodaLab, write a task description paper, manage participants submissions of system description papers, and review and maintain the quality of submitted system description papers. Important Dates - May 7, 2023: submission of shared tasks proposals - May 14, 2023: notification of acceptance of shared tasks - September 5, 2023: conference paper & shared task papers due date - October 12, 2023: notification of acceptance - October 20, 2023: camera-ready papers due - Conference Date (one day): TBD (timeframe: December 6-10) All deadlines are 11:59 pm UTC -12h <https://www.timeanddate.com/time/zone/timezone/utc-12> (“Anywhere on Earth”). If you have any questions, please contact us at: wanlp-shared-task-chair(a)sigarab.org The WANLP 2023 Organizing Committee Best regards, WANLP publicity chairs: Salam Khalifa and Amr Keleg

1 0

2026

2025

2024

2023

2022

Corpora