May 2023 - Corpora - ELRA lists

CfP - MISDOOM 2023 5th Multidisciplinary International Symposium on Disinformation in Open Online Media
by Tommaso Caselli 15 May '23

15 May '23

***** apologies for multiple posting **** *MISDOOM 2023 - *5th Multidisciplinary International Symposium on Disinformation in Open Online Media The Multidisciplinary International Symposium on Disinformation in Open Online Media (MISDOOM) is returning for its 5th edition on 21 and 22 November 2023, hosted by the National Research Center for Mathematics and Computer Science (CWI), Amsterdam, Netherlands. MISDOOM values multidisciplinary research and is designed to be inclusive of different academic disciplines and practices. The symposium provides a platform for researchers, industry professionals, and practitioners from various disciplines such as communication science, computer science, computational social science, political science, psychology, journalism, and media studies to come together and share their knowledge and insights on online disinformation. Symposium Topics Participants can discuss and contribute to the following list of topics: - Cross-platform campaigns and their impact (e.g., diffusion of disinformation and manipulation, observations of campaigns and strategies, communication strategies, hate speech) - Approaches to studying misinformation (e.g., qualitative approaches, case studies, quantitative approaches, experiments) - User involvement with misinformation on various platforms (e.g., engagement, viewership) - Counter-measures for mis- and disinformation and manipulation (e.g., censorship policies, behavioral changes, education, trainings, professional codices, legal actions) - Factors contributing to misinformation beliefs or hampering corrections of false beliefs (e.g., political polarization, motivated reasoning, confirmation bias) - Trending topics in mis- and disinformation research - Automated fact-checking and misinformation detection - Models for misinformation diffusion - Human computation approaches for misinformation detection (crowdsourcing, human-machine interaction) - Information quality (information quality dimensions, metrics, ethics of information quality) - Generative AI tools and disinformation (e.g., ChatGPT, Midjourney, DALL-E) Industry Industries are also invited to participate in the conference by submitting a contribution describing their approach to countering or detecting misinformation. Submission Instructions Given that we welcome both social scientists and computer scientists, and that the publication strategies of these fields differ, we solicit two types of contributions that, upon acceptance, result in the same opportunity to present at MISDOOM: Full papers Full papers to be published with Springer LNCS proceedings. Up to 15 pages (including references) in Springer Lecture Notes in Computer Science (LNCS) format describing original unpublished and new research. The work should be structured like a research paper, and cover the context of the problem studied, the research question, approach/methodology, and results in 6 to 15 pages. It should be formatted according to the LNCS Word or LaTeX template. Such submissions will be judged based on scientific quality and relevance for the MISDOOM symposium. Extended Abstracts Authors can also choose to submit an Extended Abstract. The extended abstract should not exceed 500 words, excluding references, and can pertain to previously published work, ongoing projects, or new research ideas. There is no particular format for the extended abstract, but it must include the title, authors, their affiliation, the text of the abstract, and references, particularly if it involves previously published work. Submissions are not archival and are not formally published. Additionally, authors must submit a conference program abstract of no more than 150 words. Authors should add the suffix "(Extended Abstract)" to the title of their extended abstract submission. Important Note about Submissions Both contribution types (full papers and extended abstracts) must specify the discipline they are contributing to as keyword(s) in Easychair at the time of submission (they should enter at least one of the two keywords “computer science” or “social science” in the keyword box). Submission Link: https://easychair.org/conferences/?conf=misdoom2023 Important Dates Submission Deadline: 30 June 2023 Notification: 28 August 2023 Camera ready: 11 September 2023 Symposium: 21-22 November 2023 -- Tommaso Caselli, Ph.D. Senior Assistant Professor in Computational Semantics Faculty of Arts, Rijksuniversiteit Groningen The Netherlands ---------------------------- https://xs4all.academia.edu/TommasoCaselli https://www.researchgate.net/profile/Tommaso_Caselli Twitter: @tommaso_caselli

1 0

Job opportunity: Corpus/Computational Linguist at BCU/Renato Software Ltd.
by e.franklin＠senso.cloud 15 May '23

15 May '23

Dear all, (Apologies for cross-posting) I'm very pleased to announce that we (Renato Software Ltd. and Birmingham City University) are looking to hire a Corpus/Computational Linguist for an Innovate UK-funded Knowledge Transfer Partnership around online safeguarding for children in schools. Senso.cloud is a cloud-based classroom management tool that allows schools to monitor children's online behaviour and be aware of any potential safeguarding risks their activity might indicate. The role of the KTP Associate is to work with our Safeguarding and Development teams as well as academics at Birmingham City University to enhance this offering and help deliver even better protection to vulnerable students in school. Digital safeguarding saves lives. If you have completed a Master's or a PhD in Corpus/Computational Linguistics and you're for a job in industry with real meaning and impact, we'd love to hear from you! The closing date is Sunday, 11th June and interviews will take place on Monday, 19th June. If you have any questions, feel free to email Emma Franklin at e.franklin(a)senso.cloud. Please share with anyone who might be interested! https://jobs.bcu.ac.uk/vacancy.aspx?ref=052023-254

1 0

Final CFP: 1st Symposium on Challenges for Natural Language Processing (CNLPS'23)
by Marek Kubis 15 May '23

15 May '23

CALL FOR PAPERS 1st Symposium on Challenges for Natural Language Processing (CNLPS'23) Warsaw, Poland, 17-20 September, 2023 https://fedcsis.org/sessions/aaia/cnlps Organized within FedCSIS 2023 (IEEE: #57573) Strict submission deadline: May 23, 2023, 23:59:59 AOE (no extensions) KEY FACTS: Proceedings: submitted to IEEE Digital Library; indexing: DBLP, Scopus and Web of Science; 70 MEiN points Please feel free to forward this announcement to your colleagues and associates who could be interested in it. ********************* Statement concerning LLMs ********************* Recognizing developing issue that affects all academic disciplines, we would like to state that, in principle, papers that include text generated from a large-scale language model (LLM) are prohibited, unless the produced text is used within the experimental part of the work. ********************************************************************* Challenges for Natural Language Processing Symposium is a series of competitions oriented towards advancing human language technologies. The goal of the symposium is to evaluate natural language processing tools in demanding, non-obvious tasks that address multimodal problems, cross-lingual learning and processing of natural languages that are not widely represented in other evaluation campaigns. This year we invite all interested teams and individuals to participate in the following events: + PolEval Competition https://fedcsis.org/sessions/aaia/cnlps/poleval + Center for Artificial Intelligence Challenge on Conversational AI Correctness https://fedcsis.org/sessions/aaia/cnlps/caiccaic + Temporal Image Caption Retrieval Competition https://fedcsis.org/sessions/aaia/cnlps/ticrc More details about the competitions can be found in the linked subpages. Apart from the competitions, we also welcome submissions to the General Session that includes the topics listed below: * Corpora and Language Resources * Machine Learning in NLP * Speech Processing * Language Modeling * Language Generation * Conversational AI * Question Answering * Sentiment and Emotion Detection * Information Extraction Papers submitted for the General Session must comply with all standard FedCSIS requirements. For this session we only accept regular papers that describe new research contributions, present experiences encountered in practice or report on research topics worthy of immediate communication as explained on the page on paper categories. Submission rules: - Authors should submit their papers as Postscript, PDF or MSWord files. - The total length of a paper should not exceed 10 pages IEEE style (including tables, figures and references). IEEE style templates are available here. - Papers will be refereed and accepted on the basis of their scientific merit and relevance to the workshop. - Preprints containing accepted papers will be published on a USB memory stick provided to the FedCSIS participants. - Only papers presented at the conference will be published in Conference Proceedings and submitted for inclusion in the IEEE Xplore® database. - Conference proceedings will be published in a volume with ISBN, ISSN and DOI numbers and posted at the conference WWW site. - Conference proceedings will be submitted for indexation. - Organizers reserve right to move accepted papers between FedCSIS technical sessions. Important dates: + Paper submission (strict deadline): May 23, 2023, 23:59:59 (AoE; there will be no extension) + Position paper submission: June 7, 2023 + Author notification: July 11, 2023 + Final paper submission and registration: July 31, 2023 + Payment (early fee deadline): July 26, 2023 CNLPS is organized in collaboration with (within the framework of) Multi-task, Multilingual, Multi-modal Language Generation COST Action CA18231; https://multi3generation.eu/

1 0

4 year PhD position VU Amsterdam Specializing NLP through reinforcement learning
by Fokkens, A.S. (Antske) 15 May '23

15 May '23

4 YEAR PHD POSITION: SPecializing NLP through reinforcement learning (Hybrid intelligence) VU AMSTERDAM & Utrecht university Deadline: 31 May 2023 23:59 CET Do you have a Masters degree in Computational Linguistics or a related area? Are you interested in reinforcement learning and do you want to dive deeper in the behavior of current models and the kind of errors they make? Do you want to be part of a large consortium that addresses a variety of topics around hybrid intelligence? Do you want to work in a group that cares about core questions in NLP research and aims to provide a positive inspiring environment to young researchers? Then please consider applying for our fully funded 4-year PhD position at the Vrije Universiteit in Amsterdam. Please find out more & apply at: https://workingat.vu.nl/ad/phd-hybrid-intelligence-specializing-nlp-models-… (applications sent via email will not be processed) -- prof. dr. Antske Fokkens Computational Linguistics & Text Mining Lab, Vrije Universiteit Amsterdam Algorithms, Geometry & Applications, Eindhoven University of Technology

1 0

HealTAC 2023
by Bea Alex 15 May '23

15 May '23

We are delighted to announce the programme of the HealTAC 2023, the sixth annual meeting of the Healthcare text analytics community. In addition, the registration is now open! Programme page: http://healtex.org/healtac-2023/programme/ The registration site is here<https://estore.manchester.ac.uk/conferences-and-events/faculty-of-science-e…> . Dates: * Early bird registration ends on: 1st June * Pre-conference tutorials/workshops: 14 June * Conference: 15 & 16 June Keynotes: The keynotes this year full naturally focus on the impact and promises of large healthcare language models. We will hear from two experts that are involved in large centres that work with clinical free-text data in the UK and the US. Dr Angus Roberts, King's College London<https://www.kcl.ac.uk/people/angus-roberts> From regular expressions to pre-trained language models – 14 years of applying NLP at the Maudsley Biomedical Research Centre Abstract of the talk<http://healtex.org/healtac-2023/programme/> Bio of the speaker<http://healtex.org/healtac-2023/programme/> Dr Yonghui Wu, University of Florida<https://hobi.med.ufl.edu/profile/wu-yonghui/> Opportunities and Challenges of Conversational Artificial Intelligence and Large Language Models in Healthcare Abstract of the talk<http://healtex.org/healtac-2023/programme/> Bio of the speaker<http://healtex.org/healtac-2023/programme/> We are looking forward to seeing you all at HealTAC 2023. The HealTAC 2023 organising committee -------------------------------------------------------------- Dr. Beatrice Alex Senior Lecturer and Chancellor’s Fellow University of Edinburgh The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

1 0

Large Language Model for Portuguese released --- Albertina PT-*
by Antonio Branco 13 May '23

13 May '23

[trying to get around with formatting issues for the digest version of corpora list] [sending again: many apologies for repetition] Good morning, We are pleased to announce the release of Albertina PT-* This is the first large language model specifically for Portuguese, covering both variants PT-PT and PT-BR, publicly available and open source. With its 900 million parameters in this first version, its sets new state of the art for models specifically for Portuguese that are publicly available and open. It was developed at the University of Lisbon together with colleagues from the University of Porto, and can be obtained here: https://huggingface.co/models?other=albertina-pt* Its development is documented here: https://arxiv.org/abs/2305.06721 Best regards, On behalf of Albertina's team

1 0

Call for presentation of ACL Findings papers at IWSLT 2023
by Atul K. Ojha 13 May '23

13 May '23

Dear all, The IWSLT 2023 <https://iwslt.org/2023/> is happy to inform you that your Findings paper(s) <https://2023.aclweb.org/> that is accepted in ACL 2023 <https://2023.aclweb.org/> can be presented as a '*Poster*' at the IWSLT conference <https://iwslt.org/>. If you are an author of the ACL Findings paper <https://2023.aclweb.org/> pertaining to *speech translation* and would like to present your work at the IWSLT 2023 <https://iwslt.org/2023/>, please fill out *this form <https://docs.google.com/forms/d/e/1FAIpQLSeb--NTGBKMp3vk8m8RLKJljzMEoOy2yx2…>.* Many thanks, Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul (IWSLT organisers)

1 0

Fwd: The First Bangla Language Processing Workshop (BLP 2023) Co-located with EMNLP 2023 in Singapore
by Firoj Alam 12 May '23

12 May '23

***Apologies for Cross-Posting*** Call for Papers: The first BNLP workshop aims to provide a forum for the NLP, speech and multimodal communities to share and discuss their ongoing work with the international community. We particularly focus on Bangla, which is a low-resource language, and assess its current state-of-the-art and discuss strategies to make further progress in NLP, speech and multimodal research. Through this workshop, we plan to bring researchers together to come up with frameworks and strategies that can later support other low-resource languages. We encourage researchers to submit their papers focusing on novel methodologies and resources that help towards the progress of Bangla and other low resource languages. Novel methodologies include, but are not limited to, zero-shot learning, unsupervised learning, and simple yet effective methods applicable to low-computation scenarios. We invite original research papers from a wide range of topics, including but not limited to: Natural Language Processing: Corpus and Resource Development, Language Modeling, Stemmer, POS Tagger, Named Entity Recognition, Relation Extraction, Spell and Grammar Checker, Question Answering, Semantics, Text Summarization, Machine Translation, Sentiment Analysis. Speech Processing: Speech Synthesis and Spoken Language Generation, Speech Recognition, Phonetics, Phonology, and Prosody, Spoken Dialog and Conversational System, Speaker and Language Detection. Multimodality: OCR - Handwriting, Printed Document, Sign Language Detection. Human Computer Interaction: Software for Disabled People, Multimodal HCI for Bangla. Important dates: Workshop paper due: 1 September 2023 Notification of acceptance: 6 October 2023 Camera-ready papers due: 18 October 2023 Workshop dates: 6-7 December 2023 All deadlines are 11:59pm anywhere on Earth (AoE). Submission Details: Papers must describe original, completed or in-progress, and unpublished work. All papers will be refereed through a double-blind peer review process by multiple reviewers with final acceptance decisions made by the workshop organizers. Accepted papers will be given up to 9 pages (for full papers), 5 pages (for short papers and posters) in the workshop proceedings, and will be presented as oral paper or poster. We are seeking submissions under the following category - Full papers (8 pages) - Short papers (work in progress, innovative ideas/proposals: 4 pages) - Shared task paper (4 pages) Both long and short papers must follow the EMNLP 2023 two-column format, using the supplied official templates [1]. The templates can be downloaded in style files and formatting. Please do not modify these style files, nor should you use templates designed for other conferences. Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review. Verification to guarantee conformance to publication standards, we will be using the ACL pubcheck tool [2]. The PDFs of camera-ready papers must be run through this tool prior to their final submission, and we recommend its use also at submission time. Submissions are open to all, and are to be submitted anonymously. For the anonymity, double-blind submission and reproducibility criteria please follow the EMNLP 2023 instructions [3]. If you have published in the field previously, and are interested in helping out in the program committee to review papers, please fill up this form <https://forms.gle/1WUYQjWT9UuqioX48> [4]. Submission portal: TBA Workshop Organizers: Firoj Alam, Qatar Computing Research Institute, HBKU, Qatar Sudipta Kar, Amazon Alexa AI, USA Shammur Absar Chowdhury, Qatar Computing Research Institute, HBKU, Qatar Farig Sadeque, BRAC University, Bangladesh Ruhul Amin, Fordham University, USA Asif Shahriyar Sushmit, Rensselaer Polytechnic Institute, USA [1] https://2023.emnlp.org/calls/style-and-formatting/ [2] https://github.com/acl-org/aclpubcheck [3] https://2023.emnlp.org/calls/main_conference_papers/ [4] https://forms.gle/1WUYQjWT9UuqioX48 The Organizers

1 0

MULTI-Fake-DetectiVE @ EVALITA 2023
by Lucia Passaro 12 May '23

12 May '23

[Apologies for cross posting] ================================================== MULTI-Fake-DetectiVE @ EVALITA 2023 Final call for Participation Task website, news and updates: https://sites.google.com/unipi.it/multi-fake-detective Contacts: multifakedetective [at] gmail [dot] com ================================================== We invite you to participate in the FIRST shared task on Multimodal Fake News Detection in Italian (MULTI-Fake-DetectiVE). The shared task is aimed at broadening the horizon of research on disinformation in Italian by addressing both the textual and visual modalities. MULTI-Fake-DetectiVE proposes two subtasks. Participants are allowed to participate in either task or both of them. The evaluation window for both subtask is open until May 19! Test data available on the website. --------- Tasks --------- Task 1. Multimodal Fake News Detection The task is structured as a multi-class classification problem in a multimodal setting. Given a piece of content c = ⟨ t, v ⟩ including a textual component t and a visual component v (i.e., an image), classify it as being one of the following labels on a scale: Certainly Fake, Probably Fake, Probably Real, Certainly Real. Task 2. Cross-modal relations in Fake and Real News The task is aimed at assessing how the two modalities (i.e., textual and visual) relate to each other in the context of fake and real news. The task is formulated as a three-class classification problem. Given a piece of content c = ⟨ t, v ⟩ which includes a textual component t and a visual component v, decide whether their combination is misleading, not misleading in the interpretation of the information provided by either component, or the two are unrelated. ------- Data ------- The training and test datasets are available here: https://sites.google.com/unipi.it/multi-fake-detective/data ----------------------- Important Dates ----------------------- 12th - 19th May 2023: evaluation window (started) 30th May 2023: assessment returned to participants 14th June 2023: final reports due to task organizers 10th July 2023: review deadline 25th July 2023: camera ready deadline 7th - 8th September 2023: final workshop in Parma ----------------------- Task Organizers ----------------------- Alessandro Bondielli, Department of Computer Science, University of Pisa Pietro Dell'Oglio, Department of Information Engineering, University of Florence Alessandro Lenci, Department of Philology, Literature, and Linguistics, University of Pisa Francesco Marcelloni, Department of Information Engineering, University of Pisa Lucia Passaro, Department of Computer Science, University of Pisa Marco Sabbatini, Department of Philology, Literature, and Linguistics, University of Pisa

1 0

PACLIC 2023 - Second Call for Papers
by emmanuelechersoni＠gmail.com 12 May '23

12 May '23

* The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37) * * December 2-4, 2023 (deadline paper submission: July 16, 2023 AoE) * * The Hong Kong Polytechnic University, Hong Kong (China) * Conference website: https://paclic2023.github.io/ CONFERENCE OBJECTIVES Following the long tradition of PACLIC conferences, PACLIC 37 emphasizes the synergy of theoretical frameworks and processing of natural language, providing a forum for researchers from different fields to share and discuss progress in scientific studies, development and application of the topics related to the study of languages. TOPICS Topics include but are not limited to: - Language Studies o Clinical linguistics and language disorders o Corpus linguistics o Discourse Analysis o Language Acquisition o Language and Social Media o Language Learning o Language, Mind and Culture o Linguistic Theories o Morphology o Multilingualism o Phonology o Pragmatics o Semantics o Sociolinguistics o Spoken language processing o Syntax o Typology - Information Processing and Computational Applications o Cognitive modeling and psycholinguistics o Dialogue systems o Digital Humanities o Ethics in Natural Language Processing o Information retrieval/extraction o Interpretability of Natural Language Processing systems o Language models o Language resources o Linguistic diversity o Machine Learning and Natural Language Processing o Machine Translation o Multimodality o Natural Language Generation o Natural Language Processing applications o Sentiment Analysis o Summarization o Word segmentation PAPER SUBMISSION Papers may consist of up to eight (8) pages of content, plus references and appendices. Submissions will be judged based on relevance, technical strength, significance and opportunities, and interest to the attendees. As the reviewing will be double-blind, authors must not indicate their names and affiliations while submitting their papers. Papers must be submitted through the Easy Chair Conference System: https://easychair.org/cfp/PACLIC37. Accepted papers will be presented orally or as posters as determined by the PACLIC 37 program committee. Papers in the proceedings of PACLIC have been indexed in Scopus since PACLIC 19 (2005). They are also listed in the ACL Anthology. Double submissions with other conferences/workshops are allowed, but the authors are asked to declare it at submission time. SUBMISSION FORMAT The conference will only accept papers formatted according to the standard ACL templates (downloadable at: https://2023.aclweb.org/calls/style_and_formatting/). IMPORTANT DATES * Deadline of paper submission: July 16, 2023 (AoE) * Notification: September 10, 2023 Camera-ready: October 1, 2023 Early bird registration deadline: October 1, 2023 Conference: December 2-4, 2023 CONFERENCE ORGANIZERS (To be completed) Chu-Ren Huang (The Hong Kong Polytechnic University) Yasunari Harada (Waseda University) Jong-Bok Kim (Kyung Hee University) Emmanuele Chersoni (The Hong Kong Polytechnic University) Sophia Yat Mei Lee (The Hong Kong Polytechnic University) Sarah Chen (The Hong Kong Polytechnic University) Yu-Yin Hsu (The Hong Kong Polytechnic University) Pranav A (Dayta AI) Winnie Zeng (The Hong Kong Polytechnic University) Bo Peng (The Hong Kong Polytechnic University) Yuxi Li (The Hong Kong Polytechnic University) Junlin Li (The Hong Kong Polytechnic University) CONTACT paclic37(a)gmail.com

1 0

2026

2025

2024

2023

2022

Corpora May 2023