October 2024 - Corpora

Second Call for papers: The 1st International Workshop on Nakba Narratives as Language Resources
by Amal Haddad 17 Oct '24

17 Oct '24

NAKBA-NLP 2025 The 1st International Workshop on Nakba Narratives as Language Resources Part of the COLING-2025 [1] Conference Abu Dhabi, UAE (Fully Virtual) January 20, 2025 https://sina.birzeit.edu/nakba-nlp/ OVERVIEW The narratives of the (ongoing) Palestinian Nakba possess significant historical, cultural, literary, and academic value. Preserving this content and empowring it with AI tools is crucial for ensuring its accessibility and usability for present and future generations. Nakba narratives and testimonies exist in diverse formats such as manuscripts, books, audio recordings, novels, and films. Converting this content into a machine-understandable format presents a notable challenge. Establishing accessible archives and well-annotated collections is essential for researchers and historians to verify and share meaningful information. This workshop aims to explore how artificial intelligence, natural language processing, and corpus linguistics can assist in understanding, disseminating and preserving, Nakba narratives and testimonies. The goal is to create accessible, comprehensive, and well-annotated collections that empower researchers and historians to validate and share critical insights derived from these data. The workshop targets datasets and narratives in Arabic, English, and other languages, however, submitted articles should be written in English. CALL FOR PAPERS We invite submissions for Nakba-NLP 2025, a workshop dedicated to the exploration and preservation of Nakba narratives through the application of artificial intelligence, natural language processing, and corpus linguistics. All submitted papers should explain their relevance to the topic of 'Nakba Narratives as Language Resources'. The organisers reserve the right to reject any papers that incite hatred, refute established facts, or undermine the suffering of individuals. We seek contributions on the following issues of interest: * Digitisation of oral and written narratives * Creation and labeling of language corpora and datasets * Digital archives, metadata, and semantic/content mark-up * Annotation tools and annotation guidelines * Document classification, topic modeling, and information retrieval * Named entity recognition for identifying people, places, organizations, and events * Entity linking and relationship extraction * Event detection and event argument extraction * Knowledge Graphs and Linked Data * Vocabularies, dictionaries, and ontologies * Data visualisation * Knowledge representation * Machine translation, summarisation, and paraphrasing * Natural Language Generation * Large Language Models * Sentiment analysis and emotional content extraction * Discourse analysis (e.g., bias, offensive language, and misinformation) related to Nakba narratives * Voice & dialogue-based systems; ASR * Palestinian dialects (written and spoken) Participants are invited to use the following archives: Institute for Palestine Studies [2], The Palestinian Museum [3], Nakba-Archive [4], POHA [5],Alhaq [6],ICHR [7], as well as Wikipedia and the Wikidata Knowledge Graph. SUBMISSION DETAILS All submitted papers must clearly state and explain their relevance to the topic of 'Nakba Narratives as Language Resources'. The organisers reserve the right to reject any papers that incite hatred, refute established facts, or undermine the suffering of individuals. Submissions may be of two types: * Long papers - up to eight (8) pages maximum, presenting substantial, original, completed, and unpublished work. * Short papers - up to four (4) pages, describing a small focused contribution, negative results, system demonstrations, etc. The workshop supports the COLING anti-harassment policy Policy. [8] IMPORTANT DATES * Submission Deadline: 25 November 2024 * Notifications of Acceptance: 5 December 2024 * Camera Ready Deadline: 13 December 2024 (cannot be changed). Links: ------ [1] https://coling2025.org/ [2] https://www.palestine-studies.org/ [3] https://palmuseum.org/en [4] https://www.nakba-archive.org/ [5] https://libraries.aub.edu.lb/poha/ [6] https://www.alhaq.org/ [7] https://www.ichr.ps/en [8] https://coling2022.org/policy

1 0

Second Call for Papers: The 4th Workshop on Arabic Corpus Linguistics (WACL-4)
by Amal Haddad 17 Oct '24

17 Oct '24

WACL4 AT COLING’2025 WITH FOCUS ON ARABIC DIALECTS https://wp.lancs.ac.uk/wacl4/ ------------------------- The workshop will be held online on January 20th, 2025 in conjunction with the 31st edition of COLING in 2025 in Abu Dhabi (UAE). CALL FOR PAPERS THE WORKSHOP TOPICS INCLUDE BUT ARE NOT LIMITED TO: * Development and Utilisation of Arabic Dialectal Corpora * Advancements in Natural Language Processing Techniques for Arabic Dialects * Applications and Challenges of Large Language Models in Understanding and Generating Arabic Dialects * Morphological and Syntactical Challenges in Arabic Dialects * Dialect Identification and Classification * Speech Recognition and Synthesis for Arabic Dialects * Machine Translation involving Arabic Dialects * Sentiment Analysis and Opinion Mining in Arabic Dialects * Named Entity Recognition and Information Extraction for Arabic Dialects * Development of Open Access Resources for Arabic Dialects * Text Processing and Transliteration Challenges for Arabic Dialects * Cultural and Sociolinguistic Considerations in NLP Applications for Arabic Dialects * Resources and Tools for Computational Analysis of Arabic Dialects * Applications of Arabic Dialects NLP in Real-World Scenarios Summary of the Call: We welcome submissions of papers centred around Arabic Dialects NLP and resources, focusing on supporting and advancing language technologies tailored to the diverse range of Arabic dialects. We encourage submissions that span a spectrum from theoretical investigations to practical applications, aiming to address the unique challenges, solutions, and insights that Arabic dialects introduce to the field of NLP. Submissions should adhere to the COLING 2025 standards. Authors are strongly encouraged to review and follow the COLING 2025 submission guidelines and author kit, available at https://coling2025.org/. If authors are describing dialectal variations, we request that they include relevant linguistic details and sociolinguistic contexts to enrich the understanding of the presented work. Submissions may be of two types: * Long papers - up to eight (8) pages excluding references, presenting substantial, original, completed, and unpublished work. * Short papers - up to four (4) pages excluding references, describing a small focused contribution, negative results, or system demonstrations, etc. Important dates: * 1st Call for Papers Announcement: 13 August 2024 * 2nd Call for Papers Announcement: 01 October 2024 * Paper Submission Deadline: 8 November 2024 * Notification of Paper Acceptance: 6 December 2024 * Camera-ready Paper Deadline: 13 December 2024 * Workshop Date: 20th January 2025 Format: The workshop will consist of a mix of invited talks, contributed talks, and panel discussions. The workshop will be held 100% virtually, allowing for greater accessibility and participation from scholars and researchers around the world. We anticipate 30 attendees and 2 invited speakers to the workshop. Scheduled for 20 January 2025, the workshop will be held in conjunction with the 31st edition of COLING 2025 in Abu Dhabi, UAE. Anti-Harassment policy: The workshop supports the COLING anti-harassment policy https://coling2022.org/policy _Keynote Speakers:_ * _Speaker 1: Imed Zitouni, Google, USA_ (Confirmed) * _Speaker 2: Hend Alkhalifa, King Saud University, Saudi Arabia_ (Awaiting confirmation) ORGANIZATION ORGANISING COMMITTEE: * Saad Ezzini, Lancaster University, UK (General Chair) * Hamza Alami, Sidi Mohamed Ben Abdellah University, Morocco (Programme Co-Chair) * Ismail Berrada, Mohammed VI Polytechnic University, Morocco (Programme Co-Chair) * Abdessamad Benlahbib, Sidi Mohamed Ben Abdellah University, Morocco (Programme Co-Chair) * Abdelkader El Mahdaouy, Mohammed VI Polytechnic University, Morocco (Review Chair) * Salima Lamsiyah, University of Luxembourg, Luxembourg (Publication Chair) * Nouran Khallaf, Leeds University, UK (Publicity Co-Chair) * Hatim Derrouz, Ibn Tofail University, Morocco (Publicity Co-Chair) * Amal Haddad, University of Granada, Spain (Publicity Co-Chair) * Mustafa Jarrar, Birzeit University, Palestine (Advisory Committee) * Mo El-Haj, Lancaster University, UK (Advisory Committee) * Ruslan Mitkov, Lancaster University, UK (Advisory Committee) * Paul Rayson, Lancaster University, UK (Advisory Committee) PROGRAMME COMMITTEE: * Ahmed Ali, Qatar Computing Research Institute (QCRI), Qatar * Ahmed Abdelali, Qatar Computing Research Institute (QCRI), Qatar * Almoataz B. Al-Said, Cairo University, Egypt * Eric Atwell, Leeds University, UK * Haithem Afli, Dublin City University, Ireland * Hazem Hajj, American University of Beirut, Lebanon * Ignatius Ezeani, Lancaster University, UK * Imed Zitouni, Microsoft Research, USA * Karim Bouzoubaa, Mohamed Vth University, Morocco * Khaled Shaban, Qatar University, Qatar * Abdessamad Benlahbib, Sidi Mohamed Ben Abdellah University, Morocco * Lama Alsudias, Lancaster University, UK * Mo El-Haj, Lancaster University, UK * Mariam Aboelezz, Birkbeck, University of London, UK * Nadi Tomeh, University of Paris 13, France * Nizar Habash, New York University Abu Dhabi, UAE * Nora Al-Twairesh, King Saud University, Saudi Arabia * Abdelkader El Mahdaoui, Mohammed VI Polytechnic University, Morocco * Paul Rayson, Lancaster University, UK * Scott Piao, Lancaster University, UK * Taha Zerrouki, Ecole Nationale Supérieure d'Informatique, Algeria * Tamer Elsayed, Qatar University, Qatar * Violetta Cavalli-Sforza, Al Akhawayn University, Morocco * Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar * Hanane El Faik, Chouaïb Doukkali University, Morocco * Wassim El-Hajj, American University of Beirut, Lebanon * Ashraf Boumhidi, Sidi Mohamed Ben Abdellah University, University, Morocco * Khadidja Merakchi, Heriot-Watt University * Ed-Drissiya El-Allaly, University of Moulay Ismail, Morocco * Driss Aboulhoucine, EMRO, WHO * El Habib Nfaoui, Sidi Mohamed Ben Abdellah University, Morocco * Salima Lamsiyah, University of Luxembourg, Luxembourg * Khaled Shaalan, The British University in Dubai, UAE * Ismail Berrada, Mohammed VI Polytechnic University, Morocco * Maram Alharbi, Lancaster University, UK * Hatim Derrouz, Ibn Tofail University, Morocco * Nouran Khallaf, Leeds University, UK * Hamza Alami, Sidi Mohamed Ben Abdellah University, Morocco * Mustafa Jarrar, Birzeit University, Palestine * Hanane Grissette, Cadi Ayyad University, Morocco -------------------------

1 0

12 PhD positions on Neuroexplicit models, Saarland University, Germany
by Vera Demberg 17 Oct '24

17 Oct '24

** *The Research Training Group 2853 “Neuroexplicit Models of Language, Vision, and Action” is looking for* * Twelve PhD Students - Fall 2025 Neuroexplicit models combine neural and human-interpretable (“explicit”) models in order to overcome the limitations that each model class has separately. They include neurosymbolic models, which combine neural and symbolic models, but also e.g. combinations of neural and physics-based models. In the RTG, we will improve the state of the art in natural language processing (“Language”), computer vision (“Vision”), and planning and reinforcement learning (“Action”). We also develop novel machine learning techniques for neuroexplicit models (“Foundations”). Our overarching aim is to contribute to a better understanding of the cross-cutting design principles of effective neuroexplicit models through interdisciplinary collaboration. The RTG is scheduled to grow to a total of 24 PhD students by 2025. An excellent and international group of twelve PhD students and one postdoc have already joined the RTG. Through the inclusion of ~20 associated PhD students and postdocs funded from other sources, it will be one of the largest research centers on neuroexplicit or neurosymbolic models in the world. The RTG brings together researchers at Saarland University, the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, the CISPA Helmholtz Center for Information Security, and the German Research Center for Artificial Intelligence (DFKI). All of these institutions are collocated on the same campus in Saarbrücken, Germany. The positions will be funded for four yearsat the TV-L E13 100% pay scale. They are intended to start in September 2025, but could start a little earlier or later depending on the student’s availability. You should have or be about to complete an MSc degree in computer science or a related field and have demonstrated expertise in one of the research areas of the RTG, e.g. through an excellent Master’s thesis or relevant publications. The RTG is part of the Saarland Informatics Campus, one of the leading centers for researchin computer science, artificial intelligence, and natural language processing in Europe. The Saarland Informatics Campus brings together 900 researchers and 2500 students from 81 countries. The CISPA Helmholtz Center, located on the same campus, is home to an additional 350 researchers and on track to grow to 800 by 2026. Researchers at SIC and CISPA are part of the ELLIS network and have been awarded more than 40 ERC grants. Each PhD student in the RTG will be jointly supervised by two PhD advisorsfrom the list of Principal Investigators below. Each student will freely define their own research topic; we encourage the choice of topics that cross the traditional boundaries of research fields. Students may be affiliated with Saarland University or with one of the participating institutes. Vera Demberg, Saarland University - Computational Linguistics Jörg Hoffmann, Saarland University - AI Planning Dietrich Klakow, Saarland University - Natural Language Processing Alexander Koller, Saarland University - Computational Linguistics Bernt Schiele, MPI for Informatics - Computer Vision, Machine Learning Philipp Slusallek, DFKI and Saarland University - Computer Graphics, Artificial Intelligence Christian Theobalt, MPI for Informatics - Visual Computing, Machine Learning Mariya Toneva, MPI for Software Systems - Computational Neuroscience, Machine Learning Isabel Valera, Saarland University - Machine Learning Jilles Vreeken, CISPA - Machine Learning, Causality Joachim Weickert, Saarland University - Mathematical Data Analysis Verena Wolf, DFKI and Saarland University - Modeling and Simulation, Reinforcement Learning Ellie Pavlick, Brown University and Google AI, will join us regularly as a Mercator Fellow. Please send your application by 26 November 2024to apply(a)neuroexplicit.org <mailto:apply@neuroexplicit.org>and include the reference number W2543. We aim to conduct job interviews in January 2025. For more details on the position, including what materials to submit with your application, please see our website: https://www.neuroexplicit.org/jobs/ <https://www.neuroexplicit.org/jobs/#phd-2023> *

1 0

Post-Doc or Chair Assistant Position (Akademischer Rat) for 3+3 Years with the Chair for Multilingual Computational Linguistics in Passau
by Johann-Mattis List 17 Oct '24

17 Oct '24

Dear all, Our Chair of Multilingual Computational Linguistics is offering a position for an Akademischer Rat (research assistant) for 3 years with the possibility of extension by 3 more years. We look for a candidate who can teach topics in Multilingual Computational Linguistics with an open focus on any topic related to multilingual computational approaches to linguistic typology, historical linguistics, or psycholinguistics. Deadline for application is November 20, more information can be found here: https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote… Sincerely, Mattis List -- Prof. Dr. Johann-Mattis List Chair of Multilingual Computational Linguistics University of Passau Dr.-Hans-Kapfinger-Str. 16 04032 Passau Germany Chair Website: https://phil.uni-passau.de/multilinguale-computerlinguistik/ Personal Website: https://lingulist.de Telephone: +49(0)851/509-3480

1 0

First Call for Papers - Fifth Conference on Language, Data and Knowledge (LDK 2025)
by Andon Tchechmedjiev 17 Oct '24

17 Oct '24

** Apologies for cross-postings ** ============== Call for Papers @ Fifth Conference on Language, Data and Knowledge (LDK 2025) Dates: 9-12 September 2025 Location: Naples, Italy Website: http://2025.ldk-conf.org Twitter/X: https://x.com/LDKconference Submission Deadline: 06/03/2025 Submission page: https://openreview.net/group?id=LDK/2025/Conference ============== We invite submissions to the fifth biennial conference on Language, Data and Knowledge (LDK 2025) to be held in Naples, Italy in September 2025. This conference aims to bring together researchers from across different disciplines concerned with the acquisition, treatment, curation and the use of language data in the context of data science and knowledge-based applications. This edition builds upon the success of the inaugural event held in Galway, Ireland in 2017, the second LDK in Leipzig, Germany in 2019, the third LDK in Zaragoza, Spain in 2021, and the fourth LDK in Vienna, Austria in 2023. Paper Submission We welcome submissions of relevance to the topics listed below. Submissions can be in the form of: Long papers: 9–12 pages; Short papers: 4–6 pages. All submission lengths are given including references. Accepted submissions will be published in an open-access conference proceedings volume and indexed in ACL anthology and DBLP, free of charge for authors. The ACL templates should therefore be used for all conference submissions. As the reviewing process is single-blind, submissions should not be anonymised. Papers should be submitted via OpenReview at the following address: https://openreview.net/group?id=LDK/2025/Conference All papers must represent original work. When submitted, the submission must not have been previously published*, and the material in it must not have been/be submitted for review at another journal or conference while under review at LDK 2025. *This excludes papers on preprint archives, such as arXiv, which we do not consider to have been previously published. The conference will be hybrid (face-to-face and remote). Note that at least one author of each accepted paper must register to present the paper at the conference (either remotely or on-site). Topics Relevant topics for the conference include, but are not limited to, the following fields: Language Data >Language data construction and acquisition >Language data annotation >FAIR data practices for language data >Language data portals and metadata about language data >Organisational and infrastructural management of language data >Multilingual, multimedia and multimodal language data >Evaluation, provenance and quality of language data >Visualisation of language data >Standards and interoperability of language data >Legal aspects of publishing language data >Under-resourced languages >e-Lexicography >Semantic processing Knowledge Graphs >Linguistic Linked Data and the multilingual Semantic Web >Ontologies, terminologies, wordnets, framenets and related resources >Information and knowledge extraction (taxonomy extraction, ontology >learning) >Data, information and knowledge integration across languages >(Cross-lingual) ontology alignment >Entity linking and relatedness >Linked data profiling >Knowledge representation and reasoning >Knowledge graphs for corpora processing and analysis >Neuro Symbolic Artificial Intelligence Methods and Applications for Language, Data and Knowledge >Question answering and semantic search >Text analytics on big data >NLP for language documentation and preservation >Speech recognition and synthesis >Spoken language processing >Semantic content management >Computer-aided language learning >Natural language interfaces to big data >Knowledge-based NLP >Deep learning and machine learning for and on LLOD >Language Models and Foundation Models (Language and Multimodal Models). >Generative Artificial Intelligence and Language, Data, Knowledge Graphs >Use Cases in Language, Data and Knowledge > Contributions are welcome where the topics above - and others within the scope of Language, Data and Knowledge - are applied to domain-specific use cases, including but not limited to: social sciences and humanities, legal, life sciences, FinTech, cybersecurity. Organising Committee Conference Chairs: Jorge Gracia, University of Zaragoza, Spain Dagmar Gromann, University of Vienna, Austria Program Chairs: Mehwish Alam, Telecom Paris, Institut Polytechnique de Paris, France Andon Tchechmedjiev, Institut Mines Telecom | EuroMov Digital Health in Motion Workshop and Tutorial Chairs: Katerina Gkirtzou, ILSP/Athena Research Center, Greece Slavko Zitnik, University of Ljubljana, Slovenia Local Organisers: Maria Pia Buono - University of Naples “L’Orientale”, Italy Johanna Monti - University of Naples “L’Orientale”, Italy Important Dates: Paper submission deadline: 6th March, 2025 Acceptance/Rejection Notification: 8th May, 2025 Pre/Post Conference events: 9 to 12 September, 2025 Main conference: 10-11 September, 2025 All deadlines are 23:59 AoE (anywhere on Earth)

1 0

Special Issue Update: Extended Deadline & Exciting News
by Rui Sousa Silva 16 Oct '24

16 Oct '24

Dear All, We're pleased to announce an extension for the special issue of Language and Law / Linguagem e Direito on "Language, Law and Rights: Balancing AI Driven Technology and Equity." Due to demand, the new deadline for submissions is November 15th, 2024. For this special issue, we particularly welcome both theoretical and empirical contributions that challenge prominent understandings, from a language, law and rights perspective, on: * relationships and tensions between language, law, rights, and technology; * linguistic imperialism via technology; * emerging digital divides and other social issues; * co-designing technology with diverse communities; * assistive technology for language access; * accessibility considerations for language rights; * best practice to balance innovation and equity by maintaining a dialogue with technology developers, communities, researchers and policymakers; * best practice for promoting linguistic equality and equity through regulations and policy. Keywords: Language, technology, human-machine interaction, minorities, human-centricity, law, rights, justice, equity Themes: * Co-creation between humans; co-creation with AI-driven technology (co-AI) * Non-converging goals (e. g. efficiency vs. customization, bias vs. fairness, short-term gains vs. long-term sustainability, commercial Interests vs. social good, power dynamics and control vs. individual choices) * Quality of life, law, regulation and ethics * Linguistic justice * Cultural issues * Sociological issues of language rights in relation to technology, its development, and deployment * Accessibility - access to services (public services and otherwise) * Language rights and language policy and planning * Glottopolitics and computer-assisted communication * Language rights and multilingual administrations * Minoritised languages and human geography * Technology-mediated communication in multilingual democracies * Participation of linguistic minority groups through remote interpreting * Minoritised/indigenous language media and social media * AI and data mining in under-resourced languages * Agency issues in human-machine interaction and language rights Language rights and international law (normative frameworks) * Linguistic Human Rights as individual and collective rights to choose the language/s for communication * Language Rights of vulnerable witnesses * Psychosocial factors in using linguistic varieties in public services * Human-Centred augmented translation * Machine translation, post-editing tools and minority languages Length: ≃ 7000-8000 words [Guidelines available here<https://drive.google.com/file/d/1_9rd9r4XmgSY2twXSVab-ZQWFDjmevy8/view?usp=…>] Book reviews: Suggest books published recently to be reviewed for the special issue. Important Dates for Vol. 12(1), 2025 (June, 2025): Full article submission: November 15, 2024 For more information please follow the link: https://ojs.letras.up.pt/index.php/LLLD/announcement/view/158 For further information please contact: Angela Soltan <angela<mailto:angela@soltan.md>@soltan.md<mailto:angela@soltan.md>>. Or one of the other guest editors: Rebekah Rousi <rebekah.rousi(a)uwasa.fi<mailto:rebekah.rousi@uwasa.fi>>, Lucia Ruiz Rosendo <Lucia.Ruiz(a)unige.ch<mailto:Lucia.Ruiz@unige.ch>> On behalf of the editorial team, Angela Soltan Rui Sousa Silva Faculdade de Letras, Universidade do Porto | Faculty of Arts and Humanities, University of Porto www.linguisticaforense.pt | https://s.up.pt/qjur | http://tinyurl.com/37w2ec6x Publicação mais recente / Latest publication: Cyber Hate Speech Detection and Analysis: An Evidence-Based Forensic Linguistics Approach<https://doi.org/10.1007/978-3-031-51248-3_8> AVISO DE CONFIDENCIALIDADE: Esta mensagem e os seus anexos são confidenciais e dirigidos unicamente aos destinatários da mesma. Se não for o destinatário, solicito que não faça qualquer uso do seu conteúdo e proceda à sua eliminação, notificando-me do sucedido. Obrigado. // CONFIDENTIALITY WARNING: This message and its attachments are confidential and exclusively addressed to the recipients above. Should you not be one of the recipients, I kindly ask you not to make use of its contents and delete the message and its attachments. Please reply to this e-mail to warn me about this incident. Thank you.

1 0

Open position as Group Leader in Natural Language Processing, Bielefeld University
by cimiano＠cit-ec.uni-bielefeld.de 16 Oct '24

16 Oct '24

Bielefeld University has an open position as group leader in Natural Language Processing at the Faculty of Technology. We are looking for a postdoctoral researcher who can independently set up and lead a group on Natural Language Processing. The group leader is expected to build up an independent research profile in the field of Natural Language Processing (NLP) and to publish the research work at international conferences (e.g. ACL, EMNLP, COLING, AAAI, …). The group leader will be affiliated with the „Semantic Computing Lab“ [1] led by Prof. Philipp Cimiano The research of the group leader should be compatible with the research topics of the „Semantic Computing“ lab. Possible research topics include: • Robustness and safety of large language models for NLP • Ethical aspects and FAIRness of NLP systems • Fine-tuning and transfer learning for the adaptation of LLMs • LLMs for information extraction • Auto-ML for LLMs • Argument Mining • Common Sense Knowledge for NLP • Temporal Reasoning for NLP The successful candidate has an excellent track record in the field of NLP with demonstrated ability to publish at top tier conferences. Ideally, the candidate will have teaching experience and experience in third party funding acquisition. The position involves teaching duties corresponding to 4 hours per week. The position is available for 3 years and can be extended for up to 3 further years. Inquires and applications should be sent to cimiano(a)techfak.uni-bielefeld.de <mailto:cimiano@techfak.uni-bielefeld.de> <mailto:cimiano@techfak.uni-bielefeld.de>. The application deadline is October 25th. Applications should include a CV, research statement and teaching concept. [1] https://www.uni-bielefeld.de/fakultaeten/technische-fakultaet/arbeitsgruppe… Prof. Dr. Philipp Cimiano AG Semantic Computing Coordinator of the Cognitive Interaction Technology Center (CITEC) Co-Director of the Joint Artificial Intelligence Institute (JAII) Universität Bielefeld Tel: +49 521 106 12249 Fax: +49 521 106 6560 Mail: cimiano(a)cit-ec.uni-bielefeld.de Personal Zoom Room: https://uni-bielefeld.zoom-x.de/my/pcimiano Office CITEC-2.307 Universitätsstr. 21-25 33615 Bielefeld, NRW Germany

1 0

SEPLN: CFP issue 74 of the journal Procesamiento del Lenguaje Natural Deadline ***22nd November***
by aitziber.atucha＠ehu.eus 15 Oct '24

15 Oct '24

[Spanish version below] Please consider contributing and/or forwarding to appropriate colleagues and groups. *******We apologize for the multiple copies of this e-mail****** Call for papers for issue 74 of the journal Procesamiento del Lenguaje Natural http://www.sepln.org/en/journal http://www.sepln.org/en/journal/author-guidelines Introduction The aim of the journal Procesamiento del Lenguaje Natural is to provide a forum for the publication of scientific-technical articles in the field of Natural Language Processing (NLP), for both the national and international scientific community. The articles must be unpublished and cannot be simultaneously submitted for publication in other journals or conference proceedings. The journal also aims to promote the development of areas related to NLP, disseminate research carried out, identify future guidelines for basic research, and present software applications in this field. Every year the Sociedad Española de Procesamiento del Lenguaje Natural (SEPLN) (Spanish Society for the Natural Language Processing) publishes two issues of the journal, including original articles, presentations of R&D projects, book reviews, and summaries of PhD theses. The scientific quality of the Journal is supported by the 2023 JCR index (JIF: 1.2, JCI: 0.39, Q2-Linguistics - Q4-Computer Sciences, Artificial Intelligence ESCI), the SCImago Journal Ranking (2023 SJR: 0.677, Q2-Computer Science Applications, Q1-Linguistics and Language), the Scopus Index (2023 CiteScore: 5.4) and the index SNIP (Source Normalized Impact per Paper) with 2.07 points. More information at: http://www.sepln.org/en/journal/quality. Topics NLP for low-resource languages Efficient and sustainable NLP methods Ethics, Bias and Fairness in NLP Truthworthy and explainability in NLP Security and privacy in NLP Text and Multimodal Generation Multimodality and Language Grounding to Vision Knowledge and common sense Computational lexicography and terminology Linguistic theories, Cognitive Modeling and Psycolinguistics Morphological and Syntactic analysis Corpus linguistics Development of linguistic resources and tools Semantics, pragmatics, and discourse Machine translation Speech synthesis and recognition Audio indexing and retrieval Dialogue systems and interactive systems/ Conversational assistants Monolingual and multilingual information extraction and retrieval Question answering systems Automatic textual content analysis Sentiment analysis, opinion mining and argument mining Plagiarism detection Negation and speculation processing Text summarization Text simplification Image retrieval NLP in specific domains (Medicine, Law, Education) Submission Information The proposal must be submitted by November 22nd, 2024 and must meet certain format and style requirements. All submissions must be in PDF format and submitted electronically using the OpenReview system. Submitted papers will be subjected to a blind review by at least three members of the program committee. Categories of papers Regular papers with original contributions. Summary of PhD thesis. Information for Authors The proposals can be written in Spanish or English and should be at most 10 A4-size pages of content, plus unlimited pages for references, and 4 pages maximum for summaries of PhD theses. The papers must include the following sections: The title of the communication (in English and Spanish). An abstract in English and Spanish (maximum 150 words). A list of keywords or related topics (in English and Spanish). The documents must not include headers or footers. As reviewing will be blind, the paper should not include the authors’ names and affiliation. Furthermore, self-references that reveal the author’s identity should be avoided. The articles should only include the title, the abstract, the keywords and the proposal. We recommend using the LaTeX and Word templates that can be downloaded from the SEPLN web (author guidelines have been updated): http://www.sepln.org/index.php/en/journal/author-guidelines Note on camera ready The final version of the paper (camera ready) should be submitted together with a cover letter explaining how the suggestions of the reviewers were implemented in the final version. This cover letter will be considered in order to accept or finally reject the selected paper. Preprint policy The Journal allows the publication of preprints (non-refereed paper posted online, such as ArXiv) anytime, but during the review period the preprint must indicate that the paper it is “under review” in the Journal Procesamiento del Lenguaje Natural. Likewise, if the paper is accepted, the preprint must be updated with the DOI, name of the Journal and the bibliographic information of the paper. Important dates Submission deadline: November 22nd, 2024 Notification of acceptance: January 27th, 2025 Camera ready: February 7th, 2025 Publication: March 2025 Contact person: Aitziber Atutxa (aitziber.atucha(a)ehu.eus) Editorial Committee of the Procesamiento del Lenguaje Natural -------------------------------------------------------------------------------------------------------------------- ***********Disculpen si reciben varias copias de este mensaje ************ Por favor, si lo considera oportuno, distribuya este llamamiento entre sus colegas. Petición de artículos para la revista Procesamiento del Lenguaje Natural nº 74. http://www.sepln.org/la-revista http://www.sepln.org/la-revista/informacion-para-autores Objetivos de la revista La revista Procesamiento del Lenguaje Natural es un foro de publicación de artículos científico-técnicos en el ámbito del Procesamiento del Lenguaje Natural (PLN), tanto para la comunidad científica nacional como internacional. Los artículos tienen que ser inéditos y no haber sido postulados para ser publicados simultáneamente en otras revistas o actas de congresos. La revista quiere potenciar el desarrollo de las diferentes áreas relacionadas con el PLN, mejorar la divulgación de las investigaciones que se llevan a cabo, identificar las futuras directrices de la investigación básica y mostrar las posibilidades reales de aplicación en este campo. Anualmente la SEPLN (Sociedad Española para el Procesamiento del Lenguaje Natural) publica dos números de la revista, que incluyen artículos originales, presentaciones de proyectos, reseñas bibliográficas y resúmenes de tesis doctorales. La calidad científica de la Revista está respaldada por el índice del JCR 2023 (JIF: 1.2, JCI: 0.39, Q2-Linguistics - Q4-Computer Sciences, Artificial Intelligence ESCI), el índice SCImago Journal Ranking (2023 SJR: 0.677, Q2-Computer Science Applications, Q1-Linguistics and Language), el índice de Scopus (2023 CiteScore: 5.4) y el índice SNIP (Source Normalized Impact per Paper) con 2,07 puntos. Más información en http://www.sepln.org/la-revista/calidad. Áreas temáticas PLN para lenguas con recursos limitados Diversidad y PNL para lenguas de bajos recursos Métodos de PNL eficientes y sostenibles LLM: Diseño, Creación, Evaluación Ética, Sesgo y Equidad en la PNL PNL veraz y explicable Seguridad y Privacidad en PNL Generación Texto y Multimodal Multimodalidad y fundamento del lenguaje para la visión Conocimiento y sentido común Teorías lingüísticas, modelado cognitivo y psicolingüística Análisis Morfológico y Sintáctico Lingüística de corpus Desarrollo de recursos y herramientas lingüísticas Semántica, pragmática y discurso Traducción automática Reconocimiento y síntesis de habla Indexación y recuperación de Audio Sistemas de diálogo y sistemas interactivos/Asistentes conversacionales Recuperación y extracción de información monolingüe y multilingüe Sistemas de búsqueda de respuestas Análisis automático de contenido textual Análisis de opiniones, emociones y minería de la argumentación Detección de plagio Procesamiento de la negación y la especulación Resumen automático de texto Simplificación de texto Recuperación de imágenes PLN especifico al dominio (Medico, Juridico-administrativo, Educación, etc) Envío de trabajos Las propuestas de trabajos (artículos y resúmenes de tesis) podrán ser enviadas hasta la fecha límite del 22 de Noviembre de 2024. El envío y la revisión de las propuestas se realizarán exclusivamente en formato PDF y se gestionarán a través del sistema OpenReview. La evaluación de los trabajos pasará por un proceso de revisión ciego realizado como mínimo por tres miembros del consejo asesor de la SEPLN. Tipos de trabajos Artículos sobre contribuciones originales. Reseñas de tesis doctorales. Instrucciones para los Autores Los trabajos pueden estar escritos en español o en inglés y su longitud máxima será de 10 páginas de contenido más un número ilimitado de páginas de referencias para los artículos científicos, y de un máximo de 4 páginas para los resúmenes de tesis. Las propuestas deben contener los siguientes apartados: El título del artículo (en español e inglés). Un resumen en español y un abstract en inglés de un máximo de 150 palabras. Un listado de temas relacionados o palabras clave (en español e inglés). Los documentos no podrán incluir cabeceras ni pies de página. Como la fase de revisión de los trabajos es ciega, en los artículos que se envíen no se debe incluir ninguna referencia a los autores ni referencias propias que revelen la identidad de los mismos. Todas las contribuciones deben contener únicamente el título, el resumen, las palabras claves y la propuesta. En el caso de los resúmenes de tesis, el anonimato no es necesario. Los trabajos deben seguir el formato de las revistas de la SEPLN disponible en la siguiente dirección: http://www.sepln.org/la-revista/informacion-para-autores Las guías se han actualizado, por favor, utilicen las que están disponibles en la página web de la revista. Nota sobre la versión final La versión final del trabajo (camera ready) debe enviarse con un documento en el que se explique cómo se han implementado las sugerencias de los revisores. Dicho documento se tendrá en cuenta para aceptar o rechazar el trabajo en cuestión. Política de prepublicación La revista permite publicar una versión no revisada de los artículos en plataformas de prepublicación (plataformas de artículos no evaluados como ArXiv). Sin embargo, durante el periodo de revisión se debe indicar que el artículo está “en revisión” en la revista Procesamiento del Lenguaje Natural. Si el artículo es aceptado, se debe actualizar la publicación en la plataforma de prepublicación con el DOI, nombre de la revista y la información bibliográfica del artículo. Fechas importantes Envío de trabajos: 22 de Noviembre 2024 Notificación de aceptación/rechazo: 27 de Enero 2025 Versión final: 7 de febrero de 2025 Publicación: Marzo de 2025 Persona de contacto:Aitziber Atutxa (aitziber.atucha(a)ehu.eus) Consejo de redacción de la revista Procesamiento del Lenguaje Natural.

1 0

October 2024 Newsletter - LDC
by Penn LDC 15 Oct '24

15 Oct '24

In this newsletter: LDC/Penn receives US Dept of Education research grant Membership year 2025 publication preview Fall 2024 data scholarship recipients New publications: RST Continuity Corpus<https://catalog.ldc.upenn.edu/LDC2024T08> MultiTACRED<https://catalog.ldc.upenn.edu/LDC2024T09> ________________________________ LDC/Penn receives US Dept of Education research grant LDC and Penn's Graduate School of Education and Department of Computer and Information Science are part of a team that was recently awarded a $10 million grant from the US Department of Education<https://ies.ed.gov/funding/grantsearch/details.asp?ID=6066> to develop the Using Generative Artificial Intelligence for Reading R&D Center (U-GAIN Reading) which will explore using generative AI to improve elementary school reading instruction for English learners. Led by the education nonprofit Digital Promise, U-GAIN Reading will build on an existing research-based tutoring platform, Amira Learning, that is used by more than 1 million students each year. The LDC/Penn team will contribute expertise in computational linguistics, computer science, and learning analytics. An evaluation team at MDRC will measure learner outcomes both to improve the R&D and to benchmark its eventual impacts. Additional experts in the science of reading, ethics, and strategies for national impact will support the project's work. Data developed in the project will be shared with the community through the LDC Catalog. Membership year 2025 publication preview The 2025 membership year is approaching and plans for next year's publications are in progress. Among the expected releases are: * Iraqi Arabic - English Lexical Database: a set of six interrelated tables (roots, lemmas, wordforms, multi-word expressions, English definitions, example phrases) presenting each Iraqi Arabic word in Arabic script and IPA format, a result of LDC's collaboration with Georgetown University Press to enhance and update three dialectal Arabic dictionaries * AIDA topic source data and annotations: multimodal source data and annotations in multiple languages (Russian, English, Spanish) for information and entity extraction * 2015 NIST Language Recognition Evaluation Test Set: 164,000+ segments of conversational telephone speech and broadcast narrow band speech in six linguistic varieties (Arabic, Spanish, English, Chinese, Slavic, French) representing 20 languages, used in NIST's 2015 language recognition evaluation * BOLT CALLFRIEND CALLHOME CTS Audio, Transcripts and Translations: previously unpublished Chinese and Egyptian Arabic telephone conversations from the CALLFRIEND and CALLHOME collections, with transcripts and translations developed by LDC for the DARPA BOLT program * Chinese Sentence Pattern Structure Treebank: 5,000+ sentences from ancient and modern Chinese texts with syntactic annotation based on sentence constituent analysis, developed by Beijing Normal University and Peking University * IARPA MATERIAL language packs: conversational telephone speech, transcripts, English translations, annotations, and queries in multiple languages (e.g., Georgian, Kazakh, Lithuanian) * LORELEI: representative and incident language packs containing monolingual text, bi-text, translations, annotations, supplemental resources, and related tools in various languages (e.g., Hungarian, Hindi, Amharic, Somali) Check your inbox for more information about membership renewal. Fall 2024 data scholarship recipients Congratulations to the recipients of LDC's Fall 2024 data scholarships: Yomma Gamaleldin: Alexandria University (Egypt): Master's student, Computer and Systems Engineering Department. Yomma is awarded a copy of Qatari Corpus of Argumentative Writing (LDC2022T04) for her work in Arabic automated essay scoring. Arhane Mahaganapathy: Jaffna University (Sri Lanka): Master's student, Department of Computer Science. Ahrane is awarded copies of IARPA Babel Tamil Language Pack (LDC2017S13) and Multi-Language Telephone Speech 2011 - South Asian (LDC2017S14) for her work in Tamil speech-to-text systems. Sivashanth Suthakar: Jaffna University (Sri Lanka): Master's student, Department of Computer Science. Sivashanth is awarded copies of CAMIO Transcription Languages (LDC2022T07) and LORELEI Tamil Representative Language Pack (LDC2023T03) for his work in Tamil OCR systems. Oshan Yalegama: University of Moratuwa (Sri Lanka): BSc, Electronic and Telecommunication Engineering. Oshan is awarded copies of CSR-I (WSJ0) Complete (LDC93S6A) and TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) for his work in audio signal processing. Samer Mohammed Yaseen: Sana'a University (Yemen): PhD candidate, Faculty of Computer and Information Technology. Samer is awarded a copy of Arabic Newswire Part 1 (LDC2001T55) for his work in Arabic information retrieval. ________________________________ New publications: RST Continuity Corpus<https://catalog.ldc.upenn.edu/LDC2024T08> was developed at Åbo Akademi University and Humboldt-Universität zu Berlin and contains annotations for continuity dimensions added to RST Discourse Treebank (LDC2002T07)<https://catalog.ldc.upenn.edu/LDC2002T07>. RST Discourse Treebank is a collection of English news texts from the Penn Treebank<https://catalog.ldc.upenn.edu/LDC99T42> annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Continuity Corpus, the relations are annotated for the seven continuity dimensions: time, space, reference, action, perspective, modality, and speech act. The relations are also annotated for polarity, order of segments, nuclearity, and context. 2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. * MultiTACRED<https://catalog.ldc.upenn.edu/LDC2024T09> was developed by the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology Lab<https://www.dfki.de/en/web/research/research-departments/speech-and-languag…> and is a machine translation of TAC Relation Extraction Dataset (LDC2018T24)<https://catalog.ldc.upenn.edu/LDC2018T24> (TACRED) into twelve languages with projected entity annotations. TACRED is a large-scale relation extraction dataset containing 106,264 examples built over English newswire and web text used in the NIST TAC KBP English slot filling evaluations during the period 2009-2014. The training and evaluation data for the TAC KBP slot filling tasks was developed by the Linguistic Data Consortium. TACRED training, development, and test splits were translated into Arabic, Chinese, Finnish, French, German, Hindi, Hungarian, Japanese, Polish, Russian, Spanish, and Turkish using DeepL<https://www.deepl.com/> or Google Translate<https://translate.google.com>. The test split was back-translated into English to generate machine-translated English test data. TACRED annotations are specified by token offsets. For translation, tokens were concatenated with white space, and the entity offsets were converted into XML-style markers to denote argument. 2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee. To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

Release: 𝐒𝐢𝐧𝐚𝐓𝐨𝐨𝐥𝐬 – 𝐎𝐩𝐞𝐧 𝐒𝐨𝐮𝐫𝐜𝐞 𝐓𝐨𝐨𝐥𝐤𝐢𝐭 𝐟𝐨𝐫 𝐀𝐫𝐚𝐛𝐢𝐜 𝐍𝐋𝐏 𝐚𝐧𝐝 𝐍𝐋𝐔
by Mustafa Jarrar 15 Oct '24

15 Oct '24

We are happy to release SinaTools - Open Source Toolkit for Arabic NLP and NLU We are excited to release SinaTools - Open Source Toolkit for Arabic NLP and NLU, which consists of Python APIs, command lines, online demos, and many datasets - free for both commercial and non-commercial purposes. It outperforms all related tools in all tasks in speed and accuracy. It includes the following modules: ▸ Morphology Tagger: Lemmatizer, POS tagger, root tagger. ▸ WSD Tagger: Pipeline of semantic taggers: single-word WSD, multi-word WSD, and NER ▸ Synonyms Generator: Extends a set of synonyms with more synonyms. ▸ Semantic Relatedness: Association between two sentences across various dimensions, meaning, underlying concepts, domain-specificity, etc. ▸ Named Entity Recognition: Nested and flat NER, 21 entity types. ▸ Relation Extraction: Extract events and their arguments (agents, locations, and dates). ▸ Diacritic-Based Matching: Decides whether two Arabic words are the same taking into account diacratization compatibility. ▸ Utilities: A set of useful NLP methods for sentence splitting, duplicate word removal, Arabic Jaccard similarity metrics, transliteration, and others. Try and Download: https://sina.birzeit.edu/sinatools. Article: Tymaa Hammouda, Mustafa Jarrar, Mohammed Khalilia: SinaTools: Open Source Toolkit for Arabic Natural Language Understanding <https://www.jarrar.info/publications/HJK24.pdf>. In Proceedings of the 2024 AI in Computational Linguistics (ACLING 2024), Procedia Computer Science, Dubai. ELSEVIER. https://www.jarrar.info/publications/HJK24.pdf --Mustafa __________________________ Mustafa Jarrar, PhD Professor of Artificial Intelligence Chair, PhD Program in Computer Science Birzeit University, Palestine Page: http://www.jarrar.info <http://www.jarrar.info/> SinaLab: https://sina.birzeit.edu <https://sina.birzeit.edu/>

1 0