- Corpora - ELRA lists

SCI-CHAT 2024: Third call for Participation & Papers for Workshop on Simulation of Conversational Intelligence in Chat
by Gerasimos Lampouras 15 Jan '24

15 Jan '24

tl;dr: - submission deadline for research track paper via Softconf: December 18th 2023 - submission deadline for research track submissions already reviewed via ARR: January 17th 2024 https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SCI-CHAT_ARR_Co… - submission deadline for shard task systems: January 20th 2024 https://forms.gle/r7HgxZKgqdencRrHA - submission deadline for shard task system descriptions via SoftConf: January 26th 2024 https://sites.google.com/view/dialogue-evaluation/ Call for Papers The aim of this workshop is to bring together experts working in the area of open-domain dialogue. In this speedily advancing research area many challenges still exist, such as learning information from conversations, engaging in realistic and convincing simulation of human intelligence, reasoning, and so on. SCI-CHAT follows previous workshops on open domain dialogue, but with a focus on the simulation of intelligent conversation, including the ability to follow a challenging topic over a multi-turn conversation, the ability to posit questions, refuting and reasoning with live human evaluation employed as the primary mechanism for evaluating models. The workshop will include a research track and shared task: SCI-CHAT's research track aims to explore recent advances and challenges in open-domain dialogue research. Researchers working on all aspects of open-domain dialogue are invited to submit papers on recent advances, resources, tools, analysis, evaluation, and challenges on the broad theme of open-domain dialogues. The topics of the workshop include but are not limited to the following: - Intelligent conversation, chit-chat, open-domain dialogue; - Automatic and human evaluation of open-domain dialogue; - Limitations, risks and safety in open-domain dialogue; - Instruction-tuned and instruction-enabled models; - Any other topic of interest to the dialogue community. SCI-CHAT's shared task will focus on simulating intelligent conversations; participants will be asked to submit (access to the APIs of) automated dialogue agents with the aim of carrying out nuanced conversations over multiple dialogue turns. Participating systems will be interactively evaluated in a live human evaluation. All data acquired within the context of the shared task will be made public, providing an important resource for improving metrics and systems in this research area. Submission guidelines: Authors are invited to submit their unpublished work that represents novel research through either direct submission or ARR commitment. Papers should consist of up to 8 pages of content, plus unlimited pages for references and appendix. Authors should make use of the EACL Latex Template <https://2023.eacl.org/calls/styles/> alongside supplementary materials, including technical appendices, links to source code, datasets, and multimedia appendices. Papers can also be submitted as non-archival, so that their content can be reused for other venues by adding "(NON-ARCHIVAL)" to the title of their submission. Previously published work can also be submitted as non-archival in the same way, with the additional requirement to state such on the first page. - Direct paper submissions must be submitted through SoftCon submission link: https://softconf.com/eacl2024/SCI-CHAT-2024/ <https://softconf.com/eacl2024/SCI-CHAT-2024/> Multiple submissions of the same paper to more EACL workshops are forbidden. All papers will be double-blind peer-reviewed, by at least 2 program committee members. As such, all submissions, including the main paper and its supplementary materials, should be fully anonymized. For more information on formatting and anonymity guidelines, please refer to EACL guidelines <https://eacl.org/index.html>. Organizers - Yvette Graham (Trinity College Dublin, Ireland) - Qun Liu (Huawei Noah's Ark Lab, China) - Gerasimos Lampouras (Huawei Noah's Ark Lab,UK) - Ignacio Iacobacci (Huawei Noah's Ark Lab, UK) - Sinead Madden (Trinity College Dublin, Ireland) - Haider Khalid (Trinity College Dublin, Ireland) - Rameez Qureshi (Trinity College Dublin, Ireland) Important Dates Regarding Research Track: - Research paper via Softconf: December 18th 2023 - Pre-reviewed ARR commitment deadline: January 17th 2024 - Notification of research paper acceptance: January 20th, 2024 - Camera-ready papers due: January 30th 2024 Regarding Shared Task: - Release of training and development data: November 9th 2023 - Release of baseline systems: November 9th 2023 - Preliminary System submission deadline: January 13th 2024 (optional - if you want help testing your API, please submit early) - System submission (API) deadline: January 20th 2024 - System description paper via SoftConf: January 26th 2024 - Camera-ready papers due: January 30th 2024 Overview of results at one-day workshop: March 21 or 22, 2024 CONTACT: sci-chat(a)adaptcentre.ie

1 0

Final Call for Participation: SemRel SemEval Shared Task 1 (Evaluation starts on January 20th, Q/A Zoom session on January 16th)
by Nedjma OUSIDHOUM 15 Jan '24

15 Jan '24

Dear corpora-list members, We are announcing the first SemEval shared task on Semantic Textual Relatedness (STR): A shared task on automatically detecting the degree of semantic relatedness (closeness in meaning) between pairs of sentences. The semantic relatedness of two language units has long been considered fundamental to understanding meaning (Halliday and Hasan, 1976; Miller and Charles, 1991), and automatically determining relatedness has many applications such as evaluating sentence representation methods, question answering, and summarization. Two sentences are considered semantically similar when they have a paraphrasal or entailment relation. On the other hand, relatedness is a much broader concept that accounts for all the commonalities between two sentences: whether they are on the same topic, express the same view, originate from the same time period, one elaborates on (or follows from) the other, etc. For instance, for the following sentence pairs: - Pair 1: a. There was a lemon tree next to the house. b. The boy enjoyed reading under the lemon tree. - Pair 2: a. There was a lemon tree next to the house. b. The boy was an excellent football player. Most people will agree that the sentences in pair 1 are more related than the sentences in pair 2. In this task, new textual datasets will be provided for Afrikaans <https://en.wikipedia.org/wiki/Afrikaans>, Algerian Arabic <https://en.wikipedia.org/wiki/Algerian_Arabic>, Amharic <https://en.wikipedia.org/wiki/Amharic>, English, Hausa <https://en.wikipedia.org/wiki/Hausa_language>, Hindi <https://en.wikipedia.org/wiki/Hindi>, Indonesian <https://en.wikipedia.org/wiki/Indonesian_language>, Kinyarwanda <https://en.wikipedia.org/wiki/Kinyarwanda>, Marathi <https://en.wikipedia.org/wiki/Marathi_language>, Moroccan Arabic <https://en.wikipedia.org/wiki/Moroccan_Arabic>, Modern Standard Arabic <https://en.wikipedia.org/wiki/Modern_Standard_Arabic>, Punjabi <https://en.wikipedia.org/wiki/Punjabi_language>, Spanish <https://en.wikipedia.org/wiki/Spanish_language>, and Telugu <https://en.wikipedia.org/wiki/Telugu_language>. Data Each instance in the training, development, and test sets is a sentence pair. The instance is labeled with a score representing the degree of semantic textual relatedness between the two sentences. The scores can range from 0 (maximally unrelated) to 1 (maximally related). These gold label scores have been determined through manual annotation. Specifically, a comparative annotation approach was used to avoid known limitations of traditional rating scale annotation methods This comparative annotation process (which avoids several biases of traditional rating scales) led to a high reliability of the final relatedness rankings. Further details about the task, the method of data annotation, how STR is different from semantic textual similarity, applications of semantic textual relatedness, etc. can be found in this paper: https://aclanthology.org/2023.eacl-main.55.pdf Tracks Each team can provide submissions for one, two or all of the tracks shown below: Track A: Supervised Participants are to submit systems that have been trained using the labeled training datasets provided. Participating teams are allowed to use any publicly available datasets (e.g., other relatedness and similarity datasets or datasets in any other languages). However, they must report additional data they used, and ideally report how impactful each resource was on the final results. Track B: Unsupervised Participants are to submit systems that have been developed without the use of any labeled datasets pertaining to semantic relatedness or semantic similarity between units of text more than two words long in any language. The use of unigram or bigram relatedness datasets (from any language) is permitted. Track C: Cross-lingual Participants are to submit systems that have been developed without the use of any labeled semantic similarity or semantic relatedness datasets in the target language and with the use of labeled dataset(s) from at least one other language. Note: Using labeled data from another track is mandatory for submission to this track. Deciding which track a submission should go to: - If a submission uses labeled data in the target language: submit to Track A - If a submission does not use labeled data in the target language but uses labeled data from another language: submit to Track C - If a submission does not use labeled data in any language: submit to Track B ** Here ‘labeled data’ refers to labeled datasets pertaining to semantic relatedness or semantic similarity between units of text more than two words long. Evaluation The official evaluation metric for this task is the Spearman rank correlation coefficient, which captures how well the system-predicted rankings of test instances align with human judgments. You can find the evaluation script for this shared task on our Github page <https://github.com/semantic-textual-relatedness/Semantic_Relatedness_SemEva…> . Helpful Links - Competition Website: https://codalab.lisn.upsaclay.fr/competitions/15704 - Task Website: <https://afrisenti-semeval.github.io/> https://semantic-textual-relatedness.github.io - Twitter/X: <https://twitter.com/AfriSenti2023> https://twitter.com/SemRel2024 - Contact organisers semrel-semeval-organisers(a)googlegroups.com - Google group for participants semrel -semeval-participants(a)googlegroups.com Important Dates - Training data ready: 11 September 2023 - Evaluation Starts: *20 January 2024* - Evaluation End: 31 January 2024 - System Description Paper Due: 19 February 2024 - Notification of acceptance: 01 April 2024 - Camera-ready Due: 22 April 2024 - SemEval workshop: 16-21 June (co-located with NAACL 2024) NB. We will organise a QA mentorship tomorrow (January 16th 2024 from 4 to 5 pm GMT) and a system description writing tutorial in February for all participants, especially students and junior researchers. The zoom links will be shared by email and on Slack. References - Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko. 2019. Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. - M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. London: Longman. - George A Miller and Walter G Charles. 1991. Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1):1–28 - Mohamed Abdalla, Krishnapriya Vishnubhotla, and Saif Mohammad. 2023. What Makes Sentences Semantically Related? A Textual Relatedness Dataset and Empirical Study. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 782–796, Dubrovnik, Croatia. Association for Computational Linguistics. Task Organizers Nedjma Ousidhoum Shamsuddeen Hassan Muhammad Mohamed Abdalla Krishnapriya Vishnubhotla Vladimir Araujo Meriem Beloucif Idris Abdulmumin Seid Muhie Yimam Nirmal Surange Christine De Kock Sanchit Ahuja Oumaima Hourrane Manish Shrivastava Alham Fikri Aji Thamar Solorio Saif M. Mohammad

1 0

CfP: BUCC, 17th Workshop on Building and Using Comparable Corpora
by Pierre Zweigenbaum 15 Jan '24

15 Jan '24

17th Workshop on Building and Using Comparable Corpora --- Call for Papers Co-located with LREC-COLING 2024 Torino, Italia, 20 May 2024 Workshop website: https://comparable.limsi.fr/bucc2024/ LREC-COLING website: BLOCKEDlrec-coling-2024[.]org/BLOCKED Workshop proceedings to be published in the ACL Anthology MOTIVATION In the language engineering and linguistics communities, research in comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is chiefly motivated by the need to use comparable corpora as training data for statistical NLP applications such as statistical and neural machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest because they enable cross-language discoveries and comparisons. It is generally accepted in both communities that comparable corpora consist of documents that are comparable in content and form in various degrees and dimensions across several languages. Parallel corpora are on the one end of this spectrum, unrelated corpora on the other. Comparable corpora have been used in a range of applications, including Information Retrieval, Machine Translation, Cross-lingual text classification, etc. The linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of neural NLP, for example, to extract parallel corpora from comparable corpora for neural machine translation. As such, it is of great interest to bring together builders and users of such corpora. TOPICS We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following: Building Comparable Corpora: - Automatic and semi-automatic methods - Methods to mine parallel and non-parallel corpora from the web - Tools and criteria to evaluate the comparability of corpora - Parallel vs non-parallel corpora, monolingual corpora - Rare and minority languages, across language families - Multi-media/multi-modal comparable corpora Applications of comparable corpora: - Human translation - Language learning - Cross-language information retrieval & document categorization - Bilingual and multilingual projections - (Unsupervised) Machine translation - Writing assistance - Machine learning techniques using comparable corpora Mining from Comparable Corpora: - Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models - Extraction of parallel segments or paraphrases from comparable corpora - Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation) - Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, paraphrases etc. from comparable corpora - Induction of morphological, grammatical, and translation rules from comparable corpora - Induction of multilingual word classes from comparable corpora Comparable Corpora in the Humanities: - Comparing linguistic phenomena across languages in contrastive linguistics - Analyzing properties of translated language in translation studies - Studying language change over time in diachronic linguistics - Assigning texts to authors via authors' corpora in forensic linguistics - Comparing rhetorical features in discourse analysis - Studying cultural differences in sociolinguistics - Analyzing language universals in typological research IMPORTANT DATES 21 Feb 2024: Paper submission deadline 24 Mar 2024: Notification of acceptance 7 Apr 2024: Camera-ready final papers 20 May 2024: Workshop date For updates, please see the workshop website at https://comparable.limsi.fr/bucc2024/ PRACTICAL INFORMATION The workshop is an in-person event. Workshop registration is via the main conference registration site, see BLOCKEDlrec-coling-2024[.]org/BLOCKED The workshop proceedings will be published in the ACL Anthology. SUBMISSION GUIDELINES Please follow the style sheet and templates (for LaTeX, Overleaf and MS-Word) provided for the main conference at BLOCKEDlrec-coling-2024[.]org/authors-kit/BLOCKED Papers should be submitted as a PDF file using the START conference manager at https://secure-web.cisco.com/1UUoVNXimK0Jzna4dQKSutgJlLRB94SkbvGnq5AUpyqLNT… Submissions must describe original and unpublished work and range from 4 to 8 pages plus unlimited references. Reviewing will be double blind, so the papers should not reveal the authors' identity. Accepted papers will be published in the workshop proceedings, which will be included in the ACL Anthology. Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately (i.e. as soon as known to the authors) notified to the workshop organizers by e-mail. For further information and updates, please see the BUCC 2024 website: https://comparable.limsi.fr/bucc2024/ WORKSHOP ORGANIZERS - Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay, France) - Reinhard Rapp (University of Mainz and Magdeburg-Stendal University of Applied Sciences, Germany) - Serge Sharoff (University of Leeds, United Kingdom) Contact: pz (at) lisn (dot) fr PROGRAMME COMMITTEE - Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences, Iran) - Thierry Etchegoyhen (Vicomtech, Spain) - Kyo Kageura (University of Tokyo, Japan) - Natalie Kübler (Université Paris Cité, France) - Philippe Langlais (Université de Montréal, Canada) - Yves Lepage (Waseda University, Japan) - Shervin Malmasi (Amazon, USA) - Michael Mohler (Language Computer Corporation, USA) - Emmanuel Morin (Nantes Université, France) - Dragos Stefan Munteanu (Language Weaver, Inc., USA) - Ted Pedersen (University of Minnesota, Duluth, USA) - Ayla Rigouts Terryn (KU Leuven, Belgium) - Reinhard Rapp (University of Mainz and Magdeburg-Stendal University of Applied Sciences, Germany) - Nasredine Semmar (CEA LIST, Paris, France) - Silvia Severini (Leonardo Labs, Italy) - Serge Sharoff (University of Leeds, UK) - Richard Sproat (OGI School of Science & Technology, USA) - Tim Van de Cruys (KU Leuven, Belgium) - Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay, France)

1 0

CAiSE'24: First Call for Journal First Submissions
by Announce 15 Jan '24

15 Jan '24

*** First Call for Journal First Submissions *** 36th International Conference on Advanced Information Systems Engineering (CAiSE'24) June 3-7, 2024, 5* St. Raphael Resort and Marina, Limassol, Cyprus https://cyprusconferences.org/caise2024/ (*** Submission Deadline: 31st March, 2024 AoE ***) CAiSE 2024 is organising journal-first sessions as part of the scientific program. The aim of these sessions is to disseminate recent important research contributions and spark discussions between authors and researchers in the CAiSE community. Authors of selected journal articles on CAiSE-related topics will be invited to present their work at the conference. SCOPE For the journal-first sessions, we solicit submissions related to articles that have been accepted for publication by a reputable journal and that meet the following criteria: • The article relates to the topics of the CAiSE conference and the recent call for papers. • The article is an original submission to the journal and not an extension of an earlier conference or workshop paper. • The article is an original research article; review articles or commentaries will not be considered. • The article was accepted for publication by a journal on or after 1 January 2023, the acceptance must have been publicly announced, the article must be available at the publisher’s website (e.g., as "articles in advance" or published on a journal’s website), and the article must be written in English. • The article has not been presented at, and is not under consideration for, journal-first tracks of other conferences. FORMAT Accepted submissions will be presented as part of the CAiSE 2024 scientific programme. SUBMISION Submissions must be done electronically via Easychair (https://easychair.org/my/conference?conf=caise2024) and include: • Title and author information of the article. • The original abstract and keywords. • DOI of the original publication or, alternatively, a link to the publication at the journal’s website. EVALUATION All submissions will be reviewed by the track chairs with the aim to accept all qualifying submissions subject to ability to accommodate them in the program. If needed, priority will be given to submissions according to their topical fit with the scope of the conference, the importance of the contribution, as well as the standing of the respective journal (including, but not limited to, the journal's impact factor and ranking results). ATTENDANCE AND PRESENTATION At least one author of each submission accepted for the journal-first track must register and attend the conference to present the work. The author needs a full registration to present the journal article. As the articles of the journal-first track have been published already, they will not be part of the CAiSE 2024 proceedings. The articles will be listed in the conference program and CAiSE 2024 participants will have access to the respective abstracts and a pointer to the original journal article. IMPORTANT DATES • Submission: 31st March, 2024 (AoE) • Notification of Acceptance: 14th April, 2024 • Author Registration: 17th May, 2024 • Conference Dates: 3rd-7th June, 2024 JOURNAL FIRST CHAIRS • Paolo Giorgini, University of Trento, Italy • Jeffrey Parsons, Memorial University of Newfoundland, Canada

1 0

Postdoc Research Fellows in the field of AI/Natural Language Processing, UKP Lab Darmstadt, Germany
by Niemann, Elisabeth 15 Jan '24

15 Jan '24

The UKP Lab at the Department of Computer Science, Technical University Darmstadt, Germany, is hiring several *** Postdoc Research Fellows in the field of AI/Natural Language Processing. *** Areas of work include Conversational AI, Multimodal fact-checking, Interactive Code Generation, NLP for mental health and privacy-aware NLP. It is also possible to propose a topic bottom-up. https://www.informatik.tu-darmstadt.de/ukp/ukp_home/jobs_ukp/2023_postdoc_u… Join our internationally recognized team at TU Darmstadt, enjoy diverse opportunities for professional development, and conduct cutting-edge research! Application deadline: January 30th, 2024. Please submit your application via the following form: https://careers.ukp.informatik.tu-darmstadt.de/ukprecruitment -------------------------------------------------------------------- Prof. Dr. Iryna Gurevych UKP Lab Technical University Darmstadt, Germany http://www.ukp.tu-darmstadt.de/

1 0

EvaLatin 2024: dependency parsing and emotion polarity detection shared tasks
by Passarotti Marco Carlo (marco.passarotti) 15 Jan '24

15 Jan '24

EvaLatin, at its third edition, is the campaign devoted to the evaluation of NLP tools for Latin. This year we invite all those interested in parsing and sentiment analysis to undertake the challenge of working on Latin by partecipating in the following tasks: - dependency parsing; - emotion polarity detection. Test sets for both tasks will be released on the EvaLatin 2024 web page in the first half of February. Check all the important dates here: https://circse.github.io/LT4HALA/2024/EvaLatin EvaLatin 2024 is organized as part of the "Workshop on Language Technologies for Historical and Ancient Languages" (LT4HALA) which will be held in Turin on May 25, 2024 in the context of the LREC-COLING 2024 conference. Prof. Marco C. Passarotti Computational Linguistics Index Thomisticus Treebank https://itreebank.marginalia.it/ ERC Grantee, P.I. LiLa https://lila-erc.eu/ (Grant Agreement No. 769994) CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html [cid:38DBA4B0-3169-48DD-B59A-4F3A679F9DD9@lan] [cid:D415BF3A-E244-4BC4-9FB5-064066B300AD@lan] [cid:13BA173A-59CB-4F2D-9B90-DE302E870A50@lan] Università Cattolica del Sacro Cuore Largo Gemelli, 1 20123 Milan, Italy marco.passarotti(a)unicatt.it<mailto:marco.passarotti@unicatt.it> tel. +39-02-72342380 [http://static.unicatt.it/ext-portale/5xmille_firma_mail_2023.jpg] <https://www.unicatt.it/uc/5xmille>

1 0

Primary text for NAIST Coref / Kyoto Corpora
by Christian Chiarcos 15 Jan '24

15 Jan '24

Dear list members (esp. those in the Japanese community), for a cross-linguistic evaluation of co-reference annotations, I was interested into looking into the NAIST Coreference Corpus, which is based on the Kyoto Corpus. Luckily, both annotations are available, but not the primary text. According to the documentation of both corpora, it is necessary to acquire the Mainichi Shimbun CD-ROM (1995), first. I really tried my best, and I followed several catalogues (incl. https://www.jaist.ac.jp/project/NLP_Portal/doc/LR/lr-cat-e.html#jp:mainichi…), but the URL is points to ( https://www.nichigai.co.jp/sales/mainichi/mainichi-data.html) isn't operational any more. Does anyone know where and how to buy that CDROM? Is there another way to get access to that data? Thanks a lot, Christian

1 1

First CFP: CNLP4DH Call for Papers
by Nicolas Nicolas 15 Jan '24

15 Jan '24

Journal of Data Mining and Digital Humanities (JDMDH) organizes a call for papers about the topic Chinese Natural Language Processing for Digital Humanities (CNLP4DH) As a reminder JDMDH is an international-based journal managed by French national research institutions and green open access (no charge for readers and authors). This special issue is dedicated to natural language processing for digital humanities involving the documents written in Chinese, including Modern, Ancient and dialectal Chinese. Mandarin, which is the national official and main common language, can be accepted and research on texts written in other languages, such as Tibet, Inner Mongolia, etc., is also welcome. A list of suitable topics includes but are not limited to: - Text analysis and processing related to humanities using computational methods - Dataset creation and curation for NLP (e.g. digitization, datafication, and data preservation). - Research on cultural heritage collections such as national archives and libraries using NLP - NLP for error detection, correction, normalization and denoising data - Generation and analysis of literary works such as poetry and novels - Analysis and detection of text genres - Word segmentation, part-of-speech tagging of Ancient Chinese - Large Language Models (LLM) for Chinese in Digital Humanities - Cross modal Models (text-speech-video-image) for Chinese in Digital Humanities - Visualization of text analytics - Ontology models for natural language text - Applications in Chinese Literature, Traditional Chinese medicine, Learning Chinese language as second language, Sentiment Analysis in Chinese Social Media, China Cultural Heritage, Chinese History, Ancient Chinese language Website and more details: https://jdmdh.episciences.org/page/chinese-natural-language-processing-for-… submission guideline: https://jdmdh.episciences.org/page/submissions Paper submission : https://jdmdh.episciences.org/submit Guest Editors: Dr. Wenhe FENG (Guangdong University of Foreign Studies, Laboratory of Language Engineering and Computing) Dr. Bin LI (Nanjing Normal University, School of Chinese Language and Literature, Center of Linguistic Big Data and Computational Humanities) Dr. Nicolas TURENNE (Guangdong University of Foreign Studies, School of Information Science and Technology) Dr. Tong WEI (Beijing University, Digital Humanities Center) ************************************************************************************

1 0

[2nd CfP]: Computational Approaches to Language Data Pseudonymization @ EACL 2024
by Elena Volodina 14 Jan '24

14 Jan '24

Second Call for papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024 Website: https://mormor-karl.github.io/events/CALD-pseudo/ Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/ Submission Deadline: Monday, 18 December 2023 We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024. [Important Dates] * December 18, 2023: paper submission deadline * January 17, 2024: resubmission of already pre-reviewed ARR papers * January 20, 2024: notification of acceptance * January, 30 2024: camera-ready papers due * March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL) [Introduction] Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected. [Topics of Interest] CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as * automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person; * developing context-sensitive algorithms for replacement of personal information in unstructured data; * studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization; * effectiveness of pseudonymization as a way of protecting writer identity; * reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data; * constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those; * approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data; * pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance; * and numerous other open questions. [Submission Guidelines] Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories: * Full papers (up to 8 pages) for substantial contributions * Short papers (up to 4 pages) for ongoing or preliminary work All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/ Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.) We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop. [Invited speakers] We are happy to announce that the workshop will host two invited speakers: * Anders Søgaard, University of Copenhagen, Denmark * Ildikó Pilán, the Norwegian Computing Center, Norway [Workshop Organizers] * Elena Volodina, University of Gothenburg, Sweden * Therese Lindström Tiedemann, University of Helsinki, Finland * Simon Dobnik, University of Gothenburg, Sweden * Xuan-Son Vu, Umeå university, Sweden [Program Committee] A list of program committee members is available on the workshop website. [Contact] For inquiries, please contact mormor.karl(a)svenska.gu.se ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat… ___________________ Elena Volodina, PhD, Docent https://spraakbanken.gu.se/en/about/staff/elena Life is like a mirror. Smile at it and it smiles back at you. Peace Pilgrim

1 1

1st CFP: Bridging Neurons and Symbols for NLP & Knowledge Graphs Reasoning @ LREC-COLING 2024
by Erhard Hinrichs 14 Jan '24

14 Jan '24

[apologies for potential cross-posting] ================================================================================================== Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning @ LREC-COLING 2024 ===================================== Co-located with LREC-COLING in Turin, Italy 21st May 2024 Workshop webpage:https://neusymbridge.github.io/ Call for Papers -------------------- The 1st Workshop on Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning — to be held at LREC-COLING 2024 — will promote two directions for exploring neural reasoning: starting from existing neural networks to enhance the reasoning performance with the target of symbolic-level reasoning, and starting from symbolic reasoning to explore its novel neural implementation. These two directions will ideally meet somewhere in the middle and will lead to representations that can act as a bridge for novel neural computing, which qualitatively differs from traditional neural networks, and for novel symbolic computing, which inherits the good features of neural computing. Hence the name of our workshop, with a focus on Natural Language Processing and Knowledge Graph reasoning. Topics (include, but are not limited to) -------------------------------------------------- • Proposing novel knowledge representations that are derived from transdisciplinary research • Using knowledge graphs or other types of symbolic Knowledge to improve the quality of LLMs • Exploring the reasoning mechanism of LLMs • Distilling symbolic knowledge from LLMs • Proposing benchmark datasets and evaluation matrices for neuro-symbolic approaches to NLP tasks • Proposing novel NLP tasks for neuro-symbolic approaches • NLP applications in classification, sense-disambiguation, sentiment analysis, question-answering, knowledge graph reasoning • Critical analysis of traditional deep learning or LLMs • Analysing spatial reasoning of LLMs • Proposing novel neural computing that may reach symbolic-level reasoning • Proposing benchmark datasets and matrices to evaluate the gap between neural reasoning and symbolic reasoning • Addressing efficiency issues in neuro-symbolic systems • Identifying challenges and opportunities of neuro-symbolic systems • Developing retrieval augmented models for combining KG and LLMs • Applying neuro-symbolic approaches to humor generation and other real-life applications Submissions: ------------------ • The papers should be submitted as a PDF document, conforming to the formatting guidelines provided in the call for papers of LREC-COLING conference (https://lrec-coling-2024.org/authors-kit/) • Submissions via Softconf/START Conference Manager athttps://softconf.com/lrec-coling2024/neusymbridge2024/ Important Dates --------------------- • Submission Deadline: Mar 3rd • Notification of Acceptance: April 10th • Camera Ready Deadline: Apr 21st • Workshop: May 21st Keynotes -------------------------------- • Pascale Fung - The Hong Kong University of Science and Technology • Alessandro Lenci - Università di Pisa • Juanzi Li - Tsinghua University • Volker Tresp - Ludwig Maximilian University of Munich Organisation Committee -------------------------------- • Tiansi Dong - Fraunhofer IAIS • Erhard Hinrichs - University of Tübingen • Zhen Han - Amazon Inc. • Kang Liu - Chinese Academy of Sciences • Yangqiu Song - The Hong Kong University of Science and Technology • Yixin Cao - Singapore Management University • Christian F. Hempelmann - Texas A&M-Commerce • Rafet Sifa - University of Bonn Programme Committee ------------------------------- • Claire Bonial - U.S. Army DEVCOM Army Research Laboratory • Meiqi Chen - Peking University • Shuo Chen - Ludwig Maximilian University of Munich • Hejie Cui - Emory University • Xinyu Dai - Nanjing University • Zifeng Ding - Ludwig Maximilian University of Munich • Kathrin Erk - The University of Texas at Austin • Irlan G Gonzalez - Bosch Center for Artificial Intelligence • Shizhu He - Institute of Automation, Chinese Academy of Sciences • Bailan He - Ludwig Maximilian University of Munich • Jens U. Kreber - Saarland University • Sandra Kübler - Indiana University • Hang Li - Ludwig Maximilian University of Munich • Honglei Li - Northumbria University • Yong Liu - Plunk • Xinze Liu - Nanyang Technological University • Xin Liu - Amazon Inc. • Tong Liu - Ludwig Maximilian University of Munich • Yunfei Long - Essex University • Yubo Ma - Nanyang Technological University • Emanuele Marconato - University of Trento • Petra Osenova - University of Sofia • Parth Padalkar - University of Texas at Dallas • Martha Palmer - University of Colorado • Barbara Plank - Ludwig Maximilian University of Munich • Julia Rayz - Purdue University • Ryan Riegel - IBM Research • Timo Schick - Meta AI • Christoph Schommer - University of Luxembourg • Wangtao Sun - Institute of Automation, Chinese Academy of Sciences • Xun Wang - Microsoft Corporation • Jingpei Wu - Ludwig Maximilian University of Munich • Kai Xiong - Harare Institute of Technology • Yuan Yang - Georgia Institute of Technology • Michihiro Yasunaga - Stanford University • Jiahao Ying - Singapore Management University • Ziqian Zeng - South China University of Technology • Hongming Zhang - Tencent AI Lab, Seattle • Gengyuan Zhang - Ludwig Maximilian University of Munich ==================================================================================================

1 0

2026

2025

2024

2023

2022

Corpora