August 2022 - Corpora

August 2022 Newsletter - LDC
by Penn LDC 18 Aug '22

18 Aug '22

In this newsletter: Fall 2022 LDC Data Scholarship Program 30th Anniversary Highlight: The LDC Gigawords ________________________________ New publication: HAVIC MED Novel 2 Test - Videos, Metadata and Annotation<https://catalog.ldc.upenn.edu/LDC2022V02> Fall 2022 LDC Data Scholarship Program Student applications for the Fall 2022 LDC Data Scholarship program are being accepted now through September 15, 2022. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor. For application requirements and program rules, visit the LDC Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>. 30th Anniversary Highlight: The LDC Gigawords Giga: a combining form meaning "billion," used in the formation of compound words (Source: https://www.dictionary.com/browse/giga-) LDC's Gigaword corpora are a natural outgrowth of its vast decades-long multi-language newswire collection. Newswire data was originally collected, annotated, and distributed for use in many sponsored projects and was also released through the LDC catalog in tailored data sets. Then came the idea of making LDC's entire newswire collection available by language with a simple, minimal markup to support a broad range of NLP/HLT tasks. The first Arabic<https://catalog.ldc.upenn.edu/LDC2011T11>, Chinese<https://catalog.ldc.upenn.edu/LDC2011T13>, and English<https://catalog.ldc.upenn.edu/LDC2011T07> Gigaword editions were released in 2003; subsequent cumulative releases through fifth editions in 2011 represent LDC's newswire collection spanning 1994-2010 in those languages. French<https://catalog.ldc.upenn.edu/LDC2011T10> and Spanish<https://catalog.ldc.upenn.edu/LDC2011T12> Gigawords were first published in 2006, culminating in the release of third editions in 2011, likewise covering newswire collected by LDC through 2010. The community has used, and continues to use, these data sets in numerous ways. Automatic text summarization is a favorite, and current work in this area applies deep learning principles (see, e.g., Gao et al. 2020<https://link.springer.com/article/10.1007/s00521-018-3946-7>, English). Gigawords are also useful for text source classification (Huang et al. 2003<https://aclanthology.org/Y08-1042.pdf>, Chinese), information extraction (Lan et al. 2020<https://arxiv.org/pdf/2004.14519.pdf>, Arabic), knowledge extraction and distributional semantics (Napoles et al. 2012<https://aclanthology.org/W12-3018.pdf>, English), and natural language understanding (Ganitkevitch 2013<https://www.cs.jhu.edu/~juri/pdf/proposal-naacl-2013-srw.pdf>, English), among other fields. Recent variations like the annotated<https://catalog.ldc.upenn.edu/LDC2012T21> and concretely annotated<https://catalog.ldc.upenn.edu/LDC2018T20> English Gigawords add syntactic, semantic, and coreference annotations to this billion word text collection. All Gigaword corpora are available for licensing by Consortium members and non-members. Visit Obtaining Data <https://www.ldc.upenn.edu/language-resources/data/obtaining> for more information. ________________________________ New publication: HAVIC MED Novel 2 Test - Videos, Metadata and Annotation<https://catalog.ldc.upenn.edu/LDC2022V02> is comprised of 6,200 hours of user-generated videos with annotation and metadata developed by LDC for the 2015 NIST Multimedia Event Detection tasks. The data consists of videos of various events (event videos) and videos completely unrelated to events (background videos). Each event video was manually annotated with judgments describing its event properties and other salient features. Background videos were labeled with topic and genre categories. HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation is distributed via web download. 2022 Subscription Members will automatically receive copies of this corpus. 2022 Standard Members may request a copy as part of their 16 free membership corpora. This corpus is a members-only release and is not available for non-member licensing. Contact ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> for information about membership. Membership Coordinator Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu> M: 3600 Market St. Suite 810 Philadelphia, PA 19104

1 0

Re: Corpora Digest, Vol 219, Issue 1
by Rocco Tripodi 18 Aug '22

18 Aug '22

Il giorno ven 12 ago 2022 alle 14:00 <corpora-request(a)list.elra.info> ha scritto: > Send Corpora mailing list submissions to > corpora(a)list.elra.info > > To subscribe or unsubscribe via email, send a message with subject or > body 'help' to > corpora-request(a)list.elra.info > > You can reach the person managing the list at > corpora-owner(a)list.elra.info > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Corpora digest..." > > Today's Topics: > > 1. [CfP] TREC Health Misinformation Track 2022 (Maria Maistro) > 2. [CfP] ACM TOIS Efficiency in Neural IR (Maria Maistro) > 3. Call for Badges - ACM SIGIR Artifact Badges Continuous Submission > (Nicola Ferro) > 4. Call for proposals: Natural Language Processing (John Benjamin’s) > (Caro) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 12 Aug 2022 08:03:18 +0000 > From: Maria Maistro <mm(a)di.ku.dk> > Subject: [Corpora-List] [CfP] TREC Health Misinformation Track 2022 > To: "corpora(a)list.elra.info" <corpora(a)list.elra.info> > Message-ID: <86B5F708-9063-456A-B790-888B9639E00F(a)ku.dk> > Content-Type: multipart/alternative; > boundary="_000_86B5F7089063456AB790888B9639E00Fkudk_" > > Call for Participation - TREC Health Misinformation Track 2022 > https://trec-health-misinfo.github.io > > Overview 🧐 > -------------------------- > Web search engines are frequently used to help people make decisions about > health-related issues. Unfortunately, the web is filled with misinformation > regarding the efficacy of treatments for health issues. Search users may > not be able to discern correct from incorrect information, nor credible > from non-credible sources. As a result of finding misinformation deemed by > the user to be useful to their decision making task, they can make > incorrect decisions that waste money and put their health at risk. > > The TREC Health Misinformation track fosters research on retrieval methods > that promote reliable and correct information over misinformation for > health-related decision making tasks. > > Tasks 💼 > -------------------------- > * Ad-hoc Retrieval Task: design a ranking model that promotes credible and > correct information over incorrect information; > * Answer Prediction Task: predict the answer to the topic’s stance. > > Guidelines 📋 we u guy > -------------------------- > * Corpus: noclean version of the C4 dataset ( > https://huggingface.co/datasets/allenai/c4); > * Topics: about consumer health search (people seeking health advice > online); > * Runs: runs may be either automatic or manual with the standard TREC run > format. > > Detailed guidelines: https://trec-health-misinfo.github.io > > Important Dates 🔥 > -------------------------- > * Runs due from participants: August 28, 2022 > * Evaluation results returned: End of September 2022 > * Notebook paper due: October 2022 > * TREC 2022 Conference: November 14-18, 2022 > * Final paper due: February 2023 > > Organization 👔 > -------------------------- > * Charles Clarke, University of Waterloo > * Maria Maistro, University of Copenhagen > * Mark Smucker, University of Waterloo > > > ——— > > Maria Maistro, PhD > Tenure-track Assistant Professor > Department of Computer Science > University of Copenhagen > Universitetsparken 5, 2100 Copenhagen, Denmark >

1 0

Final CFP: The 7th Arabic Natural Language Processing Workshop, WANLP-7 2022, / Co-located with EMNLP 2022
by Wajdi Zaghouani 18 Aug '22

18 Aug '22

*** Apologies for Cross-Posting *** The 7th Arabic Natural Language Processing Workshop (WANLP2022) will be a full-day event taking place on December 8, 2022 (in a hybrid mode). This year’s WANLP is co-located with EMNLP 2022 in Abu Dhabi, United Arab Emirates. Workshop URL: http://wanlp2022.arabic-nlp.net/ Submission URL: https://softconf.com/emnlp2022/WANLP2022 Important Dates - September 5: Workshop Paper Due Date - October 10: Notification of Acceptance - October 21: Camera-ready papers due (strict!) - December 7-8: Workshop Dates We invite submissions on topics that include, but are not limited to, the following: - Enabling core technologies: morphological analysis, disambiguation, tokenization, POS tagging, named entity detection, chunking, parsing, semantic role labeling, sentiment analysis, Arabic dialect modeling, etc. - Applications: machine translation, speech recognition, speech synthesis, optical character recognition, pedagogy, assistive technologies, social media, etc. - Resources: dictionaries, annotated data, corpus, etc. Submissions may include work in progress as well as finished work. Submissions§ must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, classical, or mixed. Papers on other languages sharing problems faced by Arabic NLP researchers, such as Semitic languages or languages using Arabic script, are welcome provided that they propose techniques or approaches that would be of interest to Arabic NLP, and they explain why this is the case. Additionally, papers on efforts using Arabic resources but targeting other languages are also welcome. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work. We have several submission tracks including long, short, and demo tracks. If you have any questions, please contact us at: wanlp2022(a)gmail.com The WANLP 2022 Organizing Committee http://wanlp2022.arabic-nlp.net/ ---- *Wajdi Zaghouani, Ph.D.* *Assistant Professor* College of Humanities and Social Sciences P.O. Box 34110 | Education City | Doha, Qatar tel: +974 4454 5601 | mob: +974 33454992 wzaghouani(a)hbku.edu.qa| Office A141, LAS Building

1 0

Call for applications: 1 year MRes in Translation and Interpreting Studies at University of Surrey
by Constantin Orasan 18 Aug '22

18 Aug '22

The Centre for Translation Studies (CTS) at University of Surrey invites applications for a place in our MRes in Translation and Interpreting Studies course. Students attending this course get in-depth, systematic research training in translation and interpreting, and customised preparation for a PhD and an academic career. This unique and innovative course is the first of its kind in the UK and draws on the research areas CTS is well known for: translation and interpreting technologies, translation process research, translation as intercultural mediation, corpus-based translation, audiovisual translation and multimodality studies. CTS has more recently embarked on exciting, fast-developing areas, including machine translation, Natural Language Processing for translation/interpreting and hybrid workflows in translation/interpreting. The research we carry out at CTS is in touch with recent technological and social developments, as we maintain a strong focus on the responsible integration of technologies in workflows where multilingual and multimodal mediation is key. By studying with us, you'll join our internationally recognised Centre for Translation Studies, thus benefiting from a combination of leading research expertise and professional relevance and honing skills you will need in order to thrive in academia or in the industry. As an MRes student, you will take two compulsory taught modules and select two optional modules (60 credits). You will then complete your degree with an MRes in Translation and Interpreting Studies Dissertation (120 credits). The dissertation, which is longer than a typical MA dissertation, will enable you to research a topic in greater depth than is the case in a conventional MA project format. This year, we invite in particular students interested in pursuing dissertation topics related to machine translation, corpora in translation and interpreting, and the use of NLP for translation and interpreting. For further inspiration, take a look at what our current students say about the course and their MA projects: https://www.surrey.ac.uk/student-life/what-our-students-say/zeynep-polat-po… And for more details about the programme or how to apply visit: https://www.surrey.ac.uk/postgraduate/translation-and-interpreting-studies-… If you feel that an MRes is not for you, you can check our other postgraduate courses on topics related to translation and interpreting at: https://www.surrey.ac.uk/centre-translation-studies/study/postgraduate-cour… --- Prof Constantin Orăsan Professor of Language and Translation Technologies Centre for Translation Studies | School of Literature and Languages Personal page: https://www.surrey.ac.uk/people/constantin-orasan Office: 06LC03, Phone: +44 (0) 1483 68 4115 Library and Learning Centre, University of Surrey, Guildford, Surrey, GU2 7XH, UK

1 0

Europhras’2022 / extended early bird registration until 9 September 2022
by Amal EL FARHMAT 18 Aug '22

18 Aug '22

*Europhras’2022* International Conference ‘*Computational and Corpus-based Phraseology’* Malaga, 28-30 September 2022 The forthcoming international conference ‘Computational and Corpus-based Phraseology’ (Europhras 2022) will take place in Malaga on 28, 29 and 30 September 2022. We are delighted to announce the new website of the conference : https://europhras.com/2022/ *Conference topics* The conference will focus on interdisciplinary approaches to phraseology and invited submissions on a wide range of topics, covering, but not limited to: computational, corpus-based, psycholinguistic and cognitive approaches to the study of phraseology, and practical applications in computational linguistics, translation, lexicography and language learning, teaching and assessment. These topics cover include the following: *Computational approaches to the study of multiword expressions*, e.g. automatic detection, classification and extraction of multiword expressions; automatic translation of multiword expressions; computational treatment of proper names; multiword expressions in NLP tasks and applications such as parsing, machine translation, text summarisation, term extraction, web search; *Corpus-based approaches to phraseology*, e.g. corpus-based empirical studies of phraseology, task-orientated typologies of phraseological units (e.g. for annotation, lexicographic representation, etc.), annotation schemes, applications in applied linguistics and more specifically translation, interpreting, lexicography, terminology, language learning, teaching and assessment (see also below); *Phraseology in mono- and bilingual lexicography and terminography*, e.g. new forms of presenting phraseological units in dictionaries and other lexical resources based on corpus-based and corpus-driven approaches; domain-specific terminology; *Phraseology in translation and cross-linguistic studies*, e.g. use parallel and comparable corpora for translating of phraseological units; phraseological units in computer-aided translation; study of phraseology across languages; *Phraseology in specialised languages and language dialects*, e.g. phraseology of specialised languages, study of phraseological use in different dialects or varieties of a specific language; *Phraseology in language learning, teaching and assessment*: e.g. second language/bilingual processing of phraseological units and formulaic language; phraseological units in learner language; *Theoretical and descriptive approaches to phraseology*, e.g. phraseological units and the lexis-grammar interface, the relevance of phraseology for theoretical models of grammar, the representation of phraseological units in constituency and dependency theories, phraseology and its interaction with semantics; *Cognitive and psycholinguistic approaches*: e.g. cognitive models of phraseological unit comprehension and production; on-line measures of phraseological unit processing (e.g. eye tracking, event-related potentials, self-paced reading); phraseology and language disorders; phraseology and text readability; The above list is indicative and not exhaustive. Any submission presenting a study related to the alternative terms of phraseological units, multiword expressions, multiword units, formulaic language or polylexical expressions, will be considered. The Springer volume and the e-proceedings will be both available at the conference. In addition, call for follow up papers will be announced after the conference and the accepted papers reporting these new studies will be published as peer-reviewed and/or indexed volume (in English). A collection of papers in Spanish will be published in an indexed journal (2023). *Schedule* 28-30 September 2022 - conference takes place in Malaga *Keynote Speakers* Jean-Pierre Colson, Université Catholique de Louvain Miloš Jakubíček, Lexical Computing María del Carmen Mellado Blanco, University of Santiago de Compostela Aline Villavicencio, Federal University of Rio Grande do Sul and University of Essex Conference and Programme Committee Co-Chairs Gloria Corpas Pastor, University of Malaga Ruslan Mitkov, University of Wolverhampton Programme committee Margarita María Alonso Ramos, University of A Coruña M. Belén Alvarado Ortega, University of Alicante Verginica Barbu Mititelu, Romanian Academy Ignacio Bosque, Complutense University of Madrid María Luisa Carrió-Pastor, Polytechnic University of Valencia Anna Čermáková, University of Cambridge Parthena Charalampidou, Aristotle University of Thessaloniki Ken Church, Baidu Jean-Pierre Colson, Université Catholique de Louvain Dmitrij Dobrovolskij, Russian Language Institute Peter Ďurčo, University of St. Cyril and Methodius Natalia Filatkina, University of Hamburg Elizaveta Goncharova, National Research University, Artificial Intelligence Research Institute (AIRI) María Isabel González Rey, University of Santiago de Compostela Stefan Gries, University of California Enrique Gutiérrez Rubio, Palacký University Olomouc Kleanthes K. Grohmann, University of Cyprus Amal Haddad Haddad, University of Granada Miloš Jakubíček, Sketch Engine Eva Lucía Jiménez-Navarro, University of Cordoba Cvetana Krstev, University of Belgrade Natalie Kübler, Université Paris Cité Maria Kunilovskaya, University of Wolverhampton Ljubica Leone, Lancaster University Óscar Loureda Lamas, Heidelberg University Elvira Manero Richard, University of Murcia Ramón Martí Solano, University of Limoges María del Carmen Mellado Blanco, University of Santiago de Compostela Flor Mena Martínez, University of Murcia Pedro Mogorrón Huerta, University of Alicante Johanna Monti, “L’Orientale” University of Naples Esteban Tomás Montoro del Arco, University of Granada Inés Olza Moreno, University of Navarra Adriane Orenha Ottaiano, São Paulo State University Antonio Pamies Bertrán, University of Granada Rozane Rebechi, Federal University of Rio Grande do Sul Mª Ángeles Recio Ariza, University of Salamanca Ute Römer, Georgia State University Leonor Ruiz Gurillo, University of Alicante Kathrin Steyer, University of Mannheim Joanna Szerszunowicz, University of Bialystok Yukio Tono, Tokyo University of Foreign Studies Agnès Tutin, University of Grenoble Alpes Tom Wasow, Stanford University Eric Wehrli, University of Geneva Stefanie Wulff, University of Florida Aline Villavicencio, Federal University of Rio Grande do Sul and University of Sheffield Michael Zock, Laboratoire d’Informatique Fondamentale de Marseille *Organisation and sponsors* The forthcoming international conference ‘Computational and Corpus-based Phraseology’ is jointly organised by the University of Malaga (Research Group in Lexicography and Translation), the University of Wolverhampton (Research Group in Computational Linguistics) and the Association for Computational Linguistics - Bulgaria. The Sketch Engine is the official sponsor of the conference. *Accompanying events* The 5th edition of the Workshop on Multiword Units in Machine Translation and Translation Technology (MUMTTT 2022) will take place as part of Europhras 2022. In addition, as part of Europhras 2022 a Sketch Engine tutorial will be given by Miloš Jakubíček, CEO, Lexical Computing. *Further information and contact details* Registration for EUROPHRAS 2022 is now open. To register, please complete the *registration form* <https://url6b.mailanyone.net/v1/?m=1nkNMi-0001pg-3h&i=57e1b682&c=wgKlKznP1z…> . *** The early bird registration has been extended until 9 September 2022 *** The conference website (https://europhras.com/2022/) will be updated on a regular basis. For further information, please email europhras2022(a)gmail.com Best regards, EUROPHRAS 2022 Organising Committee

1 0

Invitation of paper submissions to special issue "Mathematical and Computational Modeling of Language and Social Behaviors” in Mathematics (IF=2.592, Q1)
by WAN, Mingyu [CBS] 18 Aug '22

18 Aug '22

Dear All, We are the guest editors of the special issue “Mathematical and Computational Modeling of Language and Social Behaviors” in Mathematics<https://www.mdpi.com/journal/mathematics> (IF=2.592, Q1). We would like to call for papers to the above special issue from people whose research interest include computational linguistics and the related areas. Deadline for manuscript submissions: 30 June 2023. The aim of the special issue is to highlight the contributions of quantitative modeling and NLP technology to understanding collective human behaviors and to help resolve some of the greatest challenges of our time. We welcome new or improved methods to model linked data from heterogeneous sources and their computational application to solve some real-world problems relating to languages and social behaviors. Topics of interest include such as Sentiment and/or Emotion Analysis, fake news detection, FinNLP and Medical Informatics. Check the details about the special issue through the link: https://www.mdpi.com/si/mathematics/Mathe_Compu_NLP We look forward to your submissions and contribution to this special issue. Thank you very much! Best, Clara (on behalf of the Guest Editors) [https://www.polyu.edu.hk/emaildisclaimer/85A-PolyU_Email_Signature.jpg] Disclaimer: This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful. The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.

2 1

Deadline extension: #SMM4H'22, Social Media Mining for Health Applications - Workshop at COLING 2022
by Davy Weissenbacher 18 Aug '22

18 Aug '22

Due to multiple requests, we are *extending* the deadline to August 20, 2022 ============================= Last call for paper: submission deadline August 20, 2022 ============================= *Apologies if you received multiple copies of this CFP* Location: Gyeongju, Republic of Korea Workshop Date: October 16-17, 2022 Workshop link: https://healthlanguageprocessing.org/smm4h-2022/ Submission link: https://www.softconf.com/coling2022/7thSMM4H/ The workshop will include two components — a standard workshop and a shared task Workshop The Social Media Mining for Health Applications (#SMM4H) workshop serves as a venue for bringing together researchers interested in automatic methods for the collection, extraction, representation, analysis, and validation of social media data (e.g., Twitter, Reddit, Facebook) for health informatics. The 7th #SMM4H Workshop, co-located at COLING 2022 ( https://coling2022.org/index), invites 4-page paper (unlimited references in standard COLING format) submissions on original, unpublished research in all aspects at the intersection of social media mining and health. Topics of interest include, but are not limited to: Methods for the automatic detection and extraction of health-related concept mentions in social media Mapping of health-related mentions in social media to standardized vocabularies Deriving health-related trends from social media Information retrieval methods for obtaining relevant social media data Geographic or demographic data inference from social media discourse Virus spread monitoring using social media Mining health-related discussions in social media Drug abuse and alcoholism incidence monitoring through social media Disease incidence studies using social media Sentinel event detection using social media Semantic methods in social media analysis Classifying health-related messages in social media Automatic analysis of social media messages for disease surveillance and patient education Methods for validation of social media-derived hypotheses and datasets Shared task The workshop organizers this year are hosting 10 shared tasks i.e. NLP challenges as part of the workshop. Participating teams will be provided with a set of annotated posts for developing systems, followed by a three-day window during which they will run their systems on unlabeled test data and upload it to Codalab for evaluation. For additional details about the tasks and information about registration, data access, paper submissions, and presentations, go to https://healthlanguageprocessing.org/smm4h-2022/ Task 1 – Classification, detection, and normalization of Adverse Events (AE) mentions in tweets (in English) Task 2 – Classification of stance and premise in tweets about health mandates related to COVID-19 (in English) Task 3 – Classification of changes in medication treatments in tweets and WebMD reviews (in English) Task 4 – Classification of tweets self-reporting exact age (in English) Task 5 – Classification of tweets containing self-reported COVID-19 symptoms (in Spanish) Task 6 – Classification of tweets which indicate self-reported COVID-19 vaccination status (in English) Task 7 – Classification of self-reported intimate partner violence on Twitter (in English) Task 8 – Classification of self-reported chronic stress on Twitter (in English) Task 9 – Classification of Reddit posts self-reporting exact age (in English) Task 10 – Detection of disease mentions in tweets – SocialDisNER (in Spanish) Organizing Committee Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center, USA Davy Weissenbacher, Cedars-Sinai Medical Center, USA Arjun Magge, University of Pennsylvania, USA Ari Z. Klein, University of Pennsylvania, USA Ivan Flores, Cedars-Sinai Medical Center, USA Karen O’Connor, University of Pennsylvania, USA Raul Rodriguez-Esteban, Roche Pharmaceuticals, Switzerland Lucia Schmidt, Roche Pharmaceuticals, Switzerland Juan M. Banda, Georgia State University, USA Abeed Sarker, Emory University, USA Yuting Guo, Emory University, USA Yao Ge, Emory University, USA Elena Tutubalina, Insilico Medicine, Hong Kong Luis Gasco, Barcelona Supercomputing Center, Spain Darryl Estrada, Barcelona Supercomputing Center, Spain Martin Krallinger, Barcelona Supercomputing Center, Spain Program Committee Cecilia Arighi, University of Delaware, USA Natalia Grabar, French National Center for Scientific Research, France Thierry Hamon, Paris-Nord University, France Antonio Jimeno Yepes, Royal Melbourne Institute of Technology, Australia Jin-Dong Kim, Database Center for Life Science, Japan Corrado Lanera, University of Padova, Italy Robert Leaman, US National Library of Medicine, USA Kirk Roberts, University of Texas Health Science Center at Houston, USA Yutaka Sasaki, Toyota Technological Institute, Japan Pierre Zweigenbaum, French National Center for Scientific Research, France Contact All questions should be emailed to Davy Weissenbacher ( davy.weissenbacher(a)cshs.org)

1 0

North African in ML affinity group workshop at NeurIPS 2022 (First CfP)
by Nedjma OUSIDHOUM 17 Aug '22

17 Aug '22

Dear colleagues, We are pleased to invite you to the North Africans in ML affinity group workshop <https://sites.google.com/view/northafricansinml/cfp>, which will take place at NeurIPS 2022. The workshop will include talks, poster sessions, as well as a shared task relating to ML in North Africa. We will have both archival and non-archival tracks and invited talks. Junior researchers and students interested in NLP from North African institutions and beyond (academia and industry) are welcome to present their new work as well as completed or ongoing research projects or ideas. All nationalities are welcome! Authors of non-archival papers can choose to have their abstracts, bios, and posters posted on our website. NeurIPS D&I will provide some travel grants and registration fee waivers to the participants. Please note that all participants are encouraged to apply for NeurIPS registration fee waivers. We welcome submissions related to any topic of Machine Learning, including (but not limited to): - Machine Learning Applications for North Africa - Theoretical Machine Learning - Natural Language Processing and Information Retrieval - Computer Vision and Computer Graphics - Reinforcement Learning - Applications of Machine Learning for the Environment and Climate - Geometric Deep learning You can visit our website: https://sites.google.com/view/northafricansinml/. Twitter https://twitter.com/NorthAfricansML Best regards, The organisers.

1 0

Final Call for Papers and Shared Task Participation (CASE @ EMNLP 2022): Challenges and Applications of Automated Extraction of Socio-political Events from Text
by ali hürriyetoglu 17 Aug '22

17 Aug '22

Apologies for cross-posting! ************************************************************************************ URL: https://emw.ku.edu.tr/case-2022/ Sep 7, 2022: Submission deadline on Softconf Jul 15, 2022: Latest ARR submission deadline for ARR Oct 2, 2022: Latest ARR commitment deadline Oct 9, 2022: Notification of Acceptance Oct 16, 2022: Camera-ready papers due Workshop dates: Dec 7-8, 2021 Location: Hybrid -> Abu Dhabi & Online Please see below for the important dates of the shared tasks. There are two options for submissions that are i) Softconf page of the workshop: https:// <https://www.softconf.com/m/icspcc2022> softconf.com/emnlp2022/case2022 and ii) ACL Rolling review (ARR): https://aclrollingreview.org/dates. ************************************************************************************ Nowadays, the unprecedented quantity of easily accessible data on social, political, and economic processes offers ground-breaking potential in guiding data-driven analysis in social and human sciences and in driving informed policy-making processes. Governments, multilateral organizations, and local and global NGOs present an increasing demand for high-quality information about a wide variety of events ranging from political violence, environmental catastrophes, and conflict, to international economic and health crises (Coleman et al. 2014; Porta and Diani, 2015) to prevent or resolve conflicts, provide relief for those that are afflicted, or improve the lives of and protect citizens in a variety of ways. Black Lives Matter protests (http://protestmap.raceandpolicing.com) and conflicts in Syria ( https://www.cartercenter.org/peace/conflict_resolution/syria-conflict-resol…) are only two examples where we must understand, analyze, and improve real-life situations using such data. Finally, these efforts respond to “growing public interest in up-to-date information on crowds” as well ( https://sites.google.com/view/crowdcountingconsortium/faqs). Event extraction has long been a challenge for the natural language processing (NLP) community as it requires sophisticated methods in defining event ontologies, creating language resources, domain specific grammars, developing Machine Learning models and other algorithmic approaches for various event-detection- specific tasks, such entity detection, semantic labeling, event classification and clustering and others (Pustojevsky et al. 2003; Boroş, 2018; Chen et al. 2021). Social and political scientists have been working to create socio-political event (SPE) databases such as ACLED, EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP following similar steps for decades. These projects and the new ones increasingly rely on machine learning (ML), deep learning (DL), and NLP methods to deal better with the vast amount and variety of data in this domain (Hürriyetoğlu et al. 2020). Unfortunately, automated approaches suffer from major issues like bias, limited generalizability, class imbalance, training data limitations, and ethical issues that have the potential to affect the results and their use drastically (Lau and Baldwin 2020; Bhatia et al. 2020; Chang et al. 2019). Moreover, the results of the automated systems for SPE information collection have neither been comparable to each other nor been of sufficient quality (Wang et al. 2016; Schrodt 2020). SPEs are varied and nuanced. Both the political context and the local language used may affect whether and how they are reported. We invite contributions from researchers in computer science, NLP, ML, DL, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of SPE data. Academic workshops specific to tackling event information in general or for analyzing text in specific domains such as health, law, finance, and biomedical sciences have significantly accelerated progress in these topics and fields, respectively. However, there has not been a comparable effort for handling SPEs. We fill this gap. We invite work on all aspects of automated coding and analysis of SPEs and events in general from mono- or multi-lingual text sources. This includes (but is not limited to) the following topics 1) Extracting events in and beyond a sentence, event coreference resolution, 2) New datasets, training data collection, and annotation for event information, 3) Event-event relations, e.g., subevents, main events, causal relations, 4) Event dataset evaluation in light of reliability and validity metrics, 5) Defining, populating, and facilitating event schemas and ontologies, 6) Automated tools and pipelines for event collection related tasks, 7) Lexical, syntactic, discursive, and pragmatic aspects of event manifestation, 8) Methodologies for development, evaluation, and analysis of event datasets, 9) Applications of event databases, e.g. early warning, conflict prediction, policymaking, 10) Estimating what is missing in event datasets using internal and external information, 11) Detection of new SPE types, e.g. creative protests, cyberactivism, COVID19 related, 12) Release of new event datasets, 13) Bias and fairness of the sources and event datasets, 14) Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets, and 15) Copyright issues on event dataset creation, dissemination, and sharing. 16) We encourage submissions of new system description papers on our available benchmarks (ProtestNews @ CLEF 2019, AESPEN @ LREC 2020, and CASE @ 2021). Please contact the organizers if you would like to access the data. The proceedings of the previous editions should be indicative of what we cover: ProtestNews @ CLEF 2019 (http://ceur-ws.org/Vol-2380/), AESPEN @ ACL 2020 (https://aclanthology.org/volumes/2020.aespen-1/), CASE @ ACL-IJCNLP 2021 (https://aclanthology.org/volumes/2021.case-1/). **** Shared tasks **** Task 1- Multilingual protest news detection: This is the same shared task organized at CASE 2021 (For more info: https://aclanthology.org/2021.case-1.11/) But this time there will be additional data and languages at the evaluation stage. Contact person: Ali Hürriyetoğlu (ali.hurriyetoglu(a)gmail.com). Github: https://github.com/emerging-welfare/case-2022-multilingual-event Task 2- Automatically replicating manually created event datasets: The participants of Task 1 will be invited to run the systems they will develop to tackle Task 1 on a news archive (For more info https://aclanthology.org/2021.case-1.27/). Contact person: Hristo Tanev ( htanev(a)gmail.com). Github: https://github.com/emerging-welfare/case-2022-multilingual-event Task 3- Event causality identification: Causality is a core cognitive concept and appears in many natural language processing (NLP) works that aim to tackle inference and understanding. We are interested to study event causality in news, and therefore, introduce the Causal News Corpus. The Causal News Corpus consists of 3,559 event sentences, extracted from protest event news, that have been annotated with sequence labels on whether it contains causal relations or not. Subsequently, causal sentences are also annotated with Cause, Effect, and Signal spans. Our two subtasks (Sequence Classification and Span Detection) work on the Causal News Corpus, and we hope that accurate, automated solutions may be proposed for the detection and extraction of causal events in news. Contact person: Fiona Anting Tan (tan.f(a)u.nus.edu). Github: https://github.com/tanfiona/CausalNewsCorpus **** Deadlines for the Shared tasks **** ** Task 1 & 2: Training data available: The training data from CASE 2021 is used. New test data available: Sept 15, 2022 Test end: Sep 25, 2022 System Description Paper submissions due: Oct 2, 2022 Notification to authors after review: Oct 09, 2022 Camera-ready: Oct 16, 2022 ** Task 3: Training data available: Apr 15, 2022 Validation data available: Apr 15, 2022 Validation labels available: Aug 01, 2022 Test data available: Aug 01, 2022 Test start: Aug 01, 2022 Test end: extended from Aug 15 to Aug 31, 2022 System Description Paper submissions due: Sep 07, 2022 Notification to authors after review: Oct 09, 2022 Camera ready: Oct 16, 2022 *** Keynotes *** Three prominent scholars have accepted our invitation as keynote speakers: i) J. Craig Jenkins (https://sociology.osu.edu/people/jenkins.12) is Academy Professor Emeritus of Sociology at The Ohio State University. He directed the Mershon Center for International Security Studies from 2011 to 2015 and is now senior research scientist. ii) Scott Althaus (https://pol.illinois.edu/directory/profile/salthaus) is Merriam Professor of Political Science, Professor of Communication, and Director of the Cline Center for Advanced Social Research at the University of Illinois Urbana-Champaign. iii) Thien Huu Nguyen (https://ix.cs.uoregon.edu/~thien/) is an assistant professor in the Department of Computer and Information Science at the University of Oregon. Thien is the director of the NSF IUCRC Center for Big Learning (CBL) at the University of Oregon. **** Submissions ***** This call solicits short and long papers reporting original and unpublished research on the topics listed above. The papers should emphasize obtained results rather than intended work and should indicate clearly the state of completion of the reported results. The page limits and content structure announced at ACL ARR page (https://aclrollingreview.org/cfp) should be followed for both short and long papers. Papers should be submitted on the START page of the workshop ( http://softconf.com/emnlp2022/case2022) or on ARR page (TBA on the workshop website) in PDF format, in compliance with the ACL publication author guidelines for ACL publications https://acl-org.github.io/ACLPUB/formatting.html The reviewing process will be double-blind and papers should not include the author's names and affiliations. Each submission will be reviewed by at least three members of the program committee. The workshop proceedings will be published on ACL Anthology.

1 0

Postdoc or PhD Position in Fairness-aware Learning to Rank at the University of Amsterdam
by Andrew Yates 16 Aug '22

16 Aug '22

The IRLab at the University of Amsterdam (https://irlab.science.uva.nl/) seeks a postdoc or PhD student to work on fairness-aware learning to rank. Algorithmic hiring is on the rise and rapidly becoming necessary in some sectors, but these systems run the risk of reproducing and amplifying discriminatory biases. In the context of the interdisciplinary FINDHR EU project on Fairness and Intersectional Non-Discrimination in Human Recommendation, the successful postdoc or PhD student will design and evaluate fairness-aware ranking algorithms. In contrast with fairness-aware ranking in contexts where click feedback is immediate, the algorithmic hiring use case raises new challenges of learning from delayed rewards, leveraging complex feedback, and supporting optional positive actions. Interested candidates are invited to apply by 25 August, 2022. For more details and to apply, see https://vacatures.uva.nl/UvA/job/Postdoctoral-Researcher-or-PhD-Position-in… Our team has a strong collaborative and collegial atmosphere. We strongly encourage applications coming from a unique perspective. Tell us how your background fits with the focus of this position, even if your profile is slightly different from the profile / requirements written in the official vacancy text linked to above. In August 2022, our team will move into a brand new, sustainable, energy-neutral, and circular building in Amsterdam Science Park. Come and join us!

1 0

2026

2025

2024

2023

2022

Corpora August 2022