December 2022 - Corpora

Azimut Research Award - Deadline approaching: December, 23rd 2022
by Fabio Massimo Zanzotto 19 Dec '22

19 Dec '22

Don't miss the deadline (December, 23rd 2022) to submit your proposal for the Research Award! Propose YOUR solution for interpreting old databases! Dinosaur databases are running the world! As relics of the early steps of the information era, these Databases are still the basis of many economic transactions. Although their age, it seems extremely difficult to replace them with novel and faster solutions. These databases were written in a wonderful era in which memory was a problem. Hence, variable, table, and column names were short and cryptic. Moreover, documents describing these names are buried in forgotten places if they still exist. The challenge is, then, giving sense to these dinosaur databases to help software engineers to produce the novel version. Ready to apply? Azimut is looking forward to your submission! Deadline: December, 23rd 2022 More info at: <https://www.azimut.it/it/az-venture-tech-challenge> https://www.azimut.it/it/az-venture-tech-challenge Azimut (https://www.azimut.it), a leading wealth management company in Europe, offers a Research Award to whoever can propose a solution to interpret old databases.

1 0

Text2Story@ECIR’23 Call for Participation
by Hugo Oliveira Sousa 19 Dec '22

19 Dec '22

*** Apologies for cross-posting *** ++ CALL FOR PAPERS ++ **************************************************************************** Sixth International Workshop on Narrative Extraction from Texts (Text2Story'23) Held in conjunction with the 45th European Conference on Information Retrieval (ECIR'23) April 2nd, 2023 - Dublin, Ireland Website: https://text2story23.inesctec.pt<https://text2story23.inesctec.pt/> **************************************************************************** ++ Important Dates ++ - Submission deadline: January 23rd, 2023 - Acceptance Notification Date: March 3rd, 2023 - Camera-ready copies: March 17th, 2023 - Workshop: April 2nd, 2023 ++ Overview ++ Recent years have shown a stream of continuously evolving information making it unmanageable and time-consuming for an interested reader to track and process and to keep up with all the essential information and the various aspects of a story. Automated narrative extraction from text offers a compelling approach to this problem. It involves identifying the sub-set of interconnected raw documents, extracting the critical narrative story elements, and representing them in an adequate final form (e.g., timelines) that conveys the key points of the story in an easy-to-understand format. Although, information extraction and natural language processing have made significant progress towards an automatic interpretation of texts, the problem of automated identification and analysis of the different elements of a narrative present in a document (set) still presents significant unsolved challenges ++ List of Topics ++ In the sixth edition of the Text2Story workshop, we aim to bring to the forefront the challenges involved in understanding the structure of narratives and in incorporating their representation in well-established models, as well as in modern architectures (e.g., transformers) which are now common and form the backbone of almost every IR and NLP application. It is hoped that the workshop will provide a common forum to consolidate the multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the narrative extraction task. To this regard, we encourage the submission of high-quality and original submissions covering the following topics: * Narrative Representation Models * Story Evolution and Shift Detection * Temporal Relation Identification * Temporal Reasoning and Ordering of Events * Causal Relation Extraction and Arrangement * Narrative Summarization * Multi-modal Summarization * Automatic Timeline Generation * Storyline Visualization * Comprehension of Generated Narratives and Timelines * Big Data Applied to Narrative Extraction * Personalization and Recommendation of Narratives * User Profiling and User Behavior Modeling * Sentiment and Opinion Detection in Texts * Argumentation Analysis * Bias Detection and Removal in Generated Stories * Ethical and Fair Narrative Generation * Misinformation and Fact Checking * Bots Influence * Narrative-focused Search in Text Collections * Event and Entity importance Estimation in Narratives * Multilinguality: Multilingual and Cross-lingual Narrative Analysis * Evaluation Methodologies for Narrative Extraction * Resources and Dataset Showcase * Dataset Annotation for Narrative Generation/Analysis * Applications in Social Media (e.g. narrative generation during a natural disaster) * Language Models and Transfer Learning in Narrative Analysis * Narrative Analysis in Low-resource Languages ++ Dataset ++ We challenge the interested researchers to consider submitting a paper that makes use of the tls-covid19 dataset (published at ECIR'21) under the scope and purposes of the text2story workshop. tls-covid19 consists of a number of curated topics related to the Covid-19 outbreak, with associated news articles from Portuguese and English news outlets and their respective reference timelines as gold-standard. While it was designed to support timeline summarization research tasks it can also be used for other tasks including the study of news coverage about the COVID-19 pandemic. A script to reconstruct and expand the dataset is available at https://github.com/LIAAD/tls-covid19. The article itself is available at this link: https://link.springer.com/chapter/10.1007/978-3-030-72113-8_33 ++ Submission Guidelines ++ We invite two kinds of submissions: * Full papers (up to 7 pages + references): Original and high-quality unpublished contributions on the theory and practical aspects of the narrative extraction task. Full-papers should introduce existing approaches, describe the methodology and the experiments conducted in detail. Negative result papers to highlight tested hypotheses that did not get the expected outcome are also welcomed. * Work in progress, demos and dissemination papers (up to 4 pages + references): unpublished short papers describing work in progress; demo and resource papers presenting research/industrial prototypes, datasets or software packages; position papers introducing a new point of view, a research vision or a reasoned opinion on the workshop topics; and dissemination papers describing project ideas, ongoing research lines, case studies or summarized versions of previously published papers in high-quality conferences/journals that is worthwhile sharing with the Text2Story community, but where novelty is not a fundamental issue. Submissions will be peer-reviewed by at least two members of the programme committee. The accepted papers will appear in the proceedings published at CEUR workshop proceedings (indexed in Scopus and DBLP) as long as they don't conflict with previous publication rights. ++ Workshop Format ++ Participants of accepted papers will be given 15 minutes for oral presentations. ++ Organizing committee ++ Ricardo Campos (INESC TEC; Ci2 - Smart Cities Research Center, Polytechnic Institute of Tomar, Tomar, Portugal) Alípio M. Jorge (INESC TEC; University of Porto, Portugal) Adam Jatowt (University of Innsbruck, Austria) Sumit Bhatia (Media and Data Science Research Lab, Adobe) Marina Litvak (Shamoon Academic College of Engineering, Israel) ++ Proceedings Chair ++ João Paulo Cordeiro (INESC TEC & Universidade da Beira do Interior) Conceição Rocha (INESC TEC) ++ Web and Dissemination Chair ++ Hugo Sousa (INESC TEC & University of Porto) Behrooz Mansouri (Rochester Institute of Technology) ++ Program Committee ++ Álvaro Figueira (INESC TEC & University of Porto) Andreas Spitz (University of Konstanz) Antoine Doucet (Université de La Rochelle) António Horta Branco (University of Lisbon) Arian Pasquali (CitizenLab) Bart Gajderowicz (University of Toronto) Begoña Altuna (Universidad del País Vasco) Brenda Santana (Federal University of Rio Grande do Sul) Bruno Martins (IST & INESC-ID, University of Lisbon) Daniel Loureiro (Cardiff University) Dennis Aumiller (Heidelberg University) Dhruv Gupta (Norwegian University of Science and Technology) Dyaa Albakour (Signal UK) Evelin Amorim (INESC TEC) Henrique Cardoso (INESC TEC & University of Porto) Ismail Altingovde (Middle East Technical University) João Paulo Cordeiro (INESC TEC & University of Beira Interior) Kiran Bandeli (Walmart Inc.) Luca Cagliero (Politecnico di Torino) Ludovic Moncla (INSA Lyon) Marc Finlayson (Florida International University) Marc Spaniol (Université de Caen Normandie) Moreno La Quatra (Politecnico di Torino) Nuno Guimarães (INESC TEC & University of Porto) Pablo Gamallo (University of Santiago de Compostela) Pablo Gervás (Universidad Complutense de Madrid) Paulo Quaresma (Universidade de Évora) Paul Rayson (Lancaster University) Raghav Jain (Indian Institute of Technology, Patna) Ross Purves (University of Zurich) Satya Almasian (Heidelberg University) Sérgio Nunes (INESC TEC & University of Porto) Simra Shahid (Adobe's Media and Data Science Research Lab) Sriharsh Bhyravajjula (University of Washington) Udo Kruschwitz (University of Regensburg) Veysel Kocaman (John Snow Labs & Leiden University) ++ Contacts ++ Website: https://text2story23.inesctec.pt For general inquiries regarding the workshop, reach the organizers at: text2story2023(a)easychair.org<mailto:text2story2023@easychair.org>

1 0

Fully funded PhD studentships in NLP, Queen Mary University of London
by Matthew Purver 19 Dec '22

19 Dec '22

A number of funded PhD studentships in NLP/computational linguistics are available in the Computational Linguistics Lab in the School of Electronic Engineering and Computer Science at Queen Mary University of London, UK, for September 2023 entry. Studentships are available on a range of specified topics with particular supervisors. Funding conditions, length of studentship and eligibility vary with topic - please see the links in the list below for details. - Unsupervised text anonymisation using pre-trained language models: Dr Julia Ive <https://julia-ive.github.io/>, funded via CSC (see here) <http://eecs.qmul.ac.uk/phd/phd-studentships/csc-phd-studentships-in-electro…> . - Understanding word embeddings via their algebraic-topological structures: Dr. Haim Dubossarsky <https://scholar.google.com/citations?user=2EDsENQAAAAJ>, funded via CSC (see here) <http://eecs.qmul.ac.uk/phd/phd-studentships/csc-phd-studentships-in-electro…>. - Prompt-based learning for adaptive context-dependent NLP: Prof. Matthew Purver <http://www.eecs.qmul.ac.uk/~mpurver/>, funded via CSC (see here) <http://eecs.qmul.ac.uk/phd/phd-studentships/csc-phd-studentships-in-electro…> . - Evaluating and Learning with Disagreements: Prof. Massimo Poesio <http://www.massimopoesio.org/>, funded via CSC (see here) <http://eecs.qmul.ac.uk/phd/phd-studentships/csc-phd-studentships-in-electro…>. - Next-generation NLP methods for meaning change: Dr. Haim Dubossarsky <https://scholar.google.com/citations?user=2EDsENQAAAAJ>, funded via QMUL's Principal's fund (see here) <http://eecs.qmul.ac.uk/phd/phd-studentships/principal-and-epsrc-dtp-phd-stu…> . - Meaning coordination in dialogue: Prof. Matthew Purver <http://www.eecs.qmul.ac.uk/~mpurver/>, funded via EPSRC (see here) <http://eecs.qmul.ac.uk/phd/phd-studentships/principal-and-epsrc-dtp-phd-stu…> . More details on the topics and funding conditions are available from this page: http://eecs.qmul.ac.uk/phd/phd-studentships/ If you are interested, please get in touch with the relevant supervisor directly. Final applications are due by 31st January 2023. -- Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/ Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/ Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/ School of Electronic Engineering and Computer Science Queen Mary University of London, London E1 4NS, UK *My working days for QMUL are Tuesday-Thursday; responses to mail on other days may be delayed.*

1 0

Postdoctoral Research Associate LATIF project University of Liverpool
by Musi, Elena 18 Dec '22

18 Dec '22

At the University of Liverpool (Department of Communication and Media), we are hiring a Postdoctoral Research Associate in the frame of the project "Leveraging Argument Technology for impartial factchecking", funded by the Calouste Gulbenkian Foundation<https://www.linkedin.com/company/fcgulbenkian/> (European Media and Information Fund). For info about the project: For info about the project: https://gulbenkian.pt/emifund/projects/leveraging-argument-technology-for-i… Job spec available at: https://my.corehr.com/pls/ulivrecruit/erq_jobspec_version_4.display_form Please share with your communities! Thanks, All best Elena Musi Senior Lecturer (Associate Professor) University of Liverpool

1 1

Release: six dialect corpora (Palestinian, Lebanese, Yemeni, Irqi, Libyan, and Sudanese)
by Mustafa Jarrar 18 Dec '22

18 Dec '22

Dear all, We are happy to release six corpora (1.3 Million tokens) with full morphological annotations for (Palestinian, Lebanese, Yemeni, Iraqi, Libyan, and Sudanese) dialects. All are annotated using the LDC’s SAMA tagsets. Search: https://portal.sina.birzeit.edu/curras Download: https://portal.sina.birzeit.edu/curras/about-en.html This video demonstrates how to search the corpora in Arabic/English. https://twitter.com/mjarrar/status/1604078695068598273 #arabic_language_day We are very happy to release 6 Arabic dialects corpora (1.3 million tokens, morphologically annotated): Curras(Palestinian), Baladi (Lebanese), Lisani (Yemeni, Irqi, Libyan, Sudanese) by @UN, @BirzeitU and @AUB_Lebanon. https://t.co/ZP3hqVSRWc Mustafa Jarrar twitter.com Best --Mustafa __________________________ Mustafa Jarrar, PhD Professor of Artificial Intelligence Chair, PhD Program in Computer Science Birzeit University, Palestine Whatsapp:+972599662258 | mjarrar(a)birzeit.edu http://www.jarrar.info

1 0

Maknuune: The Open Palestinian Arabic Lexicon (v1.0)
by Nizar Habash 18 Dec '22

18 Dec '22

Dear all -- In celebration of Arabic Language Day (Dec 18), we are happy to announce the first release of Maknuune, the Open Source Palestinian Arabic Lexicon. www.palestine-lexicon.org Maknuune has over 36K entries from 17K lemmas, and 3.7K roots. All entries include diacritized Arabic orthography, phonological transcription and English glosses. Some entries are enriched with additional information such as broken plurals and templatic feminine forms, associated phrases and collocations, Standard Arabic glosses, and examples or notes on grammar, usage, or location of collected entry. We are honored to have received comments of endorsement from Profs. Noam Chomsky, Hamid Dabashi, Abdelkader Fassi Fehri, Clive Holes, Ilan Pappe, and Dr. Walid Saif. https://sites.google.com/nyu.edu/palestine-lexicon/endorsements -- Nizar Habash Professor of Computer Science New York University Abu Dhabi

1 0

Fully funded PhD in NLP, London (QMUL)
by Haim Dubossarsky 16 Dec '22

16 Dec '22

A fully funded 4-year PhD position at the intersection of NLP and Topology is offered at Queen Mary University of London (QMUL), School of Electronic Engineering and Computer Science. It is part of the collaboration scheme between QMUL and the China Scholarship Council (CSC), and is therefore available for Chinese candidates only. The CSC scheme provides full tuition fee waiver and living stipend for 4 years, and requires (among other things) an English Language test (IELTS) from the last 2 years. You can read more about the scheme's requirements here <https://www.qmul.ac.uk/scholarships/items/china-scholarship-council-scholar…> . I am looking for brilliant candidates who hold (or about to hold) MSc in Computer Science with a strong NLP research background. Prospective students can learn more about the project here <http://eecs.qmul.ac.uk/phd/phd-studentships/csc-phd-studentships-in-electro…>, under the section: *Understanding neural representations via their algebraic-topological structures*. The PhD student will work in an interdisciplinary environment, and will be at the forefront of NLP research. If you are interested, please get in touch with me on: h.dubossarsky(a)qmul.ac.uk. Bests, Haim

1 0

CALL FOR CONTRIBUTION - LongEval @ CLEF 2023
by Lorraine Goeuriot 16 Dec '22

16 Dec '22

We kindly invite you to participate in LongEval 2023, a shared task on longitudinal evaluation of NLP models at CLEF 2023. CALL FOR CONTRIBUTION LongEval @ CLEF 2023 Longitudinal Evaluation of Models Performance https://clef-longeval.github.io <https://clef-longeval.github.io/> Lab description: The LongEval aims at identifying the types of models that offer better temporal persistence for NLP tasks on data that evolves across time in both shorter and longer time periods. LongEval is built on a common framework for its Information Retrieval and Text Classification tasks: for one system, we evaluate its efficiency when operating on test data acquired at the same time than the training data, when operating on data acquired shortly after time t (sub-task 1), and when operating on data acquired at time t" ( longer after time t, sub-task 2). For each sub-task of each task, two evaluation measures are proposed: an absolute quality measure, and a relative drop compared to an initial time t test result for each system LongEval 2023 proposes two tasks: • Task 1: Information Retrieval. For this task, the data is a sequence of Web document collections and queries, each containing a few million documents and hundreds of queries, provided by Qwant. Relevance assessments are to be computed using a Click Model acquired from real users of the Qwant search engine. As the initial corpus contains only French documents, an automatic translation into English will be provided. • Task 2: Text Classification. For this task, the training data is the TM-Senti sentiment analysis dataset extended with a development set and three human-annotated novel test sets for submission evaluation. TM-Senti is a general large-scale Twitter sentiment dataset in the English language, spanning over a 9-year period from 2013 to 2021. Tweets are labeled for sentiment as either “positive” or “negative”. The annotation is performed using distant supervision based on a manually curated list of emojis and emoticons. You can register for the task at: https://clef2023-labs-registration.dei.unipd.it/ <https://clef2023-labs-registration.dei.unipd.it/> Lab Organizers: Rabab Alkhalifa, Iman Bilal, Hsuvas Borkakoty, Jose Camacho-Collados, Romain Deveaud, Alaa El-Ebshihy, Luis Espinosa-Anke, Gabriela Gonzalez-Saez, Petra Galuscakova, Lorraine Goeuriot, Elena Kochkina, Maria Liakata, Daniel Loureiro, Harish Tayyar Madabushi, Philippe Mulhem, Florina Piroi, Martin Popel, Christophe Servan, and Arkaitz Zubiaga. Important dates: Release of training data: 03/01/2023 Release of test data: 30/04/2023 Runs submissions date: 30/06/2023 LongEval Workshop: during CLEF 2023, Thessaloniki, 18-21 September 2023.

1 0

2nd Call for Papers - Workshop on Language-Based AI Agent Interaction with Children (deadline extended)
by Maike Paetzel-Prüsmann 16 Dec '22

16 Dec '22

Workshop on Language-Based AI Agent Interaction with Children https://aichildinteraction.github.io/ February 21st, 2023, in Los Angeles, USA & Virtual (Hybrid Format) Paper Submission Deadline: January 13th, 2023 (extended) Easychair: https://easychair.org/conferences/?conf=aiaic23 Contact: https://groups.google.com/g/ai-child-interactions or aichildinteraction(a)gmail.com =================================================== In this workshop, we aim to bring together researchers looking into multimodal interactions between children and artificial agents to discuss research problems that center around interactivity and go beyond just processing child speech. We are interested in discussing approaches to collecting and annotating datasets involving child speech, intent classification in child speech, designing dialogue flow with artificial agents that primarily interact with children, as well as repair strategies, active listening behavior, and other aspects of dialogue modeling. Moreover, multiparty conversations involving several children, children, and their adult caregivers or several artificial agents are of particular interest to this workshop. Acknowledging the early-stage nature of research in this area, the workshop will invite short position papers as contributions. In addition to selected talks that will be invited based on the submitted papers, we will host roundtable discussions allowing attendees to discuss ideas, share challenges they have faced, and highlight ideas for future research.  ## Topics of Interest The workshop welcomes contributions across a wide range of topics including, but not limited to: Natural Language Understanding of Child Speech  Dialogue Modeling of Child-Agent, Child-Robot, Child-Child, and Child-Adult Speech  Conversational Flow and Repair in Dialogue with Children  Multiparty-Interaction Involving Children  Multimodal Processing of Child Interactions  Automatic Speech Recognition of Child Speech  Evaluating Child Interactions with Artificial Agents/Robots Challenges in Designing Interactions for Children  Datasets of Child-Child, Child-Adult, or Child-Agent/Robot Interaction Ethics and Responsible AI for Child-Agent/Robot Interaction Related Topics ## Important Dates - Paper submission deadline: January 13th, 2023 (extended) - Author Notification deadline: February 1st, 2023 - Workshop: February 21st, 2023 (morning session) ## Submission Guidelines We invite short position papers of 3-4 pages (plus additional pages for references and appendices without page limitation), including work in progress containing preliminary results, technical reports, case studies, surveys, and state-of-the-art research in language-based AI agent interactions with children. Recently submitted or published papers are welcome to be submitted to this workshop if they are highly relevant to the topic of the workshop. Please select the appropriate track during the EasyChair submission to mark the submission accordingly. Papers will be reviewed for their relevance, novelty, and scientific and technical soundness. Submissions do not need to be anonymized for review. All manuscripts must be written in English and submitted electronically in PDF format via EasyChair: https://easychair.org/conferences/?conf=aiaic23 Accepted papers will be published on the workshop website. However, papers are still considered non-archival and can be submitted to other conferences. Authors of accepted papers are expected to present their paper during the workshop in the form of a short talk, which can either be given in person in Los Angeles, USA, or virtually via Zoom. Authors should use the official IWSDS template: Latex Style and Template: https://drive.google.com/open?id=1mnzjvTlIVEsdPb2IZXbzxU8WRJj3mLiJ Overleaf: https://www.overleaf.com/read/djcrwzgrdjvj Word Template: https://drive.google.com/open?id=1WmO9iLvJtO0cH1E0VSC1bPsC0vRDpzbd ## Contact If you have questions, please get in touch via our public Google Group https://groups.google.com/g/ai-child-interactions or by sending an e-mail to aichildinteraction(a)gmail.com

1 0

Permanent CNRS Research Fellow Position in NLP for Under-resourced Languages
by Thierry Poibeau 16 Dec '22

16 Dec '22

Dear All, CNRS offers one permanent research position (research fellow) in computer sciences for under-resourced languages. https://gestionoffres.dsi.cnrs.fr/fo/offres/detail-en.php?&offre_id=18 Details about the selection process and the kind of positions offered by CNRS is presented here: https://www.cnrs.fr/en/competitive-entrance-examinations-researchers-womenm…. Note that these are permanent (non tenured) positions, after a probatory period of one year. Lattice is a lab in Paris (https://lattice.cnrs.fr/, funded by CNRS, Ecole normale supérieure-PSL and Université Sorbonne nouvelle) with a strong team both in linguistics and in natural language processing. Lattice would be happy to host the above position (candidates are free to mention in their application the lab that best suits their research projects among CNRS labs in linguistics with a strong NLP component). Candidates with a strong CV who would like apply at Lattice for this position can contact Sophie Prévost or Thierry Poibeau (both firstname.lastname(a)ens.psl.eu <mailto:firstname.lastname@ens.psl.eu>) with a CV and a research proposal. A record of international publications (including major computer science conferences) is mandatory. Best regards, Thierry Poibeau

1 0

2026

2025

2024

2023

2022

Corpora December 2022