June 2022 - Corpora - ELRA lists

[Corpora-List]PhD scholarships at the University of Bozen-Bolzano
by Alberto Lavelli 27 Jun '22

27 Jun '22

The Free University of Bozen-Bolzano (https://www.unibz.it/) has opened a public competition for 21 fully funded PhD scholarships in Computer Science (*deadline July 1, 2022*). They cover a range of epistemologies, theories, methods and applications of computer science. Topics include studies of theoretical AI, data science and machine learning application, up to the design of the most advanced user interfaces and critical user research. In particular, the two following topics (in collaboration with Fondazione Bruno Kessler) can be of interest for the mailing list. *Emotions in Multilingual Texts (Carlo Strapparava)* The affective dimension of word meaning often forms part of our reservoir of common-sense knowledge, and it is reflected in the way we use words. This project aims at producing and evaluating new technologies for recognition of emotional language and possibly other subtle pragmatic aspects of communication. Because there are diverse subtilties in emotional expressions in different languages, the project will devote particular attention in approaching the problem from a multilingual point of view. *Neural Models of Collaborative Behaviours in Conversational Agents (Bernardo Magnini)* Human-human dialogues are characterized by collaborative behaviours, through which interlocutors achieve their communicative goals. As an example, proactivity (i.e., anticipating user needs during dialogue) and grounding (e.g., posing clarification questions) are two relevant cases that have been investigated from a linguistics perspective. However, such collaborative behaviours are still largely absent in current neural dialogue models. There are several open research challenges in this direction, including investigating how dialogue systems can learn when and how to be collaborative, depending on the dialogue context, and how do we evaluate whether collaborative behaviours have improved the efficacy of dialogue. This PhD project addresses collaborative behaviours in conversational agents from a computational perspective, exploiting the integration of machine learning approaches based on neural models, reinforcement learning, and knowledge-based techniques. Key information to apply and gain admission to the PhD Programme can be found here: https://www.unibz.it/en/faculties/computer-science/phd-computer-science/ The scholarship includes University Fees 3-year personal grant (approx. € 17,000 NET per year) 50% pay increase to support international mobility for a period variable between 6 months and one year according to the type of projects Personal budget for research and travel expenses (Euro 2,500) State-of-the-art technical equipment Further financial possibilities are available in the form of teaching contracts and research consultancies during the years of study, for top students. -- -- Le informazioni contenute nella presente comunicazione sono di natura privata e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai destinatari indicati e per le finalità strettamente legate al relativo contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente. -- The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. If you received this in error, please contact the sender and delete the material.

2 1

[Corpora-List]Natural Language Processing Postdoctoral Scholar at University of California, Santa Cruz
by Marilyn Walker 27 Jun '22

27 Jun '22

The Natural Language Processing Program (nlp.ucsc.edu) in the Computer Science and Engineering Department at the University of California, Santa Cruz (UCSC) invites applications for the Natural Language Processing Postdoctoral Researcher, under the direction of Professor Marilyn Walker. We seek outstanding applicants with research expertise in all areas of Natural Language Processing (NLP). The NLP Postdoctoral Researcher will be expected to contribute to the research profile of the NLP group. We also expect the successful candidate to support graduate students and other Postdoctoral Scholars as a peer mentor. Feel free to contact me at nlp(a)ucsc.edu with any questions. Applications are open now, full consideration will be given to applications submitted by July 15th, 2022 for a start date of September 1st. For details, and to apply, go here: https://recruit.ucsc.edu/JPF01330 -- Professor Marilyn Walker Fellow of the Association for Computational Linguistics Program Director, NLP MS Program, https://nlp.ucsc.edu/ Natural Language and Dialogue Systems Lab Department of Computer Science and Engineering Baskin School of Engineering University of California Santa Cruz users.soe.ucsc.edu/~maw

2 1

[Corpora-List]REMINDER -- Late-breaking and non-archival round -- (Dis)embodiment
by Sharid Loáiciga 27 Jun '22

27 Jun '22

(Dis)embodiment University of Gothenburg, Sweden, September 14-16, 2022 REMINDER: Late-breaking and non-archival round https://sites.google.com/view/disembodiment/home (Dis)embodiment will bring together researchers from various areas looking to answer the question of the role of grounding and embodiment in modelling human language tasks and behaviour -- or limits thereof. The conference is open to viewpoints from machine learning, computational linguistics, theoretical linguistics and philosophy, cognitive science and psycholinguistics, as well as artificial intelligence ethics and policy. We hope to see technical contributions and the full spectrum of reasoned debate. Important dates ***** NEW! Late-breaking and archival submission deadline: 2022 July 11, anywhere on Earth ***** Submission deadline: 2022 May 16 2020 May 30, anywhere on Earth Notification of acceptance: 2022 June 30, anywhere on Earth Camera ready: 2022 August 19, anywhere on Earth Conference: 2022 September 14-16, not anywhere on Earth, but in Gothenburg

1 0

[Corpora-List][CFP] The 8th Workshop on Noisy User-generated Text | COLING 2022 | Due Aug 19th
by Xu, Wei 27 Jun '22

27 Jun '22

The 8th Workshop on Noisy User-generated Text (WNUT @COLING 2022) The WNUT Workshop will be collocated with COLING 2022 (Hybrid - Gyeongju, Republic of Korea). The website for the workshop is at: http://noisy-text.github.io/<https://urldefense.com/v3/__http://noisy-text.github.io/__;!!KGKeukY!jkgFYC…> The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records, and language learner essays. We seek submissions of long and short papers on original and unpublished work (same format and page limit as COLING main conference). All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally. We have Best Paper Awards sponsored by Megagon Labs this year. Topics of interest include but are not limited to: * NLP Preprocessing of Noisy Text - Part of speech tagging - Named entity tagging, including a wide range of categories, e.g. product names - Chunking of user-generated text - Parsing * Text Normalization and Error Correction - Normalizing noisy text for downstream tasks and for human readability - Error detection and correction * Robustness to Noise, both Natural and Adversarial * Multilingual NLP in noisy text * Machine Translation of Noisy Text * Sentiment analysis * Crowdsourcing of text data * User prediction, e.g. gender, age, etc * Stylistics, e.g. formality, politeness, etc * Colloquial language, e.g. code-switching, idiom detection * Bilingual translation of the noisy text * Paraphrase identification and semantic similarity of short text or noisy text * Information extraction from noisy text * Domain adaptation to user-generated text * Geolocation prediction * Global and regional trend detection and event extraction * Detecting rumors, contradictory information, sarcasm, and humor on social media * Extracting user demographics, profiles, and major life events * Temporal aspects of user-generated content (resolving time expressions, concept drift, diachronic analyses, etc...) = IMPORTANT DATES = * August 19, 2022: Submission Deadline (dual-submission w/ COLING main conference allowed) * September 7, 2022: Acceptance Notification * September 14, 2022: Camera-ready Deadline * October 17, 2022: Workshop Day = ORGANIZERS = Tim Baldwin (University of Melbourne) Afshin Rahimi (University of Queensland) Wei Xu (Georgia Institute of Technology) Alan Ritter (Georgia Institute of Technology) = SUBMISSION = Formatting should be according to COLING 2022 specifications. Dual submission is allowed but must state at the time of submission. Please submit through the START system at the following URL: https://www.softconf.com/coling2022/W-NUT_2022

1 0

[Corpora-List]PhD position in modeling children's communicative development
by Abdellah Fourtassi 26 Jun '22

26 Jun '22

Our team (cocodev.fr) at Aix-Marseille University offers a fully-funded Ph.D. research position (with no teaching duties) in the framework of the ANR grant MACoMiC (Mastering the Art of Conversation in Middle Childhood). The broad goal of the PhD researcher is to lead the development of deep learning models of child-parent multimodal communication, across several cultures, using data of face-to-face conversations recorded using portable eye-tracking systems and zoom calls. We are interested in studying the development of various conversational skills including mechanisms of building shared understanding, multimodal synchrony/alignment, and discourse coherence/contingency. We are also interested in the application of this research both to help design more effective clinical interventions (for children with communicative difficulties) and to build child-oriented conversational AI. The selected candidate can focus on one or several of these dimensions, defining a personalized research program together with the main advisor. The PhD researcher will be integrated into a supportive and highly interdisciplinary team of senior and early career researchers in computer science (with expertise in conversational AI), developmental psychology, and neuro-linguistics. They will be located at the Department of Computer science of Aix-Marseille University and part of the Institute of Language Communication and the Brain (ILCB.fr) <https://www.ilcb.fr/>. Additionally, the PhD researcher will have the opportunity to interact/collaborate with CoCoDev’s internal network, especially researchers from the Dialog Modelling Group (University of Amsterdam), the Interacting Minds Center (The University of Aarhus), and the Multimodal Language and Cognition group (Max Plank Institute of Psycholinguistics). Requirement - -The ideal candidate for this position should have a strong background/training in computer science and experience with deep-learning modeling. - -Interest in cognitive science (though no prior experience is required). - -Good mastery of English - Key dates Open until filled. Please send (as soon as possible for full consideration): 1) a CV 2) A recent transcript (a university document with courses taken and grades) 2) Contact info of one reference (ideally a research supervisor) 3) (Optional) Evidence of prior experience with deep-learning modeling (a publication, dissertation, code on GitHub, etc.) *Latest starting date:* October 1st, 2022 Inquiries All kinds of inquiries (about the scientific project, the university, life in Marseille, etc) as well as the application documents should be addressed to Abdellah Fourtassi (abdellah.fourtassi(a)univ-amu.fr) -- Abdellah Fourtassi Assistant Professor Department of Computer Science Institute of Language, Communication, and the Brain Aix-Marseille University, France https://sites.google.com/site/fourtassi/

1 0

[Corpora-List][CfP] 2nd SummDial: A SemDial 2022 Special Session on Summarization of Dialogues and Multi-Party Meetings
by Tirthankar Ghosal 26 Jun '22

26 Jun '22

***2nd SummDial: A SemDial 2022 <https://semdial2022.github.io/#> Special Session on Summarization of Dialogues and Multi-Party Meetings*** ***Website: https://elitr.github.io/automatic-minuting/summdial-2022.html *** ***Submission Deadline: August 1, 2022 *** ***Event Date: August 24, 2022 *** With a sizeable working population of the world going virtual, resulting in information overload from multiple online meetings, imagine how convenient it would be to just hover over past calendar invites and get concise summaries of the meeting proceedings? How about automatically minuting a multimodal multi-party meeting? Are minutes and multi-party dialogue summaries the same? We believe Automatic Minuting is challenging. There are possibly no agreed-upon guidelines for taking minutes, and people adopt different styles to record meeting minutes. The minutes also depend on the meeting's category, the intended audience, and the goal or objective of the meeting. We hosted the First SummDial Special Session at SIGDial 2021. Several significant problems and challenges in multi-party dialogue and meeting summarization came from the discussions in the first SummDial, which we documented in our event report <https://dl.acm.org/doi/10.1145/3527546.3527561>. Since we witnessed enthusiastic participation of the dialogue and summarization community in the first SummDial special session <https://elitr.github.io/automatic-minuting/summdial.html> ( https://elitr.github.io/automatic-minuting/summdial.html), we are hosting the Second SummDial special session at SemDial 2022 <https://semdial2022.github.io/#> (https://semdial2022.github.io/#). This year, we intend to continue discussing these challenges and lessons learned from the previous SummDial. Our goal for this special session would be to stimulate intense discussions around this topic and set the tone for further interest, research, and collaboration in both Speech and Natural Language Processing communities. Our topics of interest are Dialogue Summarization, including but not limited to Meeting Summarization, Chat Summarization, Email Threads Summarization, Customer Service Summarization, Medical Dialogue Summarziation, and Multi-modal Dialogue Summarization. Our shared task on Automatic Minuting (AutoMin) at Interspeech 2021 was another community effort in this direction. Our shared task on Automatic Minuting (AutoMin) <https://elitr.github.io/automatic-minuting/> at Interspeech 2021 <https://www.interspeech2021.org/> was another community effort in this direction. ***Call for papers*** We invite regular and work-in-progress papers that report: - Current research in multi-party dialogue summarization for summarizing meetings, spoken dialogue, using speech, text, or multi-modal data (audio, video), - Challenges in dialogue summarization evaluation (manual + automatic), - New methods and metrics for dialogue summarization evaluation, - Relevant corpus collection, pre-processing, development, and ethical issues involved, - Compare and contrast speech-specific systems to systems imported from text summarization, - Tools for meeting transcript generation and automatic summarization, - Topic detection and span identification in meeting transcripts for multi-topic summarization, - Position papers to reflect on the current state of the art in this topic, to take stock of where we have been, where we are, where we are going and where we should go. Researchers may choose to submit: - ***Long papers*** Authors should submit an anonymous paper of at most 8 pages of content (up to 2 additional pages are allowed for references). - ***Short papers*** Authors should submit a non-anonymized paper of at most 2 pages of content (up to 1 additional page allowed for references). Submissions to this track can be non-archival on request. - ***Position Papers*** Including extended abstracts, work-in-progress, and late-breaking papers. ***Submission Link*** https://easychair.org/my/conference?conf=summdial2022 Submissions should follow the ACL format. Papers that have been or will be submitted to other meetings or publications must provide this information using a footnote on the title page of the submissions. SummDial 2022 cannot accept work for a publication that will be (or has been) published elsewhere. ***Special Session Program*** The special session would consist of a keynote, a panel, oral and/or poster paper presentations. ***Organizers*** - Tirthankar Ghosal <https://elitr.eu/tirthankar-ghosal/>, Institute of Formal and Applied Linguistics, Charles University, Czech Republic - Muskaan Singh, IDIAP, Switzerland - Xinnou Xu, University of Edinburgh, UK - Ondřej Bojar <https://ufal.mff.cuni.cz/ondrej-bojar>, Institute of Formal and Applied Linguistics, Charles University, Czech Republic -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Tirthankar Ghosal Researcher at UFAL, Charles University, CZ https://member.acm.org/~tghosal +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 0

[Corpora-List][CfP] Shared Task on Detecting Entities in the Astrophysics Literature (DEAL) at AACL-IJCNLP 2022
by Tirthankar Ghosal 26 Jun '22

26 Jun '22

***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)*** ***Website: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks *** ***Twitter: https://twitter.com/wiesp_nlp *** A good amount of astrophysics research makes use of data coming from missions and facilities such as ground observatories in remote locations or space telescopes, as well as digital archives that hold large amounts of observed and simulated data. These missions and facilities are frequently named after historical figures or use some ingenious acronym which, unfortunately, can be easily confused when searching for them in the literature via simple string matching. For instance, Planck can refer to the person, the mission, the constant, or several institutions. Automatically recognizing entities such as missions or facilities would help tackle this word sense disambiguation problem. The shared task consists of Named Entity recognition (NER) on samples of text extracted from astrophysics publications. The labels were created by domain experts and designed to identify entities of interest to the astrophysics community. They range from simple to detect (ex: URLs) to highly unstructured (ex: Formula), and from useful to researchers (ex: Telescope) to more useful to archivists and administrators (ex: Grant). Overall 31 different labels are included, and their distribution is highly unbalanced (ex: ~100x more Citations than Proposals). Submissions will be scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity level, and scikit-learn's Matthews correlation coefficient method at the token level. We also encourage authors to propose their own evaluation metrics. A sample dataset and more instructions can be found at: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks Participants (individuals or groups) will have the opportunity to present their findings during the workshop and write a short paper. The best performant or interesting approaches might be invited to further collaborate with the NASA Astrophysics Data System ( https://ui.adsabs.harvard.edu/). The DEAL shared task is a part of the *1st Workshop on Information Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022: * https://ui.adsabs.harvard.edu/WIESP/2022/ ***Please fill in this form to report your intention to participate in the shared task*** https://forms.office.com/r/KKpeKJBLy3 ***Shared Task Submission*** Link to data and scoring scripts: https://huggingface.co/datasets/fgrezes/WIESP2022-NER CodaLab Link to the online competition : https://codalab.lisn.upsaclay.fr/competitions/5062 ***Important Dates*** - Training+Validation Data Release: June 1, 2022 - Validation Phase: June 1 - July 31, 2022 - Test Data Release: August 1, 2022 - Final Scoring Period: August 1 - August 10, 2022 - System Report Submission: August 25, 2022 - Notification: September 25, 2022 - Camera-ready Submission Deadline: October 10, 2022 - Event Date: November 20, 2022 (online) ***All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)*** ***Organizers*** - Tirthankar Ghosal <https://elitr.eu/tirthankar-ghosal>, Charles University, CZ - Sergi Blanco-Cuaresma <https://www.blancocuaresma.com/s/>, Center for Astrophysics | Harvard & Smithsonian, USA - Alberto Accomazzi <https://ui.adsabs.harvard.edu/about/team/team/aaccomazzi.html>, Center for Astrophysics | Harvard & Smithsonian, USA - Robert M. Patton <https://www.ornl.gov/staff-profile/robert-m-patton>, Oak Ridge National Laboratory, USA - Felix Grezes <https://ui.adsabs.harvard.edu/about/team/team/fgrezes.html>, Center for Astrophysics | Harvard & Smithsonian, USA - Thomas Allen <https://ui.adsabs.harvard.edu/about/team/team/tallen.html>, Center for Astrophysics | Harvard & Smithsonian, USA -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Tirthankar Ghosal Researcher at UFAL, Charles University, CZ https://member.acm.org/~tghosal +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Tirthankar Ghosal Researcher at UFAL, Charles University, CZ https://member.acm.org/~tghosal +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 0

[Corpora-List][CfP] First Shared Task on Multi-Perspective Scientific Document Summarization (MuP) at 3rd SDP @ COLING 2022
by Tirthankar Ghosal 26 Jun '22

26 Jun '22

***Call for Participation*** ***First Shared Task on Multi-Perspective Scientific Document Summarization (MuP)*** Website: https://github.com/allenai/mup Generating summaries of scientific documents is known to be a challenging task. The majority of existing work in summarization assumes only one single best gold summary for each given document. Having only one gold summary negatively impacts our ability to evaluate the quality of summarization systems, as writing summaries is a subjective activity. At the same time, annotating multiple gold summaries for scientific documents can be extremely expensive as it requires domain experts to read and understand long scientific documents. This shared task will enable exploring methods for generating multi-perspective summaries. We introduce a novel summarization corpus, leveraging data from scientific peer reviews to capture diverse perspectives from the reader's point of view (each paper has multiple summaries reflecting multiple perspectives of the reader). The MuP shared task is a part of the 3rd Scholarly Document Processing (SDP) workshop at COLING 2022. https://sdproc.org/2022/ More details on the shared task and the corresponding dataset can be found on: https://github.com/allenai/mup ****Please fill in this form to participate in the shared task*** * https://forms.gle/K2UECKvmghzDHUpo7 The leaderboard for the shared task will be announced soon on the website. Shared Task Timelines Training Data Release: May 10, 2022 Test Data Release: June 30, 2022 Evaluation Period: July 1 - July 15, 2022 System Description Papers Due: August 1, 2022 Reviews Notification: August 15, 2022 Camera-Ready Papers Due: September 5, 2022 Event at SDP @ COLING 2022: October 16/17, 2022 MuP 2022 Organizers 1. Guy Feigenblat - Piiano, Israel 2. Arman Cohan - AI2, US 3. Tirthankar Ghosal - ÚFAL, Charles University, Czechia 4. Michal Shmueli-Scheuer - IBM Research AI, Israel -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Tirthankar Ghosal Researcher at UFAL, Charles University, CZ https://member.acm.org/~tghosal +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 0

[Corpora-List]Metadata on texts for BNC 2014 Written
by Mark Davies 24 Jun '22

24 Jun '22

Is anyone aware of metadata for the BNC 2014 *Written* corpus -- source, date, # words, (sub)genre, etc for each of the ~88,000 texts? I've contacted the BNC people, but no response. Thanks, Mark Davies ============================================ Mark Davies english-corpora.org mark-davies.org ============================================

1 0

Jobs: 13 PhD positions + 1 postdoc at FAU Erlangen-Nürnberg (RTG Dimensions of Constructional Space)
by Stephanie Evert 24 Jun '22

24 Jun '22

In our newly established Research Training Group Dimensions of Constructional Space we're offering 13 PhD positions (65%, 3 years) on a wide range of topics connected to Construction Grammar as a common theoretical core, and 1 postdoc position (100%, 4.5 years) on developing a multilingual research constructicon to integrate results obtained in the PhD projects and create a new model for linguistic research documentation. You can apply for one of the 13 PhD projects offered or for the postdoc position, including a motivation letter that explains why you're interested in, and qualified for this particular position. Application deadline: 10 July 2022 More information is available online: Call for applications – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group… Project descriptions – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group… Homepage of the RTG – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group… Full details – https://www.linguistics.phil.fau.eu/files/2022/05/rtg-dimensions-of-constru… Please share this call with anyone who might be interested! Best wishes, Stephanie -- Prof. Stephanie Evert Chair of Computational Corpus Linguistics Friedrich-Alexander-Universität Erlangen-Nürnberg Bismarckstr. 6, 91054 Erlangen, Germany office: Bismarckstr. 6, room 4.000 phone: +49 9131 8522426 e-mail: stephanie.evert(a)fau.de web: www.linguistik.fau.de

1 0

2026

2025

2024

2023

2022

Corpora June 2022