Dear colleagues,
the second edition of the Workshop on Deep Learning and Neural Approaches
for Linguistic (Linked) Data (DL4LD) will be held in Vilnius, Lithuania &
online on September 22, 2022, collocated with the conference on “LLOD
approaches for language data research and management” (
http://llodapproaches2022.mruni.eu/).
As a follow-up to the First Workshop on Deep Learning and Neural Approaches
for Linguistic Data in 2021, this workshop aims at bringing together deep
learning and neural approaches with Linguistic Linked Data. We invite
research papers, application descriptions, system demonstrations, and
position papers that discuss the interconnection of both areas.
Relevant topics for the workshop include, but are not limited to, the
following areas:
* Deep Learning for Linguistic Linked (Open) Data, modeling, resources &
interlinking
* Deep Learning and LLOD in NLP
* LLOD and Deep Learning for Digital Humanities
* Enhancement of language models with structured linguistic data
* Use cases combining language models and structured linguistic data
Submissions in the form of extended abstracts will be published in an
online book of abstracts.
Important Dates:
Submission deadline: 14 August 2022
Notification deadline: 28 August 2022
Camera ready version due: 4 September 2022
Workshop date: 22 September 2022
Submission guidelines:
Submission type: extended abstracts
Length: max. 2 pages including references
Format: Springer Lecture Notes in Computer Science (LNCS); LaTeX, Overleaf,
OpenDocument or MS Word
Registration and participation: free of charge
Submission link:
https://easychair.org/my/conference?conf=nexuslinguarumdl2022
Optionally, extended versions of the abstracts can be submitted as full
papers describing original research to the special issue of the Rasprave
journal (subject to a separate peer review process; more information:
http://llodapproaches2022.mruni.eu/?page_id=16, deadline: 1 December 2022)
Scientific committee:
Giedrė Valūnaitė-Oleškevičienė, Mykolas Romeris University
Radovan Garabik, Ľ. Štúr Institute of Linguistics, Slovak Academy of
Sciences
Dagmar Gromann, University of Vienna
Jorge Gracia, University of Zaragoza
Hugo Gonçalo Oliveira, University of Coimbra
Chaya Liebeskind, Jerusalem College of Technology
Purificação Silvano, University of Porto
Workshop webpage & more information:
http://llodapproaches2022.mruni.eu/?page_id=239
Contact e-mail: dl4ld(a)juls.savba.sk
Apologies for cross posting.
FIRST CALL FOR PARTICIPATION
CASE-2022 Shared Task: Multilingual Protest Event Detection
================================================
We invite you to participate in the CASE-2022 Shared Task 1: Multilingual
Protest Event Detection. The task is being held as part of the 5th Workshop
on Challenges and Applications of Automated Extraction of Socio-political
Events from Text (CASE 2022). This is a continuation of the shared task
CASE 2021 Hürriyetoğlu et al. (2021) [1]. The training set is the same as
CASE 2021. But the evaluation phase will include data from additional
languages in CASE 2022. Please see the workshop website for further
details: https://emw.ku.edu.tr/case-2022/ & Contact address:
ali.hurriyetoglu(a)gmail.com
Important Dates
================================================
Training data available: please follow instructions on the repository of
the task: https://github.com/emerging-welfare/case-2022-multilingual-event.
You will obtain the test data for CASE 2021 as well, with a Codalab page (
https://competitions.codalab.org/competitions/31639) available to obtain a
score for your predictions.
New test data available: September 15, 2022
Test end: September 25, 2022
System Description Paper submissions due: October 2, 2022
Notification to authors after review: Oct 09, 2022
Camera-ready: Oct 16, 2022
Workshop period @ EMNLP: Dec 7-8, 2022
Motivation
================================================
Event extraction has recently attracted a lot of attention in the NLP
community, as well as among political and social scientists: It has emerged
as a robust technology for identifying the most important information
inside media streams. At the same time, it provides a basis for
quantitative assessment of the political situation in the World. Event
extraction has long been a challenge for the natural language processing
(NLP) community as it requires sophisticated methods for detection and
classification of events: machine learning, syntactic and semantic parsing,
event ontologies, event co-reference resolution, acquisition of language
resources, grammar learning, terminology learning, temporal and spatial
reasoning, and other algorithmic approaches (Pustojevsky et al. 2003;
Boroş, 2018; Chen et al. 2021). Social and political scientists have been
working to create socio-political event (SPE) databases such as ACLED,
EMBERS, GDELT, ICEWS, MMAD, PHOENIX, POLDEM, SPEED, TERRIER, and UCDP
following similar steps for decades. These projects and the new ones
increasingly rely on machine learning (ML), deep learning (DL), and NLP
methods to deal better with the vast amount and variety of data in this
domain (Hürriyetoğlu et al. 2020). Automation offers scholars not only the
opportunity to improve existing practices, but also to vastly expand the
scope of data that can be collected and studied, thus potentially opening
up new research frontiers within the field of SPEs, such as
politically-motivated violence and social movements. Automated approaches,
however, suffer from major issues like bias, generalizability, class
imbalance, training data limitations, and ethical issues that have the
potential to affect the results and their use drastically (Lau and Baldwin
2020; Bhatia et al. 2020; Chang et al. 2019).
SPEs are varied and nuanced. Both the political context and the local
language used may affect whether and how they are reported. Therefore, all
steps of information collection (event definition, language resources, and
manual or algorithmic steps) may need to be constantly updated, leading to
a series of challenging questions: Do events related to minority groups are
represented well? Are new types of events covered? Are the event
definitions and their operationalization comparable across systems? This
workshop aims at finding answers to these questions as well. Inspiring
innovative technological and scientific solutions for tackling these issues
and quantifying the quality of the results.
Task Overview
================================================
The task consists of four subtasks relevant to Event Causality
Identification:
Subtask 1: Document classification ⇒ Does a news article contain
information about a past or ongoing event?
Subtask 2: Sentence classification ⇒ Does a sentence contain information
about a past or ongoing event?
Subtask 3: Event sentence coreference identification ⇒ Which event
sentences (subtask 2) are about the same event?
Subtask 4: Event extraction ⇒ What is the event trigger and what are its
arguments?
Participants may design mono- or multilingual solutions that work on a
single, multiple, or all subtasks concurrently. Participants are also
allowed to combine annotations for either task. Additional datasets can be
utilized for training or validation purposes.
The systems developed for one or more of these subtasks will be invited to
process a news archive to measure the correlation between automatically and
manually created event datasets. This task will be referred as Task 2.
Please see Guigni et al. (2021) for the similar task we performed last year.
You can find the task repository at
https://github.com/emerging-welfare/case-2022-multilingual-event, which
contains sample data and scripts.
Data
================================================
Training data: The training data we use for the Task 1 is the training data
for CASE-2021 and consists of English, Portuguese, and Spanish news
articles. Please find the detailed description of the data on Hürriyetoğlu
et al. (2021), https://aclanthology.org/2021.case-1.11.pdf.
Test data: There will be two test sets for Subtask 1. These are i) test
data from CASE 2021, which is already available, and ii) new test data for
CASE 2022 including new data both in existing and in new languages, e.g.
Japanese, Urdu, Mandarin, and Turkish. The test data for subtasks 2, 3, and
4 will be the same as CASE 2021 test data for subtasks 2, 3, and 4
respectively.
Evaluation
================================================
The F1-macro score on predictions for test data in each language will be
calculated separately for Subtasks 1 and 2. The subtask 2 will be evaluated
using F1-macro. Subtask 3 will be evaluated using scorch - a python
implementation of CoNLL-2012 average score for the test data (
https://github.com/LoicGrobol/scorch). Finally, we will use CoNLL-03
evaluation script (https://github.com/sighsmile/conlleval) for subtask 4.
The new test data for subtask 1 may be utilized to improve performance on
these subtasks.
The evaluation will be managed on a Codalab page. Participants will submit
their scores and the highest performing submission will be used for ranking
teams. There will be a limit on the number of submissions that can be
performed. After the test deadline, an additional Codalab page will be set
for additional scoring.
Participation
================================================
Please send your team name and the participation form that is on
https://github.com/emerging-welfare/case-2022-multilingual-event/blob/main/…
to ali.hurriyetoglu(a)gmail.com. We will share the CASE-2021 data with you
right away and notify you when the CASE-2022 evaluation data is ready.
Publication
================================================
All participating teams will have the opportunity to submit their system
description papers to be considered for publication in the workshop
proceedings published by ACL Anthology. The papers should be submitted on
http://softconf.com/emnlp2022/case2022.
Organization
================================================
Ali Hürriyetoğlu, KNAW Humanities Cluster, DHLab, the Netherlands
Erdem Yörük, Koc University, Turkey
Hristo Tanev, European Commission, Joint Research Centre (EU JRC), Italy
Osman Mutlu, Koc University, Turkey
Vanni Zavarella, Italy
Reyyan Yeniterzi, Sabanci University, Turkey
Fatih Beyhan, Sabanci University, Turkey
Francielle Vargas, University of São Paulo, Brazil
Fırat Duruşan, Koc University, Turkey
Yaoyao Dai, UNC Charlotte, United States
Aaqib Javid, Koc University, Turkey
Benjamin Radford, UNC Charlotte, United States
Kalliopi Zervanou, Leiden University, the Netherlands
Milena Slavcheva, Bulgarian Academy of Sciences, Bulgaria
Niklas Stoehr, ETH Zurich, Switzerland
Guillem Ramirez, ETH Zurich, Switzerland
Shaina Raza, Public Health Ontario and University of Toronto, Canada
Farhana Ferdousi Liza (University of East Anglia, United Kingdom
Tadashi Nomoto, National Institute of Japanese Literature, Japan
Alaeddin Selçuk Gürel, Huawei, Turkey
YiJyun Lin, University of Arizona, U.S.A & National Taiwan University,
Taiwan
Tiancheng Hu, ETH Zürich, Switzerland
Onur Uca, Mersin University, Turkey
Fiona Anting Tan, Institute of Data Science, National University of
Singapore, Singapore
Hansi Hettiarachchi, Birmingham City University, United Kingdom
References
================================================
[1] Giorgi, S., Zavarella, V., Tanev, H., Stefanovitch, N., Hwang, S.,
Hettiarachchi, H., ... & Hurriyetoglu, A. (2021, January). Discovering
black lives matter events in the United States: Shared task 3, In
Proceedings of the 4th Workshop on Challenges and Applications of Automated
Extraction of Socio-political Events from Text (CASE 2021) (pp. 218-227).
ASSOC COMPUTATIONAL LINGUISTICS-ACL. URL:
https://aclanthology.org/2021.case-1.27/
[2] Hürriyetoğlu, A., Mutlu, O., Yörük, E., Liza, F. F., Kumar, R., &
Ratan, S. (2021, August). Multilingual Protest News Detection - Shared Task
1, CASE 2021. In Proceedings of the 4th Workshop on Challenges and
Applications of Automated Extraction of Socio-political Events from Text
(CASE 2021) (pp. 79-91). URL: https://aclanthology.org/2021.case-1.11/
*LaTeCH-CLfL 2022:*
***The 6th Joint SIGHUM Workshop on Computational Linguistics for
Cultural Heritage, Social Sciences, Humanities and Literature*
to be held in a /hybrid/ manner on October 16, 2022 in conjunction with
COLING 2022 in Gyeongju, Republic of Korea.
https://sighum.wordpress.com/events/latech-clfl-2022/
Third Call for Papers (with apologies for cross-posting)
Organisers: Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter,
Stan Szpakowicz
LaTeCH-CLfL 2022 is the sixth in a series of meetings for NLP
researchers who work with data from the broadly understood arts,
humanities and social sciences, and for specialists in those disciplines
who apply NLP techniques in their work. The workshop continues a long
tradition of annual meetings. The SIGHUM Workshops on Language
Technology for Cultural Heritage, Social Sciences, and Humanities
(LaTeCH) ran ten times in 2007-2016. The five Workshops on Computational
Linguistics for Literature (CLfL) took place in 2012-2016. The first
five joint workshops (LaTeCH-CLfL) were held in 2017-2021.
*Topics and Content*
In the Humanities, Social Sciences, Cultural Heritage and literary
communities, there is increasing interest in, and demand for, NLP
methods for semantic and structural annotation, intelligent linking,
discovery, querying, cleaning and visualization of both primary and
secondary data. This is even true of primarily non-textual collections,
given that text is also the pervasive medium for metadata. Such
applications pose new challenges for NLP research: noisy, non-standard
textual or multi-modal input, historical languages, vague research
concepts, multilingual parts within one document, and so no. Digital
resources often have insufficient coverage; resource-intensive methods
require (semi-)automatic processing tools and domain adaptation, or
intense manual effort (e.g., annotation).
Literary texts bring their own problems, because navigating this form of
creative expression requires more than the typical information-seeking
tools. Examples of advanced tasks include the study of literature of a
certain period, author or sub-genre, recognition of certain literary
devices, or quantitative analysis of poetry.
NLP methods applied in this context not only need to achieve high
performance, but are often applied as a first step in research or
scholarly workflow. That is why it is crucial to interpret model results
properly; model interpretability might be more important than raw
performance scores, depending on the context.
More generally, there is a growing interest in computational models
whose results can be used or interpreted in meaningful ways. It is,
therefore, of mutual benefit that NLP experts, data specialists and
Digital Humanities researchers who work in and across their domains get
involved in the Computational Linguistics community and present their
fundamental or applied research results. It has already been
demonstrated how cross-disciplinary exchange not only supports work in
the Humanities, Social Sciences, and Cultural Heritage communities but
also promotes work in the Computational Linguistics community to build
richer and more effective tools and models.
Topics of interest include, but are not limited to, the following:
• adaptation of NLP tools to Cultural Heritage, Social Sciences,
Humanities and literature;
• automatic error detection and cleaning of textual data;
• complex annotation schemas, tools and interfaces;
• creation (fully- or semi-automatic) of semantic resources;
• creation and analysis of social networks of literary characters;
• discourse and narrative analysis/modelling, notably in literature;
• emotion analysis for the humanities and for literature;
• generation of literary narrative, dialogue or poetry;
• identification and analysis of literary genres;
• linking and retrieving information from different sources,
media, and domains;
• modelling dialogue literary style for generation;
• modelling of information and knowledge in the Humanities,
Social Sciences, and Cultural Heritage;
• profiling and authorship attribution;
• search for scientific and/or scholarly literature;
• work with linguistic variation and non-standard or historical
use of language.
*Information for Authors*
We invite papers on original, unpublished work in the topic areas of the
workshop. In addition to long papers, we will consider short papers and
system descriptions (demos). We also welcome position papers.
• Long papers, presenting completed work, may consist of up to
eight (8) pages of content plus additional pages of references (just two
if possible -:). The final camera-ready versions of accepted long papers
will be given one additional page of content (up to 9 pages) so that
reviewers’ comments can be taken into account.
• A short paper / demo presenting work in progress, or the
description of a system, and may consist of up to four (4) pages of
content plus additional pages of references (one if you can). Upon
acceptance, short papers will be given five (5) content pages in the
proceedings.
• A position paper — clearly marked as such — should not exceed
eight (8) pages including references.
All submissions are to use the ACL stylesheets adopted by COLING 2022
(see https://coling2022.org/Submission, styles for LaTeX
<https://github.com/acl-org/acl-style-files/blob/master/latex/acl_latex.tex>,
MS Word
<https://github.com/acl-org/acl-style-files/blob/master/word/acl.docx>
or Overleaf <https://www.overleaf.com/read/crtcwgxzjskr>). Papers should
be submitted electronically, in PDF, at
https://www.softconf.com/coling2022/LaTeCH-CLfL_2022/.
Reviewing will be double-blind. Please do not include the authors’ names
and affiliations, or any references to Web sites, project names,
acknowledgements and so on — anything that immediately reveals the
authors’ identity. Self-references should be kept to a reasonable
minimum, and anonymous citations cannot be used.
Accepted papers will be published in the workshop proceedings available
as usual in the ACL Anthology.
*Important Dates*
Papers due: July 11, 2022
Notification of acceptance: August 22, 2022
Camera-ready papers due: September 5, 2022
Workshop: October 16, 2022
The two due dates are Anywhere On Earth.
*More on the organisers*
Stefania Degaetano-Ortlieb, Language Science and Technology, Saarland
University
Anna Kazantseva, National Research Council Canada
Nils Reiter, Department for Digital Humanities, University of Cologne
Stan Szpakowicz, School of Electrical Engineering and Computer Science,
University of Ottawa
*Contact*
latech-clfl(a)googlegroups.com
Apologies for cross-posting
*******************************
*** 2nd CFP- SocialDisNER track: Detection of Disease Mentions in Social
Media ***
(SMM4H Shared Task at COLING2022)
https://temu.bsc.es/socialdisner/
<https://mailtrack.io/trace/link/3e71c1bed2ad679dff794153eb37d9eefb27b320?ur…>
Development set, large-scale silver standard, and disease-comoborbility
network are now available
Despite the high impact & practical relevance of detecting diseases
automatically from social media for a diversity of applications, few
manually annotated corpora generated by healthcare practitioners to
train/evaluate advanced entity recognition tools are currently available.
Developing disease recognition tools for social media is critical for:
-
Real-time disease outbreak surveillance/monitoring
-
Characterization of patient-reported symptoms
-
Post-market drug safety
-
Epidemiology and population health,
-
Public opinion mining & sentiment analysis of diseases
-
Detection of hate speech/exclusion of sick people
-
Prevalence of work-associated diseases
SocialDisNER is the first track focusing on the detection of disease
mentions in tweets written in Spanish, with clear adaptation potential not
only to English but also other romance languages like Portuguese, French or
Italian spoken by over 900 million people worldwide.
For this track the SocialDisNER corpus was generated, a manual collection
of tweets enriched for first-hand experiences by patients and their
relatives as well as content generated by patient-associations (national,
regional, local) as well as healthcare institutions covering all main
diseases types including cancer, mental health, chronic and rare diseases
among others.
As a novelty, we have published a large-scale additional corpus of +85k
tweets annotated with diseases, in addition to a disease gazzetter
extracted from medical terminologies and a disease-comoborbility network
extracted from the large-scale additional corpus.
Info:
-
Web: https://temu.bsc.es/socialdisner/
<https://mailtrack.io/trace/link/9d2545b93fc6172e6e1cfd016ca9042d4a5e3398?ur…>
-
Data:
<https://mailtrack.io/trace/link/75e30761b1d2a05960c484153cdc496035092d00?ur…>
https://doi.org/10.5281/zenodo.6359365
<https://mailtrack.io/trace/link/a618de145de70b0ada1fd37699b0727391efc5fb?ur…>
-
Additional large-scale data: https://zenodo.org/record/6773099
<https://mailtrack.io/trace/link/13e20bfcbf66f28252dc074da5459a1d3aed7b25?ur…>
-
Registration: https://temu.bsc.es/socialdisner/registration
<https://mailtrack.io/trace/link/eef3909672c1e204f68f7e0a3f9b9dddcf6ba744?ur…>
Schedule
-
Development Set Release: June 14th
-
Additional large-scale corpus with disease annotations: June 28th
-
Test Set Release: July 11th
-
Participant prediction Due: July 15th
-
Test set evaluation release: July 25th
-
Proceedings paper submission: August 1st
-
Camera ready papers: September 1st
-
SMM4H workshop @ COLING 2022: October 12-17
Publications and SMM4H (COLING 2022) workshop
Participating teams have the opportunity to submit a short system
description paper for the SMM4H proceedings (7th SMM4H Workshop, co-located
at COLING 2022). More details are available at
https://healthlanguageprocessing.org/smm4h-2022/
<https://mailtrack.io/trace/link/183f37b0b6c75261379edcec81c92588f0e28eda?ur…>
SocialDisNER Organizers
-
Luis Gascó, Barcelona Supercomputing Center, Spain
-
Darryl Estrada, Barcelona Supercomputing Center, Spain
-
Eulàlia Farré-Maduell, Barcelona Supercomputing Center, Spain
-
Salvador Lima, Barcelona Supercomputing Center, Spain
-
Martin Krallinger, Barcelona Supercomputing Center, Spain
Scientific Committee & SMM4H Organizers
-
Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center, USA
-
Davy Weissenbacher, University of Pennsylvania, USA
-
Arjun Magge, University of Pennsylvania, USA
-
Ari Z. Klein, University of Pennsylvania, USA
-
Ivan Flores, University of Pennsylvania, USA
-
Karen O’Connor, University of Pennsylvania, USA
-
Raul Rodriguez-Esteban, Roche Pharmaceuticals, Switzerland
-
Lucia Schmidt, Roche Pharmaceuticals, Switzerland
-
Juan M. Banda, Georgia State University, USA
-
çAbeed Sarker, Emory University, USA
-
Yuting Guo, Emory University, USA
-
Yao Ge, Emory University, USA
-
Elena Tutubalina, Insilico Medicine, Hong Kong
-
Jey Han Hau, The University of Melbourne (Australia)
-
Luca Maria Aiello, IT University of Copenhagen (Denmark)
-
David Camacho, Applied Intelligence and Data Analysis Research Group,
Universidad Politécnica de Madrid (Spain)
-
Torsten Zesch, Fernuniversitat in Hagen (Germany)
-
Eiji ARAMAKI, Nara Institute of Science and Technology (Japan)
-
Rafael Valencia-Garcia, Universidad de Murcia (Spain)
-
Antonio Jimeno Yepes, RMIT University (Australia)
-
Carlos Gómez-Rodríguez, Universidad da Coruña (Spain)
-
Anália Lourenço, Universidade de Vigo (Spain)
-
Paloma Martínez, Universidad Carlos III de Madrid (Spain)
-
Eugenio Martinez Cámara, Universidad de Granada (Spain)
-
Gema Bello Orgaz, Applied Intelligence and Data Analysis Research
Group, Universidad Politécnica de Madrid (Spain)
-
Juan Antonio Lossio-Ventura, National Institutes of Health (USA)
-
Héctor D. Menendez, King’s College London (UK)
-
Manuel Montes y Gómez, National Institute of Astrophysics, Optics and
Electronics (Mexico)
-
Helena Gómez Adorno, Universidad Nacional Autónoma de México (Mexico)
-
Rodrigo Agerri, IXA Group (HiTZ Centre), University of Basque Country
EHU (Spain)
-
Miguel A. Alonso, Universidad da Coruña (Spain)
-
Ferran Pla, Universidad Politécnica de Valencia (Spain)
-
Jose Alberto Benitez-Andrades, Universidad de Leon (Spain)
Darryl Estrada
Full Stack - Web Developer
* Text Mining Unit | Barcelona Supercomputing Center*
Multiple CSIRO Early Research Career (CERC) Postdoctoral Fellowships are available in Natural Language Processing.
Australia's CSIRO Data61<https://data61.csiro.au/> is looking for multiple CERC Fellows to join an NLP team of researchers and engineers to work on a number of public-good projects.
NLP research areas of interest: information extraction, text summarization, question answering, semantic parsing, semantic role labelling, paraphrase detection and generation, and NLP for Information Retrieval.
Domains of research: (1) Working with scientific literature, and (2) Computational Social Science
About the CSIRO Postdoctoral Fellowship program:
CSIRO Early Research Career (CERC) Postdoctoral Fellowships provide opportunities to scientists and engineers who have completed their doctorate and have less than three years of relevant postdoctoral work experience. These fellowships aim to develop the next generation of future leaders of the innovation system. Relocation costs are supported for successful candidates.
Location: Sydney, NSW
Salary: AU$89k - AU$98k plus up to 15.4% superannuation
Tenure: Specified term of 3 years
Reference: 77986
Applications close: 7 July 2022
To be considered you will need:
* A doctorate (or will shortly satisfy the requirements of a PhD) in a relevant discipline area, such as Computer Science (Natural Language Processing/Computational Linguistics or Machine Learning with text data).
* Experience using deep learning and other machine learning techniques in NLP.
* High-level written and oral communication skills with the ability to represent the research team effectively internally and externally, including the presentation of research outcomes at national and international conferences.
* A sound history of publication in peer-reviewed journals and/or conferences.
For more information or to apply, please visit: https://jobs.csiro.au/job-invite/77986/