***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)***
***Website: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks ***
***Twitter: https://twitter.com/wiesp_nlp ***
A good amount of astrophysics research makes use of data coming from
missions and facilities such as ground observatories in remote locations or
space telescopes, as well as digital archives that hold large amounts of
observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which,
unfortunately, can be easily confused when searching for them in the
literature via simple string matching. For instance, Planck can refer to
the person, the mission, the constant, or several institutions.
Automatically recognizing entities such as missions or facilities would
help tackle this word sense disambiguation problem.
The shared task consists of Named Entity recognition (NER) on samples of
text extracted from astrophysics publications. The labels were created by
domain experts and designed to identify entities of interest to the
astrophysics community. They range from simple to detect (ex: URLs) to
highly unstructured (ex: Formula), and from useful to researchers (ex:
Telescope) to more useful to archivists and administrators (ex: Grant).
Overall 31 different labels are included, and their distribution is highly
unbalanced (ex: ~100x more Citations than Proposals). Submissions will be
scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity
level, and scikit-learn's Matthews correlation coefficient method at the
token level. We also encourage authors to propose their own evaluation
metrics. A sample dataset and more instructions can be found at:
https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks
Participants (individuals or groups) will have the opportunity to present
their findings during the workshop and write a short paper. The best
performant or interesting approaches might be invited to further
collaborate with the NASA Astrophysics Data System (
https://ui.adsabs.harvard.edu/).
The DEAL shared task is a part of the *1st Workshop on Information
Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022: *
https://ui.adsabs.harvard.edu/WIESP/2022/
***Please fill in this form to report your intention to participate in the
shared task***
https://forms.office.com/r/KKpeKJBLy3
***Shared Task Submission***
Link to data and scoring scripts:
https://huggingface.co/datasets/fgrezes/WIESP2022-NER
CodaLab Link to the online competition :
https://codalab.lisn.upsaclay.fr/competitions/5062
***Important Dates***
-
Training+Validation Data Release: June 1, 2022
-
Validation Phase: June 1 - July 31, 2022
-
Test Data Release: August 1, 2022
-
Final Scoring Period: August 1 - August 10, 2022
-
System Report Submission: August 25, 2022
-
Notification: September 25, 2022
-
Camera-ready Submission Deadline: October 10, 2022
-
Event Date: November 20, 2022 (online)
***All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)***
***Organizers***
-
Tirthankar Ghosal <https://elitr.eu/tirthankar-ghosal>, Charles
University, CZ
-
Sergi Blanco-Cuaresma <https://www.blancocuaresma.com/s/>, Center for
Astrophysics | Harvard & Smithsonian, USA
-
Alberto Accomazzi
<https://ui.adsabs.harvard.edu/about/team/team/aaccomazzi.html>, Center
for Astrophysics | Harvard & Smithsonian, USA
-
Robert M. Patton <https://www.ornl.gov/staff-profile/robert-m-patton>,
Oak Ridge National Laboratory, USA
-
Felix Grezes <https://ui.adsabs.harvard.edu/about/team/team/fgrezes.html>,
Center for Astrophysics | Harvard & Smithsonian, USA
-
Thomas Allen <https://ui.adsabs.harvard.edu/about/team/team/tallen.html>,
Center for Astrophysics | Harvard & Smithsonian, USA
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
***Call for Participation***
***First Shared Task on Multi-Perspective Scientific Document Summarization
(MuP)***
Website: https://github.com/allenai/mup
Generating summaries of scientific documents is known to be a challenging
task. The majority of existing work in summarization assumes only one
single best gold summary for each given document. Having only one gold
summary negatively impacts our ability to evaluate the quality of
summarization systems, as writing summaries is a subjective activity. At
the same time, annotating multiple gold summaries for scientific documents
can be extremely expensive as it requires domain experts to read and
understand long scientific documents. This shared task will enable
exploring methods for generating multi-perspective summaries. We introduce
a novel summarization corpus, leveraging data from scientific peer reviews
to capture diverse perspectives from the reader's point of view (each paper
has multiple summaries reflecting multiple perspectives of the reader).
The MuP shared task is a part of the 3rd Scholarly Document Processing
(SDP) workshop at COLING 2022. https://sdproc.org/2022/
More details on the shared task and the corresponding dataset can be found
on: https://github.com/allenai/mup
****Please fill in this form to participate in the shared task*** *
https://forms.gle/K2UECKvmghzDHUpo7
The leaderboard for the shared task will be announced soon on the website.
Shared Task Timelines
Training Data Release: May 10, 2022
Test Data Release: June 30, 2022
Evaluation Period: July 1 - July 15, 2022
System Description Papers Due: August 1, 2022
Reviews Notification: August 15, 2022
Camera-Ready Papers Due: September 5, 2022
Event at SDP @ COLING 2022: October 16/17, 2022
MuP 2022 Organizers
1.
Guy Feigenblat - Piiano, Israel
2.
Arman Cohan - AI2, US
3.
Tirthankar Ghosal - ÚFAL, Charles University, Czechia
4.
Michal Shmueli-Scheuer - IBM Research AI, Israel
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Is anyone aware of metadata for the BNC 2014 *Written* corpus -- source,
date, # words, (sub)genre, etc for each of the ~88,000 texts?
I've contacted the BNC people, but no response.
Thanks,
Mark Davies
============================================
Mark Davies
english-corpora.orgmark-davies.org
============================================
In our newly established Research Training Group
Dimensions of Constructional Space
we're offering
13 PhD positions (65%, 3 years)
on a wide range of topics connected to Construction Grammar as a common theoretical core, and
1 postdoc position (100%, 4.5 years) on developing a multilingual research constructicon
to integrate results obtained in the PhD projects and create a new model for linguistic research documentation.
You can apply for one of the 13 PhD projects offered or for the postdoc position, including a motivation letter that explains why you're interested in, and qualified for this particular position.
Application deadline: 10 July 2022
More information is available online:
Call for applications – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group…
Project descriptions – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group…
Homepage of the RTG – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group…
Full details – https://www.linguistics.phil.fau.eu/files/2022/05/rtg-dimensions-of-constru…
Please share this call with anyone who might be interested!
Best wishes,
Stephanie
--
Prof. Stephanie Evert
Chair of Computational Corpus Linguistics
Friedrich-Alexander-Universität Erlangen-Nürnberg
Bismarckstr. 6, 91054 Erlangen, Germany
office: Bismarckstr. 6, room 4.000
phone: +49 9131 8522426
e-mail: stephanie.evert(a)fau.de
web: www.linguistik.fau.de
*For this iteration of the shared task, we especially encourage those who
participated or have trained models on TRAC - 2018 and /or TRAC - 2020
Shared Task datasets to submit the predictions of their earlier models on
our current test set. They are, of course, free to submit predictions on
new models / current datasets as well.*
*3rd Workshop on Threat, Aggression and Cyberbullying (TRAC - 2022)*
>
> &
> *Shared Tasks on Bias, Threat and Aggression Identification in Context*
> Co-located with COLING 2022, October 12 - 17, 2022
> Gyeongju, the Republic of Korea
>
>
> *Second Call for Papers and Shared Task Participation*
>
> *Workshop Website*: https://sites.google.com/view/trac2022/home
> *Paper Submission*: https://www.softconf.com/coling2022/TRAC-2022/
> *Shared Task Website:* https://codalab.lisn.upsaclay.fr/competitions/4753
>
> *Submission Deadline*: July 11, 2022 (Regular) / July 31, 2022 (ACL ARR)
>
> As in the earlier editions of the workshop, TRAC-2022 will focus on the
> applications of NLP, ML and pragmatic studies on aggression and
> impoliteness to tackle these issues. We invite *long (8 pages)* and *short
> papers (4 pages)* as well as *position papers* and opinion pieces (5 - 20
> pages), *demo proposals* and *non-archival extended abstracts* (2 pages)
> based on, but not limited to, any of the following themes from academic
> researchers, industry and any other group / team working in the area.
>
> - Theories and models of aggression and conflict in language.
> - Cyberbullying, threatening, hateful, aggressive and abusive language
> on the web.
> - Multilingualism and aggression.
> - Resource Development - Corpora, Annotation Guidelines and Best
> Practices for threat and aggression detection.
> - Computational Models and Methods for aggression, hate speech and
> offensive language detection in text and speech.
> - Detection of threats and bullying on the web.
> - Automatic censorship and moderation: ethical, legal and
> technological issues and challenges.
>
>
> *Shared Tasks*
> TRAC-2022 will include two novel shared tasks:
>
> *Task 1: Bias, Threat and Aggression Identification in Context*
> The first shared task will be a structured prediction task for recognising
> (a) Aggression, Gender Bias, Racial Bias, Religious Intolerance and Bias
> and Casteist Bias on social media and (b) the "discursive role" of a given
> comment in the context of the previous comment(s). The participants will be
> given a "thread" of comments with information about the presence of
> different kinds of biases and threats (viz. gender bias, gendered threat
> and none, etc) and its discursive relationship to the previous comment as
> well as the original post (viz. attack, abet, defend, counter-speech and
> gaslighting). In a series / thread of comments, participants will be
> required to predict the presence of aggression and bias of each comment,
> possibly making use of the context.
>
> *Task 2: Generalising across domains - COVID-19*
> For this sub-task, the test set will be sampled from the COVID-19 related
> conversation, annotated with levels of aggression, offensiveness and hate
> speech. Across the globe, during the pandemic, we have seen various kinds
> of novel aggressive and biased conversation on social media - in fact, in
> some cases there was massive escalation of religious and other kinds of
> intolerance and polarisation. The participants of TRAC-1 and TRAC-2 shared
> tasks are especially encouraged to submit the predictions their their
> earlier models on this test set. They may also train new models jointly on
> both the datasets. Those who didn't participate in earlier tasks are also
> invited to submit the predictions for this task by training models on the
> two datasets and are encouraged to submit the predictions on the respective
> test sets of the earlier tasks along with the predictions on the current
> dataset (to enable comparison). New participants may also use TRAC-1 or
> TRAC-2 dataset or a combination of the two for building the models. The aim
> of the task is to evaluate the generalisability of our systems in
> unexpected and novel situations.
>
> For participation, visit the Codalab website -
> https://codalab.lisn.upsaclay.fr/competitions/4753
>
> For any clarifications, contact coling.aggression(a)gmail.com.
>
> Looking forward to your participation!
>
>
Multiple CSIRO Early Research Career Postdoctoral Fellowships are available in Natural Language Processing.
CSIRO Data61 is looking for multiple CERC Fellows to join an NLP team of researchers and engineers. Relevant NLP research areas: information extraction, text summarization, question answering, semantic parsing, semantic role labelling, paraphrase detection and generation, and NLP for Information Retrieval.
About the CSIRO Postdoctoral Fellowship program:
CSIRO Early Research Career (CERC) Postdoctoral Fellowships provide opportunities to scientists and engineers who have completed their doctorate and have less than three years of relevant postdoctoral work experience. These fellowships aim to develop the next generation of future leaders of the innovation system.
Location: Sydney, NSW
Salary: AU$89k - AU$98k plus up to 15.4% superannuation
Tenure: Specified term of 3 years
Reference: 77986
Applications close: 7 July 2022
To be considered you will need:
* A doctorate (or will shortly satisfy the requirements of a PhD) in a relevant discipline area, such as Computer Science (Natural Language Processing/Computational Linguistics or Machine Learning with text data).
* Experience using deep learning and other machine learning techniques in NLP.
* High-level written and oral communication skills with the ability to represent the research team effectively internally and externally, including the presentation of research outcomes at national and international conferences.
* A sound history of publication in peer-reviewed journals and/or conferences.
For more information or to apply, please visit: https://jobs.csiro.au/job-invite/77986/
Hello,
I'm looking for any open source or cloud-hosted solution for complex word identification or word difficulty rating in French for a reading application.
As a backup plan we can use measures like corpus frequency, length, number of senses, but we're hoping someone has already made a tool available.
We found this but that's it: https://github.com/sheffieldnlp/cwi
Would appreciate any tips!
Thanks,
Chris
Christopher Collins [he/him<https://medium.com/gender-inclusivit/why-i-put-pronouns-on-my-email-signatu…>]
Associate Professor - Faculty of Science
Canada Research Chair in Linguistic Information Visualization
Ontario Tech University
vialab.ca<http://vialab.ca/>
Dear colleagues,
We have developed a toolkit to interpret deep NLP models with a focus on
*neuron interpretation*.
The toolkit is available as:
pip install neurox
Project Website: https://neurox.qcri.org/
Documentation: https://neurox.qcri.org/docs/
Git: https://github.com/fdalvi/NeuroX
NeuroX implements a number of features that facilitate model interpretation
such as:
- Word-level activation extractions (contextualized embeddings)
- Integration with Huggingface
- Implementation of a number of neuron probing methods (identify neurons
learning a linguistic property such as noun)
- Visualize the behavior of a neuron across a set of examples
- and many more
If you have any questions or feedback, feel free to reach out to me or open
an issue on the project's github.
Thank you,
--
Regards;
Hassan Sajjad
Dear all,
we are glad to present to you a new BERT-based model for Sentiment Analysis ( for Italian ), trained and benchmarked on multiple domains!
The model has been jointly optimized and fine-tuned on multiple domains such as product reviews, social media comments and financial news.
The model has achieved better performance than fine-tuning it in isolation on every single dataset, reaching state-of-the-art results in the majority of the datasets that we used.
To get and use the model please, follow the instructions available here: https://sisl.disi.unitn.it/itfn-corpus/ <https://sisl.disi.unitn.it/itfn-corpus/> or
you can go directly to the official GitLab repo: https://gitlab.com/sislab/multi-source-multi-domain-sentiment-analysis-with… <https://gitlab.com/sislab/multi-source-multi-domain-sentiment-analysis-with…>
The related paper will be presented at LREC 2022 conference.
The paper is titled as "Multi-source Multi-domain Sentiment Analysis with BERT-based Models" and it is available here <http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.62.pdf>.
Best Regards
----
Prof. Dr.-Ing. Giuseppe Riccardi
Founder and Director of the Signals and Interactive Systems Lab
Department of the Department of Computer Science and Engineering Department
University of Trento