The 8th Workshop on Noisy User-generated Text (WNUT @COLING 2022)
The WNUT Workshop will be collocated with COLING 2022 (Hybrid - Gyeongju, Republic of Korea). The website for the workshop is at:
http://noisy-text.github.io/<https://urldefense.com/v3/__http://noisy-text.github.io/__;!!KGKeukY!jkgFYC…>
The WNUT workshop focuses on Natural Language Processing applied to noisy user-generated text, such as that found in social media, online reviews, crowdsourced data, web forums, clinical records, and language learner essays.
We seek submissions of long and short papers on original and unpublished work (same format and page limit as COLING main conference). All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally. We have Best Paper Awards sponsored by Megagon Labs this year.
Topics of interest include but are not limited to:
* NLP Preprocessing of Noisy Text
- Part of speech tagging
- Named entity tagging, including a wide range of categories, e.g. product names
- Chunking of user-generated text
- Parsing
* Text Normalization and Error Correction
- Normalizing noisy text for downstream tasks and for human readability
- Error detection and correction
* Robustness to Noise, both Natural and Adversarial
* Multilingual NLP in noisy text
* Machine Translation of Noisy Text
* Sentiment analysis
* Crowdsourcing of text data
* User prediction, e.g. gender, age, etc
* Stylistics, e.g. formality, politeness, etc
* Colloquial language, e.g. code-switching, idiom detection
* Bilingual translation of the noisy text
* Paraphrase identification and semantic similarity of short text or noisy text
* Information extraction from noisy text
* Domain adaptation to user-generated text
* Geolocation prediction
* Global and regional trend detection and event extraction
* Detecting rumors, contradictory information, sarcasm, and humor on social media
* Extracting user demographics, profiles, and major life events
* Temporal aspects of user-generated content (resolving time expressions, concept drift, diachronic analyses, etc...)
= IMPORTANT DATES =
* August 19, 2022: Submission Deadline (dual-submission w/ COLING main conference allowed)
* September 7, 2022: Acceptance Notification
* September 14, 2022: Camera-ready Deadline
* October 17, 2022: Workshop Day
= ORGANIZERS =
Tim Baldwin (University of Melbourne)
Afshin Rahimi (University of Queensland)
Wei Xu (Georgia Institute of Technology)
Alan Ritter (Georgia Institute of Technology)
= SUBMISSION =
Formatting should be according to COLING 2022 specifications.
Dual submission is allowed but must state at the time of submission.
Please submit through the START system at the following URL:
https://www.softconf.com/coling2022/W-NUT_2022
Our team (cocodev.fr) at Aix-Marseille University offers a fully-funded
Ph.D. research position (with no teaching duties) in the framework of the
ANR grant MACoMiC (Mastering the Art of Conversation in Middle Childhood).
The broad goal of the PhD researcher is to lead the development of deep
learning models of child-parent multimodal communication, across several
cultures, using data of face-to-face conversations recorded using portable
eye-tracking systems and zoom calls.
We are interested in studying the development of various conversational
skills including mechanisms of building shared understanding, multimodal
synchrony/alignment, and discourse coherence/contingency.
We are also interested in the application of this research both to help
design more effective clinical interventions (for children with
communicative difficulties) and to build child-oriented conversational AI.
The selected candidate can focus on one or several of these dimensions,
defining a personalized research program together with the main advisor.
The PhD researcher will be integrated into a supportive and highly
interdisciplinary team of senior and early career researchers in computer
science (with expertise in conversational AI), developmental psychology,
and neuro-linguistics. They will be located at the Department of Computer
science of Aix-Marseille University and part of the Institute of Language
Communication and the Brain (ILCB.fr) <https://www.ilcb.fr/>.
Additionally, the PhD researcher will have the opportunity to
interact/collaborate
with CoCoDev’s internal network, especially researchers from the Dialog
Modelling Group (University of Amsterdam), the Interacting Minds Center (The
University of Aarhus), and the Multimodal Language and Cognition group (Max
Plank Institute of Psycholinguistics).
Requirement
-
-The ideal candidate for this position should have a strong
background/training in computer science and experience with deep-learning
modeling.
-
-Interest in cognitive science (though no prior experience is required).
-
-Good mastery of English
-
Key dates
Open until filled.
Please send (as soon as possible for full consideration):
1) a CV
2) A recent transcript (a university document with courses taken and grades)
2) Contact info of one reference (ideally a research supervisor)
3) (Optional) Evidence of prior experience with deep-learning modeling (a
publication, dissertation, code on GitHub, etc.)
*Latest starting date:* October 1st, 2022
Inquiries
All kinds of inquiries (about the scientific project, the university, life
in Marseille, etc) as well as the application documents should be addressed
to Abdellah Fourtassi (abdellah.fourtassi(a)univ-amu.fr)
--
Abdellah Fourtassi
Assistant Professor
Department of Computer Science
Institute of Language, Communication, and the Brain
Aix-Marseille University, France
https://sites.google.com/site/fourtassi/
***2nd SummDial: A SemDial 2022 <https://semdial2022.github.io/#> Special
Session on Summarization of Dialogues and Multi-Party Meetings***
***Website: https://elitr.github.io/automatic-minuting/summdial-2022.html
***
***Submission Deadline: August 1, 2022 ***
***Event Date: August 24, 2022 ***
With a sizeable working population of the world going virtual, resulting in
information overload from multiple online meetings, imagine how convenient
it would be to just hover over past calendar invites and get concise
summaries of the meeting proceedings? How about automatically minuting a
multimodal multi-party meeting? Are minutes and multi-party dialogue
summaries the same? We believe Automatic Minuting is challenging. There are
possibly no agreed-upon guidelines for taking minutes, and people adopt
different styles to record meeting minutes. The minutes also depend on the
meeting's category, the intended audience, and the goal or objective of the
meeting. We hosted the First SummDial Special Session at SIGDial 2021.
Several significant problems and challenges in multi-party dialogue and
meeting summarization came from the discussions in the first SummDial,
which we documented in our event report
<https://dl.acm.org/doi/10.1145/3527546.3527561>.
Since we witnessed enthusiastic participation of the dialogue and
summarization community in the first SummDial special session
<https://elitr.github.io/automatic-minuting/summdial.html> (
https://elitr.github.io/automatic-minuting/summdial.html), we are hosting
the Second SummDial special session at SemDial 2022
<https://semdial2022.github.io/#> (https://semdial2022.github.io/#). This
year, we intend to continue discussing these challenges and lessons learned
from the previous SummDial. Our goal for this special session would be to
stimulate intense discussions around this topic and set the tone for
further interest, research, and collaboration in both Speech and Natural
Language Processing communities. Our topics of interest are Dialogue
Summarization, including but not limited to Meeting Summarization, Chat
Summarization, Email Threads Summarization, Customer Service Summarization,
Medical Dialogue Summarziation, and Multi-modal Dialogue Summarization. Our
shared task on Automatic Minuting (AutoMin) at Interspeech 2021 was another
community effort in this direction. Our shared task on Automatic Minuting
(AutoMin) <https://elitr.github.io/automatic-minuting/> at Interspeech 2021
<https://www.interspeech2021.org/> was another community effort in this
direction.
***Call for papers***
We invite regular and work-in-progress papers that report:
-
Current research in multi-party dialogue summarization for summarizing
meetings, spoken dialogue, using speech, text, or multi-modal data (audio,
video),
-
Challenges in dialogue summarization evaluation (manual + automatic),
-
New methods and metrics for dialogue summarization evaluation,
-
Relevant corpus collection, pre-processing, development, and ethical
issues involved,
-
Compare and contrast speech-specific systems to systems imported from
text summarization,
-
Tools for meeting transcript generation and automatic summarization,
-
Topic detection and span identification in meeting transcripts for
multi-topic summarization,
-
Position papers to reflect on the current state of the art in this
topic, to take stock of where we have been, where we are, where we are
going and where we should go.
Researchers may choose to submit:
-
***Long papers*** Authors should submit an anonymous paper of at most 8
pages of content (up to 2 additional pages are allowed for references).
-
***Short papers*** Authors should submit a non-anonymized paper of at
most 2 pages of content (up to 1 additional page allowed for references).
Submissions to this track can be non-archival on request.
-
***Position Papers*** Including extended abstracts, work-in-progress,
and late-breaking papers.
***Submission Link***
https://easychair.org/my/conference?conf=summdial2022
Submissions should follow the ACL format. Papers that have been or will be
submitted to other meetings or publications must provide this information
using a footnote on the title page of the submissions. SummDial 2022 cannot
accept work for a publication that will be (or has been) published
elsewhere.
***Special Session Program***
The special session would consist of a keynote, a panel, oral and/or poster
paper presentations.
***Organizers***
-
Tirthankar Ghosal <https://elitr.eu/tirthankar-ghosal/>, Institute of
Formal and Applied Linguistics, Charles University, Czech Republic
-
Muskaan Singh, IDIAP, Switzerland
-
Xinnou Xu, University of Edinburgh, UK
- Ondřej Bojar <https://ufal.mff.cuni.cz/ondrej-bojar>, Institute of
Formal and Applied Linguistics, Charles University, Czech Republic
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)***
***Website: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks ***
***Twitter: https://twitter.com/wiesp_nlp ***
A good amount of astrophysics research makes use of data coming from
missions and facilities such as ground observatories in remote locations or
space telescopes, as well as digital archives that hold large amounts of
observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which,
unfortunately, can be easily confused when searching for them in the
literature via simple string matching. For instance, Planck can refer to
the person, the mission, the constant, or several institutions.
Automatically recognizing entities such as missions or facilities would
help tackle this word sense disambiguation problem.
The shared task consists of Named Entity recognition (NER) on samples of
text extracted from astrophysics publications. The labels were created by
domain experts and designed to identify entities of interest to the
astrophysics community. They range from simple to detect (ex: URLs) to
highly unstructured (ex: Formula), and from useful to researchers (ex:
Telescope) to more useful to archivists and administrators (ex: Grant).
Overall 31 different labels are included, and their distribution is highly
unbalanced (ex: ~100x more Citations than Proposals). Submissions will be
scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity
level, and scikit-learn's Matthews correlation coefficient method at the
token level. We also encourage authors to propose their own evaluation
metrics. A sample dataset and more instructions can be found at:
https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks
Participants (individuals or groups) will have the opportunity to present
their findings during the workshop and write a short paper. The best
performant or interesting approaches might be invited to further
collaborate with the NASA Astrophysics Data System (
https://ui.adsabs.harvard.edu/).
The DEAL shared task is a part of the *1st Workshop on Information
Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022: *
https://ui.adsabs.harvard.edu/WIESP/2022/
***Please fill in this form to report your intention to participate in the
shared task***
https://forms.office.com/r/KKpeKJBLy3
***Shared Task Submission***
Link to data and scoring scripts:
https://huggingface.co/datasets/fgrezes/WIESP2022-NER
CodaLab Link to the online competition :
https://codalab.lisn.upsaclay.fr/competitions/5062
***Important Dates***
-
Training+Validation Data Release: June 1, 2022
-
Validation Phase: June 1 - July 31, 2022
-
Test Data Release: August 1, 2022
-
Final Scoring Period: August 1 - August 10, 2022
-
System Report Submission: August 25, 2022
-
Notification: September 25, 2022
-
Camera-ready Submission Deadline: October 10, 2022
-
Event Date: November 20, 2022 (online)
***All submission deadlines are 11.59 pm UTC -12h (“Anywhere on Earth”)***
***Organizers***
-
Tirthankar Ghosal <https://elitr.eu/tirthankar-ghosal>, Charles
University, CZ
-
Sergi Blanco-Cuaresma <https://www.blancocuaresma.com/s/>, Center for
Astrophysics | Harvard & Smithsonian, USA
-
Alberto Accomazzi
<https://ui.adsabs.harvard.edu/about/team/team/aaccomazzi.html>, Center
for Astrophysics | Harvard & Smithsonian, USA
-
Robert M. Patton <https://www.ornl.gov/staff-profile/robert-m-patton>,
Oak Ridge National Laboratory, USA
-
Felix Grezes <https://ui.adsabs.harvard.edu/about/team/team/fgrezes.html>,
Center for Astrophysics | Harvard & Smithsonian, USA
-
Thomas Allen <https://ui.adsabs.harvard.edu/about/team/team/tallen.html>,
Center for Astrophysics | Harvard & Smithsonian, USA
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
***Call for Participation***
***First Shared Task on Multi-Perspective Scientific Document Summarization
(MuP)***
Website: https://github.com/allenai/mup
Generating summaries of scientific documents is known to be a challenging
task. The majority of existing work in summarization assumes only one
single best gold summary for each given document. Having only one gold
summary negatively impacts our ability to evaluate the quality of
summarization systems, as writing summaries is a subjective activity. At
the same time, annotating multiple gold summaries for scientific documents
can be extremely expensive as it requires domain experts to read and
understand long scientific documents. This shared task will enable
exploring methods for generating multi-perspective summaries. We introduce
a novel summarization corpus, leveraging data from scientific peer reviews
to capture diverse perspectives from the reader's point of view (each paper
has multiple summaries reflecting multiple perspectives of the reader).
The MuP shared task is a part of the 3rd Scholarly Document Processing
(SDP) workshop at COLING 2022. https://sdproc.org/2022/
More details on the shared task and the corresponding dataset can be found
on: https://github.com/allenai/mup
****Please fill in this form to participate in the shared task*** *
https://forms.gle/K2UECKvmghzDHUpo7
The leaderboard for the shared task will be announced soon on the website.
Shared Task Timelines
Training Data Release: May 10, 2022
Test Data Release: June 30, 2022
Evaluation Period: July 1 - July 15, 2022
System Description Papers Due: August 1, 2022
Reviews Notification: August 15, 2022
Camera-Ready Papers Due: September 5, 2022
Event at SDP @ COLING 2022: October 16/17, 2022
MuP 2022 Organizers
1.
Guy Feigenblat - Piiano, Israel
2.
Arman Cohan - AI2, US
3.
Tirthankar Ghosal - ÚFAL, Charles University, Czechia
4.
Michal Shmueli-Scheuer - IBM Research AI, Israel
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Is anyone aware of metadata for the BNC 2014 *Written* corpus -- source,
date, # words, (sub)genre, etc for each of the ~88,000 texts?
I've contacted the BNC people, but no response.
Thanks,
Mark Davies
============================================
Mark Davies
english-corpora.orgmark-davies.org
============================================
In our newly established Research Training Group
Dimensions of Constructional Space
we're offering
13 PhD positions (65%, 3 years)
on a wide range of topics connected to Construction Grammar as a common theoretical core, and
1 postdoc position (100%, 4.5 years) on developing a multilingual research constructicon
to integrate results obtained in the PhD projects and create a new model for linguistic research documentation.
You can apply for one of the 13 PhD projects offered or for the postdoc position, including a motivation letter that explains why you're interested in, and qualified for this particular position.
Application deadline: 10 July 2022
More information is available online:
Call for applications – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group…
Project descriptions – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group…
Homepage of the RTG – https://www.linguistics.phil.fau.eu/fau-linguistics/research-training-group…
Full details – https://www.linguistics.phil.fau.eu/files/2022/05/rtg-dimensions-of-constru…
Please share this call with anyone who might be interested!
Best wishes,
Stephanie
--
Prof. Stephanie Evert
Chair of Computational Corpus Linguistics
Friedrich-Alexander-Universität Erlangen-Nürnberg
Bismarckstr. 6, 91054 Erlangen, Germany
office: Bismarckstr. 6, room 4.000
phone: +49 9131 8522426
e-mail: stephanie.evert(a)fau.de
web: www.linguistik.fau.de
*For this iteration of the shared task, we especially encourage those who
participated or have trained models on TRAC - 2018 and /or TRAC - 2020
Shared Task datasets to submit the predictions of their earlier models on
our current test set. They are, of course, free to submit predictions on
new models / current datasets as well.*
*3rd Workshop on Threat, Aggression and Cyberbullying (TRAC - 2022)*
>
> &
> *Shared Tasks on Bias, Threat and Aggression Identification in Context*
> Co-located with COLING 2022, October 12 - 17, 2022
> Gyeongju, the Republic of Korea
>
>
> *Second Call for Papers and Shared Task Participation*
>
> *Workshop Website*: https://sites.google.com/view/trac2022/home
> *Paper Submission*: https://www.softconf.com/coling2022/TRAC-2022/
> *Shared Task Website:* https://codalab.lisn.upsaclay.fr/competitions/4753
>
> *Submission Deadline*: July 11, 2022 (Regular) / July 31, 2022 (ACL ARR)
>
> As in the earlier editions of the workshop, TRAC-2022 will focus on the
> applications of NLP, ML and pragmatic studies on aggression and
> impoliteness to tackle these issues. We invite *long (8 pages)* and *short
> papers (4 pages)* as well as *position papers* and opinion pieces (5 - 20
> pages), *demo proposals* and *non-archival extended abstracts* (2 pages)
> based on, but not limited to, any of the following themes from academic
> researchers, industry and any other group / team working in the area.
>
> - Theories and models of aggression and conflict in language.
> - Cyberbullying, threatening, hateful, aggressive and abusive language
> on the web.
> - Multilingualism and aggression.
> - Resource Development - Corpora, Annotation Guidelines and Best
> Practices for threat and aggression detection.
> - Computational Models and Methods for aggression, hate speech and
> offensive language detection in text and speech.
> - Detection of threats and bullying on the web.
> - Automatic censorship and moderation: ethical, legal and
> technological issues and challenges.
>
>
> *Shared Tasks*
> TRAC-2022 will include two novel shared tasks:
>
> *Task 1: Bias, Threat and Aggression Identification in Context*
> The first shared task will be a structured prediction task for recognising
> (a) Aggression, Gender Bias, Racial Bias, Religious Intolerance and Bias
> and Casteist Bias on social media and (b) the "discursive role" of a given
> comment in the context of the previous comment(s). The participants will be
> given a "thread" of comments with information about the presence of
> different kinds of biases and threats (viz. gender bias, gendered threat
> and none, etc) and its discursive relationship to the previous comment as
> well as the original post (viz. attack, abet, defend, counter-speech and
> gaslighting). In a series / thread of comments, participants will be
> required to predict the presence of aggression and bias of each comment,
> possibly making use of the context.
>
> *Task 2: Generalising across domains - COVID-19*
> For this sub-task, the test set will be sampled from the COVID-19 related
> conversation, annotated with levels of aggression, offensiveness and hate
> speech. Across the globe, during the pandemic, we have seen various kinds
> of novel aggressive and biased conversation on social media - in fact, in
> some cases there was massive escalation of religious and other kinds of
> intolerance and polarisation. The participants of TRAC-1 and TRAC-2 shared
> tasks are especially encouraged to submit the predictions their their
> earlier models on this test set. They may also train new models jointly on
> both the datasets. Those who didn't participate in earlier tasks are also
> invited to submit the predictions for this task by training models on the
> two datasets and are encouraged to submit the predictions on the respective
> test sets of the earlier tasks along with the predictions on the current
> dataset (to enable comparison). New participants may also use TRAC-1 or
> TRAC-2 dataset or a combination of the two for building the models. The aim
> of the task is to evaluate the generalisability of our systems in
> unexpected and novel situations.
>
> For participation, visit the Codalab website -
> https://codalab.lisn.upsaclay.fr/competitions/4753
>
> For any clarifications, contact coling.aggression(a)gmail.com.
>
> Looking forward to your participation!
>
>
Multiple CSIRO Early Research Career Postdoctoral Fellowships are available in Natural Language Processing.
CSIRO Data61 is looking for multiple CERC Fellows to join an NLP team of researchers and engineers. Relevant NLP research areas: information extraction, text summarization, question answering, semantic parsing, semantic role labelling, paraphrase detection and generation, and NLP for Information Retrieval.
About the CSIRO Postdoctoral Fellowship program:
CSIRO Early Research Career (CERC) Postdoctoral Fellowships provide opportunities to scientists and engineers who have completed their doctorate and have less than three years of relevant postdoctoral work experience. These fellowships aim to develop the next generation of future leaders of the innovation system.
Location: Sydney, NSW
Salary: AU$89k - AU$98k plus up to 15.4% superannuation
Tenure: Specified term of 3 years
Reference: 77986
Applications close: 7 July 2022
To be considered you will need:
* A doctorate (or will shortly satisfy the requirements of a PhD) in a relevant discipline area, such as Computer Science (Natural Language Processing/Computational Linguistics or Machine Learning with text data).
* Experience using deep learning and other machine learning techniques in NLP.
* High-level written and oral communication skills with the ability to represent the research team effectively internally and externally, including the presentation of research outcomes at national and international conferences.
* A sound history of publication in peer-reviewed journals and/or conferences.
For more information or to apply, please visit: https://jobs.csiro.au/job-invite/77986/
Hello,
I'm looking for any open source or cloud-hosted solution for complex word identification or word difficulty rating in French for a reading application.
As a backup plan we can use measures like corpus frequency, length, number of senses, but we're hoping someone has already made a tool available.
We found this but that's it: https://github.com/sheffieldnlp/cwi
Would appreciate any tips!
Thanks,
Chris
Christopher Collins [he/him<https://medium.com/gender-inclusivit/why-i-put-pronouns-on-my-email-signatu…>]
Associate Professor - Faculty of Science
Canada Research Chair in Linguistic Information Visualization
Ontario Tech University
vialab.ca<http://vialab.ca/>