We are proud to announce the release of a new version of BabelNet
<https://babelnet.org/> and its APIs, *both Java and the brand-new Python
version*, developed jointly by the Sapienza NLP Group
<http://nlp.uniroma1.it> of the *Sapienza University of Rome* under the
supervision of prof. Roberto Navigli <https://www.diag.uniroma1.it/navigli/>
and Babelscape <http://babelscape.com/>, *a successful deep-tech
multilingual NLP Company* providing innovative solutions for multilingual
NLP.
BabelNet -- winner of the *prominent paper award 2017* from the Artificial
Intelligence Journal and the META prize 2015, and covered in media such as The
Guardian
<https://www.theguardian.com/news/2018/feb/23/oxford-english-dictionary-can-…>
and Time magazine
<http://wwwusers.di.uniroma1.it/~navigli/img/Redefining_the_modern_dictionar…>
-- is today’s *most far-reaching multilingual resource* which, according to
need, can be used as an *encyclopedic dictionary*, or a *semantic network*
or a huge *knowledge base/ontology*. It has been used by more than *1000
universities and research institutions*, enabling multilinguality in
several fields of AI and NLP, such as semantic search, Word Sense
Disambiguation, Semantic Role Labeling and image tagging.
BabelNet was created by means of the seamless integration and interlinking
of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the
most popular computational lexicon of English - i.e., WordNet, and other
lexical resources such as Wiktionary, OmegaWiki, Wikidata, dozens of
wordnets, Wikiquote, GeoNames, and ImageNet. BabelNet provides *multilingual
synsets*, i.e., concepts and named entities lexicalized in many languages,
and connected with large amounts of semantic relations.
*Version 5.1* comes with the following features:
- *500 languages* and *22 million synsets* covered;
- *53 resources *linked and integrated;
- *Wikipedia* and *Wikidata* updated thanks to *BabelNet live*;
- *Open English WordNet* has been updated to version 2021;
- Added *Q-codes* identifiers (e.g.
https://www.hetop.eu/hetop/3CGP/?la=en&rr=CGP_QC_QD8);
- Added *string tags *from *Wikipedia labels*;
- *French wordnets cleaned up* by removing most potentially incorrect
translations;
- *Italian wordnet definitions *cleaned up;
- *General data cleanup* (glosses, senses, Named Entity vs. concept
labels);
- *Lemma casing corrected in 24 languages* (English, Italian, Spanish,
German, French, Dutch, Polish, Portuguese, Russian, Bulgarian, Czech,
Danish, Greek, Estonian, Finnish, Croatian, Hungarian, Lithuanian, Latvian,
Maltese, Romanian, Slovak, Slovenian, Swedish).
More statistics are available at: babelnet.org/statistics.
Kind regards,
The BabelNet team
--
==============================================
Roberto Navigli* - Professor*
Department of Computer, Control and Management Engineering
Sapienza University of Rome
Via Ariosto, 25
00185 Roma Italy
Phone: +39 06 77274109
Home Page: https://www.diag.uniroma1.it/navigli/
Sapienza NLP Group: http://nlp.uniroma1.it
Co-founder of Babelscape <https://babelscape.com>
==============================================
Dear corpora subscribers:
At NLPgo (https://www.nlpgo.com) we are organizing an online workshop for linguists titled "Learning to use the Terminal", which may be of interest to you.
> What is the Terminal?
> It is the usual black or white background application available in any operating system which allows the users to run commands. It is also called the command interpreter.
> What it is useful for?
> It is useful for many different things, among them making different kinds of transformations on files and, therefore, it allows us to make some interesting corpus calculations, which would otherwise be very difficult to make.
> Workshop content
> 1. Preparation: installing the necessary tools and setup the working environment.
> 2. Basic concepts: file and folder structure, file types, character encodings (types, differences and compatibility problems).
> 3. Basic commands: show files available in a folder, change current folder, show file contents, copy and move files, column extraction and reordering, result sorting, etc.
> 4. Advanced commands and transformations: standard input/output/error, command chaining, finding file names, text finding and replacing, applying commands to several files at a time.
> 5. Regular expressions: advaced text search and replacement techniques
> 6. Corpus specific tasks: Working with data from spreadsheets, texts (orthographic words) and Part-Of-Speech tagged texts (grammatical elements).
More details about the workshop are available on our web page and, specifically, on the workshop specific one: https://www.nlpgo.com/teaching/terminal
As you can see, it will be held from 18th to 22th of July (in Spanish language, Spain timezone).
If you want to register in the workshop you can do it from the workshop web page I have included before. In addition, if you are interested in receiving information about this kind of workshop we organize and other useful information, you can subscribe to our newsletter here: hhttps://www.palabrasbinarias.com/subscribe.
If you have any question don't hesitate to write us through the contact form available at the workshop web page.
Thank you for you attention.
Best regards,
Mario Barcala
--
Mario Barcala
CEO at NLPgo
http://www.nlpgo.com
GPG key id: F1C15EB7
The University of Klagenfurt is pleased to announce the following open position at the Digital Age Research Center (D!ARC), employment to commence as soon as possible:
*PreDoc Scientist (in German: Universitätsassistent*in) (Doctoral Candidate) (all genders welcome)*
The Digital Age Research Center (D!ARC), founded in 2019 as an inter-faculty university centre, aims to shed light not only on the technological but also on the economic, legal, social, individual, behavioural and cultural aspects of the digital revolution. Over the next few years, it is set to develop a corresponding profile in research with a European claim to excellence as well as modules for the range of courses offered at the University of Klagenfurt.
Level of employment: 75 % (30 hours/week)
Minimum salary: € 32.116,-- per annum (gross); classification according to collective agreement: B1
Limited to: 4 years
Application deadline: July 27, 2022
Reference code: 191/22
Tasks and responsibilities:
• Contributing to research in a project in the area of computational linguistics and digital humanities
• Participation in research activities of D!ARC, which offers unique opportunities for interdisciplinary collaboration
• Teaching courses on computational linguistic topics
Prerequisites for the appointment:
• A Master’s degree completed at a domestic or foreign higher education institution in the field of computational linguistics, linguistics, computer science or alike, graded with success and corresponding knowledge in the field
• Good programming skills (preferably Python)
• Proven expertise in:
• Natural language processing
• Digital humanities
• Linguistics
• Machine learning (particularly deep learning)
• Fluent in English and German, both spoken and written
Additional desired qualifications:
• Experience with web crawling and processing large amounts of textual data
• Instructing and supervising linguistic annotation
• Publications at scientific conferences and in journals in the field relating to the position
• Profound knowledge of publicly available tools and resources for natural language processing
• Experience in teaching at a university
• Experience in working in interdisciplinary research projects
• Social and communication skills, ability to work independently
Our offer:
The employment contract is concluded with a starting salary of € 2.294,-- gross per month (14 times a year; previous experience deemed relevant to the job can be recognised in accordance with the https://jobs.aau.at/en/faq/). The University of Klagenfurt also offers:
• Personal and professional advanced training courses, management and career coaching
• Numerous attractive additional benefits, see also https://jobs.aau.at/en/the-university-as-employer/
• Diversity- and family-friendly university culture
• The opportunity to live and work in the attractive Alps-Adriatic region with a wide range of leisure activities in the spheres of culture, nature and sports
The application:
If you are interested in this position, please apply in German or English providing the usual documents:
• Letter of application
• Curriculum vitae
• Proof of all completed higher education programmes (certificates, supplements, if applicable)
• Other documentary evidence that may be relevant to this announcement (see prerequisites and desired qualifications)
The position is solely intended for the completion of a Doctorate. Applicants with a Doctorate or Ph.D. already completed in a related discipline are therefore ineligible for this position.
To apply, please select the position with the reference code 191/22 in the category “Scientific Staff” using the link “Apply for this position” in the job portal at http://jobs.aau.at/en/
Candidates must furnish proof that they meet the required qualifications by July 27, 2022 at the latest.
For further information on this specific vacancy, please contact Michael Wiegand (mailto:michael.wiegand@aau.at). General information about the university as an employer can be found at http://www.aau.at/jobs/en/information. At the University of Klagenfurt, recruitment and staff matters are accompanied not only by the authority responsible for the recruitment procedure but also by the Equal Opportunities Working Group (https://www.aau.at/en/university/organisation/representations-commissioners…) and, if necessary, by the Representative for Disabled Persons (https://www.aau.at/en/university/organisation/administration-and-management…).
!!!!!!!!!! The deadline for submission has been extended to July 20th
!!!!!!!!!! Upload Submissions Now
https://cmt3.research.microsoft.com/AMTA2022
The First Workshop on Corpus Generation and Corpus Augmentation for
Machine Translation (CoCo4MT)
https://sites.google.com/view/coco4mt
@ AMTA – 2022
This 15th biennial conference of the Association for Machine Translation
in the Americas
12-16 September 2022, Orlando, Florida, USA
INVITED TALKS
Jörg Tiedemann University of Helsinki
Julia Kreutzer Google Research
Maria Nadejde Amazon
SCOPE
It is a well-known fact that machine translation systems, especially
those that use deep learning, require massive amounts of data. Several
resources for languages are not available in their human-created format.
Some of the types of resources available are monolingual, multilingual,
translation memories, and lexicons. Those types of resources are
generally created for formal purposes such as parliamentary collections
when parallel and more informal situations when monolingual. The quality
and abundance of resources including corpora used for formal reasons is
generally higher than those used for informal purposes. Additionally,
corpora for low-resource languages, languages with less digital
resources available, tends to be less abundant and of lower quality.
CoCo4MT sets out to be the first workshop centered around research that
focuses on corpora creation, cleansing, and augmentation techniques
specifically for machine translation. We accept work that covers any
spoken language (including high-resource languages) but we are
specifically interested in those submissions that are on languages with
limited existing resources (low-resource languages) where resources are
not highly available.
The goal of this workshop is to begin to close the gap between corpora
available for low-resource translation systems and promote high-quality
data for online systems that can be used by native speakers of
low-resource languages is of particular interest. Therefore, It will be
beneficial if the techniques presented in research papers include their
impact on the quality of MT output and how they can be used in the real
world.
CoCo4MT aims to encourage research on new and undiscovered techniques.
We hope that submissions will provide high-quality corpora that is
available publicly for download and can be used to increase machine
translation performance thus encouraging new dataset creation for
multiple languages that will, in turn, provide a general workshop to
consult for corpora needs in the future. The workshop’s success will be
measured by the following key performance indicators:
- Promotes the ongoing increase in quality of machine translation
systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research areas
to increase the availability of commonly used corpora and new corpora,
- Drives innovation to address the need for higher quality and abundance
of low-resource language data.
TOPICS
We are highly interested in original research papers on the topics
below; however, we welcome all novel ideas that cover research on
corpora techniques.
- Difficulties with using existing corpora (e.g., political
considerations or domain limitations) and their effects on final MT
systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for
training MT systems.
SUBMISSION INFORMATION
There is one type of submission in the workshop: Research, review and
position paper. The length of each paper should be at least four (4) and
not exceed ten (10) pages, plus unlimited pages for references.
Submissions should be formatted according to the official AMTA 2022
style templates (PDF, LaTeX, Word). Accepted papers will be published
on-line in the AMTA 2022 proceedings which includes the ACL Anthology
and will be presented at the conference either orally or as a poster.
Submissions must be anonymized and should be done using the official
conference management system
(https://cmt3.research.microsoft.com/AMTA2022). Scientific papers that
have been or will be submitted to other venues must be declared as such,
and must be withdrawn from the other venues if accepted and published at
CoCo4MT. The review will be double-blind.
We would like to encourage authors to cite papers written in ANY
language that are related to the topics, as long as both original
bibliographic items and their corresponding English translations are
provided.
Registration will be handled by the main conference. (To be announced)
IMPORTANT DATES
June 1, 2022 – Call for papers released
June 15, 2022 – Second call for papers
June 29, 2022 – Third and final call for papers
July 20, 2022 – Paper submissions due (updated extension!)
July 27, 2022 – Notification of acceptance
August 7, 2022 – Camera-ready due
August 31, 2022 – Video recordings due
September 16, 2022 - CoCo4MT workshop
CONTACT
CoCo4MT Workshop Organizers
coco4mt2022(a)googlegroups.com
ORGANIZING COMMITTEE (listed alphabetically)
Constantine Lignos Brandeis University
John E. Ortega New York University and University of Santiago de
Compostela (CITIUS)
Katharina Kann University of Colorado Boulder
Maja Popopvić ADAPT Centre at Dublin City University
Marine Carpuat University of Maryland
Shabnam Tafreshi University of Maryland
William Chen Carnegie Mellon University
PROGRAM COMMITTEE (listed alphabetically tentative)
Abteen Ebrahimi University of Colorado Boulder
Adelani David Saarland University
Ananya Ganesh University of Colorado Boulder
Alberto Poncelas ADAPT Centre at Dublin City University
Amirhossein Tebbifakhr University of Trento
Anna Currey Amazon
Arturo Oncevay University of Edinburgh
Atul Kr. Ojha National University of Ireland Galway
Bharathi Raja Chakravarthi National University of Ireland Galway
Beatrice Savoldi University of Trento
Bogdan Babych Heidelberg University
Briakou Eleftheria University of Maryland
Dossou Bonaventure Mila Quebec AI Institute
Duygu Ataman New York University
Eleni Metheniti Université Toulosse - Paul Sabatier
Francis Tyers Indiana University
Jasper Kyle Catapang University of Birmingham
John E. Ortega New York University and USC - CITIUS
José Ramom Pichel Campos Universidade de Santiago de Compostela - CITIUS
Kalika Bali Microsoft
Koel Dutta Chowdhury Saarland University
Liangyou Li Huawei
Manuel Mager University of Stuttgart
Maria Art Antonette Clariño University of the Philippines Los Baños
Mathias Müller University of Zurich
Nathaniel Oco De La Salle University
Niu Xing Amazon
Pablo Gamallo Universidade de Santiago de Compostela - CITIUS
Rodolfo Joel Zevallos Salazar Universitat Pompeu Fabra
Rico Sennrich University of Zurich
Sangjee Dondrub Qinghai Normal University
Santanu Pal Saarland University
Sardana Ivanova University of Helsinki
Shantipriya Parida Silo AI
Surafel Melaku Lakew Amazon
Tommi A Pirinen University of Tromsø
Valentin Malykh Moscow Institute of Physics and Technology
--
*Shabnam Tafreshi, PhD*
*Assistant Research Scientist*
*Computational Linguistics, NLP*
*UMD: ARLIS @ College Park*
*"All the problems of the world could be settled easily, if people only
willing to think."*
*-Thomas J. Watson*
We are offering a fully funded, industry-sponsored PhD scholarship on the topic of Language Models.
The selected candidate will have the opportunity to conduct research at the junction of industry and academia.
She/he will also be part of an exciting team of data scientists & PhD researchers from the corporate and from the academic world.
For more details, please see
https://www.akadeus.com/announcement,a7165.htmlhttps://www.digitallab.be/en/
Co-located with COLING 2022, at VarDial we anticipate discussion on computational methods and on language resources for closely related languages, language varieties and dialects. We plan to organize VarDial 2022 as a hybrid workshop with options for both on-site and remote participation. We accept paper submissions until July 22, 2022 (details below).
https://sites.google.com/view/vardial-2022
We welcome papers dealing with one or more of the following topics:
- Language resources and tools for similar languages, varieties and dialects;
- Adaptation of tools (taggers, parsers) for similar languages, varieties and dialects;
- Evaluation of language resources and tools when applied to language varieties;
- Reusability of language resources in NLP applications (e.g., for machine translation, POS tagging, syntactic parsing, etc.);
- Corpus-driven studies in dialectology and language variation;
- Computational approaches to the study of mutual intelligibility between dialects and similar languages;
- Automatic identification of lexical variation;
- Automatic classification of language varieties;
- Text similarity and adaptation between language varieties;
- Linguistic issues in the adaptation of language resources and tools (e.g., semantic discrepancies, lexical gaps, false friends);
- Machine translation between closely related languages, language varieties and dialects.
In addition to the topics listed above, we also welcome papers dealing with diachronic language variation (e.g. phylogenetic methods, historical dialects).
Instructions for Authors
Submissions should be formatted according to the COLING template and submitted in PDF format. The review process will be double-blind. More information on the website.
Important Dates
Submission deadline: EXTENDED TO JULY 22, 2022 (anywhere on earth)
Notification of acceptance: August 22, 2022
Camera-ready papers due: September 5, 2022
VarDial Workshop at COLING 2022: October 16, 2022
Organizers
Yves Scherrer - University of Helsinki (Finland)
Tommi Jauhiainen - University of Helsinki (Finland)
Nikola Ljubešić - Jožef Stefan Institute (Slovenia) and University of Zagreb (Croatia)
Preslav Nakov - Qatar Computing Research Institute, HBKU (Qatar)
Jörg Tiedemann - University of Helsinki (Finland)
Marcos Zampieri - Rochester Institute of Technology (USA)
Contact: yves.scherrer(a)helsinki.fi<mailto:yves.scherrer@helsinki.fi>
--- apologies for cross-postings ---
Dear colleagues,
We have an open position for a postdoctoral researcher on natural
language processing / information retrieval / machine learning (SCAI/BnF
research program)
Starting period: autumn 2022
Duration: 12-month postdoctoral contract, renewable)
Location: Sorbonne university (ISIR lab in the MLIA team) / DataLab of
the BNF
Supervision:
Laure Soulier, MCF in computer science at Sorbonne University, MLIA
team, ISIR.
Emmanuelle Bermès, Scientific and Technical Assistant to the Director of
Services and Networks at BnF.
Jean-Philippe Moreux, Scientific expert of Gallica at the BnF.
More info:
https://scai.sorbonne-universite.fr/public/news/view/27d72d260c950c8d66c6/1
_*Context*_
Gallica, the digital library of the BnF, contains nearly 10 million
digitized documents that are freely accessible online (18.5 million
visits per year). However, most users do not know that Gallica contains
not only printed documents, but also photographs, sound recordings,
videos, and 3D objects. In satisfaction surveys, only a minority of
users consider the search engine's answers to be relevant and a majority
would like to be better guided in their searches. A recommendation
system should be able to help users find their way through the mass of
collections and improve the visibility of the least known. In this
project, BnF is committed to adopting a resolutely ethical approach. The
exploitation of user logs must respect their privacy and guarantee both
the relevance and transparency of the algorithms, avoiding the risk of
filter bubbles. The interface design is also at the heart of the
approach: a trustworthy system relies on a good user experience and on
the diversity and relevance of the proposed recommendations. Three lines
of thought emerge:
1) based on the available data, including both user logs and collection
descriptions, how to develop predictive algorithms?
2) how to integrate diversity in the recommendation algorithm while
leaving the choice to the user to moderate his serendipity threshold?
3) how to build user trust in algorithm design and audit?
_*Main missions*_
This project consists in working on information access in the Gallica
library, from the point of view of machine and deep learning techniques.
The research axes concern (1) the analysis and indexing of textual
documents as well as (2) the analysis of user traces and (3)
recommendation systems. We are particularly interested in multimodal
techniques that allow contextualizing a document or a query based on
user interactions.
The successful candidate will be responsible for:
● Implementing models to learn the semantics of textual data for the
purpose of indexing them.
● Developing algorithms based on representation learning methodologies
to effectively blend text and user traces.
● Reporting and presenting development work in a clear and effective
manner, both for discussion with BnF experts and writing machine
learning publications.
The printed book collection will be the primary focus of the program
described above, but an extension to other collections with textual
descriptors (in particular iconographic collections) may be considered.
--
-------------
Laure Soulier
Maître de conférences
Equipe MLIA - Laboratoire ISIR - Sorbonne Université
Tour 26, Couloir 26-00, Bureau 515
(+33) 1 44 27 74 91
https://pages.isir.upmc.fr/soulier/
[Apologies for multiple postings]
*** SECOND CALL FOR PAPERS ***
EMNLP 2022 Workshop - The 13th International Workshop on Health Text Mining
and Information Analysis (LOUHI 2022)
https://louhi2022.fbk.eu/
Colocated with EMNLP 2022, Abu Dhabi (7 December 2022)
but also accessible online
*Please, note that this year we use both the ACL Rolling Review (ARR)
system and Softconf as paper submission platforms.*
ARR Submission deadline: July 15, 2022
Direct submission deadline: 7 September 2022
** Call for Papers **
The 13th International Workshop on Health Text Mining and Information
Analysis provides an interdisciplinary forum for researchers interested in
automated processing of health documents. Health documents encompass
textual content of electronic health records, clinical guidelines,
spontaneous reports for pharmacovigilance, biomedical literature, health
forums/blogs or any other type of health-related documents.
The LOUHI workshop series fosters interactions between the Computational
Linguistics, Medical Informatics, and Artificial Intelligence communities.
It started in 2008 in Turku, Finland and has been organized 12 times: LOUHI
2010 was co-located with NAACL in Los Angeles, CA; LOUHI 2011 was
co-located with Artificial Intelligence in Medicine (AIME) in Bled,
Slovenia; LOUHI 2013 was held in Sydney, Australia during NICTA Techfest;
LOUHI 2014 was co-located with EACL in Gothenburg, Sweden; LOUHI 2015 was
co-located with EMNLP in Lisbon, Portugal; LOUHI 2016 was co-located with
EMNLP in Austin, Texas; LOUHI 2017 was held in Sydney, Australia; LOUHI
2018 was co-located with EMNLP in Brussels, Belgium; LOUHI 2019 was
co-located with EMNLP-IJCNLP in Hong Kong; LOUHI 2020 was co-located with
EMNLP; and LOUHI 2021 was co-located with EACL.
LOUHI 2022 is soliciting papers describing original research. Papers must
describe substantial and completed work but could also focus on other
contributions, such as a negative result, a software package or work in
progress. The topics include, but are not limited to, the following
language processing techniques and related areas:
- Techniques supporting information extraction, e.g., named entity
recognition, negation and uncertainty detection
- Classification and text mining applications (e.g., diagnostic
classifications such as ICD-10 and nursing intensity scores) and problems
(e.g., handling of unbalanced data sets)
- Text representation, including dealing with issues of data sparsity and
dimensionality
- Domain adaptation, e.g., adaptation of standard NLP tools (incl.
tokenizers, PoS-taggers, etc) to the medical domain
- Information fusion, i.e. integrating data from various sources, e.g.
structured and narrative documentation
- Unsupervised methods, including distributional semantics
- Evaluation, gold/reference standard construction and annotation
- Syntactic, semantic and pragmatic analysis of health documents
- Anonymization / de-identification of health records and ethics
- Supporting the development of medical terminologies and ontologies
- Individualization of content, consumer health vocabularies, summarization
and simplification of text
- NLP for supporting documentation and decision making practices
- Predictive modeling of adverse events, e.g., adverse drug events and
hospital acquired infections
- Terminology and information model standards (SNOMED CT, FHIR) for health
text mining
- Bridging gaps between formal ontology and biomedical NLP
We welcome submissions on topics related to text mining of health
documents, particularly emphasizing multidisciplinary aspects of health
documentation and the interplay between nursing and medical sciences,
information systems, computational linguistics and computer science. We
also encourage submissions reporting work on low-resourced languages,
addressing the challenges of data sparsity and language characteristic
diversity.
** Important Dates **
ARR submission deadline: July 15, 2022 (via ARR)
Direct submission deadline: 7 September 2022
Notification to authors: October 9, 2022
Camera-ready papers due: October 16, 2022
Workshop: December 7, 2022
** Submission Instructions **
Submissions go through a double-blind review process, where each submission
is reviewed by three program committee members. Accepted papers will be
presented by the authors in a regular workshop session either as a talk or
a poster. All accepted papers will be published in the workshop proceedings.
The submissions should be in PDF format and anonymized for review. All
submissions must be written in English and follow the EMNLP 2022 style
guidelines: https://2022.emnlp.org/calls/style-and-formatting/
* Long paper submission: up to 8 pages of content, plus 2 pages for
references; final versions of long papers: one additional page (so that
reviewers' comments can be taken into account): up to 9 pages with
unlimited pages for references
* Short paper submission: up to 4 pages of content, plus 2 pages for
references; final version of short papers: up to 5 pages with unlimited
pages for references
LOUHI 2022 will accept electronic submission both via ARR and Softconf.
** Invited Speaker **
TO BE ANNOUNCED
Workshop Organizers:
Alberto Lavelli (FBK, Trento, Italy)
James Pustejovsky (Brandeis University, USA)
Eben Holderness (Brandeis University, USA)
Antonio Jimeno Yepes (RMIT University, Australia)
Anne-Lyse Minard (University of Orleans, LLL CNRS, France)
Fabio Rinaldi (Dalle Molle Institute for Artificial Intelligence Research -
IDSIA, Switzerland & FBK, Trento, Italy)
** Programme Committee **
TO BE ANNOUNCED
--
--
Le informazioni contenute nella presente comunicazione sono di natura
privata e come tali sono da considerarsi riservate ed indirizzate
esclusivamente ai destinatari indicati e per le finalità strettamente
legate al relativo contenuto. Se avete ricevuto questo messaggio per
errore, vi preghiamo di eliminarlo e di inviare una comunicazione
all’indirizzo e-mail del mittente.
--
The information transmitted is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. If you received this in
error, please contact the sender and delete the material.
https://sites.google.com/view/figlang2022/shared-tasks?authuser=0
*Euphemism Detection Shared Task*
Euphemisms are mild or indirect expressions used in place of harsher or
more offensive ones. Euphemisms are often used to mask profanity or refer
to taboo topics such as death, disability, sex, religion or personal
relationships in a polite way. Euphemisms are often ambiguous: their
literal and non-literal interpretation is context-dependent:
Asked to choose *between jobs* and the environment, a majority -- at least
in our warped, first-past-the-post system -- will pick jobs.
[non-euphemistic]
vs.
This summer, the budding talent agent was *between jobs* and free to
babysit pretty much any time. [euphemistic]
The state of the art language models perform well on many major NLP
benchmarks; however, it is unclear how such models perform on euphemisms.
Thus, we propose a euphemism detection task: given an input sentence,
identify whether the sentence contains a euphemism.
For more information about the shared task and to participate visit
https://codalab.lisn.upsaclay.fr/competitions/5726
<https://www.google.com/url?q=https%3A%2F%2Fnam10.safelinks.protection.outlo…>
.
*Important dates:*
-
July 5, 2022: CodaLab competition is open; training data can be
downloaded
-
Aug 5, 2022: Test data can be downloaded and results submitted;
performance will be tracked on CodaLab dashboard
-
Aug 20, 2022: Last day for submitting predictions on test data
-
Sept 7, 2022: Papers describing the systems are due
-
Oct 9, 2022: Notification of acceptance
-
TBD, 2022: Camera-ready papers due
-
December 7 or 8, 2022: Workshop
Hello,
Could you please distribute the following job offer? Thanks.
Best,
Pascal
-------------------------------------------------------------------------------------
3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France)
We invite applications for a 3-year PhD position at the University of
Lille in the context of the recently funded research project
"COMANCHE" (Computational Models of Lexical Meaning and Change). The
position is funded by Inria, the French national research institute in
Computer Science and Applied Mathematics.
COMANCHE proposes to transfer and adapt neural word embeddings
algorithms to model the acquisition and evolution of word meaning, by
comparing them with linguistic theories on language acquisition and
language evolution. At the intersection between Natural Language
Processing, psycholinguistics and historical linguistics, this project
intends to validate or revise some of these theories, while also
developing computational models that are less data hungry and
computationally intensive as they exploit new inductive biases
inspired by these disciplines.
The first strand of the project, on which the successful candidate
will work, focuses on the development of computational models of
semantic memory and its acquisition. Two main research directions will
be pursued. On the one hand, we will compare the structural properties
associated to different semantic spaces derived from word embedding
algorithms to those found in human semantic memory as reflected in
behavioral data (such as typicality norms) as well as brain imaging
data. The latter data will then used as additional supervision to
inject more hierarchical structure into the learned semantic
spaces. One the other hand, we intend to experiment with training
regimes for word embedding algorithms that are closer to those of
humans when they acquire language, controlling the quantity as well as
the linguistic complexity of the inputs fed to the learning algorithms
through the use of longitudinal and child directed speech corpora
(e.g., CHILDES, Colaje). In both cases, both English and French data
will be considered.
The successful candidate holds a Master's degree in computational
linguistics or computer science or cognitive science and has prior
experience in word embedding models. Furthermore, the candidate will
provide strong programming skills, expertise in machine learning
approaches and is eager to work across languages.
The position is affiliated with the MAGNET team at Inria, Lille [1] as
well as with the SCALAB group at University of Lille [2] in an effort
to strenghten collaborations between these two groups, and ultimately
foster cross-fertilizations between Natural Language Processing and
Psycholinguistics.
Applications will be considered until the position is filled. However,
you are encouraged to apply early as we shall start processing the
applications as and when they are received. Applications, written in
English or French, should include a brief cover letter with research
interests and vision, a CV (including your contact address, work
experience, publications), and contact information for at least 2
referees. Applications (and questions) should be sent to Angèle
Brunellière (angele.brunelliere(a)univ-lille.fr) and Pascal Denis
(pascal.denis(a)inria.fr).
The starting date of the position is 1 October 2022 or soon
thereafter, for a total of 3 full years.
Best regards,
Angèle Brunellière and Pascal Denis
[1] https://team.inria.fr/magnet/
[2] https://scalab.univ-lille.fr/
--
Pascal
----
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.
----
+++++++++++++++++++++++++++++++++++++++++++++++
Pascal Denis
Equipe MAGNET, INRIA Lille Nord Europe
Bâtiment B, Avenue Heloïse
Parc scientifique de la Haute Borne
59650 Villeneuve d'Ascq
Tel: ++33 3 59 35 87 24
Url: http://researchers.lille.inria.fr/~pdenis/
+++++++++++++++++++++++++++++++++++++++++++++++