https://sites.google.com/view/figlang2022/shared-tasks?authuser=0
Euphemism Detection Shared Task
Euphemisms are mild or indirect expressions used in place of harsher or
more offensive ones. Euphemisms are often used to mask profanity or refer
to taboo topics such as death, disability, sex, religion or personal
relationships in a polite way. Euphemisms are often ambiguous: their
literal and non-literal interpretation is context-dependent:
Asked to choose *between jobs* and the environment, a majority -- at least
in our warped, first-past-the-post system -- will pick jobs.
[non-euphemistic]
vs.
This summer, the budding talent agent was *between jobs* and free to
babysit pretty much any time. [euphemistic]
The state of the art language models perform well on many major NLP
benchmarks; however, it is unclear how such models perform on euphemisms.
Thus, we propose a euphemism detection task: given an input sentence,
identify whether the sentence contains a euphemism.
For more information about the shared task and to participate visit
https://codalab.lisn.upsaclay.fr/competitions/5726
<https://www.google.com/url?q=https%3A%2F%2Fnam10.safelinks.protection.outlo…>
.
*Important dates:*
-
July 5, 2022: CodaLab competition is open; training data can be
downloaded
-
Aug 5, 2022: Test data can be downloaded and results submitted;
performance will be tracked on CodaLab dashboard
-
Aug 20, 2022: Last day for submitting predictions on test data
-
Sept 7, 2022: Papers describing the systems are due
-
Oct 9, 2022: Notification of acceptance
-
TBD, 2022: Camera-ready papers due
-
December 7 or 8, 2022: Workshop
--
**********************************************
Anna Feldman, Ph.D.
Professor of Linguistics and Computer Science
Graduate Program Coordinator & Chair of Linguistics
Montclair State University
http://www.purl.org/net/fa <http://www.purl.org/NET/fa>
Shared Task on Understanding Figurative Language at FigLang2022
We are happy to announce a new shared task on Understanding Figurative
Language as part of the Figurative Language Workshop (FigLang 2022) at
EMNLP 2022. In recent years, there have been several benchmarks dedicated
to figurative language understanding, which generally frame "understanding"
as a recognizing textual entailment task -- deciding whether one sentence
(premise) entails/contradicts another (hypothesis) (Chakrabarty et al 2021,
Stowe et al 2022). We introduce a new shared task for figurative language
understanding around this textual entailment paradigm, where the hypothesis
is a sentence containing the figurative language expression (e.g.,
metaphor, sarcasm, idiom, simile) and the premise is a literal sentence
containing the literal meaning. There are two important aspects of this
task: 1) the task requires not only to generate the label
(entail/contradict) but also to generate a plausible explanation for the
prediction; 2) the entail/contradict label and the exploration are related
to the meaning of the figurative language expression.
For more information about the shared task, including the link to the
datasets, evaluation metrics and scripts important dates please visit the
Shared task website (https://figlang2022sharedtask.github.io/).
Participants can use the following CodaLab (
https://codalab.lisn.upsaclay.fr/competitions/5908) link to participate in
the task as well as submit the predictions.
Important dates:
· July 10, 2022: CodaLab competition is open; training data can be
downloaded
· Aug 15, 2022: Test data (available only to registered participants)
can be downloaded and results submitted; performance will be tracked on
CodaLab dashboard
· Aug 20, 2022: Last day for submitting predictions on test data
· Sept 7, 2022: Papers describing the systems are due
· Oct 9, 2022: Notification of acceptance
· TBD, 2022: Camera-ready papers due
· December 8, 2022: Workshop at EMNLP 2022
Organizing Team
Tuhin Chakrabarty, Columbia University; tuhin.chakr(a)cs.columbia.edu
Arkadiy Saakyan, Columbia University; as5423(a)columbia.edu
Debanjan Ghosh, Educational Testing Service; dghosh(a)ets.org
Smaranda Muresan, Data Science Institute, Columbia
University;smara(a)columbia.edu
Apologies for cross-posting
It is our pleasure to announce the first call for submissions for the
next issue of the journal Dialogue and Discourse. Submissions are
invited on all topics in the formal, computational, or
psycholinguistic study of dialogue and discourse.
Submissions received by August 1, 2022 will be considered for the next
regular issue. Later submissions will be slated for the next available
issue.
http://www.dialogue-and-discourse.org/cfps-current.shtml
Dialogue and Discourse (D&D http://www.dialogue-and-discourse.org/) is
the first peer-reviewed free open access journal dedicated exclusively
to work that deals with language "beyond the sentence". The journal
adopts an interdisciplinary perspective, accepting work from
Linguistics, Computer Science, Psychology, Sociology, Philosophy, and
other associated fields with an interest in formally, technically,
empirically or experimentally rigorous approaches. Descriptive papers
should make a substantial theoretical contribution to be considered.
We are committed to ensuring the highest editorial standards and
rigorous peer-review of all submissions, while granting open access to
all interested readers. D&D has published regular issues every year
since 2010, and occasionally special issues on common topics.
As of June 2022, D&D has published 99 papers, and the journal's
h-index is 26. D&D is endorsed by ACL SIGdial, ACL SemDial, and AMLaP.
D&D is indexed by Scopus and the European Reference Index for the
Humanities and Social Sciences.
Submissions are made via the online submission system at
http://www.dialogue-and-discourse.org/submission.shtml. Authors are
required to indicate if a submission is an extended version of one or
more previously published conference papers (to which we would expect
substantial additions); simultaneous submission to another venue is
prohibited. Submissions will undergo rigorous peer-review. Once
accepted and finalized, papers will appear online immediately, as part
of the current issue. Selected papers will furthermore be offered the
opportunity to present a poster at the following SIGDIAL Conference.
Dialogue and Discourse Editors
Issue Editor:
Ryuichiro Higashinaka (Volume 13, Issue 2)
Junyi Jessy Li (Volume 13, Issue 1)
Editor In Chief:
Barbara Di Eugenio, University of Illinois at Chicago, United States
Associate Editors:
Vera Demberg, Saarland University, Germany
Kallirroi Georgila, University of Southern California, United States
Jonathan Ginzburg, Université Paris-Diderot (Paris 7), France
Pat Healey, Queen Mary University London, United Kingdom
Ryuichiro Higashinaka, Nagoya University, Japan
Junyi Jessy Li, University of Texas at Austin, United States
Massimo Poesio, Queen Mary University London, United Kingdom
Manfred Stede, University of Potsdam, Germany
David R. Traum, University of Southern California, United States
Amir Zeldes, Georgetown University, United States
Full editorial board at: http://www.dialogue-and-discourse.org/editors.shtml
We are proud to announce the release of a new version of BabelNet
<https://babelnet.org/> and its APIs, *both Java and the brand-new Python
version*, developed jointly by the Sapienza NLP Group
<http://nlp.uniroma1.it> of the *Sapienza University of Rome* under the
supervision of prof. Roberto Navigli <https://www.diag.uniroma1.it/navigli/>
and Babelscape <http://babelscape.com/>, *a successful deep-tech
multilingual NLP Company* providing innovative solutions for multilingual
NLP.
BabelNet -- winner of the *prominent paper award 2017* from the Artificial
Intelligence Journal and the META prize 2015, and covered in media such as The
Guardian
<https://www.theguardian.com/news/2018/feb/23/oxford-english-dictionary-can-…>
and Time magazine
<http://wwwusers.di.uniroma1.it/~navigli/img/Redefining_the_modern_dictionar…>
-- is today’s *most far-reaching multilingual resource* which, according to
need, can be used as an *encyclopedic dictionary*, or a *semantic network*
or a huge *knowledge base/ontology*. It has been used by more than *1000
universities and research institutions*, enabling multilinguality in
several fields of AI and NLP, such as semantic search, Word Sense
Disambiguation, Semantic Role Labeling and image tagging.
BabelNet was created by means of the seamless integration and interlinking
of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the
most popular computational lexicon of English - i.e., WordNet, and other
lexical resources such as Wiktionary, OmegaWiki, Wikidata, dozens of
wordnets, Wikiquote, GeoNames, and ImageNet. BabelNet provides *multilingual
synsets*, i.e., concepts and named entities lexicalized in many languages,
and connected with large amounts of semantic relations.
*Version 5.1* comes with the following features:
- *500 languages* and *22 million synsets* covered;
- *53 resources *linked and integrated;
- *Wikipedia* and *Wikidata* updated thanks to *BabelNet live*;
- *Open English WordNet* has been updated to version 2021;
- Added *Q-codes* identifiers (e.g.
https://www.hetop.eu/hetop/3CGP/?la=en&rr=CGP_QC_QD8);
- Added *string tags *from *Wikipedia labels*;
- *French wordnets cleaned up* by removing most potentially incorrect
translations;
- *Italian wordnet definitions *cleaned up;
- *General data cleanup* (glosses, senses, Named Entity vs. concept
labels);
- *Lemma casing corrected in 24 languages* (English, Italian, Spanish,
German, French, Dutch, Polish, Portuguese, Russian, Bulgarian, Czech,
Danish, Greek, Estonian, Finnish, Croatian, Hungarian, Lithuanian, Latvian,
Maltese, Romanian, Slovak, Slovenian, Swedish).
More statistics are available at: babelnet.org/statistics.
Kind regards,
The BabelNet team
--
==============================================
Roberto Navigli* - Professor*
Department of Computer, Control and Management Engineering
Sapienza University of Rome
Via Ariosto, 25
00185 Roma Italy
Phone: +39 06 77274109
Home Page: https://www.diag.uniroma1.it/navigli/
Sapienza NLP Group: http://nlp.uniroma1.it
Co-founder of Babelscape <https://babelscape.com>
==============================================
Dear corpora subscribers:
At NLPgo (https://www.nlpgo.com) we are organizing an online workshop for linguists titled "Learning to use the Terminal", which may be of interest to you.
> What is the Terminal?
> It is the usual black or white background application available in any operating system which allows the users to run commands. It is also called the command interpreter.
> What it is useful for?
> It is useful for many different things, among them making different kinds of transformations on files and, therefore, it allows us to make some interesting corpus calculations, which would otherwise be very difficult to make.
> Workshop content
> 1. Preparation: installing the necessary tools and setup the working environment.
> 2. Basic concepts: file and folder structure, file types, character encodings (types, differences and compatibility problems).
> 3. Basic commands: show files available in a folder, change current folder, show file contents, copy and move files, column extraction and reordering, result sorting, etc.
> 4. Advanced commands and transformations: standard input/output/error, command chaining, finding file names, text finding and replacing, applying commands to several files at a time.
> 5. Regular expressions: advaced text search and replacement techniques
> 6. Corpus specific tasks: Working with data from spreadsheets, texts (orthographic words) and Part-Of-Speech tagged texts (grammatical elements).
More details about the workshop are available on our web page and, specifically, on the workshop specific one: https://www.nlpgo.com/teaching/terminal
As you can see, it will be held from 18th to 22th of July (in Spanish language, Spain timezone).
If you want to register in the workshop you can do it from the workshop web page I have included before. In addition, if you are interested in receiving information about this kind of workshop we organize and other useful information, you can subscribe to our newsletter here: hhttps://www.palabrasbinarias.com/subscribe.
If you have any question don't hesitate to write us through the contact form available at the workshop web page.
Thank you for you attention.
Best regards,
Mario Barcala
--
Mario Barcala
CEO at NLPgo
http://www.nlpgo.com
GPG key id: F1C15EB7
The University of Klagenfurt is pleased to announce the following open position at the Digital Age Research Center (D!ARC), employment to commence as soon as possible:
*PreDoc Scientist (in German: Universitätsassistent*in) (Doctoral Candidate) (all genders welcome)*
The Digital Age Research Center (D!ARC), founded in 2019 as an inter-faculty university centre, aims to shed light not only on the technological but also on the economic, legal, social, individual, behavioural and cultural aspects of the digital revolution. Over the next few years, it is set to develop a corresponding profile in research with a European claim to excellence as well as modules for the range of courses offered at the University of Klagenfurt.
Level of employment: 75 % (30 hours/week)
Minimum salary: € 32.116,-- per annum (gross); classification according to collective agreement: B1
Limited to: 4 years
Application deadline: July 27, 2022
Reference code: 191/22
Tasks and responsibilities:
• Contributing to research in a project in the area of computational linguistics and digital humanities
• Participation in research activities of D!ARC, which offers unique opportunities for interdisciplinary collaboration
• Teaching courses on computational linguistic topics
Prerequisites for the appointment:
• A Master’s degree completed at a domestic or foreign higher education institution in the field of computational linguistics, linguistics, computer science or alike, graded with success and corresponding knowledge in the field
• Good programming skills (preferably Python)
• Proven expertise in:
• Natural language processing
• Digital humanities
• Linguistics
• Machine learning (particularly deep learning)
• Fluent in English and German, both spoken and written
Additional desired qualifications:
• Experience with web crawling and processing large amounts of textual data
• Instructing and supervising linguistic annotation
• Publications at scientific conferences and in journals in the field relating to the position
• Profound knowledge of publicly available tools and resources for natural language processing
• Experience in teaching at a university
• Experience in working in interdisciplinary research projects
• Social and communication skills, ability to work independently
Our offer:
The employment contract is concluded with a starting salary of € 2.294,-- gross per month (14 times a year; previous experience deemed relevant to the job can be recognised in accordance with the https://jobs.aau.at/en/faq/). The University of Klagenfurt also offers:
• Personal and professional advanced training courses, management and career coaching
• Numerous attractive additional benefits, see also https://jobs.aau.at/en/the-university-as-employer/
• Diversity- and family-friendly university culture
• The opportunity to live and work in the attractive Alps-Adriatic region with a wide range of leisure activities in the spheres of culture, nature and sports
The application:
If you are interested in this position, please apply in German or English providing the usual documents:
• Letter of application
• Curriculum vitae
• Proof of all completed higher education programmes (certificates, supplements, if applicable)
• Other documentary evidence that may be relevant to this announcement (see prerequisites and desired qualifications)
The position is solely intended for the completion of a Doctorate. Applicants with a Doctorate or Ph.D. already completed in a related discipline are therefore ineligible for this position.
To apply, please select the position with the reference code 191/22 in the category “Scientific Staff” using the link “Apply for this position” in the job portal at http://jobs.aau.at/en/
Candidates must furnish proof that they meet the required qualifications by July 27, 2022 at the latest.
For further information on this specific vacancy, please contact Michael Wiegand (mailto:michael.wiegand@aau.at). General information about the university as an employer can be found at http://www.aau.at/jobs/en/information. At the University of Klagenfurt, recruitment and staff matters are accompanied not only by the authority responsible for the recruitment procedure but also by the Equal Opportunities Working Group (https://www.aau.at/en/university/organisation/representations-commissioners…) and, if necessary, by the Representative for Disabled Persons (https://www.aau.at/en/university/organisation/administration-and-management…).
!!!!!!!!!! The deadline for submission has been extended to July 20th
!!!!!!!!!! Upload Submissions Now
https://cmt3.research.microsoft.com/AMTA2022
The First Workshop on Corpus Generation and Corpus Augmentation for
Machine Translation (CoCo4MT)
https://sites.google.com/view/coco4mt
@ AMTA – 2022
This 15th biennial conference of the Association for Machine Translation
in the Americas
12-16 September 2022, Orlando, Florida, USA
INVITED TALKS
Jörg Tiedemann University of Helsinki
Julia Kreutzer Google Research
Maria Nadejde Amazon
SCOPE
It is a well-known fact that machine translation systems, especially
those that use deep learning, require massive amounts of data. Several
resources for languages are not available in their human-created format.
Some of the types of resources available are monolingual, multilingual,
translation memories, and lexicons. Those types of resources are
generally created for formal purposes such as parliamentary collections
when parallel and more informal situations when monolingual. The quality
and abundance of resources including corpora used for formal reasons is
generally higher than those used for informal purposes. Additionally,
corpora for low-resource languages, languages with less digital
resources available, tends to be less abundant and of lower quality.
CoCo4MT sets out to be the first workshop centered around research that
focuses on corpora creation, cleansing, and augmentation techniques
specifically for machine translation. We accept work that covers any
spoken language (including high-resource languages) but we are
specifically interested in those submissions that are on languages with
limited existing resources (low-resource languages) where resources are
not highly available.
The goal of this workshop is to begin to close the gap between corpora
available for low-resource translation systems and promote high-quality
data for online systems that can be used by native speakers of
low-resource languages is of particular interest. Therefore, It will be
beneficial if the techniques presented in research papers include their
impact on the quality of MT output and how they can be used in the real
world.
CoCo4MT aims to encourage research on new and undiscovered techniques.
We hope that submissions will provide high-quality corpora that is
available publicly for download and can be used to increase machine
translation performance thus encouraging new dataset creation for
multiple languages that will, in turn, provide a general workshop to
consult for corpora needs in the future. The workshop’s success will be
measured by the following key performance indicators:
- Promotes the ongoing increase in quality of machine translation
systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research areas
to increase the availability of commonly used corpora and new corpora,
- Drives innovation to address the need for higher quality and abundance
of low-resource language data.
TOPICS
We are highly interested in original research papers on the topics
below; however, we welcome all novel ideas that cover research on
corpora techniques.
- Difficulties with using existing corpora (e.g., political
considerations or domain limitations) and their effects on final MT
systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for
training MT systems.
SUBMISSION INFORMATION
There is one type of submission in the workshop: Research, review and
position paper. The length of each paper should be at least four (4) and
not exceed ten (10) pages, plus unlimited pages for references.
Submissions should be formatted according to the official AMTA 2022
style templates (PDF, LaTeX, Word). Accepted papers will be published
on-line in the AMTA 2022 proceedings which includes the ACL Anthology
and will be presented at the conference either orally or as a poster.
Submissions must be anonymized and should be done using the official
conference management system
(https://cmt3.research.microsoft.com/AMTA2022). Scientific papers that
have been or will be submitted to other venues must be declared as such,
and must be withdrawn from the other venues if accepted and published at
CoCo4MT. The review will be double-blind.
We would like to encourage authors to cite papers written in ANY
language that are related to the topics, as long as both original
bibliographic items and their corresponding English translations are
provided.
Registration will be handled by the main conference. (To be announced)
IMPORTANT DATES
June 1, 2022 – Call for papers released
June 15, 2022 – Second call for papers
June 29, 2022 – Third and final call for papers
July 20, 2022 – Paper submissions due (updated extension!)
July 27, 2022 – Notification of acceptance
August 7, 2022 – Camera-ready due
August 31, 2022 – Video recordings due
September 16, 2022 - CoCo4MT workshop
CONTACT
CoCo4MT Workshop Organizers
coco4mt2022(a)googlegroups.com
ORGANIZING COMMITTEE (listed alphabetically)
Constantine Lignos Brandeis University
John E. Ortega New York University and University of Santiago de
Compostela (CITIUS)
Katharina Kann University of Colorado Boulder
Maja Popopvić ADAPT Centre at Dublin City University
Marine Carpuat University of Maryland
Shabnam Tafreshi University of Maryland
William Chen Carnegie Mellon University
PROGRAM COMMITTEE (listed alphabetically tentative)
Abteen Ebrahimi University of Colorado Boulder
Adelani David Saarland University
Ananya Ganesh University of Colorado Boulder
Alberto Poncelas ADAPT Centre at Dublin City University
Amirhossein Tebbifakhr University of Trento
Anna Currey Amazon
Arturo Oncevay University of Edinburgh
Atul Kr. Ojha National University of Ireland Galway
Bharathi Raja Chakravarthi National University of Ireland Galway
Beatrice Savoldi University of Trento
Bogdan Babych Heidelberg University
Briakou Eleftheria University of Maryland
Dossou Bonaventure Mila Quebec AI Institute
Duygu Ataman New York University
Eleni Metheniti Université Toulosse - Paul Sabatier
Francis Tyers Indiana University
Jasper Kyle Catapang University of Birmingham
John E. Ortega New York University and USC - CITIUS
José Ramom Pichel Campos Universidade de Santiago de Compostela - CITIUS
Kalika Bali Microsoft
Koel Dutta Chowdhury Saarland University
Liangyou Li Huawei
Manuel Mager University of Stuttgart
Maria Art Antonette Clariño University of the Philippines Los Baños
Mathias Müller University of Zurich
Nathaniel Oco De La Salle University
Niu Xing Amazon
Pablo Gamallo Universidade de Santiago de Compostela - CITIUS
Rodolfo Joel Zevallos Salazar Universitat Pompeu Fabra
Rico Sennrich University of Zurich
Sangjee Dondrub Qinghai Normal University
Santanu Pal Saarland University
Sardana Ivanova University of Helsinki
Shantipriya Parida Silo AI
Surafel Melaku Lakew Amazon
Tommi A Pirinen University of Tromsø
Valentin Malykh Moscow Institute of Physics and Technology
--
*Shabnam Tafreshi, PhD*
*Assistant Research Scientist*
*Computational Linguistics, NLP*
*UMD: ARLIS @ College Park*
*"All the problems of the world could be settled easily, if people only
willing to think."*
*-Thomas J. Watson*
We are offering a fully funded, industry-sponsored PhD scholarship on the topic of Language Models.
The selected candidate will have the opportunity to conduct research at the junction of industry and academia.
She/he will also be part of an exciting team of data scientists & PhD researchers from the corporate and from the academic world.
For more details, please see
https://www.akadeus.com/announcement,a7165.htmlhttps://www.digitallab.be/en/
Co-located with COLING 2022, at VarDial we anticipate discussion on computational methods and on language resources for closely related languages, language varieties and dialects. We plan to organize VarDial 2022 as a hybrid workshop with options for both on-site and remote participation. We accept paper submissions until July 22, 2022 (details below).
https://sites.google.com/view/vardial-2022
We welcome papers dealing with one or more of the following topics:
- Language resources and tools for similar languages, varieties and dialects;
- Adaptation of tools (taggers, parsers) for similar languages, varieties and dialects;
- Evaluation of language resources and tools when applied to language varieties;
- Reusability of language resources in NLP applications (e.g., for machine translation, POS tagging, syntactic parsing, etc.);
- Corpus-driven studies in dialectology and language variation;
- Computational approaches to the study of mutual intelligibility between dialects and similar languages;
- Automatic identification of lexical variation;
- Automatic classification of language varieties;
- Text similarity and adaptation between language varieties;
- Linguistic issues in the adaptation of language resources and tools (e.g., semantic discrepancies, lexical gaps, false friends);
- Machine translation between closely related languages, language varieties and dialects.
In addition to the topics listed above, we also welcome papers dealing with diachronic language variation (e.g. phylogenetic methods, historical dialects).
Instructions for Authors
Submissions should be formatted according to the COLING template and submitted in PDF format. The review process will be double-blind. More information on the website.
Important Dates
Submission deadline: EXTENDED TO JULY 22, 2022 (anywhere on earth)
Notification of acceptance: August 22, 2022
Camera-ready papers due: September 5, 2022
VarDial Workshop at COLING 2022: October 16, 2022
Organizers
Yves Scherrer - University of Helsinki (Finland)
Tommi Jauhiainen - University of Helsinki (Finland)
Nikola Ljubešić - Jožef Stefan Institute (Slovenia) and University of Zagreb (Croatia)
Preslav Nakov - Qatar Computing Research Institute, HBKU (Qatar)
Jörg Tiedemann - University of Helsinki (Finland)
Marcos Zampieri - Rochester Institute of Technology (USA)
Contact: yves.scherrer(a)helsinki.fi<mailto:yves.scherrer@helsinki.fi>
--- apologies for cross-postings ---
Dear colleagues,
We have an open position for a postdoctoral researcher on natural
language processing / information retrieval / machine learning (SCAI/BnF
research program)
Starting period: autumn 2022
Duration: 12-month postdoctoral contract, renewable)
Location: Sorbonne university (ISIR lab in the MLIA team) / DataLab of
the BNF
Supervision:
Laure Soulier, MCF in computer science at Sorbonne University, MLIA
team, ISIR.
Emmanuelle Bermès, Scientific and Technical Assistant to the Director of
Services and Networks at BnF.
Jean-Philippe Moreux, Scientific expert of Gallica at the BnF.
More info:
https://scai.sorbonne-universite.fr/public/news/view/27d72d260c950c8d66c6/1
_*Context*_
Gallica, the digital library of the BnF, contains nearly 10 million
digitized documents that are freely accessible online (18.5 million
visits per year). However, most users do not know that Gallica contains
not only printed documents, but also photographs, sound recordings,
videos, and 3D objects. In satisfaction surveys, only a minority of
users consider the search engine's answers to be relevant and a majority
would like to be better guided in their searches. A recommendation
system should be able to help users find their way through the mass of
collections and improve the visibility of the least known. In this
project, BnF is committed to adopting a resolutely ethical approach. The
exploitation of user logs must respect their privacy and guarantee both
the relevance and transparency of the algorithms, avoiding the risk of
filter bubbles. The interface design is also at the heart of the
approach: a trustworthy system relies on a good user experience and on
the diversity and relevance of the proposed recommendations. Three lines
of thought emerge:
1) based on the available data, including both user logs and collection
descriptions, how to develop predictive algorithms?
2) how to integrate diversity in the recommendation algorithm while
leaving the choice to the user to moderate his serendipity threshold?
3) how to build user trust in algorithm design and audit?
_*Main missions*_
This project consists in working on information access in the Gallica
library, from the point of view of machine and deep learning techniques.
The research axes concern (1) the analysis and indexing of textual
documents as well as (2) the analysis of user traces and (3)
recommendation systems. We are particularly interested in multimodal
techniques that allow contextualizing a document or a query based on
user interactions.
The successful candidate will be responsible for:
● Implementing models to learn the semantics of textual data for the
purpose of indexing them.
● Developing algorithms based on representation learning methodologies
to effectively blend text and user traces.
● Reporting and presenting development work in a clear and effective
manner, both for discussion with BnF experts and writing machine
learning publications.
The printed book collection will be the primary focus of the program
described above, but an extension to other collections with textual
descriptors (in particular iconographic collections) may be considered.
--
-------------
Laure Soulier
Maître de conférences
Equipe MLIA - Laboratoire ISIR - Sorbonne Université
Tour 26, Couloir 26-00, Bureau 515
(+33) 1 44 27 74 91
https://pages.isir.upmc.fr/soulier/