--- apologies for cross-postings ---
Dear colleagues,
We have an open position for a postdoctoral researcher on natural language processing / information retrieval / machine learning (SCAI/BnF research program)
Starting period: autumn 2022
Duration: 12-month postdoctoral contract, renewable)
Location: Sorbonne university (ISIR lab in the MLIA team) /
DataLab of the BNF
Supervision:
Laure Soulier, MCF in computer science at Sorbonne University,
MLIA team, ISIR.
Emmanuelle Bermès, Scientific and Technical Assistant to the
Director of Services and Networks at BnF.
Jean-Philippe Moreux, Scientific expert of Gallica at the BnF.
More info:
https://scai.sorbonne-universite.fr/public/news/view/27d72d260c950c8d66c6/1
Context
Gallica, the digital library of the BnF, contains nearly 10
million digitized documents that are freely accessible online
(18.5 million visits per year). However, most users do not know
that Gallica contains not only printed documents, but also
photographs, sound recordings, videos, and 3D objects. In
satisfaction surveys, only a minority of users consider the search
engine's answers to be relevant and a majority would like to be
better guided in their searches. A recommendation system should be
able to help users find their way through the mass of collections
and improve the visibility of the least known. In this project,
BnF is committed to adopting a resolutely ethical approach. The
exploitation of user logs must respect their privacy and guarantee
both the relevance and transparency of the algorithms, avoiding
the risk of filter bubbles. The interface design is also at the
heart of the approach: a trustworthy system relies on a good user
experience and on the diversity and relevance of the proposed
recommendations. Three lines of thought emerge:
1) based on the available data, including both user logs and
collection descriptions, how to develop predictive algorithms?
2) how to integrate diversity in the recommendation algorithm
while leaving the choice to the user to moderate his serendipity
threshold?
3) how to build user trust in algorithm design and audit?
Main missions
This project consists in working on information access in the
Gallica library, from the point of view of machine and deep
learning techniques. The research axes concern (1) the analysis
and indexing of textual documents as well as (2) the analysis of
user traces and (3) recommendation systems. We are particularly
interested in multimodal techniques that allow contextualizing a
document or a query based on user interactions.
The successful candidate will be responsible for:
● Implementing models to learn the semantics of textual data for
the purpose of indexing them.
● Developing algorithms based on representation learning
methodologies to effectively blend text and user traces.
● Reporting and presenting development work in a clear and
effective manner, both for discussion with BnF experts and writing
machine learning publications.
The printed book collection will be the primary focus of the
program described above, but an extension to other collections
with textual descriptors (in particular iconographic collections)
may be considered.
-- ------------- Laure Soulier Maître de conférences Equipe MLIA - Laboratoire ISIR - Sorbonne Université Tour 26, Couloir 26-00, Bureau 515 (+33) 1 44 27 74 91 https://pages.isir.upmc.fr/soulier/