[Corpora-List] Postdoctoral Researcher position in Artificial Intelligence and Natural Language Processing (SCAI-Sorbonne/BnF research program)

6 Jul 2022


      --- apologies for cross-postings ---
Dear colleagues,
We have an open position for a postdoctoral researcher on natural 
language processing / information retrieval / machine learning (SCAI/BnF 
research program)
Starting period: autumn 2022
Duration: 12-month postdoctoral contract, renewable)
Location: Sorbonne university (ISIR lab in the MLIA team) / DataLab of 
the BNF
Supervision:
Laure Soulier, MCF in computer science at Sorbonne University, MLIA 
team, ISIR.
Emmanuelle Bermès, Scientific and Technical Assistant to the Director of 
Services and Networks at BnF.
Jean-Philippe Moreux, Scientific expert of Gallica at the BnF.
More info: 
https://scai.sorbonne-universite.fr/public/news/view/27d72d260c950c8d66c6/1
_*Context*_
Gallica, the digital library of the BnF, contains nearly 10 million 
digitized documents that are freely accessible online (18.5 million 
visits per year). However, most users do not know that Gallica contains 
not only printed documents, but also photographs, sound recordings, 
videos, and 3D objects. In satisfaction surveys, only a minority of 
users consider the search engine's answers to be relevant and a majority 
would like to be better guided in their searches. A recommendation 
system should be able to help users find their way through the mass of 
collections and improve the visibility of the least known. In this 
project, BnF is committed to adopting a resolutely ethical approach. The 
exploitation of user logs must respect their privacy and guarantee both 
the relevance and transparency of the algorithms, avoiding the risk of 
filter bubbles. The interface design is also at the heart of the 
approach: a trustworthy system relies on a good user experience and on 
the diversity and relevance of the proposed recommendations. Three lines 
of thought emerge:
1) based on the available data, including both user logs and collection 
descriptions, how to develop predictive algorithms?
2) how to integrate diversity in the recommendation algorithm while 
leaving the choice to the user to moderate his serendipity threshold?
3) how to build user trust in algorithm design and audit?
_*Main missions*_
This project consists in working on information access in the Gallica 
library, from the point of view of machine and deep learning techniques. 
The research axes concern (1) the analysis and indexing of textual 
documents as well as (2) the analysis of user traces and (3) 
recommendation systems. We are particularly interested in multimodal 
techniques that allow contextualizing a document or a query based on 
user interactions.
The successful candidate will be responsible for:
● Implementing models to learn the semantics of textual data for the 
purpose of indexing them.
● Developing algorithms based on representation learning methodologies 
to effectively blend text and user traces.
● Reporting and presenting development work in a clear and effective 
manner, both for discussion with BnF experts and writing machine 
learning publications.
The printed book collection will be the primary focus of the program 
described above, but an extension to other collections with textual 
descriptors (in particular iconographic collections) may be considered.
-- 
-------------
Laure Soulier
Maître de conférences
Equipe MLIA - Laboratoire ISIR - Sorbonne Université
Tour 26, Couloir 26-00, Bureau 515
(+33) 1 44 27 74 91
https://pages.isir.upmc.fr/soulier/

2026

2025

2024

2023

2022

[Corpora-List] Postdoctoral Researcher position in Artificial Intelligence and Natural Language Processing (SCAI-Sorbonne/BnF research program)