You can find below an offer for a PhD student contract in Natural Processing at Univ. of Lorraine, Nancy, France.
Subject: Automatic generation of explanations for multiword expressions in the context of language learning Thesis supervisors: Mathieu Constant (ATILF, Univ. Lorraine, France) and Patrick Watrin (CENTAL, Univ. of Louvain, Belgium) Thesis funded for three years by the ANR STAR-FLE project Start date: 1 October 2024 Salary: 2135,00 € gross monthly Host laboratory: ATILF (Computer Processing and Analysis of the French Language) Location: Nancy, France Application deadline: July 11, 2024
Scientific background:
The successful candidate will join the ATILF, a research unit in language sciences, and in particular the research group on natural language processing (NLP). This research group works, among other things, on exploiting recent NLP models for linguistic modelling (e.g. lexical modelling) with applications in the medical field and language learning. In particular, its work is based on the integration of large (generative) language models and knowledge bases (e.g. scientific textual data, lexical resources).
More specifically, the thesis will be part of the STAR-FLE project (STrategic Adaptations for better Reading and Text Comprehension in FFL) funded by the Agence Nationale de la Recherche for 4 years (2024-2027). The project is in the field of computer-assisted language teaching. The aim of STAR-FLE is to gain a better understanding of the difficulties encountered by learners of French as a foreign language (FFL) when faced with the lexicon present in authentic texts. It will propose digital solutions based on natural language processing (NLP) to facilitate text comprehension and enable teachers to better manage heterogeneous levels in the classroom. Contextual aids and personalized vocabulary adaptations are envisaged, particularly for multiword expressions.
Objectives:
The thesis will focus on multiword expressions. They correspond to combinations of several lexical units which are composed in an irregular manner on one or more linguistic levels (morphology, syntax, semantics, etc.). This term covers a wide variety of phenomena, such as idiomatic expressions (run around in circles, dry run), support verb constructions (take a walk), complex functional units (in spite of), etc. This non-compositionality, which can lead to a certain semantic opacity, can pose problems for learners when reading. In this thesis, the person recruited will develop methods based on new NLP techniques to produce in-context explanatory card enabling learners to better understand these expressions. The production of these cards will be based on the prediction of linguistic properties (e.g. a dry run is not dry), on the generation of natural language explanations using large generative language models (e.g. paraphrases), or on semantic linking to different lexical resources (e.g. to retrieve definitions and lexical neighbors), depending on the context in which the expression occurs. One of the challenges will be to propose explanatory cards adapted to the learner's level.
Application requirements and procedures
Candidates should have the following skills and profiles: - a Master's degree in computational linguistics, in natural language processing, in computer science or in cognitive science. - very good programming skills - very good skills in recent models of natural language processing (e.g. large language models).
Applications should include a cover letter, CV and Master's grades, together with references or one or more letters of recommendation. They should be submitted at the following url: https://emploi.cnrs.fr/Offres/Doctorant/UMR7118-SABMAR-020/Default.aspx?lang... https://emploi.cnrs.fr/Offres/Doctorant/UMR7118-SABMAR-020/Default.aspx?lang=EN
For more information, do not hesitate to contact Mathieu Constant (Mathieu.Constant@univ-lorraine.fr mailto:Mathieu.Constant@univ-lorraine.fr).