Dear all,
We have a PhD opportunity in NLP and computational linguistics about automatic analysis of human ability to collaborate in dyadic and group conversations, for educational applications: [ https://jobs.inria.fr/public/classic/en/offres/2024-07248 | https://jobs.inria.fr/public/classic/en/offres/2024-07248 ] . Though the offer description in the link is in French, we strongly encourage non-French speakers to apply as well! The offer is translated in English below.
Prospective candidates are encouraged to get in touch with us as soon as possible.
Looking forward to reading you, Maria Boritchev and Chloé Clavel
______________________________________
Automatic analysis of human capacity to collaborate during dyadic and group conversations, for educational applications.
Context and scientific objectives
Work on dialog using NLP and deep learning approaches for Dialog Act prediction or sentiment analysis integrates the conversational aspects by capturing contextual dependencies between utterances using recurrent neural networks (RNN) or convolutional neural networks (CNN) for supervised learning (Bapna et al., 2017). The inter-speaker dynamics has also recently started to be integrated. For example, in (Hazarika et al., 2018), intra-speaker dynamics is modeled using a GRU (Gated Recurrent Unit). Other ways to model a conversation in structures that are more complex than flat sequences of utterances are also investigated by leveraging hierarchical neural architectures (Chapuis et al., 2020) or by using graphs in the neural architectures (Ghosal et al., 2019). The conversational aspects and contextual dependencies between the labels are also modeled using sequential decoders and attention mechanisms for NLP-oriented Dialog Act classification (Colombo et al., 2020). Regarding neural architectures dedicated to generating an agent’s behavior, a few studies on affective computing attempt to integrate collaborative processes. The studies concern the generation of agent’s non-verbal behaviors related to social stances (Dermouche & Pelachaud, 2016) and Long-Short-Term-Memory (LSTM) architectures are used as a black box in order to model inter-speaker dynamics. Other studies that are not relying on neural architectures address the question of selecting the agent’s utterance or best dialog policy (ex. conversation strategies such as hedging or self-disclosure or extroverted or introverted linguistic styles) according to the user’s social behaviors (multimodal behavior in (Ritschel et al., 2017) and verbal behavior in (Pecune & Marsella, 2020)). In both studies, a social reward is built for reinforcement learning. A recent work investigates neural architectures (Bert model named CoBERT) trained on Empathetic conversations for response selection, but there is no option in order to select the level or the kind of empathy which is the most relevant (Zhong et al. 2020).
While these existing neural architectures (convolutional, recurrent and transformer), for tracking a speaker’s state in conversations are extremely promising by modelling inter-speaker dynamics and the sequential structure of the conversation, the phenomena they are detecting are restricted to sentiment, emotions, or dialogue acts. What is still missing in the module dedicated to tracking the user’s state in modular conversational systems is the consideration of the collaborative processes as a joint action of the user and the agent to understand each other, maintain the flow of the interaction and create a social relationship. The aforementioned neural approaches are very effective, but they are not very data-efficient. There are many use cases where the amount of available data is not sufficient to be able to use these methods, particularly when it comes to deep learning; this is notably the case in educational contexts, where the data at stake is quite confidential, especially when children are involved, as the data is considered to be personal data and is therefore subject to GDPR (https://gdpr-info.eu/). Computational linguistics provide us with other approaches to the analysis of conversations, symbolic and logic-based. These approaches rely on small amounts of data and focus on specific phenomena, such as management of implicit implications/information in dialogues (Breitholtz, 2020) and various contexts (Rebuschi, 2017). Segmented Discourse Representation Theory (SDRT, Asher and Lascarides, 2003) is one of the most widely used frameworks for dialogue analysis used within both formal and neural approaches to dialogue. Another approach is to propose a hybridation of knowledge graphs for modelling social commonsense and large language models (Kim et al., 2023). The objective of the thesis is to investigate approaches that hybridize neural and symbolic models. The approaches will be dedicated to analysing and controlling the level of collaborations between participants in conversations (e.g., misunderstanding analysis and management) through their verbal expressions. We will focus on educational applications such as classroom dynamics & student engagement analysis and conversational systems for supporting students with difficulties, or learning social skills following the ethical guidelines defined in (1).
(1) [ https://web-archive.oecd.org/2020-07-23/559610-trustworthy-artificial-intell... | https://web-archive.oecd.org/2020-07-23/559610-trustworthy-artificial-intell... ]
(Breitholtz, 2020) Breitholtz, E. (2020). Enthymemes in Dialogue. Brill.
(Asher and Lascarides, 2003) Asher, N. and Lascarides, A. (2003). Logics of conversation. Cambridge University Press.
(Rebuschi, 2017) Rebuschi, M. (2017). Schizophrenic conversations and context shifting. In International and Interdisciplinary Conference on Modeling and Using Context, pages 708–721. Springer
(Kim et al., 2023) Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, and Yejin Choi. 2023. SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12930–12949, Singapore. Association for Computational Linguistics.
Supervision :
Thesis supervisor: Chloé Clavel, senior research, ALMAnaCH team, Inria Paris
Co-supervisor: Maria Boritchev, associate professor, S2a team, Telecom-Paris