Postdoc position at IRIT, Toulouse (France) – ANR AnDiAMO Developing systems towards robust discourse parsing and its application https://pagesperso.irit.fr/~Chloe.Braud/andiamo/%C2%A0 * Contract duration: 12 months (flexible) * Starting date: December 2022 (flexible) * Location: IRIT, Université P. Sabatier (Toulouse III) * Remuneration: starting at 2,745 euros, gross salary, depending on experience * Application deadline: the position will be open until fulfilled * Send application by email to chloe.braud@irit.fr Application procedure: please send a CV and a short letter motivating your application by detailing the following elements: * indicate your **skills in machine learning**, e.g. the type of tasks you already worked on, the type of algorithms, the libraries used. Please specify your experience with neural architectures and pre-trained language models. * describe your **interest and/or experience in natural language processing**, i.e. the type of tasks you already tried to solve if any, or similar problems you worked on, or why you now want to work in NLP and why you think your experience in another domain could be relevant * If you are interested but don’t have a phd, rather a master / engineer diploma and your CV fits the requirements, please send me an email with the same information as above Incomplete application will not be considered. The AnDiAMO project: Natural Language Processing (NLP) is a domain at the frontier of AI, computer science and linguistics, aiming at developing systems able to automatically analyze textual documents. Within NLP, discourse parsing is a crucial but challenging task: its goal is to produce structures describing the relationships (e.g. explanation, contrast...) between spans of text in full documents, allowing for making inference on their content. Developing high-performing and robust discourse parsers could help to improve downstream applications such as automatic summarization or translation, question-answering, chat bots, e.g. [1,2,3]. However, current performance are still low, mainly due to the lack of annotated data (see e.g. [4] on monologues, [5] on dialogues, [6,7] for the multilingual setting). In order to develop robust discourse parsers within the AnDiAMO project, we want to explore multi-objective settings, where the goal is ultimately to perform a discourse analysis, but relying on another related objective such as performing well on another task (e.g. morphological, syntactic or temporal analysis), or an application (e.g. sentiment analysis or argument mining). We will also explore the issues of cross-language and cross framework learning. The position is funded by the ANR AnDiAMO project, for which an engineer has already been hired, master interns will also be recruited. Collaborations are planned with researchers in Toulouse, Grenoble, Nancy and Munich. The hired person will be part of the MELODI team at IRIT, participating in team and project meetings, and co-authoring articles. Research plan: The recruited candidate will work on one or several of the following topics, depending on its interests: - Data representation: Discourse processing requires information from various levels of linguistics analysis. For now, existing studies do not make it clear what kind of information is important and needed, and some potentially relevant sources of information are ignored. We plan to explore this issue within a multi-task learning setting, where a system has to jointly learn different tasks. We will experiment on classification tasks (discourse relation, segmentation) and on full discourse parsing. - Transferring to new languages, domains and modalities: Developing systems that perform well on domains or languages that are different from those used at training time is crucial, especially if the adaptation can be done in an unsupervised way. It is especially important for discourse, since annotation is very hard and time-consuming. We plan to experiment with cross-lingual embeddings and to explore multi-task learning, but trying to understand how to integrate additional linguistic information with only little annotated data for auxiliary tasks. We also want to investigate dialogues, for which only a few discourse parsers exist, and better understand how it differs from monologues. - Extrinsic evaluation: We will investigate a few downstream applications that could benefit from discourse information, as a way to give an extrinsic evaluation. We will explore pipeline systems, varying the way we encode the discourse information as input of our end system. We will also explore transfer learning strategies, either via multi-task learning or representation learning. We plan to start with cognitive impairment detection (e.g. schizophrenia, Alzheimer) and argument mining. More applications will be considered, depending on the interest of the recruited postdoc. It will be possible to investigate other paths of research, such as few-shot or unsupervised learning, depending on the interest of the recruited candidate.Profile * PhD degree in computer science or computational linguistics * Good knowledge in Machine Learning is required * Interest in language technology / NLP * Good programming skills: preferably with Python, knowledge of PyTorch is a plus References [1] Feng, X., Feng, X., Qin, B., and Geng, X. Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization. In Proceedings of IJCAI. 2019. [2] Bawden, R., Sennrich, R., Birch, A., and Haddow, B. Evaluating Discourse Phenomena in Neural Machine Translation. In Proceedings of NAACL. 2018 [3] Xu, J., Gan, Z., Cheng, Y., & Liu, J. Discourse-Aware Neural Extractive Text Summarization. In Proceedings of ACL. 2020 [4] Koto, F., Lau, J. H., & Baldwin, T. Top-down Discourse Parsing via Sequence Labelling. In Proceedings of EACL. 2021 [5] Liu, Z., & Chen, N. Improving Multi-Party Dialogue Discourse Parsing via Domain Integration. In Proceedings of the 2nd Workshop on Computational Approaches to Discourse. 2021 [6] Braud, C., Coavoux, M., & Søgaard, A. Cross-lingual RST Discourse Parsing. In Proceedings of EACL. 2017[7] Liu, Z., Shi, K., & Chen, N. DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing. In Proceedings of the 2nd Workshop on Computational Approaches to Discourse. 2021