PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 19th, 2022
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, natural language processing, self-supervised learning, knowledge informed learning, Robustness, fairness
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies) will start on November 1st 2022. Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.
*PROJECT OBJECTIVES*
Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an impressive impact on downstream tasks performance. This is mainly due to their ability to benefit from a large amount of data at the cost of a tremendous carbon footprint rather than improving the efficiency of the learning. Another question related to SSL models is their unpredictable results once applied to realistic scenarios which exhibit their lack of robustness. Furthermore, as for any pre-trained models applied in society, it isimportant to be able to measure the bias of such models since they can augment social unfairness.
The goals of this PhD position are threefold:
- to design new evaluation metrics for SSL of speech models ;
- to develop knowledge-driven SSL algorithms ;
- to propose methods for learning robust and unbiased representations.
SSL models are evaluated with downstream task-dependent metrics e.g., word error rate for speech recognition. This couple the evaluation of the universality of SSL representations to a potentially biased and costly fine-tuning that also hides the efficiencyinformation related to the pre-training cost. In practice, we will seek to measure the training efficiency as the ratio between the amount of data, computation and memory needed to observe a certain gain in terms of performance on a metric of interest i.e.,downstream dependent or not. The first step will be to document standard markers that can be used as robust measurements to assess these values robustly at training time. Potential candidates are, for instance, floating point operations for computational intensity, number of neural parameters coupled with precision for storage, online measurement of memory consumption for training and cumulative input sequence length for data.
Most state-of-the-art SSL models for speech rely onmasked prediction e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such prevalence in the literature is mostly linked to the size, amount of data and computational resources injected by thecompany producing these models. In fact, vanilla masking approaches and contrastive losses may be identified as uninformed solutions as they do not benefit from in-domain expertise. For instance, it has been demonstrated that blindly masking frames in theinput signal i.e. HuBERT and WavLM results in much worse downstream performance than applying unsupervised phonetic boundaries [Yue2021] to generate informed masks. Recently some studies have demonstrated the superiority of an informed multitask learning strategy carefully selecting self-supervised pretext-tasks with respect to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive learning loss [Zaiem2022]. In this PhD project, our objective is: 1. continue to develop knowledge-driven SSL algorithms reaching higher efficiency ratios and results at the convergence, data consumption and downstream performance levels; and 2. scale these novel approaches to a point enabling the comparison with current state-of-the-art systems and therefore motivating a paradigm change in SSL for the wider speech community.
Despite remarkable performance on academic benchmarks, SSL powered technologies e.g. speech and speaker recognition, speech synthesis and many others may exhibit highly unpredictable results once applied to realistic scenarios. This can translate into a global accuracy drop due to a lack of robustness to adversarial acoustic conditions, or biased and discriminatory behaviors with respect to different pools of end users. Documenting and facilitating the control of such aspects prior to the deployment of SSL models into the real-life is necessary for the industrial market. To evaluate such aspects, within the project, we will create novel robustness regularization and debasing techniques along two axes: 1. debasing and regularizing speech representations at the SSL level; 2. debasing and regularizing downstream-adapted models (e.g. using a pre-trained model).
To ensure the creation of fair and robust SSL pre-trained models, we propose to act both at the optimization and data levels following some of our previous work on adversarial protected attribute disentanglement and the NLP literature on data sampling and augmentation [Noé2021]. Here, we wish to extend this technique to more complex SSL architectures and more realistic conditions by increasing the disentanglement complexity i.e. the sex attribute studied in [Noé2021] is particularly discriminatory. Then, and to benefit from the expert knowledge induced by the scope of the task of interest, we will build on a recent introduction of task-dependent counterfactual equal odds criteria [Sari2021] to minimize the downstream performance gap observed in between different individuals of certain protected attributes and to maximize the overall accuracy. Following this multi-objective optimization scheme, we will then inject further identified constraints as inspired by previous NLP work [Zhao2017]. Intuitively, constraints are injected so the predictions are calibrated towards a desired distribution i.e. unbiased.
*SKILLS*
*
Master 2 in Natural Language Processing, Speech Processing, computer science or data science.
*
Good mastering of Python programming and deep learning framework.
*
Previous in Self-Supervised Learning, acoustic modeling or ASR would be a plus
*
Very good communication skills in English
*
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The thesis will be conducted within the Getalp teams of the LIG laboratory (_https://lig-getalp.imag.fr/_ https://lig-getalp.imag.fr/) and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team and the LIA have a strong expertise and track record in Natural Language Processing and speech processing. The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment.
The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and LIA. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.
The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon) and Benjamin Lecouteux and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore, the project will involve one of the founders of SpeechBrain, Titouan Parcollet with whom the candidate will interact closely.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Mickael Rouvier (_mickael.rouvier@univ-avignon.fr_ mailto:mickael.rouvier@univ-avignon.fr), Benjamin Lecouteux(benjamin.lecouteux@univ-grenoble-alpes.fr) and François Portet (_francois.Portet@imag.fr_ mailto:francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.
*REFERENCES:*
[Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T., Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation in Proc. Interspeech 2021 (2021), 1902–1906.
[Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3515–3525 (2021)
[Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech Representation Learning in Proc. Interspeech 2021 (2021), 746–750.
[Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection for Multitask Self-Supervised Speech Representation in AAAI, The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing, 2023 (2022).
[Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), 2979–2989.
PhD in ML/NLP – Fairness and self-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 19th, 2022
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, fairness, bias, self-supervised learning,evaluation metrics
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies) will start on November 1st 2022. Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.
*PROJECT OBJECTIVES*
Speech technologies are widely used in our daily life and are expanding the scope of our action, with decision-making systems, including in critical areas such as health or legal aspects. In these societal applications, the question of the use of these tools raises the issue of the possible discrimination of people according to criteria for which societyrequires equal treatment, such as gender, origin, religion or disability... Recently, the machine learning community has been confronted with the need to work on the possible biases of algorithms, and many works have shown that the search for the best performance is not the only goal to pursue [1]. For instance, recent evaluations of ASR systems have shown that performances can vary according to the gender but these variations depend both on data used for learning and on models [2]. Therefore such systems are increasingly scrutinized for being biased while trustworthy speech technologies definitely represents a crucial expectation.
Both the question of bias and the concept of fairness have now become important aspects of AI, and we now have to find the right threshold between accuracy and the measure of fairness. Unfortunately, these notions of fairness and bias are challenging to define and their meanings can greatly differ [3].
The goals of this PhD position are threefold:
- First make a survey on the many definitions of robustness, fairness and bias with the aim of coming up with definitions and metrics fit for speech SSL models
- Then gather speech datasets with high amount of well-described metadata
- Setup an evaluation protocol for SSL models and analyzing the results.
*SKILLS*
*
Master 2 in Natural Language Processing, Speech Processing, computer science or data science.
*
Good mastering of Python programming and deep learning framework.
*
Previous experience in bias in machine learning would be a plus
*
Very good communication skills in English
*
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The PhD position will be co-supervised by Alexandre Allauzen (Dauphine Université PSL, Paris) and Solange Rossato and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, two other PhD positions are open in this project. The students, along with the partners will closely collaborate. For instance, specific SSL models along with evaluation criteria will be developed by the other PhD students. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and Dauphine Université PSL. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Alexandre Allauzen (_alexandre.allauzen@espci.psl.eu_ mailto:mickael.rouvier@univ-avignon.fr), Solange Rossato(Solange.Rossato@imag.fr) and François Portet (_francois.Portet@imag.fr_ mailto:francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.
*REFERENCES:*
[1] Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J. & Tuennerman, E. “I don’t Think These Devices are Very Culturally Sensitive.”—Impact of Automated Speech Recognition Errors on African Americans. Frontiers in Artificial Intelligence 4. issn: 2624-8212. _https://www.frontiersin.org/article/10.3389/frai.2021.725911_ https://www.frontiersin.org/article/10.3389/frai.2021.725911(2021).
[2] Garnerin, M., Rossato, S. & Besacier, L. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech inProceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021), 86–92. [3] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACMComput. Surv. 54. issn: 0360-0300. _https://doi.org/10.1145/3457607_ https://doi.org/10.1145/3457607(July 2021).
PhD in NLP - PATRIMALP Materials, Pigments, Lights: the colors of Heritage – Natural Language Processing for cultural heritage
Starting date: October 01, 2022 (flexible) Application deadline: September 5th, 2022 Interviews (tentative): September 12th, 2022 Salary: 1 975 € gross/month (social security included) Mission: research oriented (teaching possible but not mandatory)
Keywords: natural language processing, knowledge representation, cultural heritage, transfer learning, multilingualism
CONTEXT
The main challenge of the Patrimalp project is the development of an integrated and interdisciplinary Heritage Science, in order to ensure cultural Heritage sustainability, promotion and dissemination in contemporary society. The ambition is to produce the forms of intelligibility of a global and moving process which starts from the collection of the raw material, its transformation into a primitive object, different lives as a material (alterations, degradations, transformations ...) and as a symbol (relegation, disinterest, oblivion or rebirth, exaltation...) throughout history, and finally from its election as an object of historical and Heritage value and its “promotion” into a work of art. This research is applied to understand how inks and pigments have been conceived over several centuries, how it has been used in art work as well as how the handcrafting method has evolved and been disseminated over centuries and countries.
To make this study possible, the project will gather a large collection of textual material made up of alchemical works and collections of natural or artificial objects collected between the 16th and 18th centuries. To better understand the choice of colors for these "wonders", we want to reconstruct the recipes for making colored material in its context of thought, whether technical or symbolic. These recipes will constitute a new body of research for literary people and a new data-study case for building knowledge about color. This corpus indeed offers modes of representation inscribed in complex forms of writing and fiction whose modalities and frames of reference remain to be analyzed (accounts of technical, medical or physico-chemical experiments inscribed in fictional worlds or mythological, symbolic descriptions of artifacts, or materials collected in nature, mines). On the linguistic level, the inventory of this lexicon in different European and Eastern languages will lead to the formalization of the knowledge of these various skills over time and several cultures. This corpus will thus provide complex data on the material and symbolic origin of the ingredients of color, on its use, its names and its physical or symbolic perception: these data represent a challenge for computer researchers who will have to organize them in order to benefit curators, chemists or physicists, in ontologies representing the state of knowledge from the point of view of scholars over the ages. To systematically explore the corpus of these recipes, we will use NLP techniques to uncover the correlations between recipes, physical and chemical composition of objects and symbolic references. The final objective is to build a knowledge base (objects, components of objects, materials, colors, know-how, reference framework) each of the parts being able to reference a specific ontology (ontology of pigments, materials, colors...) to make it possible for researchers to observe the trajectory from the writing of color to its technical and artisan practice from this specific corpus.
PHD OBJECTIVES
The PhD project will focus on segmenting, extracting and representing recipes from a corpus of alchemical works from the 16th and 18th centuries to make them accessible to researchers in the humanities. This necessitates to : · identify which excerpts of the text belong to a recipe; · supervise an annotation campaign to build an analysis and training corpus · build NLP tools to extract automatically the list of elements (raw material, tools, quantity, units) and actions (verb, adverb, adjective) that made up the recipes; · analyze the dependencies between the elements of a recipe rules ; · Represent these rules in a formal knowledge representation.
The results of this processing will support : · The documentation of this unique set of text, by inserting the extracted elements to the document meta data to easy retrieval · The building a knowledge base of alchemical recipes This PhD will need to address several challenges. One of them is to be able to process text composed of multiple non-modern languages (French, German, English, Latin, Greek) [Coavoux2022,Grobol2022] . One approach we will be to study how large multilingual pre-trained models [Delvin2019, Xue2020] can be leveraged and adapted for the task and how disparate collection of corpora of ancient texts can be used to fine-tune them. Another challenge will be the paucity of data for the downstream tasks (segmentation, parsing, Natural Language Understanding [Desot2022]) for this we will need to identify other related corpus (e.g. cooking) to address the problem in a multitask setting (such as NER and NLU) and transfer learning.
SKILLS · Master 2 in Natural Language Processing, computer science or data science. · Good mastering of Python programming and deep learning frameworks. · Previous experience in text classification, parsing, processing of several languages or text retrieval would be a plus · Very good communication skills in English and good command of French
SCIENTIFIC ENVIRONMENT
The thesis will be conducted within the STEAMER and GETALP teams of the LIG laboratory (http://steamer.imag.fr/%C2%A0and%C2%A0https://lig-getalp.imag.fr/).The GETALP team has strong expertise and track record in Natural Language Processing, STEAMER team has strong expertise in Knowkledge representation and reasoning.The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment. The means to carry out the PhD will be provided both in terms of missions in France and abroad and in terms of equipment (personal computer, access to the LIG GPU servers). The PhD candidate will collaborate with the partners involved in the PATRIMALP project, in particular with Laurence Rivière from the LUHCIE lab (Laboratoire Universitaire Histoire Cultures Italie Europe) and Véronique Adam from the LITT&ARTS lab (Littératures et Arts).
INSTRUCTIONS FOR APPLYING
Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Danielle Ziebelin (Danielle.Ziebelin@univ-grenoble-alpes.fr), François Portet (francois.Portet@imag.fr) Maximin Coavoux (Maximin.Coavoux@univ-grenoble-alpes.fr)
REFERENCES
[Coavoux2022] Maximin Coavoux, Corinne Denoyelle, Olivier Kraif, Julie Sorba. Phraséologie du roman médiéval en prose. Diachro X – le français en diachronie, Sorbonne Université, May 2022, Paris, France [Delvin2019] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL. [Desot 2022] Desot, T., Portet, F., & Vacher, M. (2022). End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting. Computer Speech & Language, 75, 101369. [Grobol2022] Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez, Benoît Sagot, Laurent Romary and Benoit Crabbé BERTrade: Using Contextual Embeddings to Parse Old French. 13th International Conference on Language Resources and Evaluation (LREC 2022), May 2022, Marseille, France [Xue2020] Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., ... & Raffel, C. (2020). mT5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
PhD in NLP - PATRIMALP Materials, Pigments, Lights: the colors of Heritage – Natural Language Processing for cultural heritage
Starting date: October 01, 2022 (flexible) Application deadline: September 5th, 2022 Interviews (tentative): September 12th, 2022 Salary: 1 975 € gross/month (social security included) Mission: research oriented (teaching possible but not mandatory)
Keywords: natural language processing, knowledge representation, cultural heritage, transfer learning, multilingualism
CONTEXT
The main challenge of the Patrimalp project is the development of an integrated and interdisciplinary Heritage Science, in order to ensure cultural Heritage sustainability, promotion and dissemination in contemporary society. The ambition is to produce the forms of intelligibility of a global and moving process which starts from the collection of the raw material, its transformation into a primitive object, different lives as a material (alterations, degradations, transformations ...) and as a symbol (relegation, disinterest, oblivion or rebirth, exaltation...) throughout history, and finally from its election as an object of historical and Heritage value and its “promotion” into a work of art. This research is applied to understand how inks and pigments have been conceived over several centuries, how it has been used in art work as well as how the handcrafting method has evolved and been disseminated over centuries and countries.
To make this study possible, the project will gather a large collection of textual material made up of alchemical works and collections of natural or artificial objects collected between the 16th and 18th centuries. To better understand the choice of colors for these "wonders", we want to reconstruct the recipes for making colored material in its context of thought, whether technical or symbolic. These recipes will constitute a new body of research for literary people and a new data-study case for building knowledge about color. This corpus indeed offers modes of representation inscribed in complex forms of writing and fiction whose modalities and frames of reference remain to be analyzed (accounts of technical, medical or physico-chemical experiments inscribed in fictional worlds or mythological, symbolic descriptions of artifacts, or materials collected in nature, mines). On the linguistic level, the inventory of this lexicon in different European and Eastern languages will lead to the formalization of the knowledge of these various skills over time and several cultures. This corpus will thus provide complex data on the material and symbolic origin of the ingredients of color, on its use, its names and its physical or symbolic perception: these data represent a challenge for computer researchers who will have to organize them in order to benefit curators, chemists or physicists, in ontologies representing the state of knowledge from the point of view of scholars over the ages. To systematically explore the corpus of these recipes, we will use NLP techniques to uncover the correlations between recipes, physical and chemical composition of objects and symbolic references. The final objective is to build a knowledge base (objects, components of objects, materials, colors, know-how, reference framework) each of the parts being able to reference a specific ontology (ontology of pigments, materials, colors...) to make it possible for researchers to observe the trajectory from the writing of color to its technical and artisan practice from this specific corpus.
PHD OBJECTIVES
The PhD project will focus on segmenting, extracting and representing recipes from a corpus of alchemical works from the 16th and 18th centuries to make them accessible to researchers in the humanities. This necessitates to : · identify which excerpts of the text belong to a recipe; · supervise an annotation campaign to build an analysis and training corpus · build NLP tools to extract automatically the list of elements (raw material, tools, quantity, units) and actions (verb, adverb, adjective) that made up the recipes; · analyze the dependencies between the elements of a recipe rules ; · Represent these rules in a formal knowledge representation.
The results of this processing will support : · The documentation of this unique set of text, by inserting the extracted elements to the document meta data to easy retrieval · The building a knowledge base of alchemical recipes This PhD will need to address several challenges. One of them is to be able to process text composed of multiple non-modern languages (French, German, English, Latin, Greek) [Coavoux2022,Grobol2022] . One approach we will be to study how large multilingual pre-trained models [Delvin2019, Xue2020] can be leveraged and adapted for the task and how disparate collection of corpora of ancient texts can be used to fine-tune them. Another challenge will be the paucity of data for the downstream tasks (segmentation, parsing, Natural Language Understanding [Desot2022]) for this we will need to identify other related corpus (e.g. cooking) to address the problem in a multitask setting (such as NER and NLU) and transfer learning.
SKILLS · Master 2 in Natural Language Processing, computer science or data science. · Good mastering of Python programming and deep learning frameworks. · Previous experience in text classification, parsing, processing of several languages or text retrieval would be a plus · Very good communication skills in English and good command of French
SCIENTIFIC ENVIRONMENT
The thesis will be conducted within the STEAMER and GETALP teams of the LIG laboratory (http://steamer.imag.fr/%C2%A0and%C2%A0https://lig-getalp.imag.fr/).The GETALP team has strong expertise and track record in Natural Language Processing, STEAMER team has strong expertise in Knowkledge representation and reasoning.The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment. The means to carry out the PhD will be provided both in terms of missions in France and abroad and in terms of equipment (personal computer, access to the LIG GPU servers). The PhD candidate will collaborate with the partners involved in the PATRIMALP project, in particular with Laurence Rivière from the LUHCIE lab (Laboratoire Universitaire Histoire Cultures Italie Europe) and Véronique Adam from the LITT&ARTS lab (Littératures et Arts).
INSTRUCTIONS FOR APPLYING
Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Danielle Ziebelin (Danielle.Ziebelin@univ-grenoble-alpes.fr), François Portet (francois.Portet@imag.fr) Maximin Coavoux (Maximin.Coavoux@univ-grenoble-alpes.fr)
REFERENCES
[Coavoux2022] Maximin Coavoux, Corinne Denoyelle, Olivier Kraif, Julie Sorba. Phraséologie du roman médiéval en prose. Diachro X – le français en diachronie, Sorbonne Université, May 2022, Paris, France [Delvin2019] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL. [Desot 2022] Desot, T., Portet, F., & Vacher, M. (2022). End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting. Computer Speech & Language, 75, 101369. [Grobol2022] Loïc Grobol, Mathilde Regnault, Pedro Ortiz Suarez, Benoît Sagot, Laurent Romary and Benoit Crabbé BERTrade: Using Contextual Embeddings to Parse Old French. 13th International Conference on Language Resources and Evaluation (LREC 2022), May 2022, Marseille, France [Xue2020] Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., ... & Raffel, C. (2020). mT5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
PhD in ML/NLP – Fairness and self-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 19th, 2022
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, fairness, bias, self-supervised learning,evaluation metrics
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies) will start on November 1st 2022. Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.
*PROJECT OBJECTIVES*
Speech technologies are widely used in our daily life and are expanding the scope of our action, with decision-making systems, including in critical areas such as health or legal aspects. In these societal applications, the question of the use of these tools raises the issue of the possible discrimination of people according to criteria for which societyrequires equal treatment, such as gender, origin, religion or disability... Recently, the machine learning community has been confronted with the need to work on the possible biases of algorithms, and many works have shown that the search for the best performance is not the only goal to pursue [1]. For instance, recent evaluations of ASR systems have shown that performances can vary according to the gender but these variations depend both on data used for learning and on models [2]. Therefore such systems are increasingly scrutinized for being biased while trustworthy speech technologies definitely represents a crucial expectation.
Both the question of bias and the concept of fairness have now become important aspects of AI, and we now have to find the right threshold between accuracy and the measure of fairness. Unfortunately, these notions of fairness and bias are challenging to define and their meanings can greatly differ [3].
The goals of this PhD position are threefold:
- First make a survey on the many definitions of robustness, fairness
and bias with the aim of coming up with definitions and metrics fit for speech SSL models
Then gather speech datasets with high amount of well-described metadata
Setup an evaluation protocol for SSL models and analyzing the results.
*SKILLS*
Master 2 in Natural Language Processing, Speech Processing, computer science or data science.
Good mastering of Python programming and deep learning framework.
Previous experience in bias in machine learning would be a plus
Very good communication skills in English
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The PhD position will be co-supervised by Alexandre Allauzen (Dauphine Université PSL, Paris) and Solange Rossato and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, two other PhD positions are open in this project. The students, along with the partners will closely collaborate. For instance, specific SSL models along with evaluation criteria will be developed by the other PhD students. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and Dauphine Université PSL. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Alexandre Allauzen (_alexandre.allauzen@espci.psl.eu_ mailto:mickael.rouvier@univ-avignon.fr), Solange Rossato(Solange.Rossato@imag.fr) and François Portet (_francois.Portet@imag.fr_ mailto:francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.
*REFERENCES:*
[1] Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J. & Tuennerman, E. “I don’t Think These Devices are Very Culturally Sensitive.”—Impact of Automated Speech Recognition Errors on African Americans. Frontiers in Artificial Intelligence 4. issn: 2624-8212. _https://www.frontiersin.org/article/10.3389/frai.2021.725911_ https://www.frontiersin.org/article/10.3389/frai.2021.725911(2021).
[2] Garnerin, M., Rossato, S. & Besacier, L. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech inProceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021), 86–92. [3] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACMComput. Surv. 54. issn: 0360-0300. _https://doi.org/10.1145/3457607_ https://doi.org/10.1145/3457607(July 2021).
François PORTET Professeur - Univ Grenoble Alpes Laboratoire d'Informatique de Grenoble - Équipe GETALP Bâtiment IMAG - Office 333 700 avenue Centrale Domaine Universitaire - 38401 St Martin d'Hères FRANCE
Phone: +33 (0)4 57 42 15 44 Email:francois.portet@imag.fr www:http://membres-liglab.imag.fr/portet/
PhD in ML/NLP – Fairness and self-supervised learning for speech processing Starting date: October 1st, 2023 (flexible) Application deadline: June 9th, 2023 Interviews (tentative): June 14th, 2023
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, fairness, bias, self-supervised learning,evaluation metrics
*CONTEXT*
This thesis is in the context of the ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies). Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.
*PROJECT OBJECTIVES*
Speech technologies are widely used in our daily life and are expanding the scope of our action, with decision-making systems, including in critical areas such as health or legal aspects. In these societal applications, the question of the use of these tools raises the issue of the possible discrimination of people according to criteria for which societyrequires equal treatment, such as gender, origin, religion or disability... Recently, the machine learning community has been confronted with the need to work on the possible biases of algorithms, and many works have shown that the search for the best performance is not the only goal to pursue [1]. For instance, recent evaluations of ASR systems have shown that performances can vary according to the gender but these variations depend both on data used for learning and on models [2]. Therefore such systems are increasingly scrutinized for being biased while trustworthy speech technologies definitely represents a crucial expectation.
Both the question of bias and the concept of fairness have now become important aspects of AI, and we now have to find the right threshold between accuracy and the measure of fairness. Unfortunately, these notions of fairness and bias are challenging to define and their meanings can greatly differ [3].
The goals of this PhD position are threefold:
- First make a survey on the many definitions of robustness, fairness and bias with the aim of coming up with definitions and metrics fit for speech SSL models
- Then gather speech datasets with high amount of well-described metadata
- Setup an evaluation protocol for SSL models and analyzing the results.
*SKILLS*
*
Master 2 in Natural Language Processing, Speech Processing, computer science or data science.
*
Good mastering of Python programming and deep learning framework.
*
Previous experience in bias in machine learning would be a plus
*
Very good communication skills in English
*
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The PhD position will be co-supervised by Alexandre Allauzen (Dauphine Université PSL, Paris) and Solange Rossato and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, two other PhD positions are open in this project. The students, along with the partners will closely collaborate. For instance, specific SSL models along with evaluation criteria will be developed by the other PhD students. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and Dauphine Université PSL. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Alexandre Allauzen (_alexandre.allauzen@espci.psl.eu_ mailto:mickael.rouvier@univ-avignon.fr), Solange Rossato(Solange.Rossato@imag.fr) and François Portet (_francois.Portet@imag.fr_ mailto:francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.
*REFERENCES:*
[1] Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J. & Tuennerman, E. “I don’t Think These Devices are Very Culturally Sensitive.”—Impact of Automated Speech Recognition Errors on African Americans. Frontiers in Artificial Intelligence 4. issn: 2624-8212. _https://www.frontiersin.org/article/10.3389/frai.2021.725911_ https://www.frontiersin.org/article/10.3389/frai.2021.725911(2021).
[2] Garnerin, M., Rossato, S. & Besacier, L. Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech inProceedings of the 3rd Workshop on Gender Bias in Natural Language Processing (2021), 86–92. [3] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACMComput. Surv. 54. issn: 0360-0300. _https://doi.org/10.1145/3457607_ https://doi.org/10.1145/3457607(July 2021).
PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 19th, 2022
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, natural language processing, self-supervised learning, knowledge informed learning, Robustness, fairness
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies) will start on November 1st 2022. Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks.
*PROJECT OBJECTIVES*
Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an impressive impact on downstream tasks performance. This is mainly due to their ability to benefit from a large amount of data at the cost of a tremendous carbon footprint rather than improving the efficiency of the learning. Another question related to SSL models is their unpredictable results once applied to realistic scenarios which exhibit their lack of robustness. Furthermore, as for any pre-trained models applied in society, it isimportant to be able to measure the bias of such models since they can augment social unfairness.
The goals of this PhD position are threefold:
to design new evaluation metrics for SSL of speech models ;
to develop knowledge-driven SSL algorithms ;
to propose methods for learning robust and unbiased representations.
SSL models are evaluated with downstream task-dependent metrics e.g., word error rate for speech recognition. This couple the evaluation of the universality of SSL representations to a potentially biased and costly fine-tuning that also hides the efficiencyinformation related to the pre-training cost. In practice, we will seek to measure the training efficiency as the ratio between the amount of data, computation and memory needed to observe a certain gain in terms of performance on a metric of interest i.e.,downstream dependent or not. The first step will be to document standard markers that can be used as robust measurements to assess these values robustly at training time. Potential candidates are, for instance, floating point operations for computational intensity, number of neural parameters coupled with precision for storage, online measurement of memory consumption for training and cumulative input sequence length for data.
Most state-of-the-art SSL models for speech rely onmasked prediction e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such prevalence in the literature is mostly linked to the size, amount of data and computational resources injected by thecompany producing these models. In fact, vanilla masking approaches and contrastive losses may be identified as uninformed solutions as they do not benefit from in-domain expertise. For instance, it has been demonstrated that blindly masking frames in theinput signal i.e. HuBERT and WavLM results in much worse downstream performance than applying unsupervised phonetic boundaries [Yue2021] to generate informed masks. Recently some studies have demonstrated the superiority of an informed multitask learning strategy carefully selecting self-supervised pretext-tasks with respect to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive learning loss [Zaiem2022]. In this PhD project, our objective is: 1. continue to develop knowledge-driven SSL algorithms reaching higher efficiency ratios and results at the convergence, data consumption and downstream performance levels; and 2. scale these novel approaches to a point enabling the comparison with current state-of-the-art systems and therefore motivating a paradigm change in SSL for the wider speech community.
Despite remarkable performance on academic benchmarks, SSL powered technologies e.g. speech and speaker recognition, speech synthesis and many others may exhibit highly unpredictable results once applied to realistic scenarios. This can translate into a global accuracy drop due to a lack of robustness to adversarial acoustic conditions, or biased and discriminatory behaviors with respect to different pools of end users. Documenting and facilitating the control of such aspects prior to the deployment of SSL models into the real-life is necessary for the industrial market. To evaluate such aspects, within the project, we will create novel robustness regularization and debasing techniques along two axes: 1. debasing and regularizing speech representations at the SSL level; 2. debasing and regularizing downstream-adapted models (e.g. using a pre-trained model).
To ensure the creation of fair and robust SSL pre-trained models, we propose to act both at the optimization and data levels following some of our previous work on adversarial protected attribute disentanglement and the NLP literature on data sampling and augmentation [Noé2021]. Here, we wish to extend this technique to more complex SSL architectures and more realistic conditions by increasing the disentanglement complexity i.e. the sex attribute studied in [Noé2021] is particularly discriminatory. Then, and to benefit from the expert knowledge induced by the scope of the task of interest, we will build on a recent introduction of task-dependent counterfactual equal odds criteria [Sari2021] to minimize the downstream performance gap observed in between different individuals of certain protected attributes and to maximize the overall accuracy. Following this multi-objective optimization scheme, we will then inject further identified constraints as inspired by previous NLP work [Zhao2017]. Intuitively, constraints are injected so the predictions are calibrated towards a desired distribution i.e. unbiased.
*SKILLS*
Master 2 in Natural Language Processing, Speech Processing, computer science or data science.
Good mastering of Python programming and deep learning framework.
Previous in Self-Supervised Learning, acoustic modeling or ASR would be a plus
Very good communication skills in English
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The thesis will be conducted within the Getalp teams of the LIG laboratory (_https://lig-getalp.imag.fr/_ https://lig-getalp.imag.fr/) and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team and the LIA have a strong expertise and track record in Natural Language Processing and speech processing. The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment.
The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and LIA. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments.
The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon) and Benjamin Lecouteux and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore, the project will involve one of the founders of SpeechBrain, Titouan Parcollet with whom the candidate will interact closely.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Mickael Rouvier (_mickael.rouvier@univ-avignon.fr_ mailto:mickael.rouvier@univ-avignon.fr), Benjamin Lecouteux(benjamin.lecouteux@univ-grenoble-alpes.fr) and François Portet (_francois.Portet@imag.fr_ mailto:francois.Portet@imag.fr). We celebrate diversity and are committed to creating an inclusive environment for all employees.
*REFERENCES:*
[Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T., Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation in Proc. Interspeech 2021 (2021), 1902–1906.
[Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3515–3525 (2021)
[Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech Representation Learning in Proc. Interspeech 2021 (2021), 746–750.
[Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection for Multitask Self-Supervised Speech Representation in AAAI, The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing, 2023 (2022).
[Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), 2979–2989.