School of Computer Science and Digital Technologies, Aston University, UK, is offering two PhD positions in language and speech processing in the following two topics. The application deadline is 16th February 2024. Applications for the position can be submitted via Aston's PGR webpage (https://www.aston.ac.uk/graduate-school/how-to-apply/studentships). Enquiries about the positions can be made to Dr Tharindu Ranasinghe, School of Computer Science and Digital Technologies, Aston University, UK - t.ranasinghe@aston.ac.uk .
Building Trustworthy Automatic Speech Recognition Systems
Dr Tharindu Ranasinghehttps://research.aston.ac.uk/en/persons/tharindu-ranasinghe (School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr https://research.aston.ac.uk/en/persons/tharindu-ranasinghe Phil Weberhttps://research.aston.ac.uk/en/persons/phil-weber (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Prof Aniko Ekarthttps://research.aston.ac.uk/en/persons/aniko-ek%C3%A1rt (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr Muhidin Mohamedhttps://research.aston.ac.uk/en/persons/muhidin-mohamed (College of Business and Social Sciences - Operations & Information Management)
Project Summary, Aim and Objectives:
Automatic Speech Recognition (ASR) has gained popularity in the last decade thanks to advancements in speech and natural language processing, along with the availability of powerful hardware for processing extensive data streams. ASR is crucial in transcription services for various sectors, including legal, healthcare, and entertainment. It also plays a vital role in e-learning platforms, customer support systems, and enhancing accessibility for individuals with disabilities. Additionally, ASR significantly contributes to language translation, making it widely adopted across diverse sectors.
Although ASR has come a long way in recent years, it still has limitations, and the produced output is far from perfect. However, most commercial ASR systems do not explicitly state this to the user, leaving the user to assume that the output is accurate. Most large-scale ASR systems perform better for widely spoken languages, while low-resource languages have lower quality. ASR systems also struggle to handle different accents and dialects, especially of non-native speakers. Furthermore, most ASR systems are trained in the general domain and do not perform optimally in specific domains such as healthcare. These limitations result in wrong outputs, and the lack of transparency and accountability can lead to severe consequences, especially in critical domains such as healthcare or legal. Therefore, a quality indicator for ASR systems has become essential as they can play a significant role in informing the user about the output quality.
This PhD research aims to develop a comprehensive quality indicator system for ASR. The specific goals are (1) Investigate what makes ASR trustworthy (2) Evaluate ASR systems in challenging scenarios (3) Design quality indicator metrics in ASR (i.e. sentence level scores, word level error spans, critical errors, etc.) (4) Introduce public benchmarks and investigate novel approaches for predicting quality in ASR. The output of the PhD will contribute towards trustworthy ASR systems..
Knowledge and skills required in applicant:
Natural Language Processing, Speech Processing, Machine Learning and Deep Learning. The applicant should be familiar with Python and neural network framework(s) such as PyTorch and TensorFlow and should have excellent programming skills.
Evidence-based detection of misuse of large language models
Drhttps://research.aston.ac.uk/en/persons/tharindu-ranasinghe Phil Weberhttps://research.aston.ac.uk/en/persons/phil-weber (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr Tharindu Ranasinghehttps://research.aston.ac.uk/en/persons/tharindu-ranasinghe (School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr Muhidin Mohamedhttps://research.aston.ac.uk/en/persons/muhidin-mohamed (College of Business and Social Sciences - Operations & Information Management)
Dr Paul Gracehttps://research.aston.ac.uk/en/persons/paul-grace (Cyber Security Innovation Research Centre – CSI, School of Computer Science and Digital Technologies - School of Computer Science and Digital Technologies)
Project Summary, Aim and Objectives:
Large language models (LLMs) have become ubiquitous since the release of ChatGPT, bringing a paradigm shift in the processing and generation of text, images, speech and video. New methods for training very large neural models using massive unlabelled data created the opportunity for foundation models able to generate data with apparently human-like ability. Publicly available pre-trained models facilitate novel tools; Google Gemini, Microsoft Co-Pilot, Dall-E and many start-ups allow non-experts to conversationally instruct and use AI systems in everyday life, seamlessly employing complex technologies including automatic speech recognition, natural language processing, machine translation and image captioning.
New dangers accompany this rapid and unstructured step-change in technology. Beyond unease over energy use, environmental impact, and digital divides, many are concerned with the ease with which fake media increasingly difficult to distinguish from real media can be created. In education, plagiarism detection becomes more nuanced with the need to identify AI-generated text. In the justice domain, forensic determination of the source of a voice or face is obfuscated by the potential that it was artificially generated. Politicians worry about the impact on democracy of undetectable deepfakes, and cybersecurity experts about identity theft. The problems are exacerbated by the potential for LLM-generated data to be reused for training downstream models.
Scientifically well-founded methods for detecting and quantifying the risk of LLM-generated media are therefore urgently needed.
This project builds on established methods in forensic data analysis to develop rigorous methods for detecting AI-generated media. Specifically: 1) review existing approaches to detecting AI-generated and spoofed media, 2) build on methods for forensic voice comparison to develop and validate new approaches to forensic text comparison, 3) apply to detecting plagiarism and deep fakes, 4) extend to image data, 4) propose principles to contribute to broader questions of safe, fair and transparent use of LLMs.
Knowledge and skills required in applicant:
Strong programming skills, preferably in Python, including development of large language models. Knowledge of machine learning theory, applications, and related statistical and probability theory. Awareness of modern approaches to forensic data science.