We welcome you to the next Natural Language Processing and Vision (NLPV) seminar at the University of Exeter.
Zoom scheduled: Thursday 5 March 2026 at 15:00 to 16:00, GMT Location: https://Universityofexeter.zoom.us/j/98687933020?pwd=si6Sb2yasZU2s8zw0hMI4n2... (Meeting ID: 986 8793 3020 Password: 667296)
Title: Privacy-Preserving Generation of Synthetic Clinical Narratives
Abstract: Training data is fundamental to the success of modern machine learning models, yet in high-stakes domains such as healthcare, the use of real-world training data is severely constrained by concerns over privacy leakage. A potential solution to this challenge is the use of differentially private (DP) synthetic data, which offers formal privacy guarantees while maintaining data utility. However, striking the right balance between privacy protection and utility remains challenging in clinical note synthesis, given its domain specificity and the complexity of long-form text generation. I will discuss the key issues in synthetic clinical text generation and a methodology to synthesise full-length clinical notes under strong DP constraints. The method structurally separates content and form, and generates section-wise note content conditioned on clinical profile of patients, with terms and notes privatised under separate DP constraints. To ensure quality, a DP quality maximiser enhances synthetic notes by selecting high-quality outputs. I will also introduce a validation framework, and demonstrate how the corpus generated by the proposed method aligns with real clinical notes (MIMIC).
Speaker's bio: Goran Nenadic is a Professor of Computer Science at the University of Manchester. His research focuses on making sense of large-scale free-text data by combining rule-based and data-intensive approaches. He mainly works in the healthcare domain, exploring clinical coding, temporal clinical information extraction and anonymisation of clinical free-text data. He currently leads a DARE UK project on federated generation of synthetic textual healthcare records. The project explores how synthetic clinical text can be generated and validated for safe use. Combining differential privacy with strong public and regulatory engagement, the project will test whether synthetic free-text data can meaningfully support research and federated learning while reducing privacy risks. Goran also leads the UK healthcare text analytics network (Healtex) and a DARE UK working group (Safetext) to develop national protocols for the responsible use of healthcare free-text data in AI development.
We will update future talks on the website: https://sites.google.com/view/neurocognit-lang-viz-group/seminars
Joining our *Google group* for future seminars and research information: https://groups.google.com/g/neurocognition-language-and-vision-processing-gr...