On behalf of Dr. Elliot Crowley from the School of Engineering at the University of Edinburgh (queries at elliot.crowley@ed.ac.uk):
Application link: Affordable Training of Large Language Modelshttps://www.eng.ed.ac.uk/studying/postgraduate/research/phd/affordable-training-large-language-modelshttps://www.eng.ed.ac.uk/studying/postgraduate/research/phd/affordable-training-large-language-models
Recent developments in large language models (LLMs) have caught the attention of the public. LLMs such as OpenAI's GPT-4 and Google's Bard are able to generate remarkably realistic, coherent text based on a user's input and have the potential to be general-purpose tools used throughout society e.g. for customer service, summarising text, answering questions, writing contracts or translating between languages.
However, LLMs are prohibitively expensive to train. GPT-3 (which is significantly smaller than its successor, GPT-4) has an estimated training time of 355-GPU years and an estimated training cost of $4.6M [1]. Only large, wealthy institutions can train these models and thereby control how they are trained and who gets access to them. This is undemocratic.
Very recent work provides hope however. In [2] the authors explore the promising idea of “cramming”: the training of a LLM on a single GPU in a day. In [3] the authors use synthetic data to train “small” language models that can produce consistent stories at little cost. There is a huge discrepancy in quality between these models and their expensive counterparts, however.
In this PhD, the student will investigate affordable LLM training i.e. with limited compute and/or data, inspired by [2,3]. Avenues of research could include (i) generating training data that facilitates fast training e.g. through dataset distillation [4]; (ii) exploring neural architecture search to develop models that are "aware" of being resource-constrained while being trained; (iii) developing novel cost-effective training algorithms, (iv) leveraging and tuning open-source LLMs.
The successful student will have opportunities for collaboration within and outside Edinburgh’s School of Engineering e.g. with colleagues in the Institute for Digital Communicationshttps://www.eng.ed.ac.uk/research/institutes/idcom/, The Bayesian and Neural Systems Grouphttps://www.bayeswatch.com/, and Edinburgh NLPhttps://edinburghnlp.inf.ed.ac.uk/.
[1] https://lambdalabs.com/blog/demystifying-gpt-3
[2] https://arxiv.org/abs/2212.14034
[3] https://arxiv.org/abs/2305.07759
[4] https://arxiv.org/abs/1811.10959
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.