Project ANR SHERBET : Stemmatology for the HEbRew BiblE Transmission - Artificial Intelligence to understand the transmission of the Hebrew Bible
1st September 2024-31th August 2026
Description Before the appearance of the printing press, the only way of reproducing and spreading a text in written form was manual copying. During this process, accidents, errors and intentional modifications occurred, progressively modifying the text of each witness. The revised text, whether modified deliberately or accidentally, then served as a template for other copyists and the changes would thereby be propagated. For the philologist interested in the reconstruction of text history and the text’s genealogical relations (similar to a genealogical tree, called stemma codicum), it has been imperative to study these different variants and suggest methods for the objective construction of such trees (called stemmatology methods). Retrieving the genealogical lineage of the Hebrew manuscripts has been one of the major focuses of the laboratoire Écritures and the MSH at the University of Lorraine. In this project, we suggest to improve the manual work performed in the critical editions of the Hebrew Bible by applying the latest advances in applied mathematics and natural language processing to reconstruct the stemmas of the Hebrew manuscripts. This project takes place as a partnership between the centers of research MSH Lorraine (UL), Écriture (UL), LORIA (UL), LJK (UGA) and IECL (UL). In this context, we are looking for a two years fellow for a post-doctoral position, to fulfill the objective of building the genealogical lineage of the Hebrew Bible through computational stemmatology algorithms.
Postdoc’s responsabilities Over the course of the project, the fellow will be asked to lead and innovate to complete the following objectives:
Automatic Variant tagging for ancient language The candidate will have to design, train and test a Deep Learning model to automatically tag scribal variants between manuscripts. The model should will be trained on the different variants and their subsequent classification, as designed by the philology experts (orthographic, lexical, grammatical, etc.). The model will then be able to automatically suggest a variant classification given two different strings. While the main focus of the project is Hebrew, extension to Greek would be a possible supplement to the project.
Textual embedding of ancient languages A major challenge of the project is the computation of a semantic-based distances between Hebrew words, in order to define the proximity between two variants, accounting for their meaning. The candidate will have to work on textual embeddings and textual representation of the Hebrew words using Neural Networks.
Textual generation of Hebrew texts using adversarial Deep Learning models Current approaches within the project rely on probabilistic models to generate mock textual traditions to be used as ground truth, that resemble the variants observed on real traditions. Statistics describing scribal behavior are then fed into the model, that then rely on Markov chains to generate the corresponding tradition. One of the objectives of the project is to rely on Deep Learning models for this generation of mock traditions, by using generative adversarial networks. The networks should be able to generate new traditions representative of scribal behavior.
Provide Open-Source results To ensure a reception as wide as possible for the project and to strive towards the goal of making science open to all, the candidate is expected to provide all the software developed over the course of the project as an Open-Source software, respecting all the quality constraints of modern software development. The generated datasets should also be made available to the public. All results will be published in high-impact journals and conferences. Required skills
Mathematical and computer science skills The candidate must have a PhD in computer science and/or applied mathematics (artificial intelligence, natural language processing...). An experience in Deep Learning, especially applied to Natural Language Processing or modelization of complex systems is required.
Technical skills The candidate should be very familiar with the Python ecosystem for Deep Learning, data manipulation and analysis: pandas, sklearn, tensorflow/ Keras/pytorch. The candidate should have previous experience in the development of Open-Source software and a good knowledge of current development standards, to ensure that the project reaches as many scholars as possible: CI/CD pipelines, containerization, automated deployments. They will also have to interact daily with REST API and SQL databases. A good understanding of XML TEI and collation tools would be a plus.
Humanities skill Knowledge of Classical Greek and Ancient Hebrew. Knowledge and interest in textual criticism, philology and biblical studies would be a plus.
The candidate is expected to have a good level in English. Knowledge of French would be a plus.
Terms and tenure This two-years position will be based at the Loria, Campus Scientifique, BP 239 54506, Vandoeuvre-lès- Nancy & MSH Lorraine, Ile du Saulcy, 57000 Metz. The duration can not exceed 24 months. The target start date for the position is 1st September 2024, with some flexibility on the exact start date.
How to apply Applicants are requested to submit the following materials: • A cover letter explaining their motivation for the position. • Full Curriculum Vitae and list of publications. • Academic transcripts (unofficial versions are fine) Deadline for application is June 17th 2024. All documents must be sent to frederique.rey@univ-lorraine.fr
Job Location Nancy-Metz, Lorraine, France
---------------------- Maxime Amblard Université de Lorraine https://members.loria.fr/mamblard http://espoir-ul.fr
Si vous lisez ce message en dehors de vos heures de travail, merci de ne le traiter qu’en cas d’urgence avérée.