Job opportunities: Two positions (research data engineer and postdoc researcher) in NLP for historical documents – 3.5 years – EPFL, Switzerland - Corpora

27 Feb 2023


      ***Apologies for cross-posting***
In the context of the upcoming interdisciplinary project 'impresso - Media Monitoring of the Past II' ('impresso doppio''), the EPFL Digital Humanities Laboratory is looking for one postdoctoral researcher and one research data engineer who will work with us on the design, development and evaluation of large-scale text mining pipelines for multilingual historical newspaper and radio archives.
=> NLP Research Data Engineer: https://go.epfl.ch/impresso-nlp-job1
=> NLP Postdoctoral Researcher: https://go.epfl.ch/impresso-nlp-job2
FOR BOTH POSITIONS:
Application deadline: 21.04.2023.
Interviews: End of April.
Foreseen start of contract: 01.09.2023
Employment duration: 3.5 years (1-year contract renewable until the end of Feb 2027).
Employment rate: 100%.
Salary: according to EPFL salary scales and experience.
Place of work: EPFL DHLAB, Lausanne, Switzerland.
Contact: for any questions feel free to contact Maud Ehrmann (maud.ehrmann [at] epfl [dot] ch).
How to apply: please upload your application (full CV and cover letter) via the EPFL portal, cf. links above.
ABOUT THE PROJECT:
"impresso - Media Monitoring of the Past II" is an interdisciplinary research project which aims to pioneer new approaches to the joint exploration of newspaper and radio archive contents across time, languages, and national borders. Funded by the Swiss National Science Foundation and the Luxembourg National Research Fund (2023-2027), it is carried by the EPFL DHLABhttp://dhlab.epfl.ch/, the Department of Computational Linguisticshttp://www.cl.uzh.ch/de.html of the University of Zurich, the Centre for Contemporary and Digital Historyhttp://c2dh.uni.lu/ (C2DH) and the History Departmenthttps://www.unil.ch/hist/fr/home.html of the University of Lausanne, with the additional support of 21 European partners. Computational linguists, computer scientists, digital humanists, historians, and designers will work closely together to enrich and connect newspaper and radio sources through multiple layers of cutting-edge semantic enrichments represented in a shared multilingual vector space, and to design adequate, meaningful and transparent exploration capabilities for (data-driven) historical research in transnational and transmedia perspective. Impresso doppiohttps://data.snf.ch/grants/grant/213585 follows on from the first impressohttps://impresso-project.ch/ project which developed a scalable architecture for the processing of Swiss and Luxembourgish newspaper collections and created an interfacehttps://impresso-project.ch/app with powerful search, filter and discovery functionalities based on semantic enrichments. The present project puts forward the vision of a complete connection between media archives across languages and media types.
WE OFFER:
*   Opportunity to join an experienced and highly motivated interdisciplinary team conducting innovative and relevant research at the intersection of computer science and humanities research.
  *   Applied research framework: what you will develop will be deployed in production and directly used by a community of researchers.
  *   Work in an interdisciplinary team at the intersection of computer science, NLP, history, journalism and digital library.
  *   Flexible working hours and teleworking.
  *   Located in Lausanne, Switzerland, EPFL has a highly international environment, state-of-the-art research facilities, and is consistently ranked among the world's leading institutions in scientific research. Lausanne is a vibrant and cosmopolitan city centre in a unique natural environment with great outdoor activities (Jura, Alps, Lake Leman). Salaries and benefits are internationally competitive.
POSITION 1: NLP Research Data Engineer:
Apply online: https://go.epfl.ch/impresso-nlp-job1
Your mission:
The impresso project will compile an unprecedented transmedia and transnational corpus (historical newspaper and radio collections from 8 Western European countries) and develop a technical framework for its annotation, integration and exploitation. In this endeavour, you will lead the activities related to the management and engineering of the project data and system architecture. In collaboration with other project team members, you will contribute to the design and implementation of the technical framework.
Key responsibilities:
*   Design and implement scalable data pipelines to convert, cleanse, integrate and consolidate media archives. This includes defining appropriate data structures, models and formats for source documents and enrichments, as well as developing large-scale ingest workflows.
  *   Establish a sustainable system architecture and pipeline management, including unit and integration testing.
  *   Manage, document, and release code modules and datasets.
  *   Actively collaborate with C2DH and UZH teams on data modelling, formats and APIs.
  *   Engage in participative interface and API design with project team and partners.
  *   Contribute to the organisation of annotation and evaluation campaigns (e.g. in the vein of HIPEhttps://hipe-eval.github.io/).
  *   Contribute to the organisation of project workshops on the development and adoption of standards for the representation and exchange of historical data (raw material and annotations).
  *   Contribute to the definition of a roadmap towards the long-term maintenance and expansion of a rich ecosystem of tools, resources and services around historical media.
  *   Participate in other impresso work packages where your expertise is required and coordinate with project team members and partners.
  *   Initiate and/or contribute to scientific publications on data releases, processing and standards (and more topics if interested).
The work will be carried out in collaboration with the project team (ca. 12 people).
Your profile:
*   An experienced research data engineer (2-4 years) or NLP researcher/programmer with an interest in history, media and participatory design.
  *   A degree in computer science, natural language processing or a related field (master or PhD), or equivalent professional experience.
  *   Proficiency in: Python; Unix-based operating systems; database development and use (mysql and nosql); use of cloud storage and cloud computing (S3 object storage, Kubernetes); automation and scripting.
  *   Good understanding of machine learning.
  *   Willingness to write good documentation.
  *   Good communication skills.
  *   Strong collaborative and team spirit.
  *   Autonomous and accountable with a proactive approach.
  *   Efficient, committed to deadlines and concerned with production readiness.
  *   Fluency in English.
  *   Comfortable in an international and multi-cultural context.
Desirable
*   Experience working in a scientific and academic context.
  *   Knowledge of French or German is a plus.
  *   Interest in getting involved in supervising activities (MSc students).
  *   Interest in writing scientific papers (on data and infrastructure-related topics, or more if interested).
POSITION 2:  NLP Postdoctoral Researcher:
Apply online: https://go.epfl.ch/impresso-nlp-job2
Your mission:
You will conduct research in natural language processing and text mining on historical texts, with the aim of developing powerful information extraction methods on heterogeneous, multilingual and challenging radio transcripts and newspaper archives.
Key responsibilities:
*   Develop approaches to advanced named entity processing, quote extraction and segmentation and classification of media content.
  *   Contribute to semantic indexing integration of media archives.
  *   Contribute to the co-design of the interface and dedicated developments supporting the 4 historical use cases of the project.
  *   Contribute to the organisation of international evaluation shared tasks on historical document processing.
  *   Contribute to the organisation of project workshops on media mining, semantic indexing and processing pipelines.
  *   Participate in other impresso work packages where your expertise is required and coordinate with project team members and partners.
  *   Presentation of research results and participation in scientific and communication events.
  *   Assistance with project management and organisational tasks.
Your profile:
*   PhD (obtained or close to completion) in natural language processing, machine learning, computer science or related areas.
  *   Strong background in machine learning foundations and willingness to apply approaches to real and large-scale data.
  *   Experience in deep learning, language models, information extraction.
  *   Strong programming skills (Python, deep learning frameworks) and knowledge of Unix-based operating systems .
  *   Curious, creative and highly motivated about scientific research and the application of NLP to digitised cultural heritage collections.
  *   Very good communication, presentation, and writing skills in English.
  *   Comfortable in an international and multi-cultural context.
Desirable
*   Understanding of image processing is a plus.
  *   Experience of working with historical documents and in an interdisciplinary environment.
  *   Knowledge of French or German is a plus.
  *   Willingness to (co)-supervise student projects, internships and master theses.
Regards,
Maud Ehrmann