June 2023 - Corpora - ELRA lists

3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France)
by Pascal Denis 13 May '25

13 May '25

Hello, Could you please distribute the following job offer? Thanks. Best, Pascal ------------------------------------------------------------------------------------- 3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France) We invite applications for a 3-year PhD position at the University of Lille in the context of the recently funded research project "COMANCHE" (Computational Models of Lexical Meaning and Change). The position is funded by Inria, the French national research institute in Computer Science and Applied Mathematics. COMANCHE proposes to transfer and adapt neural word embeddings algorithms to model the acquisition and evolution of word meaning, by comparing them with linguistic theories on language acquisition and language evolution. At the intersection between Natural Language Processing, psycholinguistics and historical linguistics, this project intends to validate or revise some of these theories, while also developing computational models that are less data hungry and computationally intensive as they exploit new inductive biases inspired by these disciplines. The first strand of the project, on which the successful candidate will work, focuses on the development of computational models of semantic memory and its acquisition. Two main research directions will be pursued. On the one hand, we will compare the structural properties associated to different semantic spaces derived from word embedding algorithms to those found in human semantic memory as reflected in behavioral data (such as typicality norms) as well as brain imaging data. The latter data will then used as additional supervision to inject more hierarchical structure into the learned semantic spaces. One the other hand, we intend to experiment with training regimes for word embedding algorithms that are closer to those of humans when they acquire language, controlling the quantity as well as the linguistic complexity of the inputs fed to the learning algorithms through the use of longitudinal and child directed speech corpora (e.g., CHILDES, Colaje). In both cases, both English and French data will be considered. The successful candidate holds a Master's degree in computational linguistics or computer science or cognitive science and has prior experience in word embedding models. Furthermore, the candidate will provide strong programming skills, expertise in machine learning approaches and is eager to work across languages. The position is affiliated with the MAGNET team at Inria, Lille [1] as well as with the SCALAB group at University of Lille [2] in an effort to strenghten collaborations between these two groups, and ultimately foster cross-fertilizations between Natural Language Processing and Psycholinguistics. Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English or French, should include a brief cover letter with research interests and vision, a CV (including your contact address, work experience, publications), and contact information for at least 2 referees. Applications (and questions) should be sent to Angèle Brunellière (angele.brunelliere(a)univ-lille.fr) and Pascal Denis (pascal.denis(a)inria.fr). The starting date of the position is 1 October 2022 or soon thereafter, for a total of 3 full years. Best regards, Angèle Brunellière and Pascal Denis [1] https://team.inria.fr/magnet/ [2] https://scalab.univ-lille.fr/ -- Pascal ---- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. ---- +++++++++++++++++++++++++++++++++++++++++++++++ Pascal Denis Equipe MAGNET, INRIA Lille Nord Europe Bâtiment B, Avenue Heloïse Parc scientifique de la Haute Borne 59650 Villeneuve d'Ascq Tel: ++33 3 59 35 87 24 Url: http://researchers.lille.inria.fr/~pdenis/ +++++++++++++++++++++++++++++++++++++++++++++++

1 2

NLP4CALL 2023 Final call for papers
by David Alfter 22 Aug '24

22 Aug '24

== 12th NLP4CALL, Tórshavn, Faroe Islands== The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other. The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools. The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field. We welcome papers: - that describe research directly aimed at ICALL; - that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning; - that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback; - that discuss challenges and/or research agenda for ICALL - that describe empirical studies on language learner data. This year a special focus is given to work done on error detection/correction and feedback generation. We encourage paper presentations and software demonstrations describing the above- mentioned themes primarily, but not exclusively, for the Nordic languages. ==Shared task== NEW for this year is the MultiGED shared task on token-level error detection for L2 Czech, English, German, Italian and Swedish, organized by the Computational SLA working group. For more information, please see the Shared Task website: https://github.com/spraakbanken/multiged-2023 ==Invited speakers== This year, we have the pleasure to announce two invited talks. The first talk is given by Marije Michel from the University of Amsterdam. The second talk is given by Pierre Lison from the Norwegian Computing Center. ==Submission information== Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references. We will be using the NLP4CALL template for the workshop this year. The author kit can be accessed here, alternatively on Overleaf: <https://spraakbanken.gu.se/sites/default/files/2023/NLP4CALL%20workshop%20t…> <https://spraakbanken.gu.se/sites/default/files/2023/nlp4call%20template.doc> <https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…> Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2023>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments. Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>). ==Important dates== 03 April 2023: paper submission deadline 21 April 2023: notification of acceptance 01 May 2023: camera-ready papers for publication 22 May 2023: workshop date ==Organizers== David Alfter (1), Elena Volodina (2), Thomas François (3), Arne Jönsson (4), Evelina Rennes (4) (1) Gothenburg Research Infrastructure for Digital Humanities, Department of Literature, History of Ideas, and Religion, University of Gothenburg, Sweden (2) Språkbanken, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden (3) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium (4) Department of Computer and Information Science, Linköping University, Sweden ==Contact== For any questions, please contact David Alfter, david.alfter(a)gu.se For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…> Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>

2 6

3-year PhD position in Automatic Argumentation Mining in French Legal Decisions (Inria Lille, University of Lille, and LexisNexis France)
by Pascal Denis 10 Nov '23

10 Nov '23

Hi there, Could you please distribute the following job offer? Thanks. Best, Pascal ------------------------------------------------------------------------------------- We invite applications for a 3-year PhD position co-funded by Inria, the French national research institute in Computer Science and Applied Mathematics, and LexisNexis France, leader of legal information in France and subsidiary of the RELX Group. The overall objective of this project is to develop an automated system for detecting argumentation structures in French legal decisions, using recent machine learning-based approaches (i.e. deep learning approaches). In the general case, these structures take the form of a directed labeled graph, whose nodes are the elements of the text (propositions or groups of propositions, not necessarily contiguous) which serve as components of the argument, and edges are relations that signal the argumentative connection between them (e.g., support, offensive). By revealing the argumentation structure behind legal decisions, such a system will provide a crucial milestone towards their detailed understanding, their use by legal professionals, and above all contributes to greater transparency of justice. The main challenges and milestones of this project start with the creation and release of a large-scale dataset of French legal decisions annotated with argumentation structures. To minimize the manual annotation effort, we will resort to semi-supervised and transfer learning techniques to leverage existing argument mining corpora, such as the European Court of Human Rights (ECHR) corpus, as well as annotations already started by LexisNexis. Another promising research direction, which is likely to improve over state-of-the-art approaches, is to better model the dependencies between the different sub-tasks (argument span detection, argument typing, etc.) instead of learning these tasks independently. A third research avenue is to find innovative ways to inject the domain knowledge (in particular the rich legal ontology developed by LexisNexis) to enrich enrich the representations used in these models. Finally, we would like to take advantage of other discourse structures, such as coreference and rhetorical relations, conceived as auxiliary tasks in a multi-tasking architecture. The successful candidate holds a Master's degree in computational linguistics, natural language processing, machine learning, ideally with prior experience in legal document processing and discourse processing. Furthermore, the candidate will provide strong programming skills, expertise in machine learning approaches and is eager to work at the interplay between academia and industry. The position is affiliated with the MAGNET [1], a research group at Inria, Lille, which has expertise in Machine Learning and Natural Language Processing, in particular Discourse Processing. The PhD student will also work in close collaboration with the R&D team at LexisNexis France, who will provide their expertise in the legal domain and the data they have collected. Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English or French, should include a brief cover letter with research interests and vision, a CV (including your contact address, work experience, publications), and contact information for at least 2 referees. Applications (and questions) should be sent to Pascal Denis (pascal.denis(a)inria.fr). The starting date of the position is 1 November 2022 or soon thereafter, for a total of 3 full years. Best regards, Pascal Denis [1] https://team.inria.fr/magnet/ [2] https://www.lexisnexis.fr/ -- Pascal ---- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. ---- +++++++++++++++++++++++++++++++++++++++++++++++ Pascal Denis Equipe MAGNET, INRIA Lille Nord Europe Bâtiment B, Avenue Heloïse Parc scientifique de la Haute Borne 59650 Villeneuve d'Ascq Tel: ++33 3 59 35 87 24 Url: http://researchers.lille.inria.fr/~pdenis/ +++++++++++++++++++++++++++++++++++++++++++++++

1 3

Core metadata scheme for learner corpora - feedback needed!
by Magali Paquot 30 Oct '23

30 Oct '23

Dear colleagues, Last month, we shared the result of our collaborative work on a core metadata scheme for learner corpora with LCR2022 participants. Our proposal builds on Granger and Paquot (2017)'s first attempt to design such a scheme and during our presentation, we explained the rationale for expanding on the initial proposal and discussed selected aspects of the revised scheme. Our proposal is available at https://docs.google.com/spreadsheets/d/1-RbX5iUCUtCBkZU9Rfk-kv-Vzc--F-eUW2O…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…> We firmly believe that our efforts to develop a core metadata scheme for learner corpora will only be successful to the extent that (1) the LCR community is given the opportunity to engage with our work in various ways (provide feedback on the general structure of the scheme, the list of variables that we identified as core and their operationalization; test the metadata on other learner corpora; use the scheme to start a new corpus compilation, etc.) and (2) the core metadata scheme is the result of truly collaborative work. As mentioned at LCR2022, we will be collecting feedback on the metadata scheme until the end of October. The online feedback form is available at: https://docs.google.com/document/d/1NeDUuxGJlPSJI9wHVA1xgGM-aV8jXTa8Qlb45K-…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…> We'd like to thank all the colleagues who already got back to us (at LCR2022, by email or via the online form). We also thank them for their appreciation and enthusiasm for our work! We'd also like to encourage more colleagues (and particularly those of you who have experience in learner corpus compilation) to provide feedback! We need help in finalizing the core metadata scheme to make sure that it can be applied in all learner compilation contexts. In short, we need you to make sure the scheme meets the needs of the LCR community at large. With very best wishes, Magali Paquot (also on behalf of Alexander König, Jennifer-Carmen Frey, and Egon W. Stemle) Reference Granger, S. & M. Paquot (2017). Towards standardization of metadata for L2 corpora. Invited talk at the CLARIN workshop on Interoperability of Second Language Resources and Tools, 6-8 December 2017, University of Gothenburg, Sweden. Dr. Magali Paquot Centre for English Corpus Linguistics Institut Langage et Communication UCLouvain https://perso.uclouvain.be/magali.paquot/

1 1

2nd CFP: 1st Workshop on Readability for Low Resourced Languages (RLRL 2023)
by El-Haj, Mo 04 Sep '23

04 Sep '23

2nd Call for Abstracts: 1st Workshop on Readability for Low Resourced Languages (RLRL 2023) Free registration is now open https://bit.ly/3pwUwlG - a few tickets are still available. Please join us for an exciting online workshop where experts in natural language processing will come together to discuss the latest research and innovative approaches to assessing the readability of low-resource languages. The workshop will take place as a free online event on September 5, 2023, and is being hosted jointly by Lancaster University, Sheffield Hallam University and King Saud University. We welcome researchers and practitioners to submit presentation abstract proposals of up to 500 words for talks related to the development of a Readability Framework for low-resource languages. The ultimate goal of the workshop is to discuss best practices and state-of-the-art AI-based approaches to create mathematical representations of expected readability levels at different school grade or cognitive ability levels. The workshop will also focus on utilising classifiers that are intuitive for humans to understand and adjust, enabling the analysis and improvement of the decision-making criteria. We welcome abstracts on work that is still in progress or that does not yet have conclusive results. We encourage authors to share their work at various stages of development to facilitate discussions and collaboration during the workshop. Important Dates: - Due date for workshop abstract submission: August 1, 2023 (extended) - Notification of abstract acceptance to authors: August 10, 2023 - Workshop date: September 5, 2023 (online event<https://bit.ly/3pwUwlG>) Keynote speakers: - Professor Laurence Anthony - Faculty of Science and Engineering at Waseda University, Japan. - Dr Violetta Cavalli-Sforza - School of Science and Engineering at Al Akhawayn University, Morocco. - Professor Hend Al-Khalifa - College of Computer and Information Sciences at King Saud University, KSA - Dr Abdel-Karim Al Tamimi- Computer Science and Software Engineering at Sheffield Hallam University, UK - Dr Mo El-Haj - School of Computing and Communications at Lancaster University, UK For list of speakers, talks' titles and abstract please visit the workshop's website: https://wp.lancs.ac.uk/acc/rlrl2023/ The main objectives of the workshop are three-fold: 1- Increase awareness of the importance of readability in low-resource languages and its impact on language learning and literacy. 2- Discuss the challenges of readability in low-resource languages, such as limited resources and lack of standardization, and brainstorm strategies for addressing these challenges. 3- Foster a community of practice among participants, allowing them to share their experiences and best practices for addressing readability issues in low-resource languages. Abstract submission: Abstract submission page is now open, please submit abstracts of no more than 500 words https://easychair.org/conferences/?conf=rlrl2023 Alternatively, you can contact the organisers directly with presentation ideas on topics related to readability or low resourced languages. Topics of interest include, but are not limited to: - Machine learning for text readability - Applications of readability assessment - Readability in low-resource languages - Comprehensibility measures - Mathematical representations of readability levels - Text simplification for low-resource languages - Readability and comprehensibility in language learning - The effects of text simplification on readability - Readability frameworks for indigenous languages - Updating readability representations We look forward to your contributions and to a productive and enlightening workshop on September 5, 2023. RLRL 2023 Organisers: - Dr Mo El-Haj (SCC/DSI/UCREL, Lancaster University) - Dr Abdel-Karim Al Tamimi (CSSE, Sheffield Hallam University) - Prof. Hend Al Khalifa (iWAN, King Saud University) https://wp.lancs.ac.uk/acc/rlrl2023/ Best wishes, Mahmoud --------------------- Dr Mo El-Haj Senior Lecturer in NLP Co-Director of UCREL NLP Group Strategic Lead of Arabic and Financial NLP Research Advisory Board of the Natural Language Processing Journal https://benjamins.com/catalog/nlp School of Computing and Communications, Lancaster University https://www.lancaster.ac.uk/staff/elhaj @DocElhaj<https://twitter.com/DocElhaj>

2 3

Job : Postdoc (12 months), NLP and gender stereotypes in the French media, Universite Grenoble Alpes, France
by François Portet 25 Aug '23

25 Aug '23

Call for postdoc applications in Natural Language Processing for the automatic detection of gender stereotypes in the French media (Grenoble Alps University, France) Starting date: flexible, November 30, 2023, at the latest Duration: full-time position for 12 months Salary: according to experience (up to 4142€/ month) Application Deadline: Open until filled Location: The position will be based in Grenoble, France. This is not a remote work. Keywords: natural language processing, gender stereotypes bias, corpus analysis, language models, transfer learning, deep learning *Context* The University of Grenoble Alps (UGA) has an open position for a highly motivated postdoc researcher to joint the multidisciplinary GenderedNews project. Natural Language Processing models trained on large amount of on-line content, have quickly opened new perspectives to process on-line large amount of on-line content for measuring gender bias in a daily basis (see our project https://gendered-news.imag.fr/ <https://gendered-news.imag.fr/> ). Regarding research on stereotypes, most recent works have studied Language Models (LM) from a stereotype perspective by providing specific corpora such as StereoSet (Nadeem et al., 2020) or CrowS-Pairs (Nangia et al. 2020). However, these studies are focusing on the quantifying of bias in the LM predictions rather than bias in the original data (Choenni et al., 2021). Furthermore, most of these studies ignore named entities (Deshpande et al., 2022) which account for an important part of the referents and speakers in news. In this project, we intend to build corpora, methods and NLP tools to qualify the differences between the language used to describe groups of people in French news. *Main Tasks* The successful postdoc will be responsible for day-to-day running of the research project, under the supervision of François Portet (Prof UGA at LIG) and Gilles Bastin (prof UGA at PACTE). Regular meetings will take place every two weeks. - Defining the dimensions of stereotypes to be investigated and the possible metrics that can be processed from a machine learning perspective. - Exploring, managing and curating news corpora in French for stereotypes investigation, with a view to making them widely available to the community to favor reproducible research and comparison. - Studying and developing new computational models to process large number of texts to reveal stereotype bias in news. Make use of pretrained models for the task. - Evaluate the methods on curated focused corpus and apply it to the unseen real longitudinal corpus and analyze the results with the team. - Preparing articles for submission to peer-reviewed conferences and journals. - Organizing progress meetings and liaising between members of the team. The hired person will interact with PhD students, interns and researchers being part of the GenderedNews project. According to his/her background his/her own interests and in accordance with the project's objective, the hired person will have the possibility to orient the research in different directions. *Scientific Environment* The recruited person will be hosted within the GETALP teams of the LIG laboratory (https://lig-getalp.imag.fr/ <https://lig-getalp.imag.fr/>), which offers a dynamic, international, and stimulating environment for conducting high-level multidisciplinary research. The person will have access to large datasets of French news, GPU servers, to support for missions as well as to the scientific activities of the labs. The team is housed in a modern building (IMAG) located in a 175-hectare landscaped campus that was ranked as the eighth most beautiful campus in Europe by the Times Higher Education magazine in 2018. The person will also closely work with Gilles Bastin (PACTE, a Sociology lab in Grenoble) and Ange Richard (PhD at LIG and PACTE). The project also includes an informal collaboration with "Prenons la une" (https://prenonslaune.fr/ <https://prenonslaune.fr/>) a journalists’ association which promotes a fair representation of women in the media. *Requirements* The candidate must have a PhD degree in Natural Language Processing or computer science or in the process of acquiring it. The successful candidate should have - Good knowledge of Natural Language Processing - Experience in corpus collection/formatting and manipulation. - Good programming skills in Python - Publication record in a close field of research - Willing to work in multidisciplinary and international teams - Good communication skills - Good mastering of French is required *Instructions for applying* Applications will be considered on the fly and must be addressed to François Portet (Francois.Portet(a)imag.fr <mailto:Francois.Portet@imag.fr>). It is therefore advisable to apply as soon as possible. The application file should contain - Curriculum vitae - References for potential letter(s) of recommendation - One-page summary of research background and interests for the position - Publications demonstrating expertise in the aforementioned areas - Pre-defense reports and defense minutes; or summary of the thesis with the date of defense for those currently in doctoral studies *References* Deshpande et al. (2022). StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes. arXiv preprint arXiv:2205.14036. Choenni et al. (2021). Stepmothers are mean and academics are pretentious: What do pretrained language models learn about you? arXiv preprint arXiv:2109.10052. Nadeem et al. (2020) StereoSet: Measuring stereotypical bias in pretrained language models. ArXiv. Nangia et al. (2020) CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models. In EMNLP2020. -- François PORTET Professeur - Univ Grenoble Alpes Laboratoire d'Informatique de Grenoble - Équipe GETALP Bâtiment IMAG - Office 333 700 avenue Centrale Domaine Universitaire - 38401 St Martin d'Hères FRANCE Phone: +33 (0)4 57 42 15 44 Email:francois.portet@imag.fr www:http://membres-liglab.imag.fr/portet/

1 1

Post doc Position in Corpus Linguistics or NLP at TurkuNLP
by Veronika Laippala 24 Jul '23

24 Jul '23

Dear colleagues, Our research group TurkuNLP at the University of Turku, Finland, has an opening for *a post doc position in corpus linguistics or NLP.*. The position is part of the research project "Massively Multilingual Modeling of Registers in Web-Scale Data," (MMMReg) which is funded by the Academy of Finland. The project aims to explore language use in the digital world at a massively multilingual scale using neural networks. The specific focus of the project is on web registers, such as news, blogs, and how-to pages. The primary goals of the project are to analyze the linguistic characteristics of web registers across languages and to develop machine learning methods for modeling registers in large web datasets at a massively multilingual scale. The position is for one year, starting on September 1, 2023. The closing date for the applications is August 7, 2023 (UTC+3) For more information on the position, please visit https://www.utu.fi/en/university/come-work-with-us/open-vacancies Do not hesitate to get in touch if you have any questions! Best regards, Veronika Laippala TurkuNLP, University of Turku, Finland

1 1

Counting multiple long (9+) n-grams in corpora: request for approaches
by beauchampd＠uni.coventry.ac.uk 21 Jul '23

21 Jul '23

Hi all, I'm doing analysis on a corpus on tweets from institutions. Regarding analysis of n-grams, it is quite unusual in that there are many repeated exact tweets, or very similar tweets, leading to long super strings of often 9 or 10 or more words together. Naturally this makes accurate counting and classifying difficult due to the overlapping substrings. Does anyone know of any approaches or software which can count and classify n-grams in such circumstances? I am aware of approaches outlined by Buerki (2017) and O'Donnell (2011), but these do not seem practical due to the excessive length of the n-grams in the corpus. Does anyone know of any accessible methods or packages? Any input much appreciated.

7 8

Call For Participation - PLABA @ TAC 2023
by Ondov, Brian (NIH/NLM/LHC) [F] 20 Jul '23

20 Jul '23

We are pleased to announce the inaugural offering of the Plain Language Adaptation of Biomedical Abstracts (PLABA) track, as part of the 2023 Text Analysis Conference (TAC) hosted by the U.S. National Institute of Standards and Technology (NIST). This track is an opportunity to showcase your cutting-edge research on an important topic, and to take advantage of large amounts of expert annotated data and manual evaluation. Background: Deficits of Health Literacy are linked to worse outcomes and drive health disparities. Though unprecedented amounts of biomedical knowledge are available online, patients and caregivers face a type of “language barrier” when confronted with jargon and academic writing. Advances in language modeling have improved plain language generation, but the task of automatically and accurately adapting biomedical text for a general audience has thus far lacked high-quality, standardized benchmarks. Task: Systems will adapt biomedical abstracts to plain language. This includes substituting medical jargon, providing explanations for necessary terms, simplifying sentences, and other modifications. The training set is the publicly available PLABA dataset<https://doi.org/10.1038%2Fs41597-022-01920-3>, which contains 750 abstracts with manual, sentence-aligned adaptations for each, totaling more than 7k sentence pairs with document context. Evaluation: Participating systems will be evaluated on 400 held out abstracts, manually adapted four-fold by different annotators for robust automatic metrics. Additionally, a subset of system output will be manually evaluated along several axes to ensure they are accurate and faithful to the original, which is crucial for the biomedical domain. URL: https://bionlp.nlm.nih.gov/plaba2023/ Mailing list: https://groups.google.com/g/plaba2023 Key dates: Jul 19 – Evaluation data released Aug 16 – Submissions due Oct 18 – Results posted We look forward to your submissions.

1 1

Six PhD students and one postdoc: Neurosymbolic Models of Language, Vision, and Action - Saarbrücken, Germany
by Alexander Koller 16 Jul '23

16 Jul '23

The Research Training Group 2853 “Neuroexplicit Models of Language, Vision, and Action” is looking for 6 PhD students and 1 postdoc October 2023 or later Neuroexplicit models combine neural and human-interpretable (“explicit”) models in order to overcome the limitations that each model class has separately. They include neurosymbolic models, which combine neural and symbolic models, but also e.g. combinations of neural and physics-based models. In the RTG, we will improve the state of the art in natural language processing (“Language”), computer vision (“Vision”), and planning and reinforcement learning (“Action”) through the use of neuroexplicit models and investigate the cross-cutting design principles of effective neuroexplicit models (“Foundations”). The RTG is scheduled to grow to a total of 24 PhD students and one postdoc by 2025. Through the inclusion of ~20 further PhD students and postdocs funded from other sources, it will be one of the largest research centers on neuroexplicit or neurosymbolic models in the world. The RTG brings together researchers at Saarland University, the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, the CISPA Helmholtz Center for Information Security, and the German Research Center for Artificial Intelligence (DFKI). All of these institutions are colocated on the same campus in Saarbrücken, Germany. The positions are funded as follows: • PhD students will be funded for up to four years at the TV-L E13 100% pay scale. You should have or be about to complete an MSc degree in computer science or a related field and have demonstrated expertise in one of the research areas of the RTG, e.g. through an excellent Master’s thesis or relevant publications. • The postdoc will initially be funded for three years, with the possibility of extension up to five years, at the TV-L E13 100% pay scale. As the RTG postdoc, you will pursue your own research agenda in the field of neuroexplicit models and work with the PhD students to identify and pursue opportunities for collaborative research. You should have or be about to complete a PhD in computer science or a related field and have demonstrated your expertise in one or more of the RTG’s research areas through publications in top venues. The RTG is part of the Saarland Informatics Campus, one of the leading centers for research in computer science, artificial intelligence, and natural language processing in Europe. The Saarland Informatics Campus brings together 900 researchers and 2500 students from 81 countries. The CISPA Helmholtz Center, located on the same campus, is home to an additional 350 researchers and on track to grow to 800 by 2026. Researchers at SIC and CISPA are part of the ELLIS network and have been awarded more than 35 ERC grants. Each PhD student in the RTG will be jointly supervised by two PhD advisors from the list of Principal Investigators below. Each student will freely define their own research topic; we encourage the choice of topics that cross the traditional boundaries of research fields. Students may be affiliated with Saarland University or with one of the participating institutes. Vera Demberg, Saarland University - Computational Linguistics Jörg Hoffmann, Saarland University - AI Planning Eddy Ilg, Saarland University - Computer Vision, Machine Learning Dietrich Klakow, Saarland University - Natural Language Processing Alexander Koller, Saarland University - Computational Linguistics Bernt Schiele, MPI for Informatics - Computer Vision, Machine Learning Philipp Slusallek, DFKI and Saarland University - Computer Graphics, Artificial Intelligence Christian Theobalt, MPI for Informatics - Visual Computing, Machine Learning Mariya Toneva, MPI for Software Systems - Computational Neuroscience, Machine Learning Isabel Valera, Saarland University - Machine Learning Jilles Vreeken, CISPA - Machine Learning, Causality Joachim Weickert, Saarland University - Mathematical Data Analysis Verena Wolf, DFKI and Saarland University - Modeling and Simulation, Reinforcement Learning Ellie Pavlick, Brown University and Google AI, will join us regularly as a Mercator Fellow. Please send your application by 31 May 2023 to bewerbung(a)uni-saarland.de. Include the reference number W2298 for the postdoc position and the reference number W2299 for the PhD positions. We aim to conduct job interviews in July (for a start in October) and September (for a later start). The legally binding version of this job ad is at https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaf… (postdoc) and https://www.uni-saarland.de/fileadmin/upload/verwaltung/stellen/Wissenschaf… (PhD), respectively. For details on what materials to submit with your application and all other information about the RTG, please see our website: https://www.neuroexplicit.org/jobs/#phd-2023

2 1

2026

2025

2024

2023

2022

Corpora June 2023