The Natural Language Processing Section at the Department of Computer Science at University of Copenhagen is advertising a 17 month position for a Postdoctoral Researcher in Natural Language Processing.
The position involves interdisciplinary research on the development and evaluation of large language models for low-resourced langauges, primarily focusing on Danish and other Nordic languages. The research will focus on post-training methods for these low-resourced languages, for example, by investigating the role of synthetic data, among other data augmentation techniques, and the role of in-context learning in modelling low-resource languages. Applicants are also expected to contribute to work on data pre-processing, tokenization, model training. The candidate will be affiliated to the project group Danish Foundation Models, (https://www.foundationmodels.dk/) which is a collaboration between several major universities in Denmark addressing the aforementioned initiative, including Aarhus University, Southern Danish University, the Alexandra Institute, and the University of Copenhagen. The candidate will be co-supervised from both the Centre for Language Technology and the Department of Computer Science and will be physically located in both places during their employment.
The successful candidate will join the Language and Multimodal Processing group, which is part of a section with a strong, international, and diverse environment for research within core as well as emerging topics in natural language processing, natural language understanding, computational linguistics, and multi-modal language processing. It is housed within the main Science Campus, which is centrally located in Copenhagen. Further information about the group is available here: https://lampgroup.github.io/, and further information about the Department is available here: https://di.ku.dk/english/.
The application deadline is August 15, 2025, with interviews on September 4, 2025, and a start date of 15 October, 2025, or as soon as possible thereafter. Further information about the position can be found here: https://employment.ku.dk/faculty/?show=164490.
Inquiries about the position can be made at de(a)di.ku.dk. Interested candidates can also reach out to schedule an informal discussion during the ACL 2025 conference in Vienna.
Dear list members,
I'm delighted to announce an exciting new publication in the Cambridge Elements in Corpus Linguistics series. The title is "Automatic Image Tagging for Corpus Linguistics: a multimodal study of news representations of Islam", and the authors are Paul Baker, Hanna Schmük and Yufang Qian.
The Element offers a test of Vertex AI for automatically tagging images, using analyses of the written text, the images, and the interaction between them. It offers a critical evaluation of the software used in this context, and also acts as a practical guide for researchers in this area.
The Element is available as an Open Access publication here:
Automatic Image Tagging for Corpus Linguistics<https://www.cambridge.org/core/elements/automatic-image-tagging-for-corpus-…>
Susan Hunston
The Hansen Foundation is offering a
doctoral scholarship in
(Computational) History / Cultural Anthropology / Ecology / Geography
at the University of Passau as soon as possible.
With your relevant work, we are looking for correlations and possibly interactions between cultural factors on the one hand and landscape factors on the other in the region of the entire Šumava (Bavarian Forest). The area under investigation is not only attractive because of its nature and proximity to Passau, but is also particularly interesting for the research question, as the Šumava is a region that was and is characterized by (historical and current) political borders, but geologically represents a uniform mountain landscape.
The scholarship is embedded in the project “Regional Collectives at the End of the Weimar Republic” of the Passau Chairs of German Linguistics and Computational Humanities. For the first time, we are systematically processing the materials of the Atlas of German Folklore (1930-1935) for Bavaria - the largest humanities research project ever undertaken in Germany. The atlas used questionnaires to document people's everyday culture at the end of the Weimar Republic. For Bavaria alone, there are a total of 450,000 data records for 1,820 places, which are available to you for your doctoral project.
You will enrol as a doctoral student at the University of Passau and be supervised by Professor Dr Malte Rehbein (Computational Humanities), who will serve as your primary advisor. At the same time, you will receive non-personalised support from the Hansen Foundation and its subordinate Research Centre for the Study of Collectives at the University of Regensburg.
The project is embedded in a broader research context that includes collaboration with another scholarship holder, the research initiatives of the Chairs of Computational Humanities and Linguistics, the Computational Historical Ecology research programme, and the Passau Methodikum. It is designed to transcend traditional disciplinary boundaries and encourage a shift in perspective. Depending on your academic background, you will bring a focus from ecological, historical, cultural studies, or historical-geographical perspectives.
A central aspect of the project is the joint consideration of research subjects as machine-readable data. The dissertation will also contribute to the development of a “Big Data source criticism,” which adapts and extends traditional historical methodology to large-scale datasets.
Requirements include a successfully completed university degree with a focus in history, folklore/empirical cultural anthropology, geography, ecology, digital humanities, or a related field. You should have a strong interest in topics such as environmental, everyday, or social history, cultural or geoanthropology, or historical ecology. As our work is data-driven, you will need relevant skills in the analysis of research data — for example, through quantitative methods, databases, geographic information systems (GIS), or text and data mining. You will be closely integrated into the Passau-based research groups throughout your project and will have the opportunity to acquire and develop any necessary skills there. A good reading proficiency in German is required.
The financial support provided by the foundation amounts to €1,400 per month (tax-free). The foundation permits up to ten hours of secondary employment per week. If the necessary qualifications and funding are in place, this may take the form of work on another research project within the Chair of Computational Humanities. The scholarship is limited to two years. We will actively support the search for follow-up funding to complete the doctorate in good time. Residency in Passau is desirable and beneficial, but not a strict requirement.
Curious? Interested?
Then please submit an academic CV including transcripts, along with a motivation letter outlining your interest in the project and in pursuing further academic qualifications (in a single PDF). If your application sparks our curiosity and interest, we will invite you to an interview.
Please send your application directly to malte.rehbein(a)uni-passau.de by 25 July 2025. If you have any questions, feel free to contact Malte Rehbein directly.
Link: https://che.hypotheses.org/883
----
Dr. phil. Thomas Nikolaus Haider
Computational Humanities and Multilingual Computational Linguistics
University of Passau
Call for Papers: CASE 2025 @ RANLP (8. Challenges and Applications of Automated Extraction of Socio-political Events from Texts)
Dear Colleagues,
We are pleased to announce the 8th edition of the Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, held in conjunction with RANLP 2025 (https://ranlp.org/ranlp2025/)!
CASE is a leading venue for research, resources, and practical advances in automated event extraction and analysis, focusing on social and political event data. It has been organized consistently in top venues like ACL, EMNLP, EACL, etc.
We invite submissions of research papers, resource papers, and position papers addressing (but not limited to) the following topics:
• Event extraction at the sentence, document, or cross-document level, including event coreference.
• Creation and annotation of datasets for event extraction.
• Modeling event-event relations such as subevents, causal, temporal, and spatial links.
• Evaluation of event datasets: reliability, validity, and coverage.
• Event schemas and ontologies: population, definition, and enrichment.
• Tools, pipelines, and infrastructure for event annotation and analysis.
• Linguistic aspects of event representation: lexical, syntactic, semantic, discursive, and pragmatic.
• Applications of event data in conflict prediction, early warning, and policy support.
• Detection of new event types, including protests, public health crises, and cyber activism.
• Bias, fairness, and misinformation in event extraction systems and datasets.
• Legal, ethical, and privacy considerations in dataset creation and dissemination.
• Cross-lingual, multilingual, and multimodal event extraction.
• Use of LLMs and generative AI for event extraction, analysis, and dataset generation.
• Release of new benchmarks, datasets, or annotation resources.
All accepted papers will be published in the ACL Anthology.
Website: https://emw.ku.edu.tr/case-2025/ (being updated! please get in touch with ahurriyetoglu(a)ku.edu.tr for any questions)
Link for submission: https://softconf.com/ranlp25/CASE2025/user/
Important dates:
Submission Deadline: 25 July 2025
Notification: August 17, 2025
Camera-ready deadline: August 30, 2025
Workshop date: September 11-13, 2025
Shared task
Multimodal detection of hate speech, humor, and stance in LGBTQ+ socio-political discourse
To know more and participate, please visit: https://github.com/therealthapa/case2025-multimodal/blob/main/README.md
All shared task papers will also be published in the ACL anthology.
Organizers: Surendrabikram Thapa, Siddhant Bikram Shah, Shuvam Shiwakoti, Kritesh Rauniyar, Surabhi Adhikari, Kristy Johnson, Ali Hürriyetoğlu, Hristo Tanev, Usman Naseem
Organizing committee:
Ali Hürriyetoglu
Hristo Tanev
Surendrabikram Thapa
Vanni Zavarella
Erdem Yörük
*LAW-XIX 2025 - Call for Participation*
The 19th edition of the Linguistic Annotation Workshop
(https://sigann.github.io/LAW-XIX-2025) will be held on the 31th of July
2025 in Vienna, Austria, co-located with ACL 2025.
*Workshop description*
The Linguistic Annotation Workshop (LAW) is the annual workshop of the
ACL and ELRA Special Interest Group on Annotation (SIGANN), and it
provides a forum for the presentation and discussion of innovative
research on all aspects of linguistic annotation, including the creation
and evaluation of annotation schemes, methods for automatic and manual
annotation, use and evaluation of annotation software and frameworks,
representation of linguistic data and annotations, semi-supervised
“human in the loop” methods of annotation, crowd-sourcing approaches,
and more. As in the past, the LAW will provide a forum for annotation
researchers to work towards standardization, best practices, and
interoperability of annotation information and software.
*
Keynote speakers*
Rotem Dror, University of Haifa
Junyi Jessy Li, University of Texas at Austin
For more details, see: https://sigann.github.io/LAW-XIX-2025/invited.html
*Workshop program:* https://sigann.github.io/LAW-XIX-2025/program.html
*
Registration details:* https://2025.aclweb.org/registration
The LAW-XIX organisers
Ines Rehbein & Logan Siyao Peng
--
Ines Rehbein
Data and Web Science Group
University of Mannheim, Germany
---------------------------------------------------------------------------------------------------------------------------
--- Doctoral and post-doctoral positions - AI4DH - University of
Ljubljana, Slovenia
---------------------------------------------------------------------------------------------------------------------------
The University of Ljubljana has *3 open positions (2 PhD and 1 Postdoc)*
in artificial intelligence for digital humanities (AI4DH) in the context
of the *European centre of excellence in AI4DH*, recently founded and
supported by Horizon Europe.
At the heart of the vibrant European capital city of Ljubljana, close to
both the Alps and the Mediterranean, you will be part of an AI research
group working, dedicated to AI research that can be applied to DH.
You will benefit from *competitive salaries* and *top-notch
infrastructure*, based in the faculty of computer and information
science, yet in an *interdisciplinary context*.
Full details are available here:
- *Posdoctoral position, for 2 years* with possible extension:
https://euraxess.ec.europa.eu/jobs/356445
- *Doctoral positions, for up to 4 years*:
https://euraxess.ec.europa.eu/jobs/356450
*Important dates:
* - NEW *information session* on Thursday 10 July 10am CEST (see
registration details on Euraxess)
- Application *deadline: 24 July,* CEST (Ljubljana time)
🚀 Call for Participation: DISRPT 2025 Shared Task on Discourse Relation Parsing and Treebanking.
🛎️ training data has been released and the submission is now open!
https://softconf.com/emnlp2025/disrpt2025/
In conjunction with CODI-CRAC & EMNLP 2025 - Suzhou, China, Nov. 5-9.
This year, we are organizing the fourth edition of the DISRPT shared task on discourse processing across formalisms, for a variety of languages and genres, with three subtasks:
* Task 1: Discourse segmentation
* Task 2: Connective identification
* Task 3: Relation classification
We will provide training, development and test datasets from (almost) all available languages in RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies, using a uniform format. Because different corpora, languages, and frameworks use different guidelines, the shared task will promote the design of flexible methods for dealing with various guidelines, and will help to push forward the discussion of converging standards for discourse units. We will evaluate segmentation and connective detection in two different scenarios: with and without gold syntax. An automatically parsed version is provided for all corpora without a gold parse.
This year, the shared task will feature:
* The inclusion of more frameworks, with datasets from: RST / eRST, SDRT, PDTB, ISO 24617, and discourse dependencies * The inclusion of new corpora and new languages, some of them kept a surprise! * A unified set of labels for the discourse relations, to make easier the evaluation across datasets * A new constraint: only one multilingual model should be submitted per task, and it should be small (4B parameters max)! This will make our replication work easier, but more importantly, it will simplify using such a model and test the robustness of your solution.
We’re excited to announce the release of the training data for the DISRPT 2025 Shared Task! You can now access the data, format documentation, and tools on our GitHub 🔗 https://github.com/disrpt/sharedtask2025
The data covers five discourse frameworks — RST / eRST, PDTB, SDRT, and Discourse Dependencies — across 14 languages: Basque, Chinese, Czech, Dutch, English, Farsi, French, German, Italian, Portuguese, Russian, Spanish, Thai and Turkish Thai.
We invite researchers and teams interested in participating to register now. Registered participants will be added to our mailing list and receive all future updates.
📅 The full testing data will be released on July 14, 2025 — stay tuned!
To join the mailing list and stay informed, please email us at:
📧 disrpt_chairs(a)googlegroups.com
Let us know you're interested — we’d love to have you on board!
**Important dates**
* May 16 2025 – Sample data release * June 17 2025 – Training data release [NOW] * July 14 2025 – Test data release * August 1 2025 – System + paper submissions due * September 12 2025 – Notification of acceptance * September 19 2025 – Camera ready papers * November 8-9 2025 – CODI at EMNLP
All deadlines are 11.59 pm UTC -12h (AoE, "Anywhere on Earth").
**Information:**
Contact the organizers: disrpt_chairs(a)googlegroups.com
Official website: https://sites.google.com/view/disrpt2025/
Google group for participants, please join us on: disrpt2025_participants(a)googlegroups.com
**Organization:**
Chloé Braud (CNRS - IRIT, University of Toulouse, France)
Chuyuan Li (University of British Columbia, Canada)
Janet Yang Liu (LMU Munich, Germany)
Philippe Muller (CNRS - University of Toulouse, France)
Amir Zeldes (Georgetown University, Washington DC, USA)
CODI CRAC 2025 Workshop: joint call for papers
November 5-9 2025 - EMNLP 25 - Suzhou, China
We are pleased to announce that we are organizing in 2025 the first joint CODI-CRAC workshop that will be held during EMNLP! More information on: https://sites.google.com/view/codi-crac2025/
Deadline for CODI CRAC papers: July 30 2025
We will host 2 shared tasks, the CRAC and the DISRPT shared tasks. More information on:
- CRAC shared task: https://ufal.mff.cuni.cz/corefud/crac25
- DISRPT shared task: https://sites.google.com/view/disrpt2025/ Aims and scope
The last few years have seen a dramatic improvement in the ability of NLP systems and Large Language Models to understand and produce words, sentences and in some cases longer texts. This development has created a renewed interest in discourse problems as researchers move towards the processing of long-form documents and conversations. There is a surge of activity in discourse pretraining tasks, coherence models, summarization for long texts and conversations, corpora for discourse level reading comprehension and formal parsing, as well as discourse related/aided representation learning, to name a few.
Discourse, roughly the interactions of context, form and meaning above the sentence level, is at the intersection of many areas in Computational Linguistics and NLP, since it is concerned with all levels of linguistic representation, allowing the modeling of textual coherence and inference leveraging long-distance links within documents.It thus brings together researchers working on different areas but facing similar issues with coherence and cohesion, document-level structure, long text and long context.
In 2025, we organize the first joint CODI-CRAC workshop. The CODI workshop has been a forum for a broad range of work at the discourse level. The CRAC workshop has been a primary venue for researchers interested in the computational modeling of reference, anaphora, and coreference. Together, these workshops have catalyzed work to advance research on discourse level problems and have served as a forum for the discussion of suitable datasets and reliable evaluation methods.
This joint edition corresponds to the 6th CODI workshop and the 8th CRAC workshop. It will welcome contributions from all the areas below, including state of the art textual NLU and NLG work using LLMs, as well as classic structured work on automatic discourse analysis -- corresponding to challenging tasks such as coreference resolution or discourse parsing -- to encourage interaction between communities. The workshop is set to host the fourth edition of the DISRPT shared task on Discourse Relation Parsing and Treebanking and the fourth edition of the CRAC shared task on Multilingual Coreference Resolution.
The workshop is planned as a 1 day event which brings together different subcommunities. It will feature invited talks and regular papers. We also accept papers accepted at other major conferences for non-archival presentation, including Findings papers.
Topics of interest
We welcome papers on symbolic and probabilistic approaches, corpus development and analysis, as well as machine and deep learning approaches to discourse. We appreciate theoretical contributions as well as practical applications, including demos of systems and tools. The goal of the workshop is to provide a forum for the community of NLP researchers working on all aspects of discourse.
Topics of interest include, but are not limited to:
- discourse structure
- discourse connectives
- discourse relations
- annotation tools and schemes for discourse phenomena
- corpora annotated with discourse phenomena
- discourse parsing
- cross-lingual discourse processing
- cross-domain discourse processing
- anaphora and coreference resolution
- event coreference
- argument mining
- coherence modeling
- discourse and semantics
- discourse in applications such as machine translation, summarization, etc.
- evaluation methodology for discourse processing
- discourse pretraining tasks
- long-text modeling and generation
Submissions
We solicit three categories of papers: regular (long and short) workshop papers, demos and extended abstracts. Only regular workshop papers and demos will be included in the proceedings as archival publications.
Double submission of papers is allowed, but this information will need to be disclosed at submission time.
Regular papers must describe original unpublished research. Long papers may consist of up to 8 pages of content, plus unlimited pages for references. Short papers can be up to 4 pages, plus unlimited pages for references. Demo submissions may describe systems, tools, visualizations, etc., and may consist of up to 4 pages, plus unlimited pages for references.
Each submission can contain unlimited pages for Appendices but the paper submissions need to remain fully self-contained, as these supplementary materials are completely optional, and reviewers are not even asked to review them.
Extended abstracts can describe work in progress. These may be two pages long (without references). Extended abstracts are non-archival. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings.Paper accepted or rejected at one of the main conferences
We also invite presentations of paper accepted at another main conference, a specific deadline and submission process will be communicated later on. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings.
We also fast-track ARR papers with reviews, with timeline TBA.
Submission website
All submissions must be anonymous and follow the EMNLP 2025 formatting instructions described here: https://aclrollingreview.org/cfp
Submission websites:
* CODI: https://softconf.com/emnlp2025/codi2025/ * DISRPT: https://softconf.com/emnlp2025/disrpt2025/ * CRAC: https://softconf.com/emnlp2025/crac2025/ Schedule
- July 30 2025: CODI CRAC papers due
- September 5 2025:Notification of acceptance
- September 19 2025:Camera ready deadline
- November 8-9 2025-:CODI-CRAC workshop
All deadlines are 11.59 pm UTC -12h ("anywhere on Earth").
Invited Speakers
- Tanya Goyal, Cornell University.
- Nancy F. Chen, Institute of Infocomm Research (I2R), A-STAR, Singapore
Organizers
- Chloé Braud, CNRS-IRIT
- Christian Hardmeier, IT University of Copenhagen
- Chuyuan (Lisa) Li, University of British Columbia
- Jessy Li, University of Texas, Austin
- Sharid Loáiciga, University of Gothenburg
- Vincent Ng, University of Texas at Dallas
- Michal Novák, Charles University, Prague
- Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences
- Massimo Poesio, Queen Mary University of London and University of Utrecht
- Sameer Pradhan, University of Pennsylvania and cemantix
- Michael Strube, Heidelberg Institute for Theoretical Studies
- Amir Zeldes, Georgetown University, Washington DC
To contact the organizers, please send an email to: codi-crac-workshop(a)googlegroups.com
We are seeking qualified applicants for a position as Language Data Scientist on the ERC Synergy grant ‘NILOMORPH: The evolution of suprasegmental morphology in West Nilotic’, led by Matthew Baerman. The successful candidate will perform a key role in managing, processing and analyzing language data generated across the multiple teams that make up the project. The position is based at the Surrey Morphology Group at the University of Surrey, in Guildford, UK, and provides the opportunity to work in the vibrant and highly collegial research environment for which the SMG is renowned.
The NILOMORPH project aims to reconstruct the morphological evolution of the West Nilotic languages, spoken primarily in South Sudan and neighboring countries. These languages have developed some of the most remarkable morphological systems on the planet, where simultaneous manipulation of multiple phonological features (vowel length, vowel height, tone, phonation type) results in enormous paradigms marked solely by the modulation of vowel properties. NILOMORPH combines fieldwork, experimental methods, and historical linguistics to account for the phonological, morphological and psycholinguistic pathways that led to this unique outcome. The project is spread across multiple teams, based in the UK, France, Germany and the USA, and will make use of a diverse range of language data taken from multiple sources: ongoing fieldwork, previous studies, text corpora and newly-generated reconstructions, as well as well as the results of computational simulations and artificial language learning experiments.
The successful candidate will develop and execute a set of data management and analysis tools will enable the diverse international team of researchers to create, access and manipulate the language data that is central to the NILOMORPH project. As a core member of the Surrey-based team, the successful candidate will provide expert guidance on methods and tools for data analysis, including data design, data management protocols and statistical analysis. They will collaborate closely in the writing and dissemination of papers and presentations, and be expected to take initiative in the formulation of the research agenda. They will also participate in the broader activities both of the NILOMORPH group and of the Surrey Morphology Group (SMG). The candidate will have opportunities at SMG to develop their research leadership profile while interacting with world-class researchers.
Suitable candidates with a background in linguistics (or linguistics-adjacent fields) are strongly encouraged to apply. Please refer to the full job description available at the application website for essential and desirable criteria.
The following documents will be required:
CV
Contact details for 2 academic/industry referees.
Three relevant works of research (DOI/URL links if possible).
Research statement of one page, describing research interests and experience, and how you think this makes you suitable for the advertised post.
A completed application form
The website for applications is https://jobs.surrey.ac.uk/vacancy.aspx?ref=031325. Interviews are planned for 02 September 2025 and will be held online. Please contact Matthew Baerman m.baerman(a)surrey.ac.uk<mailto:m.baerman@surrey.ac.uk> with any questions.
--
Dr. Sacha Beniamine (he/him)
It's pronounced [saʃa benjamin] (stress is irrelevant)
Leverhulme Early Career Fellow
Surrey Morphology Group, School of Literature and Languages
s.beniamine(a)surrey.ac.uk<mailto:s.beniamine@surrey.ac.uk> | smg.surrey.ac.uk<http://smg.surrey.ac.uk/>
[University of Surrey] <http://www.surrey.ac.uk/?utm_source=emailsignature&utm_medium=internal&utm_…>
Senate House, University of Surrey, Guildford, Surrey, GU2 7XH, UK