*Call for Papers *
*
Slav-NLP: The10thWorkshoponNLP for Slavic languages
<http://bsnlp.cs.helsinki.fi/>
co-located with ACL 2025, Vienna, Austria
31 July or 1 August 2025
http://bsnlp.cs.helsinki.fi/ <http://bsnlp.cs.helsinki.fi/>
Submission Deadline: 27 April 2025
WORKSHOPDESCRIPTION
The 10th edition of the Slav-NLP Workshop at ACL 2025Sponsored by
SIGSLAV: The ACL Special Interest Group on Slavic NLP
Slavic languages play an important role due to their diverse cultural
heritage and wide use — over 400M speakers worldwide. Current political
and economic developments in Central/ Eastern Europe thrust the
Slavic-speaking societies — and their languages — into sharp focus,
especially in light of rapid technological advancements and expanding
consumer markets.
Research on theoretical and applied topics in the context of Slavic
languages is still lagging in the community. Linguistic phenomena that
are common to the Slavic languages — rich morphology, free word order,
etc. — make NLP for these languages a challenging task. The Slav-NLP
Workshop gathers researchers from academia and industry. It aims to
stimulate research in Slavic NLP, and foster the creation of tools and
resources. The Workshops provides a forum for exchange of ideas and
experience, discussing current challenges, and making the available
resources widely-known. The structural similarity, as well as the easily
recognizable core vocabulary and inflectional inventory spanning this
large language group creates a special environment, where researchers
can appreciate the shared problems and communicate naturally — despite
the lack of mutual intelligibility.We are glad to have an opportunity to
organize Slav-NLP again in Central Europe.
This Workshop addresses Natural Language Processing (NLP) for the Slavic
languages. The NLP tasks in urgent need of attention include:
*
language modeling,
*
morphological, syntactic and semantic analysis,
*
lexical semantics,
*
named-entity recognition,
*
text normalization and processing non-standard language,
*
coreference resolution,
*
information extraction,
*
question answering,
*
text summarization,
*
machine translation,
*
development of linguistic resources,
*
development and assessment of large language models,
*
text classification,
*
text generation,
*
disinformation detection,
*
fact verification,
*
sentiment analysis.
This Workshop continues the proud tradition established by the 9
previous (B)SNLP Workshops.
IMPORTANT DATES
*
Submission deadline: 27 April 2025
*
Pre-reviewed ARR commitment20 May 2025
*
Notification of acceptance: 27 May 2025
*
Camera-ready papers due: 3 June 2025
*
Workshop: 31 July or 1 August 2025
SHARED TASK
This year's Slav-NLP features a Shared Task on Detection and
Classification of Persuasion Techniquesin Slavic languages in two types
of texts: (a) parliamentary debateson highly-contested topics, and (b)
social media postsrelated to the spread of disinformation.
Information about the Shared Task is available on the Workshop’s Web page
SUBMISSION
At the Workshop’s Web page: bsnlp.cs.helsinki.fi
<http://bsnlp.cs.helsinki.fi/call-for-papers.html>
Workshop contact: bsnlp(a)cs.helsinki.fi
*
--
Roman Yangarber
Professor, University of Helsinki, Finland
Digital Humanities
INEQ: Helsinki Inequality Initiative
<https://helsinki.fi/en/ineq-helsinki-inequality-initiative> —
Linguistic Inequalities and Translation Technologies
------------------------------------------------------------------------
e-Learning & language learning
Language Learning Lab
Unioninkatu 40, Metsätalo A214
helsinki.fi/revita <https://www.helsinki.fi/revita>
helsinki.fi/language-learning-lab
<https://www.helsinki.fi/language-learning-lab>
mobile: +358 50 41 51 71 3
------------------------------------------------------------------------
RЯ
Language Learning Lab <https://www.helsinki.fi/language-learning-lab>
Utrecht University, The Netherlands
In NLP, there is a growing recognition that data quality is key to better language models, yet we still know very little about the link between data and model behavior. In this project, we will develop methods to measure the diversity of NLP datasets, assess the impact of diversity on NLP models, and improve data collection and model training.
As a PhD student, you will develop innovative methods to measure the diversity of NLP datasets. A major focus will be on measuring the dataset diversity from a sociolinguistic perspective, considering language variation – such as styles and dialects - and combining (socio)linguistic insights with neural language modeling. You will also draw from relevant disciplines, particularly the social sciences, that have developed measurement approaches for diversity. Furthermore, you will carry out experiments to assess the impact of data diversity on NLP models, with a focus on fairness and robustness, and investigate ways to leverage data diversity to improve NLP models.
You will join the NLP & Society Lab, headed by Dong Nguyen, where we work on a variety of topics, including computational sociolinguistics, analysis of online conversations, data-centered NLP, and evaluation of NLP models. We are part of the wider NLP group within the Department of Information and Computing Sciences of Utrecht University (UU), the Netherlands.
For more details and to apply, visit the link below:
https://www.uu.nl/en/organisation/working-at-utrecht-university/jobs/phd-po… (Deadline: Jan 5)
Contact: Dong Nguyen (d.p.nguyen(a)uu.nl)
ICLC-11
11TH INTERNATIONAL CONTRASTIVE LINGUISTICS CONFERENCE
Second Call for Abstracts
September 17–19, 2025
Prague, Czech Republic
The Faculty of Arts at Charles University in Prague is pleased to announce the 11th International Contrastive Linguistics Conference. The ICLC conference series, running since 1998, aims to promote fine-grained cross-linguistic research comprising two or more languages from a broad range of theoretical and methodological perspectives. Following the success of ICLC-10 in Mannheim 2023, ICLC-11 wants to bring together researchers from different linguistic subfields and neighbouring disciplines to continue the interdisciplinary dialog on comparing languages, to foster the development of an international community and to advance possible new areas of cross-linguistic research. See https://iclc11.ff.cuni.cz/ for more and note the submission deadline of February 24, 2025.
We invite abstracts on a broad range of topics, including but not limited to:
(1) Comparison of phenomena in two or more languages focused on any area and level of linguistic analysis:
* lexicon
* phonetics and phonology
* morphology, syntax and morphosyntax, linguistic complexity
* semantics, pragmatics, register and socio-cultural context
(2) Methodological challenges and solutions in cross-linguistic research:
* language corpora (multilingual, learner, and multimodal) and issues of linguistic annotation (e.g., Universal Dependencies)
* comparability issues, tertia comparationis, language universals; experimental and naturalistic interaction data
* AI and new digital tools in linguistic analysis
* low-resourced languages
(3) Contrastive linguistics in touch with related disciplines:
* generative, model-theoretic, functional or cognitive (e.g., constructional) approches
* historical, sociolinguistic and variationist perspectives; registers, multimodality, pragmatics, interculturality; language contact; language policy
* cognitive and psycholinguistic approaches to bilingualism and multilingualism; language acquisition, language teaching and learning
* translation studies
The abstracts should present empirical research, well-defined research questions or hypotheses, details of the research approach and methods, theoretical insights, and (preliminary or expected) results. For details see https://iclc11.ff.cuni.cz/calls-and-circulars/call-for-papers/.
PRELIMINARY PROGRAM
* Parallel Oral Sessions
* Poster Sessions
* Keynote Speakers:
Sabine De Knop (Université Saint-Louis, Bruxelles, Belgium)
Volker Gast (Friedrich-Schiller-University, Jena, Germany)
Dan Zeman (Charles University, Prague)
* Panel Discussion
IMPORTANT DATES
24.02.2025: Deadline for abstract submission
26.05.2025: Notification of acceptance
02.06.2025: Registration opens
16.06.2025: Deadline for revised abstract submission
30.06.2025: Last day for early bird registration
01.09.2025: Online registration closes
16.09.2025: Arrival, Registration, Get-together
17–19.09.2025: Conference
ORGANIZING COMMITTEE
* Mirjam Fried (chair) 1)
* Viktor Elšík 1)
* Jana Kocková 2)
* Michal Křen 1)
* Olga Nádvorníková 1)
* Alexandr Rosen 1)
1) Charles University, Faculty of Arts
2) Czech Academy of Sciences, Institute of Slavonic Studies
PROGRAM COMMITTEE: tba
CONTACT INFORMATION
Website: https://iclc11.ff.cuni.cz/
Email: iclc11(a)ff.cuni.cz
MultiGEC-2025 shared task: test phase officially open
We invite you to participate in the shared task on Multilingual Grammatical Error Correction, MultiGEC-2025, covering 12 languages: Czech, English, Estonian, German, Greek, Icelandic, Italian, Latvian, Russian, Slovene, Swedish and Ukrainian.
The task has just entered its test phase, during which participants are invited to submit their system output to CodaLab: https://codalab.lisn.upsaclay.fr/competitions/20500 . The system submission deadline is November 20.
The results will be presented on March 5th 2025, at the NLP4CALL workshop, colocated with the NoDaLiDa conference to be held in Estonia, Tallinn, on 2--5 March 2025. https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…
The publication venue for system descriptions will be the proceedings of the NLP4CALL workshop, also co-published in the ACL anthology.
Official system evaluation will be carried out on CodaLab (link comes later).
To register for/express interest in the shared task, please fill in this form (https://forms.gle/nTPfARVqy1XmqT4t6).
Note that you will be prompted to sign Terms of Use for the data at https://forms.gle/VLJ18WbwsxitEBYi7. The data access is personal, please do not forget to fill in the form.
The GitHub page for the shared task is https://github.com/spraakbanken/multigec-2025/; task description and general information are also available on https://spraakbanken.gu.se/en/compsla/multigec-2025
* TASK DESCRIPTION
In this shared task, your goal is to rewrite learner-written texts to make them grammatically correct or both grammatically correct and idiomatic, that is either adhering to the "minimal correction" principle or applying fluency edits.
For instance, the text
> My mother became very sad, no food. But my sister better five months later.
can be corrected minimally as
> My mother became very sad, and ate no food. But my sister felt better five months later.
or with fluency edits as
> My mother was very distressed and refused to eat. Luckily, my sister recovered five months later.
For fair evaluation of both approaches to the correction task, we will provide two evaluation metrics, one favoring minimal correction, one suited for fluency-edited output (read more under Evaluation).
We particularly encourage development of multilingual systems that can process all (or several) languages using a single model, but this is not a mandatory requirement to participate in the task.
* DATA
We provide training, development and test data for each of the languages. The training and development dataset splits are available through Github. Evaluation will be performed on a separate test set.
See website for more detailed information: https://github.com/spraakbanken/multigec-2025/
Note: The English data is expected a bit later.
* EVALUATION
During the shared task, evaluation will be based on cross-lingually applicable automatic metrics:
- reference-based:
- GLEU score
- Precision, Recall, F0.5 score
- reference-free: Scribendi score
After the shared task, we also plan on carrying out a human evaluation experiment on a subset of the submitted results.
* TIMELINE
- June 18, 2024 - first call for participation ✓
- September 20, 2024 - second call for participation ✓
- October 20, 2024 - third call for participation. Training and validation data released ✓
- October 31, 2024 - reminder. CodaLab opens for team registrations, validation phase starts ✓
- November 13, 2024 - test phase starts ✓
- November 20, 2024 - system submission deadline (system output)
- November 29, 2024 - results announced
- December 16, 2024 - paper submission deadline with system descriptions
- January 20, 2025 - paper reviews sent to the authors
- February 3, 2025 - camera-ready deadline
- March 5, 2025 - presentations of the systems at the NLP4CALL workshop
* PUBLICATION
We encourage you to submit a paper with your system description to the NLP4CALL workshop special track. We follow the same requirements for paper submissions as the NLP4CALL workshop, i.e. we use the same template and apply the same page limit. All papers will be reviewed by the organizing committee. Upon paper publication, we encourage you to share models, code, fact sheets, extra data, etc. with the community through GitHub or other repositories.
* ORGANIZERS
- Arianna Masciolini, University of Gothenburg, Sweden
- Andrew Caines, University of Cambridge, UK
- Orphée De Clercq, Ghent university, Belgium
- Joni Kruijsbergen, Ghent university, Belgium
- Murathan Kurfali, Stockholm University, Sweden
- Ricardo Muñoz Sánchez, University of Gothenburg, Sweden
- Elena Volodina, University of Gothenburg, Sweden
- Robert Östling, Stockholm University, Sweden
* DATA PROVIDERS
- Czech:
-- Alexandr Rosen, Charles University, Prague
- English:
-- Andrew Caines, University of Cambridge
- Estonian:
-- Mark Fishel, University of Tartu, Estonia
-- Kais Allkivi-Metsoja, Tallinn University, Estonia
-- Kristjan Suluste, Eesti Keele Instituut, Estonia
- German:
-- Andrea Horbach, IPN / CAU Kiel, Germany
-- Josef Ruppenhofer, FernUniversität in Hagen, Germany
-- Katrin Wisniewski, Universität Leipzig
-- Torsten Zesch, FernUniversität in Hagen, Germany
- Greek:
-- Alex Tantos, Aristotle University of Thessaloniki
-- Konstantinos Tsiotskas, Aristotle University of Thessaloniki
-- Vassilis Varsamopoulos, Aristotle University of Thessaloniki
-- Pinelopi Kikilintza, Aristotle University of Thessaloniki
-- Elena Drakonaki, Aristotle University of Thessaloniki
-- Eleni Tsourilla, Aristotle University of Thessaloniki
-- Despoina-Ourania Touriki, Aristotle University of Thessaloniki
- Icelandic:
-- Isidora Glisič, University of Iceland
- Italian:
-- Jennifer-Carmen Frey, Eurac Research Bolzano, Italy
-- Lionel Nicolas, Eurac Research Bolzano, Italy
- Latvian:
-- Roberts Darģis, University of Latvia
-- Ilze Auzina, University of Latvia
- Russian:
-- Alla Rozovskaya, City University of New York (CUNY), USA
- Slovene:
-- Špela Arhar Holdt, University of Ljubljana, Slovenia
-- Aleš Žagar, University of Ljubljana, Slovenia
- Swedish:
-- Arianna Masciolini, University of Gothenburg, Sweden
- Ukrainian:
-- Oleksiy Syvokon, Microsoft
-- Mariana Romanyshyn, Grammarly
* CONTACT
Please join the MultiGEC-2025 Google group (https://groups.google.com/g/multigec-2025) in order to ask questions, hold discussions and browse for already answered questions.
Second Workshop on Patient-Oriented Language Processing (CL4Health) @ NAACL 2025
https://bionlp.nlm.nih.gov/cl4health2025/
Albuquerque, New Mexico, USA
SCOPE
CL4Health fills the gap among the different biomedical language processing workshops by providing a general venue for a broad spectrum of patient-oriented language processing research. The second workshop on patient-oriented language processing follows the successful inaugural CL4Health workshop (co-located with LREC-COLING 2024), which clearly demonstrated the need for a computational linguistics venue that focuses on language related to health of the public.
CL4Health is concerned with the resources, computational approaches, and behavioral and socio-economic aspects of the public interactions with digital resources in search of health-related information that satisfies their information needs and guides their actions. The workshop invites papers concerning all areas of language processing focused on patients' health and health-related issues concerning the public. The issues include, but are not limited to accessibility and trustworthiness of health information provided to the public; explainable and evidence-supported answers to consumer-health questions; accurate summarization of patients' health records at their health-literacy level; understanding patients' non-informational needs through their language, and accurate and accessible interpretations of biomedical research. The topics of interest for the workshop include but are not limited to the following:
* Health-related information needs and online behaviors of the public;
* Quality assurance and ethics considerations in language technologies and approaches applied to text and other modalities for public consumption;
* Summarization of data from electronic health records for patients;
* Detection of misinformation in consumer health-related resources and mitigation of potential harms;
* Consumer health question answering (Community Question Answering)(CQA);
* Biomedical text simplification/adaptation;
* Dialogue systems to support patients' interactions with clinicians, healthcare systems, and online resources;
* Linguistic resources, data and tools for language technologies focusing on consumer health;
* Infrastructures and pre-trained language models for consumer health;
SHARED TASK
Perspective-aware Healthcare Answer Summarization (PerAnsSumm) will be co-located with the workshop. In community / consumer health question answering, several aspects, such as question understanding and answer generation, have been studied for over a decade. A new and important question posed by this task is the different perspectives provided in the answers to questions posted to online forums. The responses to the questions offer different answer perspectives, e.g., personal experiences, factual information, and suggestions. Traditionally, the CQA answer summarization task has focused on a single best-voted answer as a reference summary. A single answer does not capture all the perspectives. Moreover, a structured presentation of the information in the form of perspective-specific summaries may be more useful for the end-users. To address these gaps, this challenge introduces a novel perspective-specific answer summarization task within a CQA setup. The task will use the Perspective-aware healthcare Answer SuMmarizAtion (PUMA) dataset, a corpus of medical question-answer pairs created by the task organizers. The PUMA dataset consists of 3,167 CQA threads with approximately 10K answers filtered from the Yahoo! L6 corpus. Each answer in PUMA is annotated with five perspective spans: ‘cause’, ‘suggestion’, ‘experience’, ‘question’, and ‘information’.
IMPORTANT DATES
(Tentative)
January 30, 2025 -Workshop Paper Due Date️:
March 1, 2025 - Notification of acceptance:
March 10, 2025 - Camera-ready papers due:
April 8, 2025 - Pre-recorded video due (hard deadline):
May 3 OR 4, 2025 - Workshop
SUBMISSIONS
Two types of submissions are invited:
- Full papers: should not exceed eight (8) pages of text, plus unlimited references. These are intended to be reports of original research.
- Short papers: may consist of up to four (4) pages of content, plus unlimited references. Appropriate short paper topics include preliminary results, application notes, descriptions of work in progress, etc.
Electronic Submission: Submissions must be electronic and in PDF format, using the Softconf START conference management system. Submissions need to be anonymous. The submission site will be announced shortly.
Dual submission policy: papers may NOT be submitted to the workshop if they are or will be concurrently submitted to another meeting or publication.
MEETING
The workshop will be hybrid. Virtual attendees must be registered for the workshop to access the online environment.
Accepted papers will be presented as posters or oral presentations based on the reviewers’ recommendations.
ORGANIZERS
- Dina Demner-Fushman, US National Library of Medicine
- Sophia Ananiadou, National Centre for Text Mining and University of Manchester, UK
- Paul Thompson, National Centre for Text Mining and University of Manchester, UK
- Deepak Gupta, US National Library of Medicine
--
Paul Thompson
Research Fellow
Department of Computer Science
National Centre for Text Mining
Manchester Institute of Biotechnology
University of Manchester
131 Princess Street
Manchester
M1 7DN
UK
http://personalpages.manchester.ac.uk/staff/Paul.Thompson/
The next meeting of the Edge Hill Corpus Research Group will take place online (via MS Teams) on Friday 15 November 2024, 2-4 pm (GMT)
Topic: Discourse-Oriented Corpus Studies
2-3 pm
Katia Adimora (Edge Hill University)
Mexican immigration/immigrants in American and Mexican newspapers
3-4 pm
Dan Malone (Edge Hill University)
When is the extreme also typical? Using prototypicality to investigate representations of the lone-wolf terrorist
Attendance is free. The abstracts and registration link are here: https://sites.edgehill.ac.uk/crg/next
Registration closes tomorrow (Wednesday 13 November) at 11 am (GMT).
If you have problems registering, or have any questions, please contact me: gabrielc(a)edgehill.ac.uk<mailto:gabrielc@edgehill.ac.uk>
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
Location: Cardiff, UK
Deadline for applications: 25th November
Start date: available immediately
End date: 30th April 2027
Keywords: natural language processing, neuro-symbolic AI, graph neural networks, commonsense reasoning
Details about the post
Applications are invited for a Postdoctoral Research Associate in the Cardiff University School of Computer Science & Informatics, to work on the EPSRC Open Fellowship project ReStoRe (Reasoning about Structured Story Representations), which is focused on story-level language understanding. The overall aim of this project is to develop methods for learning graph-structured representations of stories. For this post, the specific focus will be on developing neuro-symbolic reasoning strategies to fill the gap between what is explicitly stated in a story and what a human reader would infer by “reading between the lines”. More details about the post and instructions on how to apply are available at https://www.jobs.ac.uk/job/DKK088/research-associate
We are happy to announce the next online seminar in the Neurocognition, Language and Visual Processing (NLVP) series organized by the NLVP group and IDSAI at the University of Exeter. You can check the slides, videos of previous talks and the schedule for upcoming talks here: https://sites.google.com/view/neurocognit-lang-viz-group/seminars
Zoom meeting link: https://Universityofexeter.zoom.us/j/93707609239?pwd=ErfOgIy30fwkAH7V5iFFVg…
(Meeting ID: 937 0760 9239 Password: 259613)
***Seminar 1: Thursday, 12 Dec 2024, 16:00 to 17:00, BST***
Speaker: Dr Vered Shwartz (University of British Columbia)
Title: Navigating Cultural Adaptation of LLMs: Knowledge, Context, and Consistency
Abstract: Despite their amazing success, large language models and vision and language models suffer from several limitations. This talk focuses on one of these limitations: the models’ narrow Western, North American, or even US-centric lens, as a result of training on web text and images primarily from US-based users. As a result, users from diverse cultures that are interacting with these tools may feel misunderstood and experience them as less useful. Worse still, when such models are used in applications that make decisions about people’s lives, lack of cultural awareness may lead to models perpetuating stereotypes and reinforcing societal inequalities. In this talk, I will present a line of work from our lab aimed at quantifying and mitigating this bias.
Speaker's short bio: Vered Shwartz is an Assistant Professor of Computer Science at the University of British Columbia, and a CIFAR AI Chair at the Vector Institute. Her research interests include commonsense reasoning, computational semantics and pragmatics, multimodal models, and cultural considerations in NLP. Previously, Vered was a postdoctoral researcher at the Allen Institute for AI (AI2) and the University of Washington, and received her PhD in Computer Science from Bar-Ilan University.
***Seminar 2: Thursday, 16 Jan 2025, 15:00 to 16:00, BST***
Speaker: Prof Roberto Navigli (Sapienza University of Rome)
Title: What's Behind Text? The Long, Challenging Path Towards a Unified Language-Independent Representation of Meaning
Abstract: In the era of Large Language Models (LLMs), the pursuit of a unified, language-independent representation of meaning remains both essential and complex. This talk revisits the rationale for advancing semantic understanding beyond the capabilities of LLMs and highlights the development of a large-scale multilingual inter-task resource like MOSAICo and the design of innovative methods that bridge word- and sentence-level meanings across languages. I will also explore how building a robust, multilingual framework for interpreting meaning with greater precision and depth enhances the quality and reliability of system outputs, including text generated by LLMs.
Speaker's short bio: Roberto Navigli is Professor of Natural Language Processing at the Sapienza University of Rome, where he leads the Sapienza NLP Group. He has received two ERC grants on lexical and sentence-level multilingual semantics, highlighted among the 15 projects through which the ERC transformed science. He received several prizes, including two Artificial Intelligence Journal prominent paper awards and several outstanding/best paper awards from ACL. He is the co-founder of Babelscape, a successful deep-tech company which enables NLU in dozens of languages. He served as Associate Editor of the Artificial Intelligence Journal (2013-2020) and Program Co-Chair of ACL-IJCNLP 2021. He is a Fellow of ACL, ELLIS and EurAI and currently serves as General Chair of ACL 2025.
Check past and upcoming seminars at the following url: https://sites.google.com/view/neurocognit-lang-viz-group/seminars.
If you want to follow future NLVP seminars, you are welcome to join our *Google group*: https://groups.google.com/g/neurocognition-language-and-vision-processing-g…
Best wishes,
Hang Dong (https://computerscience.exeter.ac.uk/staff/hd524)
on behalf of the NLVP group (https://sites.google.com/view/neurocognit-lang-viz-group/members)
*Apologies for cross-postings*
�
eRST – enhanced Rhetorical Structure Theory
�
We are delighted to introduce a new parsing framework and datasets for discourse relation recognition: eRST is an ehanced version of Rhetorical Structure Theory which allows multiple, concurrent and non-projective discourse relations in a formally constrained graph, aligned to a large inventory of discourse relation signals, based on the Signaling Corpus taxonomy. Signals are divided into 9 classes and 45 sub-classes, including traditional discourse markers such as PDTB-style connectives, but also lexical, syntactic and semantic signals, such as repetition, lexical chains and anaphoric relations.
�
eRST is described in depth in this paper:
�
Zeldes, Amir, Aoyama, Tatsuya, Liu, Yang Janet, Peng, Siyao, Das, Debopam and Gessler, Luke (2024) "eRST: A Signaled Graph Theory of Discourse Relations and Organization". Computational Linguistics, 1–47. https://direct.mit.edu/coli/article/doi/10.1162/coli_a_00538/124464/eRST-A-…
�
You can also find an overview at the following website, as well as analyses for nearly 250K words of English in 24 spoken and written text types, from the freely available UD English GUM and GENTLE corpora:
�
https://gucorpling.org/erst/
�
If you want to learn more and are participating in EMNLP in Miami this week, please check out our talk on Wednesday! And if you are interested in shallow discourse parsing, please check out our paper on Tuesday and the aligned PDTB3-style relations for the same data in this paper: https://aclanthology.org/2024.emnlp-main.684/
�
*<Lexicom/>*
a workshop in digital lexicography and lexical computing
*Registration open*
*Bari, Italy*15 – 19 September 2025
Your 5 days to get up-to-date with the latest developments in
*corpus-driven lexicography* and to practice your
*corpus building and corpus query skills* with some of the top experts in
the field.
For the programme, lecturers, invited speakers, fees and registration,
visit this website
*lexicom.courses <https://lexicom.courses/upcoming-lexicom/>*
I hope to meet you in Bari in September!
Ondřej
*Ondřej Matuška*
sketchengine.eu <http://www.sketchengine.eu/> | Facebook
<https://www.facebook.com/SketchEngine/> | LinkedIn
<https://www.linkedin.com/in/ondrejmatuska> | Twitter
<https://twitter.com/SketchEngine>