Hello all, here is our Call for Papers: VarDial 2025 - Twelfth Workshop on NLP for Similar Languages, Varieties and Dialects
VarDial 2025: https://sites.google.com/view/vardial-2025/home
Co-located with COLING 2025, VarDial deals with computational methods and language resources for closely related languages, language varieties, and dialects.
We welcome papers dealing with one or more of the following topics:
- Corpora, resources, and tools for similar languages, varieties and dialects;
- Adaptation of tools (taggers, parsers) for similar languages, varieties and dialects;
- Evaluation of language resources and tools when applied to language varieties;
- Reusability of language resources in NLP applications (e.g., for machine translation, POS tagging, syntactic parsing, etc.);
- Corpus-driven studies in dialectology and language variation;
- Computational approaches to mutual intelligibility between dialects and similar languages;
- Automatic identification of lexical variation;
- Automatic classification of language varieties;
- Text similarity and adaptation between language varieties;
- Linguistic issues in the adaptation of language resources and tools (e.g., semantic discrepancies, lexical gaps, false friends);
- Machine translation between closely related languages, language varieties and dialects.
In addition to the topics listed above, we also welcome papers dealing with diachronic language variation (e.g. phylogenetic methods, historical dialects).
Timeline
Publication of call for papers: Tuesday, August 6th, 2024
Paper submission deadline: Tuesday, November 5th, 2024
Notification of acceptance: Monday, November 25th, 2024
Commitment deadline for pre-reviewed papers: TBD
Camera-ready papers due: Friday, December 13th, 2024
Workshop date: Sunday, January 19th, 2025
Submission information
We invite submissions of up to 8 pages of content, plus up to one page for ethical considerations and/or limitations, plus unlimited pages of references. We also welcome shorter contributions, but we do not make an explicit distinction between long and short papers. For shared task system description papers, we recommend a length of 4-5 pages of content.
Detailed submission guidelines available on the COLING 2025 website. All submissions must use the official COLING templates. Contributions must be submitted to Softconf: https://softconf.com/coling2025/VarDial25/
It will also be possible to submit rejected COLING main conference papers to VarDial 2025. Instructions on committing such papers to VarDial will be provided here at a later date.
Organizers
Yves Scherrer - University of Oslo (Norway)
Tommi Jauhiainen - University of Helsinki (Finland)
Nikola Ljubešić - Jožef Stefan Institute (Slovenia) and University of Ljubljana (Slovenia)
Preslav Nakov - Mohamed bin Zayed University of Artificial Intelligence (UAE)
Jörg Tiedemann - University of Helsinki (Finland)
Marcos Zampieri - George Mason University (USA)
Cheers,
Tommi
—
Tommi Jauhiainen (PhD, Language Technology)
Projektisuunnittelija / Project Planning Officer
FIN-CLARIN & Kielipankki – The Language Bank of Finland
Digitaalisten ihmistieteiden osasto / Department of Digital Humanities
Helsingin yliopisto / University of Helsinki
https://www.kielipankki.fi
Member
Centre of Excellence in Ancient Near Eastern Empires, University of Helsinki
(apologies for cross-posting)
Dear colleagues,
We are delighted to announce the first CfP for The 4th Workshop on
Arabic Corpus Linguistics (WACL-4) @ COLING 2025.
The 4th Workshop on Arabic Corpus Linguistics (WACL-4)
20 January 2025, in conjunction with COLING'2025 in Abu Dhabi (UAE).
URL: https://wp.lancs.ac.uk/wacl4/ [1]
Description
The 4th Workshop on Arabic Corpus Linguistics (WACL - 4) will serve as
an outstanding international platform for exchanging knowledge and
outcomes in the theory, methodology, and applications of language
technologies tailored to the diverse range of Arabic dialects. We
encourage submissions that span a spectrum from theoretical
investigations to practical applications, aiming to address the unique
challenges, solutions, and insights that Arabic dialects introduce to
the field of NLP.
Topics of interest include, but are not limited to, the following
* Development and Utilisation of Arabic Dialectal Corpora
* Advancements in Natural Language Processing Techniques for Arabic
Dialects
* Applications and Challenges of Large Language Models in
Understanding and Generating Arabic Dialects
* Morphological and Syntactical Challenges in Arabic Dialects
* Dialect Identification and Classification
* Speech Recognition and Synthesis for Arabic Dialects
* Machine Translation involving Arabic Dialects
* Sentiment Analysis and Opinion Mining in Arabic Dialects
* Named Entity Recognition and Information Extraction for Arabic
Dialects
* Development of Open Access Resources for Arabic Dialects
* Text Processing and Transliteration Challenges for Arabic Dialects
* Cultural and Sociolinguistic Considerations in NLP Applications for
Arabic Dialects
* Resources and Tools for Computational Analysis of Arabic Dialects
* Applications of Arabic Dialects NLP in Real-World Scenarios
Paper Submission
Submissions should adhere to the COLING 2025 standards. Authors are
strongly encouraged to review and follow the COLING 2025 submission
guidelines and author kit, available at https://coling2025.org/ [2].
Submission link: https://softconf.com/coling2025/WACL-4/ [3]
If authors are describing dialectal variations, we request that they
include relevant linguistic details and sociolinguistic contexts to
enrich the understanding of the presented work.
Submissions may be of two types
* Long papers - up to eight (8) pages including references, presenting
substantial, original, completed, and unpublished work.
* Short papers - up to four (4) pages including references, describing
a small focused contribution, negative results, or system
demonstrations, etc.
Important dates
* Call for Papers Announcement: 07 August 2024
* Abstract Submission Deadline: 03 October 2024
* Paper Submission Deadline: 10 October 2024
* Notification of Paper Acceptance: 1 November 2024
* Camera-ready Paper Deadline: 15 November 2024
* Workshop Date: 20th January 2025
Organising Committee
* Saad Ezzini, Lancaster University, UK (General Chair)
* Hamza Alami, Sidi Mohamed Ben Abdellah University, Morocco
(Programme Co-Chair)
* Ismail Berrada, Mohammed VI Polytechnic University, Morocco
(Programme Co-Chair)
* Abdessamad Benlahbib, Sidi Mohamed Ben Abdellah University, Morocco
(Programme Co-Chair)
* Abdelkader El Mahdaouy, Mohammed VI Polytechnic University, Morocco
(Review Chair)
* Salima Lamsiyah, University of Luxembourg, Luxembourg (Publication
Chair)
* Nouran Khallaf, Leeds University, UK (Publicity Co-Chair)
* Hatim Derrouz, Ibn Tofail University, Morocco (Publicity Co-Chair)
* Amal Haddad, University of Granada, Spain (Publicity Co-Chair)
* Mustafa Jarrar, Birzeit University, Palestine (Advisory Committee)
* Mo El-Haj, Lancaster University, UK (Advisory Committee)
* Ruslan Mitkov, Lancaster University, UK (Advisory Committee)
* Paul Rayson, Lancaster University, UK (Advisory Committee)
Anti-Harassment policy
The workshop supports the COLING anti-harassment policy
https://coling2022.org/policy [4]
Links:
------
[1] https://wp.lancs.ac.uk/wacl4/
[2] https://coling2025.org/
[3] https://softconf.com/coling2025/WACL-4/
[4] https://coling2022.org/policy
Dear Community,
I am writing to share an exciting chance to fund projects focused on
creating or strengthening Natural Language datasets in English, French,
Portuguese, or Spanish.
In collaboration with Lacuna Fund, the National Center for Artificial
Intelligence <https://www.cenia.cl/> (CENIA) has opened a call for
proposals focused on Latin America and Africa. I kindly ask for your
assistance in spreading the word about this initiative.
This is a significant opportunity to contribute to the development of
language technologies that are inclusive and representative of diverse
linguistic communities. By supporting the creation of high-quality
datasets, we can ensure that innovations in AI and Natural Language
Processing are accessible and beneficial to a broader range of people,
particularly in regions that are often underrepresented in global research
efforts.
Lacuna Fund is an initiative that funds the creation of high-quality open
data to close critical knowledge gaps. Its goal is to empower institutions,
organizations, and academic groups by providing resources that enable the
development of innovative and sustainable solutions.
I invite you to learn more about the call for proposals here:
https://lacunafund.org/apply/.
Cheers,
Felipe
Dear fellow researchers,
as part of a bachelor thesis on Text2SQL models, we are collecting
questions about the IR community.
In order to get realistic and diverse questions, we’d like to ask for
your help: What questions do you have about the IR research community
and the publications in the field?
All questions are welcome; simply submit them via our form; no
additional information is required.
https://forms.gle/aTKXA2PFqWVK96rYA
Thank you in advance!
Tim Gollub
https://webis.de
My friends, it is time to look to the future. All the way to *August 15,
2024*. On that day you will happily submit one or more submissions to
the Workshop
on the Future of Event Detection
<https://future-of-event-detection.github.io/> (FuturED). Your work will
help set the agenda for years to come of this long-standing subfield of
NLP, CV, HCI and other main areas of AI. The future can be yours!
Look forward to seeing you on Softconf
<https://softconf.com/emnlp2024/FuturED/> in 11 days and the workshop
itself on November 15 at EMNLP <https://2024.emnlp.org/>.
Yours truly, for the Future.
[With apologies for cross-posting]
-----------------------------
NEW DEADLINE: August 22nd, 2024
-----------------------------
We are excited to announce the 22nd International Workshop on Treebanks and Linguistic Theories (TLT 2024), which will bring together developers and users of linguistically annotated natural language corpora. The workshop is endorsed by ACL SIGPARSE and will be hosted by Universität Hamburg in Germany on December 5th-6th, 2024.
-----------------------------
VENUE
-----------------------------
TLT 2024 will take place at the guest house of Universität Hamburg. In order to support rich discussions and networking, TLT 2024 will primarily be an in-person event; we will, however, accommodate a limited number of live / synchronous remote presentations, prioritizing those with circumstances that prevent travel.
Universität Hamburg and its guest house are conveniently located near the Dammtor train station / metro station Stephansplatz which are well-connected with many parts of the city and beyond, providing an easy commute for attendees.
Hamburg is a vibrant city known for its rich maritime history as one of the leading cities in the medieval Hanseatic League, as well as its modern cultural diversity, including events at the world-famous Elbphilharmonie Concert Hall. The city is easily accessible by train or plane (Hamburg Airport (HAM); about 1 to 1.5 hours train ride: Bremen Airport (BRE) and Hannover Airport (HAJ)).
-----------------------------
SUBMISSION INFORMATION
-----------------------------
TLT addresses all aspects of treebank design, development, and use. As ‘treebanks’ we consider any pairing of natural language data (spoken, signed, or written) with annotations of linguistic structure at various levels of analysis, including, e.g., morpho-phonology, syntax, semantics, and discourse. Annotations can take any form (including trees or general graphs), but they should be encoded in a way that enables computational processing. Reflections on the design of linguistic annotations, methodology studies, resource announcements or updates, annotation or conversion tool development, or reports on treebank usage including probing the leakage of treebanks into large language models are but some examples of the types of papers we anticipate for TLT.
Papers should describe original work; they should emphasize completed work rather than intended work, and should indicate clearly the state of completion of the reported results. Submissions will be judged on correctness, originality, technical strength, significance and relevance to the conference, and interest to the attendees.
We invite paper submissions in two distinct tracks:
* regular papers on substantial and original research, including empirical evaluation results, where appropriate;
* short papers on smaller, focused contributions, work in progress, negative results, surveys, or opinion pieces.
Submissions (in both tracks) may either be archival—in case of unpublished work—or non-archival, based on the wish of the authors. All archival papers accepted for presentation at the workshop will be included in the TLT 2024 proceedings volume, which will be part of the ACL Anthology. Non-archival papers must have been published or accepted for publication at another CL conference.
Long papers may consist of up to 8 pages of content (excluding references and appendices). Short papers may consist of up to 4 pages of content (excluding references and appendices). Accepted papers will be given an additional page to address reviewer comments.
All submissions should follow the two-column format and the ACL style guidelines. We strongly recommend the use of the LaTeX style files, OpenDocument, or Microsoft Word templates created for ACL: https://github.com/acl-org/acl-style-files
Submissions will be reviewed double-blind, and all full and short papers must be anonymous, i.e. not reveal author(s) on the title page or through self-references. So e.g., “We previously showed (Smith, 2020) …”, should be avoided. Instead, use citations such as “Smith (2020) previously showed …. Papers must be submitted digitally, in PDF, and uploaded through the on-line conference system (link forthcoming).
Submissions that violate these requirements will be rejected without review.
-----------------------------
IMPORTANT DATES
-----------------------------
* Long and short paper submission deadlines: August 22th, 2024
* Reviews Due: September 26th, 2024
* Notification of acceptance: October 6th, 2024
* Final version of papers due: November 6th, 2024
* TLT2024: December 5th-6th, 2024 in Hamburg
-----------------------------
TLT2024 WORKSHOP CHAIRS
-----------------------------
Daniel Dakota, Indiana University
Sandra Kübler, Indiana University
Heike Zinsmeister, Universität Hamburg
-----------------------------
TLT2024 COMMUNICATION CHAIR
-----------------------------
Sarah Jablotschkin, Universität Hamburg
Contact: tlt2024.gw(a)uni-hamburg.de
Website: https://www.korpuslab.uni-hamburg.de/en/tlt2024.html
In this newsletter:
Fall 2024 LDC Data Scholarship program
New publications:
LORELEI Uyghur Incident Language Pack<https://catalog.ldc.upenn.edu/LDC2024T07>
Ravnursson Faroese Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S09>
________________________________
Fall 2024 LDC Data Scholarship program
Student applications for the Fall 2024 LDC Data Scholarship program are being accepted now through September 15, 2024. This program provides eligible students with no-cost access to LDC data. Students must complete an application consisting of a data use proposal and letter of support from their advisor. For application requirements and program rules, visit the LDC Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
________________________________
New publications:
LORELEI Uyghur Incident Language Pack<https://catalog.ldc.upenn.edu/LDC2024T07> was developed by LDC and is comprised of 28 million words of Uyghur monolingual text, 500,000 words of English monolingual text, 3.3 million words of parallel and comparable Uyghur-English text, and 200,000 words annotated for simple named entities and situation frames. It constitutes all of the text data, annotations, supplemental resources, and related software tools for the Uyghur language that were used in the DARPA LORELEI / LoReHLT 2016 Evaluation<https://www.nist.gov/itl/iad/mig/lorehlt-evaluations>.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. In the evaluation scenario, an unforeseen event triggered a need for humanitarian and logistical support in a region where the incident language had received little or no attention in NLP research. Evaluation participants provided NLP solutions, including information extraction and machine translation, with limited resources and limited development time.
Data was collected from news, social network, weblog, newsgroup, discussion forum, and reference material. Named entity annotation identified entities to be detected by systems for scoring purposes. Situation frame analysis was designed to extract basic information about needs and relevant issues for planning a disaster response effort.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
Ravnursson Faroese Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S09> contains 109 hours of Faroese prompted speech from 433 speakers (249 female, 184 male), corresponding transcripts and speaker metadata. It is an extract from the Basic Language Resource Kit 1.0 (BLARK 1.0)<https://mtd.setur.fo/en/resource/ravnur-blark-1-0/> developed by the Faroe Islands' Ravnur Project<https://mtd.setur.fo/en/>.
Speech data was collected in 2022. Speakers from all major dialect areas in the Faroe Islands in three age groups -- 15-35, 36-60, and 61+ years -- read texts that included a word list, a phrase list, closed vocabulary readings, and short texts. Recordings also contain spontaneous speech. Orthographic transcripts are included.
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data at no cost.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
(Apologies for cross-posting)
NEW DEADLINE: August 29, 2024
****** Second Call for Papers ******
SICon 2024: 2nd Workshop on Social Influence in Conversations
Website: https://sites.google.com/view/sicon2024/home
Twitter/X: @SIConWorkshop
Paper Submission via Openreview: https://openreview.net/group?id=EMNLP/2024/Workshop/SiCon
Contact: sicon-chairs [at] googlegroups.com
Venue: Co-located with EMNLP 2024; November 16, 2024; Miami, Florida
*** Workshop description ***
Social influence (SI) is the change in an individual's thoughts, feelings, attitudes, or behaviors from interacting with another individual or a group. For example, a buyer uses SI skills to negotiate trade-offs and build rapport with the seller. SI is ubiquitous in everyday life, and hence, realistic human-machine conversations must reflect these dynamics, making it essential to model and understand SI in dialogue research systematically. This would improve SI systems' ability to understand users’ utterances, tailor communication strategies, personalize responses, and actively lead conversations. These challenges draw on perspectives not only from NLP and AI research but also from Game Theory, Affective Computing, Communication, and Social Psychology.
SICon 2024 will be the second edition of a venue that uniquely fosters a dedicated discussion on social influence within NLP while involving researchers from other disciplines such as affective computing and the social sciences. SICon 2024 features keynote talks, panel discussions, poster sessions, and lightning talks for accepted papers. We encourage researchers of all stages and backgrounds to share their exciting work!
SICon will promote discussion around several key questions:
* How should social influence systems model users and plan optimal responses systematically?
* How can social influence systems benefit from linguistic theories (e.g.,successful persuasion or negotiation tactics) developed in the social sciences?
* What structurally differentiates and unites various social influence tasks?
* What are the ethical issues involved with AI that engage in social influence and what guardrails must be implemented before using these systems in the wild?
*** Submission Guidelines ***
SICon welcomes two types of papers: regular workshop submissions and shared task submissions.
- Regular workshop submissions
Regular workshop submissions are archival short (4 pages) and long (8 pages) papers. There is also a non-archival track for extended abstracts (2 pages) covering ongoing work on social influence NLP. Topics include but are not limited to:
Analysis-focused contributions (e.g., associations between linguistic behaviors or user attributes with SI task outcomes);
System design contributions (e.g., dialogue systems for SI tasks such as strategic games, emotional support, etc.; SI systems effectively harnessing the capabilities of LLMs);
Contributions advancing relevant subgoals in SI tasks (e.g. detecting SI strategies in text, partner/opponent modeling and emotion recognition in SI interactions);
SI systems benefited from linguistic theories (e.g., successful persuasion, negotiation tactics) developed in social science;
Ethical issues and guardrails involved with AI that engage in SI;
Unintentional aspects of SI for any human-facing NLP system;
Datasets capturing forms of SI;
Opinion or position papers on SI.
Submissions should follow the official EMNLP 2024 style guidelines and be submitted through OpenReview: https://openreview.net/group?id=EMNLP/2024/Workshop/SiCon
- Shared task submissions
The GenSICon shared task aims to investigate the generalization capability of a particular model across different social influence tasks or scenarios. Further details and a submission link can be found here: https://sites.google.com/view/sicon2024/shared-task
We have prizes for the winners!
*** Important Dates ***
Direct paper submission deadline: August 29, 2024
Shared task paper submission deadline: September 27, 2024
Notification of acceptance: October 2, 2024
Camera-ready paper due: October 10, 2024
Workshop: November 16, 2024
(All submission deadlines are 11:59 p.m. UTC-12:00 ‘anywhere on Earth’)
*** Contact ***
For any questions, please contact the organizers at: sicon-chairs [at] googlegroups.com
Consider joining our slack community! A dedicated space to connect researchers working on various topics related to social influence in NLP and beyond, as well as a useful communication channel for live announcements during the workshop.
https://join.slack.com/t/acl2023sicon/shared_invite/zt-1y7cv1c2v-Lm21Sm6KX3…
*** Organizers ***
Muskan Garg (Mayo Clinic)
Kushal Chawla (Capital One)
Weiyan Shi (Stanford NLP)
Ritam Dutt (Carnegie Mellon University)
Deuksin Kwon (University of Southern California)
James Hale (University of Southern California)
Liang Qiu (Amazon)
Aina Garí Soler (Télécom-Paris)
Alexandros Papangelis (Amazon Alexa AI)
Gale Lucas (University of Southern California)
Zhou Yu (Columbia University)
Daniel Hershcovich (University of Copenhagen)
We are pleased to announce the release of a new annotated corpus, consisting of selected sections (i.e., Abstract, Methods and Results) of scientific research articles concerning occupational exposures to two different types of substance, i.e., diesel exhaust (51 articles) and respirable crystalline silica (50 articles). The article sections have been annotated by experts in the field with 6 categories of named entities relevant to the assessment of occupational substance exposures, particularly in the context of Job Exposure Matrices.
The corpus and associated annotation guidelines may be downloaded from: https://zenodo.org/records/11164271
NER models and associated code are available at: https://github.com/panagiotis-geo/Substance_Exposure_NER/
The development of the corpus and the associated NER models are described in more detail in the following article:
Thompson, P., Ananiadou, S., Basinas I., Brinchmann, B. C., Cramer, C., Galea, K. S., Ge, C., Georgiadis, P., Kirkeleit, J., Kuijpers, E., Nguyen, N., Nuñez, R., Schlünssen, V., Stokholm, Z. A., Taher, E. A., Tinnerberg, H., Van Tongeren, M. and Xie, Q. (2024). <https://doi.org/10.1371/journal.pone.0307844> Supporting the working life exposome: annotating occupational exposure for enhanced literature search. PLoS ONE 19(8): e030784 https://doi.org/10.1371/journal.pone.0307844
Abstract
—————
An individual’s likelihood of developing non-communicable diseases is often influenced by the types, intensities and duration of exposures at work. Job exposure matrices provide exposure estimates associated with different occupations. However, due to their time-consuming expert curation process, job exposure matrices currently cover only a subset of possible workplace exposures and may not be regularly updated. Scientific literature articles describing exposure studies provide important supporting evidence for developing and updating job exposure matrices, since they report on exposures in a variety of occupational scenarios. However, the constant growth of scientific literature is increasing the challenges of efficiently identifying relevant articles and important content within them. Natural language processing methods emulate the human process of reading and understanding texts, but in a fraction of the time. Such methods can increase the efficiency of both finding relevant documents and pinpointing specific information within them, which could streamline the process of developing and updating job exposure matrices. Named entity recognition is a fundamental natural language processing method for language understanding, which automatically identifies mentions of domain-specific concepts (named entities) in documents, e.g., exposures, occupations and job tasks. State-of-the-art machine learning models typically use evidence from an annotated corpus, i.e., a set of documents in which named entities are manually marked up (annotated) by experts, to learn how to detect named entities automatically in new documents. We have developed a novel annotated corpus of scientific articles to support machine learning based named entity recognition relevant to occupational substance exposures. Through incremental refinements to the annotation process, we demonstrate that expert annotators can attain high levels of agreement, and that the corpus can be used to train high-performance named entity recognition models. The corpus thus constitutes an important foundation for the wider development of natural language processing tools to support the study of occupational exposures.
--
Paul Thompson
Research Fellow
Department of Computer Science
National Centre for Text Mining
Manchester Institute of Biotechnology
University of Manchester
131 Princess Street
Manchester
M1 7DN
UK
Tel: 0161 306 3091
http://personalpages.manchester.ac.uk/staff/Paul.Thompson/
Dear colleague,
The 34th Meeting of Computational Linguistics in The Netherlands (CLIN34) will take place soon, on Friday 30 August 2024! We cordially invite you to participate. Online registration<https://clin34.leidenuniv.nl/2024/07/05/registration-open/> ends on Wednesday (21st of August).
Besides a large and diverse programme of posters and oral presentations, we are happy to report that CLIN34 will have two keynote talks by:
* Diana Maynard, Sheffield University
* Dominique Blok and Erik de Graaf, TNO
The programme can also be found at: clin34.leidenuniv.nl/program/<https://clin34.leidenuniv.nl/program/>
We hope to see you in Leiden about two weeks from now!
The CLIN34 organizers
Leiden University