Dear all,
Registration is now open for the shared tasks at OSACT7, the 7th Workshop on Open-Source Arabic Corpora and Processing Tools, co-located with LREC 2026 in Palma, Mallorca, Spain (11–16 May 2026).
Participating teams will have the opportunity to submit a system paper. Accepted papers, following peer review, will be published in the OSACT 2026 workshop proceedings on the ACL Anthology.
Shared Tasks
1- QIAS 2026: Questions & Answers in Islamic Studies Assessment
Website: https://sites.google.com/view/qias2026/<https://sites.google.com/view/qias2026/>
2- AdabEval 2026: Arabic Politeness Detection
Website: https://sites.google.com/view/adabeval2026/home
3- AraSentEval 2026: A Shared Task on Sentiment Analysis and Swapping in Arabic
Website: https://ezzini.github.io/AraSentEval/
4- AraHAHA 2026: Arabic Humour Generation
Website: https://sites.google.com/view/arhaha2026/home
5- KSAA 2026: Arabic Speech Dictation with Automatic Diacritisation
Website: https://arai.ksaa.gov.sa/sharedTask2026/
We warmly invite researchers and practitioners to participate. Full details, including datasets, evaluation protocols, and timelines, are available on the respective shared task websites.
For general information about OSACT7 and workshop participation, please visit the official workshop page: https://osact-lrec.github.io/
Best regards,
OSACT7 Organizing Committee
________________________________
Disclaimer:
This communication is intended for the above named person and is confidential and / or legally privileged. Any opinion(s) expressed in this communication are not necessarily those of KSU (King Saud University). If it has come to you in error you must take no action based upon it, nor must you print it, copy it, forward it, or show it to anyone. Please delete and destroy the e-mail and any attachments and inform the sender immediately. Thank you.
KSU is not responsible for the political, religious, racial or partisan opinion in any correspondence conducted by its domain users. Therefore, any such opinion expressed, whether explicitly or implicitly, in any said correspondence is not to be interpreted as that of KSU.
KSU may monitor all incoming and outgoing e-mails in line with KSU business practice. Although KSU has taken steps to ensure that e-mails and attachments are free from any virus, we advise that, in keeping with best business practice, the recipient must ensure they are actually virus free.
The 7th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT7)
Palma, Mallorca, Spain, 11 May 2026
Co-located with LREC 2026.
OSACT7 invites submissions on open-source Arabic language resources and processing tools. The workshop brings together researchers, practitioners, and students working in computational linguistics, NLP, and IR, with a strong emphasis on accessibility, reproducibility, and support for underrepresented Arabic varieties.
The workshop will feature five shared tasks, alongside regular paper submissions, addressing timely challenges in Arabic NLP. We particularly welcome work on Arabic corpora, language models, and processing technologies, including Large Language Models and Generative AI, dialectal Arabic, and practical tools for real-world applications.
Topics include:
*
Arabic language resources and annotated corpora
*
Pre-trained & fine-tuned Arabic language models
*
Dialect identification and translation
*
Sentiment analysis, text classification, and fake news detection
*
Core Arabic NLP tasks and processing tools
*
Crowdsourcing, annotation, & resource creation
Submission types (excluding references):
*
Long papers (up to 8 pages),
*
Short papers (up to 4 pages,)
*
Shared task system papers (up to 4 pages).
Important dates:
*
Submission deadline: 18 February 2026
*
Notification: 12 March 2026
*
Camera-ready: 30 March 2026
*
Workshop: 11 May 2026
Submission link: https://softconf.com/lrec2026/OSACT7/
Best wishes,
OSACT7 Organizing Committee:
-Hend Al-Khalifa
-Mo El-Haj
-Saad Ezzini
————
Dr Mo El-Haj
Director of NLP @ VinUniversity
Reader (Associate Professor) in NLP
CECS, VinUniversity, Vietnam
SCC, Lancaster University, UK
https://elhaj.ukhttps://arabicnlp.uk<https://arabicnlp.uk/>
https://vinnlp.com
The ELLIS Institute Finland invites applications for Principal Investigator (PI) positions.
Deadline: January 12, 2026
https://www.ellisinstitute.fi/PI-recruit-2026
Part of this call is a position at the University of Helsinki on
Reliable Communicative AI connected to Helsinki-NLP
Research topics of interest include:
*
Development of reliable interactive AI and AI-mediated communication
*
Robust generative AI across domains and languages
*
Efficiency and trustworthiness
*
Explainable AI and ethical/societal aspects of AI development
More details about this position are available from
https://www.ellisinstitute.fi/pi-positions-2026#12-university-of-helsinki--…
*****************************************************************
Jörg Tiedemann
Language Technology https://blogs.helsinki.fi/language-technology/
University of Helsinki
We apologize for the inadvertent multiple submissions
Are you motivated to conduct impactful research? Do you want to collaborate
with a diverse group of scientists to advance wildlife conservation
practices? If so, we invite you to apply to the “WildBotics - Autonomous
Sampling with Robotics in the Wild for Nature Conservation” MSCA Doctoral
Network (WildBotics - Recruitment <https://www.wildbotics.eu/recruitment>).
This innovative program brings together leading European institutions and
industry partners to develop autonomous robotic solutions for nature
conservation. In particular, you will have the opportunity to work closely
with an Italian research group renowned for its long tradition in audio and
speech analysis (https://speechtek.fbk.eu/) , applying these methods to
ecological monitoring and biodiversity protection.
Aim: The research explores AI-driven analysis of soundscapes to detect
species and bioevents, focusing on ultrasonic recordings critical for
understanding biodiversity and ecosystem health. By automating the analysis
of extensive audio data, AI enables faster, efficient tracking of animal
populations and ecosystem changes. This approach offers essential tools for
conservation, aiding in the preservation of species and habitats,
particularly those impacted by human activity or climate change. These
sounds, collected by microphones and specialised devices, are crucial for
identifying various animal species and understanding ecosystem dynamics.
Analytics will support the study of natural soundscape losses, a reflection
of declining biodiversity and disrupted ecosystems. Analytics will be based
on data filtering, feature extraction and segmentation with
CNN/RNN/Transformers models. AI models will be deployed for offline
(post-flight) or almost real-time data processing, exploiting on-board GPU
capabilities, for field work activities (DC11).
Host institution: Fondazione Bruno Kessler (FBK), Italy
PhD enrolment: University of Salamanca (USAL), Spain
Principal / Academic supervisor: A. Brutti (FBK) / D. Gonzalez-Aguilera
(USAL)
The available 12 PhD topics for the Doctoral Candidates are summarized here
<https://www.wildbotics.eu/network/topics>.
The PhDs call is centralized by FBK - the project coordinator - and more
info is available here
<https://jobs.fbk.eu/Annunci/Offerte_di_lavoro_12_Doctoral_Candidates_to_joi…>
.
Once you are ready to apply, please submit your candidature here
<https://script.google.com/macros/s/AKfycbwMXmh7dVoIHtRXJqwuXaetmmMw16pc09fH…>
.
The candidature phase will close on January 7th, 23:59 CET.
------------------------------------------------------------------------------------------------
Giuseppe Daniele Falavigna
Fondazione Bruno Kessler
Via Sommarive 18 - 38123 Povo - Trento, Italy
mail:falavi@fbk.eu - tel:+39(0)461314562 - fax:+39(0)461314591
HomePage: https://speechtek.fbk.eu/people/profile/falavi
-------------------------------------------------------------------------------------------------
--
--
Le informazioni contenute nella presente comunicazione sono di natura
privata e come tali sono da considerarsi riservate ed indirizzate
esclusivamente ai destinatari indicati e per le finalità strettamente
legate al relativo contenuto. Se avete ricevuto questo messaggio per
errore, vi preghiamo di eliminarlo e di inviare una comunicazione
all’indirizzo e-mail del mittente.
--
The information transmitted is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. If you received this in
error, please contact the sender and delete the material.
CFP: EvaLatin 2026 - The Fourth Evaluation Campaign of NLP tools for Latin
* Website: https://circse.github.io/LT4HALA/2026/EvaLatin
*
Date: Monday, May 11 2026
* Place: co-located with LREC 2026, May 11-16, Palma, Mallorca (Spain)
* Submission runs: check task guidelines
* Submission technical reports: https://softconf.com/lrec2026/LT4HALA2026/
* DESCRIPTION
EvaLatin 2026 is the fourth evaluation campaign of NLP tools for Latin. The campaign is designed with the aim of promoting the development of resources and language technologies for the Latin language, and foster collaboration among scholars working on Latin, as well as attracting researchers from different disciplines.
EvaLatin 2026 edition focusses on 2 tasks:
* Dependency Parsing;
* Named Entity Recognition (NER).
The dependency parsing task is based on the Universal Dependencies<https://universaldependencies.org/> (UD) framework. No specific training set is released but participants are free to make use of any (kind of) data/resource they consider useful for the task, including the Latin treebanks already available in the UD collection. In this regard, one of the challenges of this task is to understand which treebank (or combination of treebanks) is the most suitable to deal with new test data.
Test data will be distributed in the CoNLL-U format with gold tokenization, lemmatization, part-of-speech tagging and morphological annotation.
For more details, see the guidelines at the specific section of EvaLatin<https://circse.github.io/LT4HALA/2026/EvaLatin> webpage.
In the NER task, participants are required to develop systems capable of automatically identifying and classifying proper names in provided Classical Latin texts. The goal is to detect the span of the mention and assign it to a pre-defined category. A small sample set (plain text HIPE IOB format) will be made available in advance, together with the guidelines.
Test data will be distributed in the same HIPE IOB format with the values for the NER predictions obscured.
For more details, see the guidelines at the specific section of EvaLatin<https://circse.github.io/LT4HALA/2026/EvaLatin> webpage.
* SUBMISSIONS
Participants are required to submit their runs using specific email addresses (see the guidelines for each task) and to provide a technical report that should include a brief description of their approach, focusing on the adopted algorithms, models and resources, a summary of their experiments, and an analysis of the obtained results.
Technical reports will be included in the proceedings as short papers: the maximum length is 4 pages (excluding references) and they should follow the LREC 2026 official format). Reports will receive a light review (we will check for the correctness of the format, the exactness of results and ranking, and overall exposition). Reports should be submitted using the START submission page of the workshop (https://softconf.com/lrec2026/LT4HALA2026/ ). Reports of the shared tasks are not anonymous. All participants will have the opportunity to present their results at the workshop, as an oral or poster presentation.
Participants are allowed to use any approach (e.g. from traditional machine learning algorithms to Large Language Models) and any resource (annotated and non-annotated data, embeddings): all approaches and resources are expected to be described in the systems’ reports.
Technical reports should follow the LREC stylesheet, which is available on the LREC 2026 website on the Author’s kit page<https://lrec2026.info/authors-kit/>.
* WORKSHOP IMPORTANT DATES
* 22 December 2025: guidelines available
* Evaluation Window I - Task: Dependency Parsing
* 3 February 2026: test data available
* 10 February 2026: system results due to organizers
* Evaluation Window II - Task: Named Entity Recognition
* 12 February 2026: test data available
* 19 February 2026: system results due to organizers
* 10 March 2026: reports due to organizers
* 20 March 2026: short report review deadline
* 27 March 2026: camera ready version of reports due to organizers (strict deadline)
Best Regards,
Federica Iurescia (on behalf of the organizing committee)
Federica Iurescia
Postdoctoral Researcher
LiLa: Linking Latin https://lila-erc.eu/#page-top
CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html
Facoltà di Scienze Linguistiche e Letterature Straniere
Franciscanum Building, 2nd Floor, room 219
Università Cattolica del Sacro Cuore
Largo Gemelli 1,
20123 Milan, Italy
[http://static.unicatt.it/ext-portale/5xmille_firma_mail_2023.jpg] <https://www.unicatt.it/uc/5xmille>
The Department of Software Science, Tallinn University of Technology is seeking for 2 positions on associate professor level, but good candidates at assistant and full professor level are encouraged to contact us as well.
Our department is currently welcoming new people in software engineering and data science fields.
Details of the job and on what it is like to work in Estonia can be found here:
Professor of Data Science<https://candidate.recrur.com/public/jobad/en/d99ed3c8-5>
Professor of Software Engineering<https://candidate.recrur.com/public/jobad/en/fedc5c5d-1>
Our department has strong research groups in programming languages, compositional systems, natural language processing, cyber security and applied AI among others and we are leading many study programs both at graduate and undergraduate level.
--
Marko Kääramees
Head of department of Software Science
Tallinn University of Technology
marko.kaaramees(a)taltech.ee
(Apologies for crossposting)
Dear Colleague
We are pleased to announce that CSEE&T 2026 (The 38th International
Conference on Software Engineering Education and Training) will be held at
The University of Florence, Italy, from July 20 - 22.
The CFP is posted at https://cseet26.techconf.org/download/CFP-CSEE&T-2026.pdf<http://www.cs.ucy.ac.cy/~george/GPLists_2021/lm.php?tk=Y29ycG9yYQkJCWNvcnBv…>
Please refer to the conference website for the most recent updates.
If you have any questions related to paper submission to the main conference,
please send emails to the Program Chairs:
Professor Matthew Barr
Professor Lin Liu, or
Professor Rafal Wlodarski.
For the Academy for Software Engineering Education & Training (ASEE&T) Workshop,
please visit its website or contact Professor Nancy Mead and Professor Hossein Saiedian.
For other inquiries, please get in touch with the CSEET 2026 Secretariat.
We look forward to working with you for a successful conference.
CSEE&T Secretariat
Dear editor,
I am Bin Li, one of the organizers of EvaHan2026. Would you spread this CFP to the corpora list? Thank you so much!
--
Best wishes!
Bin Li
Phone: (86)13813878144
Homepage: http://cognitivebase.com/lib/
School of Chinese Language and Literature,
Nanjing Normal University,China
CFP | EvaHan2026 Ancient Chinese OCR Shared Tasks
EvaHan 2026
https://github.com/GoThereGit/EvaHan
EvaHan 2026 is the Fifth International Evaluation of Ancient Chinese Information Processing, focusing on OCR tasks for multimodal large language models in ancient Chinese.
Co-organized with LT4HALA 2026@LREC 2026, which will be held from May 11 to 16, 2026, in Mallorca, Spain.
EvaHan 2026 is organized by Dongbo Wang, Bin Li, Minuxan Feng, Chao Xu, Weiguang Qu, Liu Liu, Si Shen.
Previous Tasks:
EvaHan 2022
The First Bake-off of Ancient Chinese Automatic Processing was successfully held in Marseille, France, in 2022, with a focus on automatic word segmentation and part-of-speech tagging of ancient Chinese.
EvaHan 2023
The Second Bake-off of Ancient Chinese Automatic Processing was successfully held in Macau, China, in 2023, with a focus on machine translation of ancient Chinese.
EvaHan 2024
The Third Bake-off of Ancient Chinese Automatic Processing was held in Turin, Italy, in 2024, with a focus on automatic sentence segmentation and punctuation of ancient Chinese.
EvaHan 2025
The Fourth Bake-off of Ancient Chinese Automatic Processing was held in New Mexico, USA, in 2025, with a focus on named entity recognition in ancient Chinese.
Important Dates for EvaHan 2026:
Registration deadline: January 30, 2026
Training data release: January 1, 2026
Test data release: February 1, 2026
Running results submission: February 6, 2026
Technical report submission deadline: February 28, 2026
Notification of acceptance: March 1, 2026
Camera-ready papers due: March 10, 2026
Participation
To participate in EvaHan 2026, you must complete the following steps:
Registration:
Submit a registration form to officially register your team for the task. Registration is open from December 1, 2025, to January 30, 2026. Only registered participants will gain access to the training dataset.
Accessing the Training Data:
After completing the registration process, participants will receive instructions for downloading the training dataset, which includes image--text pairs from ancient Chinese texts for OCR.
Submitting Results and Reports:
Participants must use the provided test data to generate results and submit their system outputs and a technical report as per the shared task schedule.
For inquiries or to request the registration form, please contact us at evahan2026(a)gmail.com.
Data
The Evahan 2026 dataset comprises three datasets, covering image-text pairs: plain text images, mixed image-text images, and handwritten images-text. The data underwent initial automatic annotation, followed by meticulous correction and refinement by experts in classical Chinese language and history to ensure the highest quality of the training materials and gold-standard texts.
● Dataset A ( Printed Texts) consists of data selected from the Siku Quanshu (Complete Library of the Four Treasuries), including classics, history, philosophy, and literature, as well as various other ancient books.
● Dataset B (Mixed Layouts) contains mixed image-text data selected from the Siku Quanshu and other ancient books.
● Dataset C (Handwritten Texts) includes handwritten ancient books, primarily the Chinese Buddhist canon, including the Chinese Buddhist canon (TKH) dataset, and the Chinese Buddhist canon (MTH) dataset.
Training Data The training set consists of designated portions of subsets A, B, and C. All training samples are provided in image-text pair format, with text in Traditional Chinese (UTF-8), approximately 5000-10000 image-text pairs per subset. Registered participants will receive the training data via email.
Test Data The test set includes the remaining unseen portions of subsets A, B, and C to ensure comprehensive evaluation of all three challenge types. The data is also provided in image-text pair format, approximately 200-500 image-text pairs per subset. Detailed information and a download link for the test data will be provided to participants before the start of the formal evaluation period.
Task
This section offers a detailed description of the tasks encompassed in EvaHan 2026.
OCR
In many Chinese language processing systems,OCR is a critical task, often performed in parallel with other processing functions. The accuracy and speed of OCR directly determine the overall system's performance and user experience in downstream applications such as document digitization, information extraction, and intelligent retrieval.
Evaluation
Metrics
Each team will only have access to the training data. Later, unlabeled test data will also be released. After the evaluation is complete, the labels for the test data will also be released. Tables 2,3 and 4 provide examples of the scorer output. The evaluation will align the system-generated text with the gold standard. Next, OCR will be evaluated: precision, recall, and F1 score will be calculated. BLEU ROUGE-1, ROUGE-2, and ROUGE-L will also be evaluated, bringing the competition's evaluation to multiple metrics. This evaluation adds layout analysis metrics: mAP and IoU. T he team's final ranking will be based on the overall score. The final ranking of teams will be based on the combined scores.
Two Modalities
Each participant can submit results for both modes. In the closed mode, each team has limited resources. Each team can only use training data and a pre-trained model. This model is a word embedding pre-trained on a large Traditional Chinese corpus. No other resources are allowed in the closed mode.
In the open mode, there are no restrictions on resources, data, or models. Annotated external data, such as processed images or text, may be used. However, each team must disclose all resources, data, and models used in each system in the final report.
How to Participate
Registration time is mentioned above. Participants will be required to submit their runs and to provide a technical report for the task they participated in.
Submitting Runs
Each team can submit runs for two tasks. A run should be produced according to the closed modality. The second run will be produced according to the open modality. The closed run is compulsory, while the open run is optional.
Once the system has produced the results for the task over the test set, participants have to follow these instructions to complete their submission:
The annotated results should be submitted as three plain text files encoded in UTF-8 (four-byte encoding). The specific submission format will be released along with the pre-trained dataset.
Organizers
Dongbo Wang, College of Information Management, Nanjing Agricultural University, China
Bin Li, School of Chinese Language and Literature, Nanjing Normal University, China
Minxuan Feng, School of Chinese Language and Literature, Nanjing Normal University, China
Chao Xu, School of Chinese Language and Literature, Nanjing Normal University, China
Weiguang Qu, School of Computer and Electronic Information /School of Artificial Intelligence, Nanjing Normal University, China
Liu Liu, College of Information Management, Nanjing Agricultural University, China
Si Shen, School of Economics and Management, Nanjing University of Science and Technology, China
Student Members
Dongmei Zhu, College of Information Management, Nanjing Agricultural University, China
Jieqiong Li, College of Information Management, Nanjing Agricultural University, China
Ruifeng Wu,College of Information Management, Nanjing Agricultural University, China
Junyi Yang,College of Information Management, Nanjing Agricultural University, China
Zhixing Xu, School of Chinese Language and Literature, Nanjing Normal University, China
Junjie Li, School of Chinese Language and Literature, Nanjing Normal University, China
Yue Zhu, School of Chinese Language and Literature, Nanjing Normal University, China
Mengting Xu, School of Chinese Language and Literature, Nanjing Normal University, China
Call for Papers
4th Int. Workshop on AI and Semantic Technologies for Scientific, Technical, and Legal Web co-located with The Web Conference 2026
Workshop: April 13 or 14, 2026 - Dubai, UAE
https://semtech4stld.github.io/
-----------------------------------------------------
Important Dates
-----------------------------------------------------
Abstract Submissions: January 5th, 2026
Paper Submissions: January 12th, 2026
Notifications: January 25th, 2026
Camera-Ready Contributions: February 2nd, 2026
Workshop: April 13 or 14, 2026 - Dubai, UAE
All deadlines are 11:59 pm, AoE time (Anywhere on Earth).
------------------------------------------------------
Workshop Aims and Scope
------------------------------------------------------
The SemTech 2026 workshop focuses on methods that combine Semantic Web technologies, Natural Language Processing, Large Language Models (LLMs), and other AI technologies to model knowledge across scientific, technical, and legal domains. The workshop invites research on knowledge graph creation, semantic annotation, LLM–KG hybrid reasoning, and trustworthy AI pipelines that enhance the reliability, interpretability, and reuse of Web data. This is particularly timely as the Web community seeks robust approaches to integrate symbolic and sub-symbolic methods for managing and understanding the growing body of domain-specific knowledge on the Web
-------------------------------------------------------
Workshop Topics
-------------------------------------------------------
We invite contributions on topics related to Semantic Web technologies and deep learning, particularly in the context of scientific, technical, and legal data. Areas of interest include, but are not limited to, the following areas:
Data Collection
- Leveraging LLMs for generating scientific, technical, and legal data.
- New tools and systems for capturing scientific, technical, and legal data, such as scientific articles, patent publications, etc.
- Procedures and tools for storing, sharing, and preserving data on the Web.
- Collecting and sharing data sets such as benchmarks, etc.
- Pipelines and protocols to capture peculiarities from Web data.
- Employing Semantic Web Technologies to represent and preserve sensitive data in terms of ethics, privacy, security, and trust on the Web.
Novel Semantic Technologies for scientific, technical, and legal web:
- Ontologies and annotation schemas to model such data.
- Annotation, linking, and disambiguation of the data.
- Knowledge graph construction.
- LLMs to generate metadata, vocabularies, ontologies, and semantic models for specific data.
Applications for patents, scientific, technical, and legal web:
- Applications based on Generative AI and LLMs.
- Exploiting knowledge graphs for document similarity, question answering, search, etc.
- Semantic content-based retrieval.
- Natural language processing techniques for classification, summarization, etc.
- Exploratory search using semantic technologies on scientific, technical, and legal data.
- Key enabling tools (also based on LLMs) for accessing and using data on the Web.
- Lessons learned and use cases from both academia and industry around semantic models and LLMs for data in specific domains.
-------------------------------------------------------
Submission Details
-------------------------------------------------------
Formatting Requirements. Submissions must be written in English, in double-column format, and must adhere to the ACM template and format (also available in Overleaf). The review process will follow a single-blind protocol.
Key Participation Requirement: At least one author per paper must be registered for the workshop, attend in person, and present their work.
- Full Research Papers (6-8 pages maximum) should be clearly placed with respect to the state of the art and state the contribution of the proposal in the domain of application, even if presenting preliminary results. In particular, research papers should describe the methodology in detail, experiments should be repeatable, and a comparison with the existing approaches in the literature is encouraged.
- Replicability/Reproducibility papers (4 pages) should involve repeating prior experiments using the source code and datasets to analyze existing methods and their limitations. Alternatively, authors may assess the robustness of previous work by applying the original code in new contexts, such as different domains or datasets.
- Short Papers (4 pages) should describe significant novel work in progress. Compared to full papers, their contribution may be narrower in scope, be applied to a narrower set of application domains, or have weaker empirical support than that expected for a full paper. Submissions likely to generate discussions in new and emerging areas of legal data are encouraged.
Submissions should not exceed the indicated number of pages, including any diagrams and references.
PROCEEDINGS PUBLICATION Due to conference policy, papers accepted by the workshop will be included in the Companion Proceedings of the Web Conference 2026, which are archived in the ACM Digital Library, subject to meeting the ACM open-access, formatting guidelines, and camera-ready timeline as provided and observed by the ACM Web Conference. See the section Important update on ACM's new open access publishing model for 2026 ACM Conferences! on the conference website.
---------------------------------------------------------
Workshop Chairs
---------------------------------------------------------
Rima Dessi´
Higher Colleges of Technology, United Arab Emirates (UAE)
Hidir Aras
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany
Jeenu Joy
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Germany
Danilo Dessi´
Department of Computer Science, College of Computing and Informatics, University of Sharjah, Sharjah, UAE
Francesco Osborne
The Open University, Milton Keynes, United Kingdom
-----------------------------------------------------------
Contacts
-----------------------------------------------------------
For general inquiries on the workshop, please send an email to ddessi(a)sharjah.ac.ae
*To be held at EACL 2026 (March 24-29 in Rabat, Morocco)*
*Workshop description*
The 8th SIGTYP Workshop aims to provide a forum for bridging linguistic
typology, multilingual NLP, and adjacent areas to develop truly
multilingual NLP methods. The workshop raises awareness of linguistic
typology and its potential to broaden the global reach of multilingual NLP
and introduces computational approaches to typology. We welcome open
problems and discussion, inviting contributions from researchers in
multilingual/cross-lingual NLP and leading scholars in linguistic typology.
In 2026, we place a special emphasis on the utility of LLMs for typological
research.
*SIGTYP is the first dedicated venue for typology-related research and its
integration in multilingual NLP. Appropriate topics include (but are not
limited to):*
- *Integration of typological features in language transfer and joint
multilingual learning. *Beyond techniques such as “selective sharing,”
what other ways can we encode heterogeneous external knowledge in ML
algorithms?
- *Development of unified taxonomy and resources. *Building universal
databases/models to support the understanding and processing of diverse
languages.
- *Automatic inference of typological features. *Pros/cons of existing
techniques (e.g., heuristics from morphosyntactic annotation, propagation
from related languages, supervised Bayesian/neural models) and emerging
approaches.
- *Typology and interpretability. *Using typological knowledge to
interpret hidden representations of multilingual models, guide multilingual
data generation/selection, and annotate texts.
- *Improvement and completion of typological databases. *Combining
linguistic expertise with data-driven methods to advance knowledge of
cross-linguistic variation and universals.
- *Linguistic diversity and universals; cross-lingual annotation. *Which
phenomena/categories should be considered universal? How should they be
annotated?
- *Using LLMs for typological studies. *Can LLMs help formulate/test
typological hypotheses? Can they make valid cross-linguistic
generalisations?
-
- *Additional topics include* constructed language generation, universals
in diachronic language change, information-theoretic approaches to
typology, and automated approaches to etymology.
*Important Dates (23:59 AoE)*
- *Direct submission deadline: December 26, 2025*
- *Pre-reviewed (ARR) submission deadline: January 2, 2026*
- *Notification of acceptance: January 23, 2026*
- *Camera-ready deadline: February 3, 2026*
- *Workshop date: During EACL 2026 (March 24–29, 2026; exact day TBA)*
*Submissions*
We invite extended *abstract submissions (non-archival) *and *general paper
submissions (archival)*. The accepted submissions will be presented at the
workshop, providing new insights and ideas. Extended abstracts should
describe already published work or work in progress and should *not exceed
two (2) pages*. This way, we will not discourage researchers from
preferring main conference proceedings, while ensuring that engaging and
thought-provoking research is presented at the workshop. For general
(archival) submissions, we accept both long and short papers. Short papers
should* not exceed four (4) pages, long papers should not exceed eight (8)
pages.* Unlimited additional pages are allowed for the references section
in all submission types.
*Submissions should be anonymous, without authors or an acknowledgement
section; self-citations should appear in third person.*
*Format: *
Submissions must follow the ACL 2025 stylesheet (
https://github.com/acl-org/acl-style-files), and both long and short paper
submissions must follow the two-column format of ACL proceedings. All
submissions must be in PDF format.
Submission Link:
https://openreview.net/group?id=eacl.org/EACL/2026/Workshop/SIGTYP
*SIGTYP 2026: *https://sigtyp.github.io/
*Organizing Committee*
Priya Rani, Michael Hahn, Andreas Shcherbakov, Oleg Serikov, Alexey
Sorokin, Ryan Cotterell and Kat Vylomova
*Anti-harassment policy*
The workshop follows the ACL anti-harassment policy:
https://www.aclweb.org/adminwiki/index.php?title=Anti-Harassment_Policy.
*Contact*
For any inquiries regarding the workshop, please send an email to the
Organising Committee at sigtyp(a)gmail.com
Regards,
Priya.