VarDial 2024, the eleventh workshop on NLP for similar languages, varieties and dialects, will be held in conjunction with NAACL in Mexico City, on June 20/21, 2024.
We welcome papers dealing with one or more of the following topics: - Corpora, resources, and tools for similar languages, varieties and dialects; - Adaptation of tools (taggers, parsers) for similar languages, varieties and dialects; - Evaluation of language resources and tools when applied to language varieties; - Reusability of language resources in NLP applications (e.g., for machine translation, POS tagging, syntactic parsing, etc.); - Corpus-driven studies in dialectology and language variation; - Computational approaches to mutual intelligibility between dialects and similar languages; - Automatic identification of lexical variation; - Automatic classification of language varieties; - Text similarity and adaptation between language varieties; - Linguistic issues in the adaptation of language resources and tools (e.g., semantic discrepancies, lexical gaps, false friends); - Machine translation between closely related languages, language varieties and dialects. In addition to the topics listed above, we also welcome papers dealing with diachronic language variation (e.g. phylogenetic methods, historical dialects).
Paper submission deadline: March 10, 2024 (AoE) Details: https://sites.google.com/view/vardial-2024/call-for-papers
The VarDial workshop has a history of hosting well-attended shared tasks on various dialects and languages. In 2024, we organize the two following tasks:
1. The DIALECT-COPA shared task on dialectal causal commonsense reasoning
This shared task invites the community to propose, develop, and test approaches for adapting models for causal commonsense language understanding to three dialects of South-Slavic languages: the Slovenian Cerkno dialect, the Croatian Chakavian dialect, and the Serbian, Macedonian and Bulgarian Torlak dialect. Training and development data based on the COPA (Choice of plausible alternatives, Roemmele et al. 2011) dataset are available for four related standard languages (Slovenian, Croatian, Serbian, Macedonian) and two out of the three testing dialects (Cerkno, Torlak), the Chakavian dialect serving as a surprise dialect.
2. DSL-ML - Multi-label classification of similar languages
The DSL-ML task is a multi-label extension of the classic "Discriminating similar languages" task that has been popular with VarDial since the beginnings of the workshop. The motivation behind this new task formulation is that some texts do not present any linguistic markers to unambiguously determine their origin. It therefore makes sense to predict several possible labels for such texts. The 2024 DSL-ML task is based on multi-label conversions of existing datasets from five different macro-languages: English, Spanish, Portuguese, French and BCMS (Bosnian, Croatian, Montenegrin, Serbian).
Test results submission deadline: March 11, 2024 (AoE) System description paper submission deadline: March 24, 2024 (AoE) Registration: https://forms.gle/UcLYcPgDFJoiAVip7 Details: https://sites.google.com/view/vardial-2024/shared-tasks