Test data has been released, and CodaLab competitions are up and running, so we encourage you to register if you still haven't! There is still a week before the deadline. :)
SummaryIn recent years, sets of 
downstream tasks called benchmarks have become a very popular, if not 
default, method to evaluate general-purpose word and sentence 
embeddings. Starting with decaNLP (McCann et al., 2018) and SentEval 
(Conneau & Kiela, 2018), multitask benchmarks for NLU keep appearing
 and improving every year. However, even the largest multilingual 
benchmarks, such as XGLUE, XTREME, XTREME-R or XTREME-UP (Hu et al., 
2020; Liang et al., 2020; Ruder et al., 2021, 2023), only include modern
 languages. When it comes to ancient and historical languages, scholars 
mostly adapt/translate intrinsic evaluation datasets from modern 
languages or create their own diagnostic tests. We argue that there is a
 need for a universal evaluation benchmark for embeddings learned from 
ancient and historical language data and view this shared task as a 
proving ground for it.
The shared task involves solving the 
following problems for 12+ ancient and historical languages that belong 
to 4 language families and use 6 different scripts. Participants will be
 invited to describe their system in a paper for the
 SIGTYP workshop proceedings. The task organizers will write an overview
 paper that describes the task and summarizes the different approaches 
taken, and analyzes their results. 
Subtasks
For subtask A, participants are not allowed to use any additional data; 
however, they can reduce and balance provided training datasets if they 
see fit. For subtask B, participants are allowed to use any additional 
data in any language, including pre-trained embeddings and LLMs. 
A. Constrained
-     POS-tagging
 -     Full morphological annotation
 -     Lemmatisation
 
B. Unconstrained
-     POS-tagging
 -     Detailed morphological annotation
 -     Lemmatisation
 -     Filling the gaps
 - Word-level
 - Character-level