Dear colleagues,
[Apologies for cross-posting]
In 2024, SIGTYP is hosting a *Shared Task on Word Embedding Evaluation for
Ancient and Historical Languages*: https://sigtyp.github.io/st2024.html The
workshop will be co-located with EACL.
*Summary*
In recent years, sets of downstream tasks called benchmarks have become a
very popular, if not default, method to evaluate general-purpose word and
sentence embeddings. Starting with decaNLP (McCann et al., 2018) and
SentEval (Conneau & Kiela, 2018), multitask benchmarks for NLU keep
appearing and improving every year. However, even the largest multilingual
benchmarks, such as XGLUE, XTREME, XTREME-R or XTREME-UP (Hu et al., 2020;
Liang et al., 2020; Ruder et al., 2021, 2023), only include modern
languages. When it comes to ancient and historical languages, scholars
mostly adapt/translate intrinsic evaluation datasets from modern languages
or create their own diagnostic tests. We argue that there is a need for a
universal evaluation benchmark for embeddings learned from ancient and
historical language data and view this shared task as a proving ground for
it.
The shared task involves solving the following problems for 12+ ancient and
historical languages that belong to 4 language families and use 6 different
scripts. Participants will be invited to describe their system in a paper
for the SIGTYP workshop proceedings. The task organisers will write an
overview paper that describes the task and summarises the different
approaches taken, and analyses their results.
*Subtasks*
For subtask A, participants are not allowed to use any additional data;
however, they can reduce and balance provided training datasets if they see
fit. For subtask B, participants are allowed to use any additional data in
any language, including pre-trained embeddings and LLMs.
A. Constrained
1. POS-tagging
2. Full morphological annotation
3. Lemmatisation
B. Unconstrained
1. POS-tagging
2. Full morphological annotation
3. Lemmatisation
4. Filling the gaps
- Word-level
- Character-level
*Data*
For tasks 1-3, we use Universal Dependencies v. 2.12 data (Zeman et al.,
2023) in 11 ancient and historical languages, complemented by 5 Old
Hungarian codices from the MGTSZ website (HAS Research Institute for
Linguistics, 2018) that are annotated to the same standard as the corpora
available through UD. For task 4, we add historical Irish data from CELT (Ó
Corráin et al., 1997), Corpas Stairiúil na Gaeilge (Acadamh Ríoga na
hÉireann, 2017), and digital editions of the St. Gall glosses (Bauer et
al., 2017) and the Würzburg glosses (Doyle, 2018) as a case study of how
performance may vary on different historical stages of the same language.
We set the upper temporal boundary to 1700 CE and do not include texts
created later than this date in our dataset. List of languages:
- Ancient Greek
- Ancient Hebrew
- Classical Chinese
- Coptic
- Gothic
- Classical, Late & Medieval Latin
- Medieval Icelandic
- Old Church Slavonic
- Old East Slavic
- Old French
- Old Hungarian
- Old, Middle & Early Modern Irish
- Vedic Sanskrit
*Important dates*
*05 Nov 2023*: Release of training and validation data
*02 Jan 2024*: Release of test data
*08 Jan 2024*: Submission of the systems
*13 Jan 2024*: Notification of results
*20 Jan 2024*: Submission of shared task papers
*27 Jan 2024*: Notification of acceptance to authors
*03 Feb 2024*: Camera-ready
*15 Mar 2024*: Video recordings due
*21/22 Mar 2024*: SIGTYP workshop
*Important links*
- *Registration form*
<https://docs.google.com/forms/d/e/1FAIpQLSdINgMfzzZGIZ-uBVQhvyndB6yeaaj-wT7…>
- Data + detailed description: https://github.com/sigtyp/ST2024
*Task organisers*
- Oksana Dereza, Insight SFI Research Centre for Data Analytics, Data
Science Institute, University of Galway
- Priya Rani, SFI Centre for Research and Training in AI, Data Science
Institute, University of Galway
- Atul Kr. Ojha, Insight SFI Research Centre for Data Analytics, Data
Science Institute, University of Galway
- Adrian Doyle, Insight SFI Research Centre for Data Analytics, Data
Science Institute, University of Galway
- Pádraic Moran, School of Languages, Literatures and Cultures, Moore
Institute, University of Galway
- John P. McCrae, Insight SFI Research Centre for Data Analytics, Data
Science Institute, University of Galway
*Contact details*
- Oksana: oksana.dereza(a)insight-centre.org
- Priya: priya.rani(a)insight-centre.org
Best wishes,
Oksana and the organisers
--
[image: https://nuig.insight-centre.org/]
<https://www.insight-centre.org/>
Oksana Dereza | PhD student on the Cardamom
<http://cardamom.insight-centre.org/> project | Unit for Linguistic Data |
Insight Centre for Data Analytics | Data Science Institute | University of
Galway
Oksana Dereza | Iarrthóir PhD ar thionscadal Cardamom
<http://cardamom.insight-centre.org/> | An tAonad um Shonraí Teangeolaíocha
| Insight, Ionad na hAnailísíochta Sonraí | Institiúid Eolaíochta Sonraí |
Ollscoil na Gaillimhe
International Conference ‘New Trends in Translation and Technology’ (NeTTT’2024)
Varna, Bulgaria, 4-7 July 2024
First Call for Papers
The conference
The second edition of the forthcoming International Conference ‘New Trends in Translation and Technology’ (NeTTT’2024) will take place in Varna, Bulgaria, 4-7 July 2024.
The objective of the conference is (i) to bridge the gap between academia and industry in the field of translation and interpreting by bringing together academics in linguistics, translation studies, machine translation and natural language processing, developers, practitioners, language service providers and vendors who work on or are interested in different aspects of technology for translation and interpreting, and (ii) to be a distinctive event for discussing the latest developments and practices. NeTTT’2024 invites all professionals who would like to learn about the new trends, present the latest work or/and share their experience in the field, and who would like to establish business and research contacts, collaborations and new ventures.
The conference will take the form of presentations (peer-reviewed research and user presentations, keynote speeches), and posters; it will also feature panel discussions. The accepted papers will be published as open-access conference e-proceedings.
Conference topics
Contributions are invited on any topic related to latest technology and practices in machine translation, translation, subtitling, localisation and interpreting.
NeTTT’2024 will feature a Special Theme Track "Future of Translation Technology in the Era of LLMs and Generative AI".
The conference topics include but are not limited to:
CAT tools
* Translation Memory (TM) systems
* NLP and MT for translation memory systems
* Terminology extraction tools
* Localisation tools
Machine Translation
* Latest developments in Neural Machine Translation
* MT for under-resourced languages
* MT with low computing resources
* Multimodal MT
* Integration of MT in TM systems
* Resources for MT
Technologies for MT deployment
* MT evaluation techniques, metrics and evaluation results
* Human evaluations of MT output
* Evaluating MT in a real-world setting
* Quality estimation for MT
* Domain adaptation
Translation Studies
* Corpus-based studies applied to translation
* Corpora and resources for translation
* Translationese
* Cognitive effort and eye-tracking experiments in translation
Interpreting studies
* Corpus-based studies applied to interpreting
* Corpora and resources for interpreting
* Interpretationese
* Resources for interpreting and interpreting technology applications
* Cognitive effort and eye-tracking experiments in interpreting
Interpreting technology
* Machine interpreting
* Computer-aided interpreting
* NLP for dialogue interpreting
* Development of NLP based applications for communication in public service settings (healthcare, education, law, emergency services)
Emerging Areas in Translation and Interpreting
* MT and translation tools for literary texts and creative texts
* MT for social media and real-time conversations
* Sign language recognition and translation
Subtitling
* NLP and MT for subtitling
* Latest technology for subtitling
User needs
* Analysis of translators’ and interpreters’ needs in terms of translation and interpreting technology
* User requirements for interpreting and translation tools
* Incorporating human knowledge into translation and interpreting technology
* What existing translators’ (including subtitlers’) and interpreters’ tools do not offer
* User requirements for electronic resources for translators and interpreters
* Translation and interpreting workflows in larger organisations and the tools for translation and interpreting employed
The business of translation and interpreting
* Translation workflow and management
* Technology adoption by translators and industry
* Setting up translation /interpreting / language provider company
Teaching translation and interpreting
* Teaching Machine Translation
* Teaching translation technology
* Teaching interpreting technology
* Latest AI developments in the syllabi of translation and interpreting curricula
Ethical issues in translation and technology
* Bias and fairness in MT
* Privacy and security in cloud MT systems
* Transparency and explainability of MT systems
* Environmental impact on MT systems
Special Theme Track - Future of Translation Technology in the Era of LLMs and Generative AI
We are excited to share that NeTTT’2024 will have a special theme with the goal of stimulating discussion around Large Language Models, Generative AI and the Future of Translation and Interpreting Technology. While the new generation of Large Language Models such as CHATGPT and LLAMA showcase remarkable advancements in language generation and understanding, we find ourselves in uncharted territory when it comes to their performance on various Translation and Interpreting Technology tasks with regards to fairness, interpretability, ethics and transparency.
The theme track invites studies on how LLMs perform on Translation and Interpreting Technology tasks and applications, and what this means for the future of the field. The possible topics of discussion include (but are not limited to) the following:
* Changes in the translators and interpreters’ professions in the new AI era especially as a result of the latest developments in LLMSs and Generative AI
* Generative AI and translation
* Generative AI and interpreting
* Augmenting machine translation systems with generative AI
* Domain and terminology adaptation with Large Language Models
* Literary translation with Large Language Models
* Improving Machine Translation Quality with Contextual Prompts in Large Language Models
* Prompt engineering for translation
* Generative AI for professional translation
* Generative AI for professional interpreting
We anticipate having a special session on this theme at the conference.
Submissions and publication
NETTT’2024 invites the following types of submissions:
User papers – for industry and practitioners. References to related work are optional. Allowed paper length: between 1 and 4 pages.
Academic submissions, in three different categories (have to follow formatting requirements, references to related work are required):
* (academic) full papers – describing original completed research. Allowed paper length: maximum 8 pages + 2 for references.
* (academic) work-in-progress papers/posters – describing work in progress, late breaking research, papers at a more conceptual stage, and other types of papers that do not fit in the ‘full’ papers category. Allowed paper length: maximum 6 pages + 2 for references.
* (academic) demo papers – describing working systems. Allowed paper length: maximum 4 pages + 2 for references. In addition to the papers, the authors will be expected to demonstrate the systems at the workshop.
The submission will be electronic, using the Softconf START conference management system which will be available at the conference website soon. The follow up calls will provide more further submission details.
Each submission will be reviewed by three members of the Programme Committee.
The final version of the accepted papers will be published in e-proceedings with assigned ISBN and DOI.
All accepted papers will be included in the conference e-proceedings which will be available at the conference.
Schedule
Submission deadline: 31 March 2024
Notification: 5 June 2024
Final version due: 20 June 2024
Venue
The conference will take place at Conference Hotel Cherno More<https://www.chernomorebg.com/en/conference-centre.html>, Varna, situated only 200 m away from the fine sandy Black Sea beach.
Further information and contact details
The second call for papers is expected in December 2023 and registration will be open as from January 2024. The follow-up calls will list keynote speakers, conference chairs and members of the programme committee once confirmed.
The conference website is https://nettt-conference.com and will be updated on a regular basis. For further information, please contact us at nettt2024(a)nettt-conference.com
Dear All,
CASE 2024 will be held at EACL 2024. Call for papers:
https://emw.ku.edu.tr/case-2024/
We are organizing two shared tasks. Please register and participate. We
have multimodal hate speech event detection as well and it is a nice
opportunity for those who could not submit last time.
*Climate Activism Stance and Hate Event Detection Shared Task at CASE 2024*
*Task Description: *Hate speech detection and stance detection are some of
the most important aspects of event identification during climate change
activism events. In the case of hate speech detection, the event is the
occurrence of hate speech, the entity is the target of the hate speech, and
the relationship is the connection between the two. The hate speech event
has targets to which hate is directed. Identification of targets is an
important task within hate speech event detection. Additionally, stance
event detection is an important part of assessing the dynamics of protests
and activisms for climate change. This helps to understand whether the
activist movements and protests are being supported or opposed. This task
will have three subtasks (i) Hate speech identification (ii) Targets of
Hate Speech Identification (iii) Stance Detection.
*Codalab Link: *https://codalab.lisn.upsaclay.fr/competitions/16206
*Registration: *In order to register for the shared task, please send a
request in codalab. The organizers will approve requests on a daily basis.
*GitHub Page: *https://github.com/therealthapa/case2024-climate
*Shared task on Multimodal Hate Speech Event Detection at CASE 2024*
*Task Description: *Hate speech detection is one of the most important
aspects of event identification during political events like invasions. In
the case of hate speech detection, the event is the occurrence of hate
speech, the entity is the target of the hate speech, and the relationship
is the connection between the two. Since multimodal content is widely
prevalent across the internet, the detection of hate speech in
text-embedded images is very important. Given a text-embedded image in the
context of the Russia-Ukraine crisis, this task aims to automatically
identify hate speech and its targets. This task will have two subtasks (i)
Hate speech identification (ii) Targets of Hate Speech Identification.
*Codalab Link: *https://codalab.lisn.upsaclay.fr/competitions/16203
*Codalab Link to Multimodal Hate Speech Event Detection Shared Task at CASE
2023:* https://codalab.lisn.upsaclay.fr/competitions/13087
*Registration: *In order to register for the shared task, please send a
request in codalab. The organizers will approve requests on a daily basis.
*GitHub Page: *https://github.com/therealthapa/case2024-multimodal-hate
Best Regards,
Surendrabikram Thapa
Research Faculty
Virginia Tech, USA
Dear Members of the SIGUL list,
we are happy to announce that the proceedings of the SIGUL 2023 Workshop
in Dublin are now available on the ISCA archive.
https://doi.org/10.21437/SIGUL.2023
All the best
Claudia Soria, Maite Melero, Sakriani Sakti
--
Ricercatrice
Istituto di Linguistica Computazionale "A. Zampolli"
Consiglio Nazionale delle Ricerche
Pisa
--
Researcher
"A. Zampolli" Institute for Computational Linguistics
National Research Council
Pisa, Italy