First CfP: Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2023) - Corpora

25 Jan 2023


      [apologies for x-posting]
Call for Papers and Extended Abstracts
Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2023)
collocated with the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Norðurlandahúsið - The Nordic House in Tórshavn, Faroe Islands
22nd May 2023
https://resourceful-workshop.github.io/resourceful-2023/
Important dates:
- Submission deadline (both papers and abstracts): 28th March 2023
- Notification of acceptance: 25th April 2023
- Camera-ready version: 9th May 2023
- Workshop date: 22nd May 2023
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
Workshop description
The second workshop on resources and representations for under-resourced language and domains (RESOURCEFUL-2023) explores the role of the kind and the quality of resources that are available to us and challenges and directions for constructing new resources in light of the latest trends in natural language processing.
Data-driven machine-learning techniques in natural language processing have achieved remarkable performance (e.g., BERT, GPT, ChatGPT) but in order to do so large quantities of quality data (which is mostly text) is required. Interpretability studies of large language models in both text-only and multi-modal setups have revealed that even in cases where large text datasets are available, the models still do not cover all the contexts of human social activity and are prone to capturing unwanted bias where data is focused towards only some contexts. A question has also been raised whether textual data is enough to capture semantics of natural language processing and other modalities such as visual representations or a situated context of a robot might be required. Annotator-based resources have been constructed over years based on theoretical work in linguistics, psychology and related fields and a large amount of work has been done both theoretically and practically.
The purpose of the workshop is to initiate a discussion between the two communities involved in building resources (data vs annotation-based) and exploring their synergies for the new challenges in natural language processing. We encourage contributions in the areas of resource creation, representation learning and interpretability in data-driven and expert-driven machine learning setups and both uni-modal and multi-modal scenarios.
In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
- What is relevant linguistic knowledge the models should capture and how can this knowledge be sampled and extracted in practice?
- What kind of linguistic knowledge do we want and can capture in different contexts and tasks?
- To what degree are resources that have been traditionally aimed at rule-based natural language processing approaches relevant today both for machine learning techniques and hybrid approaches?
- How can they be adapted for data-driven approaches?
- To what degree data-driven approaches can be used to facilitate expert-driven annotation?
- What are current challenges for expert-based annotation?
- How can crowd-sourcing and citizen science be used in building resources?
- How can we evaluate and reduce unwanted biases?
Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, psychology, computational linguistics, speech, computer science, machine learning, computer vision etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and presentations of on-going or completed research.
This workshop is a continuation of the first workshop on resources and representations for under-resourced languages and domains held together with the SLTC 2020, https://gu-clasp.github.io/resourceful-2020/.
Submission
We invite submissions of both long (8 pages) and short papers (4 pages) with any number of pages for references. All submissions must follow the NoDaLida template, available in both LaTeX and MS Word, the templates are available at the official conference website, https://www.nodalida2023.fo/authorkit-nodalida23 Submissions must be anonymous and submitted in the PDF format through OpenReview.
We also invite submissions of maximum 2-page extended non-anonymous abstracts with any number of pages for references describing work in progress, negative results and opinion pieces. Papers related to our theme and already presented at other venues or have already been published elsewhere will be considered for acceptance for presentation as well. The abstracts, which should follow the same formatting templates as the archival track, will be reviewed by the workshop organisers and the accepted ones will be posted on the workshop website.
Workshop organisers
Dana Dannélls, Språkbanken Text, University of Gothenburg
Simon Dobnik, CLASP, University of Gothenburg
Adam Ek, CLASP, University of Gothenburg
Stella Frank, University of Copenhagen
Nikolai Ilinykh, CLASP, University of Gothenburg
Beáta Megyesi, Uppsala University
Felix Morger, Språkbanken Text, University of Gothenburg
Joakim Nivre, RISE and Uppsala University
Magnus Sahlgren, AI Sweden
Sara Stymne, Uppsala University
Jörg Tiedemann, University of Helsinki
Lilja Øvrelid, University of Oslo
resourceful-2023@listserv.gu.se