[Apologies for cross-posting]
The second workshop on resources and representations for under-resourced language and domains (RESOURCEFUL-2023) explores the role of the kind and the quality of resources that are available to us and challenges and directions for constructing new resources in light of the latest trends in natural language processing.
Data-driven machine-learning techniques in natural language processing have achieved remarkable performance (e.g., BERT, GPT, ChatGPT) but in order to do so large quantities of quality data (which is mostly text) is required. Interpretability studies of large language models in both text-only and multi-modal setups have revealed that even in cases where large text datasets are available, the models still do not cover all the contexts of human social activity and are prone to capturing unwanted bias where data is focused towards only some contexts. A question has also been raised whether textual data is enough to capture semantics of natural language processing and other modalities such as visual representations or a situated context of a robot might be required. Annotator-based resources have been constructed over years based on theoretical work in linguistics, psychology and related fields and a large amount of work has been done both theoretically and practically.
The purpose of the workshop is to initiate a discussion between the two communities involved in building resources (data vs annotation-based) and exploring their synergies for the new challenges in natural language processing. We encourage contributions in the areas of resource creation, representation learning and interpretability in data-driven and expert-driven machine learning setups and both uni-modal and multi-modal scenarios.
In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
- What is relevant linguistic knowledge the models should capture and how can this knowledge be sampled and extracted in practice? - What kind of linguistic knowledge do we want and can capture in different contexts and tasks? - To what degree are resources that have been traditionally aimed at rule-based natural language processing approaches relevant today both for machine learning techniques and hybrid approaches? - How can they be adapted for data-driven approaches? - To what degree data-driven approaches can be used to facilitate expert-driven annotation? - What are current challenges for expert-based annotation? - How can crowd-sourcing and citizen science be used in building resources? - How can we evaluate and reduce unwanted biases?
Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, psychology, computational linguistics, speech, computer science, machine learning, computer vision etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and presentations of on-going or completed research.
This workshop is a continuation of the first workshop on resources and representations for under-resourced languages and domains held together with the SLTC 2020, https://gu-clasp.github.io/resourceful-2020/.
** Important dates: - Submission deadline for archival papers: 28th March 2023 - Submission deadline for non-archival papers: 4 April 2023 - Notification of acceptance: 25th April 2023 - Camera-ready version: 9th May 2023 - Workshop date: 22nd May 2023
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
** Submission We invite submissions of long papers (8 pages), short papers (4 pages), and extended abstracts describing work in progress (2 pages). Submissions can report negative results and be opinion pieces. Both papers and extended abstracts can include any number of pages for references. All submissions must follow the NoDaLida template, available in both LaTeX and MS Word, the templates are available at the official conference website, https://www.nodalida2023.fo/authorkit-nodalida23 Submissions must be anonymous and submitted in the PDF format through OpenReview.
We also invite submissions of non-archival papers related to our theme already presented or published at other venues. These can be submitted in their original formatting. They will be reviewed by the workshop organisers and the accepted ones will be posted on the workshop website.
Authors may be asked to contribute peer-reviews of papers.
** Workshop organisers Dana Dannélls, Språkbanken Text, University of Gothenburg Simon Dobnik, CLASP, University of Gothenburg Adam Ek, CLASP, University of Gothenburg Stella Frank, University of Copenhagen Nikolai Ilinykh, CLASP, University of Gothenburg Beáta Megyesi, Uppsala University Felix Morger, Språkbanken Text, University of Gothenburg Joakim Nivre, RISE and Uppsala University Magnus Sahlgren, AI Sweden Sara Stymne, Uppsala University Jörg Tiedemann, University of Helsinki Lilja Øvrelid, University of Oslo
--- Adam Ek PhD Student Centre for Linguistic Theories and Studies in Probability (CLASP) Department of Philosophy, Linguistics and Theory of Science University of Gothenburg adam.ek@gu.se