2nd Call for Papers: Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2023) - Corpora

1 Mar 2023


      [Apologies for cross-posting]
The second workshop on resources and representations for under-resourced 
language and domains (RESOURCEFUL-2023, 
https://resourceful-workshop.github.io/resourceful-2023/index.html) 
explores the role of the kind and the quality of resources that are 
available to us and challenges and directions for constructing new 
resources in light of the latest trends in natural language processing. 
The workshop is co-located with NoDaLiDa2023 
(https://www.nodalida2023.fo/) at Tórshavn, Faroe Islands on May 
22nd-24th, 2023.
Data-driven machine-learning techniques in natural language processing 
have achieved remarkable performance (e.g., BERT, GPT, ChatGPT) but in 
order to do so large quantities of quality data (which is mostly text) 
is required. Interpretability studies of large language models in both 
text-only and multi-modal setups have revealed that even in cases where 
large text datasets are available, the models still do not cover all the 
contexts of human social activity and are prone to capturing unwanted 
bias where data is focused towards only some contexts. A question has 
also been raised whether textual data is enough to capture semantics of 
natural language processing and other modalities such as visual 
representations or a situated context of a robot might be required. 
Annotator-based resources have been constructed over years based on 
theoretical work in linguistics, psychology and related fields and a 
large amount of work has been done both theoretically and practically.
The purpose of the workshop is to initiate a discussion between the two 
communities involved in building resources (data vs annotation-based) 
and exploring their synergies for the new challenges in natural language 
processing. We encourage contributions in the areas of resource 
creation, representation learning and interpretability in data-driven 
and expert-driven machine learning setups and both uni-modal and 
multi-modal scenarios.
In particular we would like to open a forum by bringing together 
students, researchers, and experts to address and discuss the following 
questions:
  - What is relevant linguistic knowledge the models should capture and 
how can this knowledge be sampled and extracted in practice?
  - What kind of linguistic knowledge do we want and can capture in 
different contexts and tasks?
  - To what degree are resources that have been traditionally aimed at 
rule-based natural language processing approaches relevant today both 
for machine learning techniques and hybrid approaches?
  - How can they be adapted for data-driven approaches?
  - To what degree data-driven approaches can be used to facilitate 
expert-driven annotation?
  - What are current challenges for expert-based annotation?
  - How can crowd-sourcing and citizen science be used in building 
resources?
  - How can we evaluate and reduce unwanted biases?
Intended participants are researchers, PhD students and practitioners 
from diverse backgrounds (linguistics, psychology, computational 
linguistics, speech, computer science, machine learning, computer vision 
etc). We foresee an interactive workshop with plenty of time for 
discussion, complemented with invited talks and presentations of 
on-going or completed research.
This workshop is a continuation of the first workshop on resources and 
representations for under-resourced languages and domains held together 
with the SLTC 2020, https://gu-clasp.github.io/resourceful-2020/.
** Important dates:
- Submission deadline for archival papers: 28th March 2023
  - Submission deadline for non-archival papers: 4 April 2023
  - Notification of acceptance: 25th April 2023
  - Camera-ready version: 9th May 2023
  - Workshop date: 22nd May 2023
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
** Submission
We invite submissions of long papers (8 pages), short papers (4 pages), 
and extended abstracts describing work in progress (2 pages). 
Submissions can report negative results and be opinion pieces. Both 
papers and extended abstracts can include any number of pages for 
references. All submissions must follow the NoDaLida template, available 
in both LaTeX and MS Word, the templates are available at the official 
conference website, https://www.nodalida2023.fo/authorkit-nodalida23 
Submissions must be anonymous and submitted in the PDF format through 
OpenReview.
We also invite submissions of non-archival papers related to our theme 
already presented or published at other venues. These can be submitted 
in their original formatting. They will be reviewed by the workshop 
organisers and the accepted ones will be posted on the workshop website.
Authors may be asked to contribute peer-reviews of papers.
** Workshop organisers
Dana Dannélls, Språkbanken Text, University of Gothenburg
Simon Dobnik, CLASP, University of Gothenburg
Adam Ek, CLASP, University of Gothenburg
Stella Frank, University of Copenhagen
Nikolai Ilinykh, CLASP, University of Gothenburg
Beáta Megyesi, Uppsala University
Felix Morger, Språkbanken Text, University of Gothenburg
Joakim Nivre, RISE and Uppsala University
Magnus Sahlgren, AI Sweden
Sara Stymne, Uppsala University
Jörg Tiedemann, University of Helsinki
Lilja Øvrelid, University of Oslo