Second Call for Papers: GenBench, the second workshop on generalisation (benchmarking) in NLP @ EMNLP 2024

17 Jul 2024


      *GenBench: The second workshop on generalisation (benchmarking) in NLP*
*Workshop description*The ability to generalise well is often mentioned as
one of the primary desiderata for models of natural language processing
(NLP).
Yet, there are still many open questions related to what it means for an
NLP model to generalise well, and how generalisation should be evaluated.
LLMs, trained on gigantic training corpora that are – at best – hard to
analyse or not publicly available at all, bring a new set of challenges to
the topic.
The second GenBench workshop aims to serve as a cornerstone to catalyse
research on generalisation in the NLP community.
The workshop aims to bring together different expert communities to discuss
challenging questions relating to generalisation in NLP, crowd-source
challenging generalisation benchmarks for LLMs, and make progress on open
questions related to generalisation.
Topics of interest include, but are not limited to:
- Opinion or position papers about generalisation and how it should be
   evaluated;
   - Analyses of how existing or new models generalise;
   - Empirical studies that propose new paradigms to evaluate
   generalisation;
   - Meta-analyses that compare how results from different generalisation
   studies compare;
   - Meta-analyses that study how different types of generalisation are
   related;
   - Papers that discuss how generalisation of LLMs can be evaluated;
   - Papers that discuss why generalisation is (not) important in the era
   of LLMs;
   - Studies on the relationship between generalisation and fairness or
   robustness.
The second GenBench workshop on generalisation (benchmarking) in NLP will
be co-located with EMNLP 2024.
*Submission types*
We call for two types of submissions: regular workshop submissions and
collaborative benchmarking task submissions.
The latter will consist of a data/task artefact and a companion paper
motivating and evaluating the submission.
In both cases, we accept archival papers and extended abstracts.
*1. Regular workshop submissions*
Regular workshop submissions present papers on the topic of generalisation
(see examples listed above).
Regular workshop papers may be submitted as an archival paper, when they
report on completed, original and unpublished research, or as a shorter
extended abstract, otherwise.
More details on this category can be found below.
If you are unsure whether a specific topic is well-suited for submission,
feel free to reach out to the organisers of the workshop at
genbench@googlegroups.com.
*2. Collaborative Benchmarking Task (CBT) submissions*
The goal of this year's CBT is to generate versions of existing evaluation
datasets for LLMs which, given a particular training corpus, have a larger
distribution shift than the original test set, or – in other words –
evaluate generalisation to a stronger degree than the original dataset.
For this particular challenge, we focus on three training corpora: C4,
RedPajama-Data-1T, and Dolma.
All three corpora are publicly available, and they can be searched via the
What's in My Big Data API (https://github.com/allenai/wimbd).
We will focus on three popular evaluation datasets: MMLU, HumanEval, and
SiQA.
Submitters to the CBT are asked to design a way to assess distribution
shift for one or more of these evaluation datasets, given particular
features of the training corpus, and then generate one or more versions of
the dataset that have a larger distribution shift according to this method.
Newly generated sets do not have to have the same size as the original test
set, but should have at least 200 examples.
Practically speaking, CBT submissions consist of:
1. the data/task artefact, submitted through
   https://github.com/GenBench/genbench_cbt
   2. a paper describing the dataset and its method of construction,
   submitted through
   https://openreview.net/group?id=GenBench.org/2024/Workshop
We accept submissions that consider only one pretraining dataset and
evaluation dataset, but encourage submitters to apply their suggested
protocols to both pretraining datasets.
We also suggest that submitters include model results for models trained on
these datasets.
Suggestions are provided on the CBT website: https://genbench.org/cbt.
Given enough high-quality submissions, we aim to write a paper with the
combined results, to which submitters can be co-authors, if they wish so.
More detailed guidelines will be given on https://genbench.org/cbt.
*Archival vs extended abstract*
Archival papers are up to 8 pages excluding references and report on
completed, original and unpublished research.
They follow the requirements of regular EMNLP 2024 submissions.
Accepted papers will be published in the workshop proceedings and are
expected to be presented at the workshop.
The papers will undergo double-blind peer review and should thus be
anonymised.
Extended abstracts can be up to 2 pages excluding references, and may
report on work in progress or be cross-submissions of work that has already
appeared in another venue.
Abstract titles will be posted on the workshop website, but will not be
included in the proceedings.
*Submission instructions*For both archival papers and extended abstracts,
we refer to the EMNLP 2024 website for paper templates and requirements.
Additional requirements for both regular workshop papers and collaborative
benchmarking task submissions can be found on our website.
All submissions can be submitted through OpenReview:
https://openreview.net/group?id=GenBench.org/2024/Workshop.
*Important dates*
- August 15, 2024: Paper submission deadline
   - September 20, 2024: Notification deadline
   - October 4, 2024: Camera-ready deadline
   - November 15 or 16, 2024: Workshop
Note: all deadlines are 11:59 PM UTC-12:00. Check the website for final
updates to these deadlines (https://genbench.org/workshop).
*Preprints*
We do not have an anonymity deadline, preprints are allowed, both before
the submission deadline as well as after.
*Contact*
Email address: genbench@googlegroups.com
Website: https://genbench.org/workshop
*On behalf of the organisers*Dieuwke Hupkes
Verna Dankers
Khuyagbaatar Batsuren
Amirhossein Kazemnejad
Christos Christodoulopoulos
Mario Giulianelli
Ryan Cotterell

Second Call for Papers: ​GenBench, the second workshop on generalisation (benchmarking) in NLP @ EMNLP 2024