WebNLG 2023: Call for Participation Special focus on multilingual NLG for under-resourced languages We are delighted to announce a new edition of the WebNLG challenge, which will take place in 2023. WebNLG 2023 will focus on multilingual generation for under-resourced languages.
Registration If you intend to participate or if you download the data, please fill in this form 👍 https://docs.google.com/forms/d/e/1FAIpQLSfytc1rUMUOKrDc9vV658uiLh1_jUS7G0Xe...
Motivation With the development of large-scale pretrained models, research in automatic text generation has acquired new impetus. Yet, the current state-of-the-art is dominated by a handful of languages, for which training data is relatively easy to acquire. At the same time, the field has recently witnessed some encouraging developments which focus on generation for under-resourced and under-represented languages. This trend is paralleled by a growing interest in multilingual models and applications in NLP more broadly.
The WebNLG 2023 Challenge is being organised in response to these trends and specifically addresses generation in few-shot and/or zero-shot settings for four under-resourced languages. About WebNLG The WebNLG Challenge consists in mapping data, in the form of RDF triples, to natural language text. The input is a set of RDF triples sourced from DBPedia for example: (John_E_Blaha birthDate 1942_08_26) (John_E_Blaha birthPlace San_Antonio) (John_E_Blaha occupation Fighter_pilot) where the corresponding output text might be: John E Blaha, born in San Antonio on 1942-08-26, worked as a fighter pilot The WebNLG challenge was launched in 2017. A second edition, in 2020, extended the task to Russian, in addition to English.
WebNLG 2023 The new edition of WebNLG focuses on four under-resourced languages which are severely under-represented in research on text generation, namely Maltese, Irish, Breton and Welsh. In addition, WebNLG 2023 will once again include Russian, which was first featured in WebNLG 2020.
For WebNLG 2023, we are soliciting submissions encompassing a variety of approaches to automatic text generation, from neural architectures to rule-based systems. We especially encourage submissions addressing generation in few-shot or zero-shot settings.
Data Development and test data is now available for all 5 languages, namely Breton, Maltese, Irish and Welsh (the target languages for WebNLG 2023), as well as Russian. Participants can download the development data; the test data will be reserved for the final evaluation.
Data for each language was obtained by sourcing high-quality, professional translations of the original English texts in the WebNLG 2020 dev and test sets.
Training data is also available for the original WebNLG English data and, as per WebNLG 2020, for Russian. In addition, we provide ‘noisy’ training data for the target languages (Maltese, Breton, Welsh and Irish), obtained via machine translation of the texts in the English WebNLG 2020 train split.
Evaluation As in previous editions of WebNLG, submitted results will be evaluated using both automatic and human evaluation methods.
I nstructions for participants Data and instructions for the task are available from the WebNLG repo: https://github.com/WebNLG/2023-Challenge
Teams who submit systems for evaluation at WebNLG 2023 will subsequently be invited to contribute a short paper describing their approach and results. The task as a whole, as well as individual submissions, will be presented at a special session in an event to be announced later.
General information about the WebNLG challenges can be found on the following URL: https://synalp.gitlabpages.inria.fr/webnlg-challenge/challenge_2023/
Timeline February 2023: First call for participation. Development data and noisy training data available. 8 June 2023: Release of test data 15 June 2023: Deadline for submission of system outputs. 15 August 2023: Deadline for submission of short papers describing systems.
The final presentation of results will be held during a workshop. Current plans are to hold this in September 2023. Organisation WebNLG 2023 is being organised under the auspices of LT-Bridge, supported by the Horizon 2020 Work Programme Spreading Excellence and Widening Participation (WIDESPREAD) 2018-2020 and the ANR funded xNLG Chair on multi-lingual, multi-source NLG. Claire Gardent, CNRS/LORIA, Nancy, France Albert Gatt, Utrecht University, The Netherlands and University of Malta Claudia Borg, University of Malta Enrico Aquilina, University of Malta Anya Belz, Dublin City University, Ireland John Judge, Dublin City University, Ireland Liam Cripwell, CNRS/LORIA and Université de Lorraine, Nancy, France William Soto-Martinez, CNRS/LORIA and Université de Lorraine, Nancy, France Contact : webnlg-challenge@inria.fr