fyi
Joseph
-------- Message transféré --------
Sujet : [2020PC+Board] The State and Fate of Linguistic Diversity and
Inclusion in the NLP World
Date : Wed, 30 Sep 2020 14:40:21 +0100
De : Antonio Branco <antonio.branco(a)di.fc.ul.pt>
Répondre à : LREC 2020 Program Committee plus ELRA Board
<lrec2020-pc-plus-elra-board(a)list.lrec-conf.org>
Organisation : University of Lisbon
Pour : elra-board(a)list.elra.info,
lrec2020-pc-plus-elra-board(a)list.lrec-conf.org
Dear all,
Hope you're all fine.
Breaking our Summer holidays' radio silence, I'm sharing
the paper below, with flattering news for us. My apologies
if you have already stumbled upon it.
It is a recent paper, published last July in ACL,
in a special theme track they promoted this year,
aimed at positive discrimination, namely at attracting
"out of the(ir) box" papers, which would have been
very likely rejected from ACL2020 otherwise.
The key goals of this paper are "making the [ACL] community aware
of the gap that needs to be filled before we can truly claim
state-of-the-art technologies to be language agnostic"/universal,
and "attempt to convince the ACL community to prioritize
the resolution of the predicaments highlighted here,
so that no language is left behind."
One of its major, and duly emphasized, conclusion confirms (objectively)
what we were (subjectively) sure about: "LREC has been more inclusive
across different classes of languages" when compared
to all the other top-tier NLP/CL venues (conferences and journal).
All the best,
António
++++++++++++++++++++++++++++++
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
Joshi et al, 2020, acl
https://www.aclweb.org/anthology/2020.acl-main.560.pdf
Language technologies contribute to promot-ing multilingualism
and linguistic diversity around the world. However, only a very small number
of the over 7000 languages of the world are represented in the rapidly
evolving language technologies and applications. In this paper we look at
the relation between the types of languages, resources, and
their representation in NLP conferences to understandt he trajectory
that different languages havefollowed over time. Our quantitative
investigation underlines the disparity between languages, especially in
terms
of their resources, and calls into question the “language agnostic” status
of current models and systems.
_______________________________________________
LREC2020-PC-plus-ELRA-Board mailing list
LREC2020-PC-plus-ELRA-Board(a)list.lrec-conf.org
http://list.lrec-conf.org/cgi-bin/mailman/listinfo/lrec2020-pc-plus-elra-bo…
--
Joseph MARIANI
Directeur de Recherche Émérite
LIMSI-CNRS
Rue John von Neumann
Campus Universitaire d'Orsay
Batiment 508
91405 ORSAY Cedex (France)
Tel: +33 1 69 15 78 56
Email: Joseph.Mariani(a)limsi.fr
Web: https://perso.limsi.fr/mariani/index
Web IMMI: http://immi.cnrs.fr/
> Début du message réexpédié :
>
> De: "De Wet, F [fdw(a)sun.ac.za <mailto:fdw@sun.ac.za>]" <fdw(a)sun.ac.za <mailto:fdw@sun.ac.za>>
> Objet: Lacuna Fund RFP
> Date: 24 septembre 2020 à 13:18:42 UTC+2
> À: Laurent Besacier <laurent.besacier(a)imag.fr <mailto:laurent.besacier@imag.fr>>
>
> Dear Laurent,
>
> This RFP may be of interest to some members of SIGUL. Could you please share it with the community?
>
> Thank you,
> Febe de Wet
>
> Lacuna Fund, a collaborative effort to mobilize datasets for machine learning that solve urgent problems in low- and middle-income contexts globally, has issued a Request for Proposals (RFP) for datasets in language in Sub-Saharan Africa.
> The full RFP and more details on eligibility, selection criteria, as well as information about the Fund and upcoming calls, can be found at <http://www.lacunafund.org/>lacunafund.org <http://www.lacunafund.org/>. Questions about the RFP are welcome through 7 October 2020. The RFP closes on 6 November 2020.
> We are seeking applications to create, expand, or maintain datasets from organizations and partnerships with technical expertise in language data collection and labeling. Proposals should also demonstrate a strong understanding of the machine learning landscape and the needs of end users.
> Applicants must be headquartered in Africa or have a substantial partnership with an entity headquartered in Africa. Lacuna Fund encourages collaboration between organizations to assemble a competitive proposal.
> Proposals will be selected by a Technical Advisory Panel based on the Fund’s principles: transformative potential, quality, accessibility, equity, ethics, and a participatory approach.
> Lacuna Fund values a collaborative and locally driven approach to data creation, expansion, and maintenance. We recognize that the continued usefulness and maintenance of open data will thrive in a community that is collectively invested in that data.
> While the Request for Proposals outlines some data needs identified by our Technical Advisory Panel, proposals are not restricted to these areas, and we welcome other ideas within the domain area that have a clearly articulated benefit.
> Read more about Lacuna Fund’s open RFP in language data and apply on our website here <http://www.lacunafund.org/>. You can also sign up to receive future notifications of available funding.
> Lacuna Fund is a funder collaborative between The Rockefeller Foundation, Google.org <http://google.org/>, and Canada’s International Development Research Centre, with an upcoming call for proposals on underserved languages also supported by the German development agency GIZ on behalf of the Federal Ministry for Economic Cooperation and Development (BMZ). The Fund is governed by a multi-stakeholder steering committee composed of technical experts, thought leaders, local beneficiaries, and end users. Collectively, the Fund’s stakeholders are committed to creating and mobilizing labeled datasets that both solve urgent local problems and lead to a step change in machine learning’s potential worldwide.
>
> <https://www.sun.ac.za/english/about-us/strategic-documents>
> The integrity and confidentiality of this email are governed by these terms. Disclaimer <https://www.sun.ac.za/emaildisclaimer/default.aspx>
> Die integriteit en vertroulikheid van hierdie e-pos word deur die volgende bepalings bereël.Vrywaringsklousule <https://www.sun.ac.za/emaildisclaimer/default.aspx>
**Deadline extended to: 13 October**
Call for Extended Abstracts
Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2020)
collocated with the Eighth Swedish Language Technology Conference (SLTC)
University of Gothenburg, Sweden
25th November 2020
https://gu-clasp.github.io/resourceful-2020/
The workshop will be held online.
Important dates:
- Submission of extended abstracts: 29th September 2020, extended to 13 October
- Notification of acceptance: 23rd October 2020
- Final version: 10th November 2020
- Workshop date: 25th November 2020
All times are 11:59PM UTC-12:00 ("anywhere on Earth").
Workshop description
All areas of natural language processing have achieved visible breakthroughs from the use of data-driven models. Contemporary machine learning is significantly influenced by techniques that rely on large datasets that demand substantial computational resources to solve practical problems in a tangible way (e.g. models based on transformers such as BERT, VilBERT, ALBERT, and GPT-2 that are pre-trained on large corpora of unlabelled data).
However, many of the world’s languages lack the availability of linguistic description as well as of sufficiently large computer-readable corpora of linguistic material. Even those languages that are considered well-resourced have some domains where resources are scarce, for example corpora of dialogue and situated interaction. Another similarity of these domains with under-resourced languages is that since they focus on spoken or spoken-like interaction (either in a written or an audio form) they show a high variability of input data. Applying state-of-the-art deep-neural-network-based methods for the development of data-driven systems in such resource-constrained environments is a non-trivial task.
For this workshop, we encourage contributions in the area of resource creation and representation learning in limited or low-resource environments that are tackling the above mentioned problems. In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
- How can new resources be constructed or extended for languages and domains that lack standardised representations of linguistic units?
- What experience from building resources for languages that have a good coverage today (for example Scandinavian languages) can be ported to building resources for under-resources languages and domains?
- How to deal with the variability of data and its standardisation in machine learning approaches?
- What algorithms and methods can we employ to transfer learning from related domains/languages that have good coverage?
- What is the role of multi-task learning in this domain?
- What representations can be learned and how effective are they in different low-resource scenarios?
- How can newly created resources and learned representations be evaluated?
- What ethical considerations are involved?
Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, computational linguistics, speech, machine learning etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and short presentations of on-going or completed research.
Submission
We invite submissions of 2-page extended non-anonymous abstracts with any number of pages for references using the ACL/EMNLP template [1]. Papers related to our theme and already presented at other venues or have already been published elsewhere will be considered for acceptance for presentation as well. The abstracts will be reviewed by the workshop organisers and the accepted ones will be posted on the website, unless authors wish not to do so. There will be no workshop proceedings but post-proceedings may be organised depending on the interest of authors.
[1]https://2020.emnlp.org/files/emnlp2020-templates.zip
Extended abstracts should be submitted in the pdf format athttps://easychair.org/conferences/?conf=resourceful2020
Workshop organisers
Tewodros Gebreselassie, University of Gothenburg
Simon Dobnik, University of Gothenburg
Barbara Plank, ITU, IT University of Copenhagen
Lars Borin, University of Gothenburg
resourceful2020(a)easychair.org
https://gu-clasp.github.io/resourceful-2020/index.html
--
Claudia Soria
Researcher
Istituto di Linguistica Computazionale "A. Zampolli"
Consiglio Nazionale delle Ricerche
Via Moruzzi 1
56124 Pisa
Italy
Tel. +39 050 3153166
Skype clausor