Dear All,
First of all, we hope you are all safe in these difficult times.
As you may already know, the joint ISCA-ELRA Special Interest Group on
Under-resourced Languages (SIGUL) has been created in 2017.
The first Board was elected in January 2018 (current Board members are:
Claudia Soria, Laurent Besacier and Sakriani Sakti) and according to
ELRA and ISCA by-laws is in charge for two years.
The end of the term being on January 2020, it is now time to organize
the elections to renew the Board members.
The elected members of the SIG shall consist of a chairperson, a
co-chair and a secretary. All elected members should commit themselves
to become individual members of either ISCA or ELRA associations. The
term of all elected board members of the SIG shall be 2 years. They may
be re-elected but consecutive appointments are limited to four
consecutive terms.
The schedule of the election can be found here:
http://www.elra.info/en/sig/sigul/
Currently, the *Call for candidates* is open (from now to November 30th
2020) : if you want to be a candidate, please send a message to
choukri(a)elda.org before Nov 30th with a short bio and a
statement/manifesto that explains why you want to be an elected member
of SIGUL board (please write or keep ‘SIGUL election’ in the mail subject).
We hope you will take an active part to this election and we are looking
forward to seeing you during coming SIGUL events whether virtual or
onsite (ALPS virtual winter school in 2021, next joint CCURL-SLTU
workshop that will be organized in 2022).
Our mailing-list sigul(a)list.elra.info now gathers 334 members and we
have been organizing several events or activities for the past 3 years.
You can find the the last ISCA SIG reports for 2017-2018 and 2019-2020
in the SIGUL page (http://www.elra.info/en/sig/sigul/) on the ELRA web site.
Best Regards
Khalid Choukri, in charge of supervising these elections for the SIGUL
committee
Dear Collegues,
'Language In The Human-Machine Era' (LITHME) is a research network
funded by the EU COST programme (https://cost.eu). LITHME is centered
around the relationship between language and machines, with the aim of
forecasting the impact that future technology may have on language, and
prepare research to that future. A COST Action is basically a networking
mechanism that fosters exchange across researchers coming from different
disciplines and countries.
During the COVID travel restrictions we sadly can't arrange large
in-person meetings. But COST also funds individual 'Short-term
Scientific Missions', where one researcher travels across borders to
learn and share knowledge. So for the time being we're focusing on these
short term individual visits where cross-border travel is allowed, and
ensuring COVID safety protocols during the visit and associated travel.
Intended for novel dialogue as outlined above, these visits should
increase understanding and preparation for language in the human-machine
era.
For eligibility criteria and further information, please see our
website: https://lithme.eu/STSM
Please also forward this message on to anyone who might be interested,
and retweet us here:
https://twitter.com/LgHumanMachine/status/1318260873216012289
All the best,
Claudia
--
Claudia Soria
Researcher
Istituto di Linguistica Computazionale "A. Zampolli"
Consiglio Nazionale delle Ricerche
Via Moruzzi 1
56124 Pisa
Italy
Management Committee member
COST Action CA19102 ‘Language In The Human-Machine Era' (LITHME)
www.lithme.eu
Tel. +39 050 3153166
Skype clausor
fyi
Joseph
-------- Message transféré --------
Sujet : [2020PC+Board] The State and Fate of Linguistic Diversity and
Inclusion in the NLP World
Date : Wed, 30 Sep 2020 14:40:21 +0100
De : Antonio Branco <antonio.branco(a)di.fc.ul.pt>
Répondre à : LREC 2020 Program Committee plus ELRA Board
<lrec2020-pc-plus-elra-board(a)list.lrec-conf.org>
Organisation : University of Lisbon
Pour : elra-board(a)list.elra.info,
lrec2020-pc-plus-elra-board(a)list.lrec-conf.org
Dear all,
Hope you're all fine.
Breaking our Summer holidays' radio silence, I'm sharing
the paper below, with flattering news for us. My apologies
if you have already stumbled upon it.
It is a recent paper, published last July in ACL,
in a special theme track they promoted this year,
aimed at positive discrimination, namely at attracting
"out of the(ir) box" papers, which would have been
very likely rejected from ACL2020 otherwise.
The key goals of this paper are "making the [ACL] community aware
of the gap that needs to be filled before we can truly claim
state-of-the-art technologies to be language agnostic"/universal,
and "attempt to convince the ACL community to prioritize
the resolution of the predicaments highlighted here,
so that no language is left behind."
One of its major, and duly emphasized, conclusion confirms (objectively)
what we were (subjectively) sure about: "LREC has been more inclusive
across different classes of languages" when compared
to all the other top-tier NLP/CL venues (conferences and journal).
All the best,
António
++++++++++++++++++++++++++++++
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
Joshi et al, 2020, acl
https://www.aclweb.org/anthology/2020.acl-main.560.pdf
Language technologies contribute to promot-ing multilingualism
and linguistic diversity around the world. However, only a very small number
of the over 7000 languages of the world are represented in the rapidly
evolving language technologies and applications. In this paper we look at
the relation between the types of languages, resources, and
their representation in NLP conferences to understandt he trajectory
that different languages havefollowed over time. Our quantitative
investigation underlines the disparity between languages, especially in
terms
of their resources, and calls into question the “language agnostic” status
of current models and systems.
_______________________________________________
LREC2020-PC-plus-ELRA-Board mailing list
LREC2020-PC-plus-ELRA-Board(a)list.lrec-conf.org
http://list.lrec-conf.org/cgi-bin/mailman/listinfo/lrec2020-pc-plus-elra-bo…
--
Joseph MARIANI
Directeur de Recherche Émérite
LIMSI-CNRS
Rue John von Neumann
Campus Universitaire d'Orsay
Batiment 508
91405 ORSAY Cedex (France)
Tel: +33 1 69 15 78 56
Email: Joseph.Mariani(a)limsi.fr
Web: https://perso.limsi.fr/mariani/index
Web IMMI: http://immi.cnrs.fr/
> Début du message réexpédié :
>
> De: "De Wet, F [fdw(a)sun.ac.za <mailto:fdw@sun.ac.za>]" <fdw(a)sun.ac.za <mailto:fdw@sun.ac.za>>
> Objet: Lacuna Fund RFP
> Date: 24 septembre 2020 à 13:18:42 UTC+2
> À: Laurent Besacier <laurent.besacier(a)imag.fr <mailto:laurent.besacier@imag.fr>>
>
> Dear Laurent,
>
> This RFP may be of interest to some members of SIGUL. Could you please share it with the community?
>
> Thank you,
> Febe de Wet
>
> Lacuna Fund, a collaborative effort to mobilize datasets for machine learning that solve urgent problems in low- and middle-income contexts globally, has issued a Request for Proposals (RFP) for datasets in language in Sub-Saharan Africa.
> The full RFP and more details on eligibility, selection criteria, as well as information about the Fund and upcoming calls, can be found at <http://www.lacunafund.org/>lacunafund.org <http://www.lacunafund.org/>. Questions about the RFP are welcome through 7 October 2020. The RFP closes on 6 November 2020.
> We are seeking applications to create, expand, or maintain datasets from organizations and partnerships with technical expertise in language data collection and labeling. Proposals should also demonstrate a strong understanding of the machine learning landscape and the needs of end users.
> Applicants must be headquartered in Africa or have a substantial partnership with an entity headquartered in Africa. Lacuna Fund encourages collaboration between organizations to assemble a competitive proposal.
> Proposals will be selected by a Technical Advisory Panel based on the Fund’s principles: transformative potential, quality, accessibility, equity, ethics, and a participatory approach.
> Lacuna Fund values a collaborative and locally driven approach to data creation, expansion, and maintenance. We recognize that the continued usefulness and maintenance of open data will thrive in a community that is collectively invested in that data.
> While the Request for Proposals outlines some data needs identified by our Technical Advisory Panel, proposals are not restricted to these areas, and we welcome other ideas within the domain area that have a clearly articulated benefit.
> Read more about Lacuna Fund’s open RFP in language data and apply on our website here <http://www.lacunafund.org/>. You can also sign up to receive future notifications of available funding.
> Lacuna Fund is a funder collaborative between The Rockefeller Foundation, Google.org <http://google.org/>, and Canada’s International Development Research Centre, with an upcoming call for proposals on underserved languages also supported by the German development agency GIZ on behalf of the Federal Ministry for Economic Cooperation and Development (BMZ). The Fund is governed by a multi-stakeholder steering committee composed of technical experts, thought leaders, local beneficiaries, and end users. Collectively, the Fund’s stakeholders are committed to creating and mobilizing labeled datasets that both solve urgent local problems and lead to a step change in machine learning’s potential worldwide.
>
> <https://www.sun.ac.za/english/about-us/strategic-documents>
> The integrity and confidentiality of this email are governed by these terms. Disclaimer <https://www.sun.ac.za/emaildisclaimer/default.aspx>
> Die integriteit en vertroulikheid van hierdie e-pos word deur die volgende bepalings bereël.Vrywaringsklousule <https://www.sun.ac.za/emaildisclaimer/default.aspx>
**Deadline extended to: 13 October**
Call for Extended Abstracts
Workshop on RESOURCEs and representations For Under-resourced Languages and domains (RESOURCEFUL-2020)
collocated with the Eighth Swedish Language Technology Conference (SLTC)
University of Gothenburg, Sweden
25th November 2020
https://gu-clasp.github.io/resourceful-2020/
The workshop will be held online.
Important dates:
- Submission of extended abstracts: 29th September 2020, extended to 13 October
- Notification of acceptance: 23rd October 2020
- Final version: 10th November 2020
- Workshop date: 25th November 2020
All times are 11:59PM UTC-12:00 ("anywhere on Earth").
Workshop description
All areas of natural language processing have achieved visible breakthroughs from the use of data-driven models. Contemporary machine learning is significantly influenced by techniques that rely on large datasets that demand substantial computational resources to solve practical problems in a tangible way (e.g. models based on transformers such as BERT, VilBERT, ALBERT, and GPT-2 that are pre-trained on large corpora of unlabelled data).
However, many of the world’s languages lack the availability of linguistic description as well as of sufficiently large computer-readable corpora of linguistic material. Even those languages that are considered well-resourced have some domains where resources are scarce, for example corpora of dialogue and situated interaction. Another similarity of these domains with under-resourced languages is that since they focus on spoken or spoken-like interaction (either in a written or an audio form) they show a high variability of input data. Applying state-of-the-art deep-neural-network-based methods for the development of data-driven systems in such resource-constrained environments is a non-trivial task.
For this workshop, we encourage contributions in the area of resource creation and representation learning in limited or low-resource environments that are tackling the above mentioned problems. In particular we would like to open a forum by bringing together students, researchers, and experts to address and discuss the following questions:
- How can new resources be constructed or extended for languages and domains that lack standardised representations of linguistic units?
- What experience from building resources for languages that have a good coverage today (for example Scandinavian languages) can be ported to building resources for under-resources languages and domains?
- How to deal with the variability of data and its standardisation in machine learning approaches?
- What algorithms and methods can we employ to transfer learning from related domains/languages that have good coverage?
- What is the role of multi-task learning in this domain?
- What representations can be learned and how effective are they in different low-resource scenarios?
- How can newly created resources and learned representations be evaluated?
- What ethical considerations are involved?
Intended participants are researchers, PhD students and practitioners from diverse backgrounds (linguistics, computational linguistics, speech, machine learning etc). We foresee an interactive workshop with plenty of time for discussion, complemented with invited talks and short presentations of on-going or completed research.
Submission
We invite submissions of 2-page extended non-anonymous abstracts with any number of pages for references using the ACL/EMNLP template [1]. Papers related to our theme and already presented at other venues or have already been published elsewhere will be considered for acceptance for presentation as well. The abstracts will be reviewed by the workshop organisers and the accepted ones will be posted on the website, unless authors wish not to do so. There will be no workshop proceedings but post-proceedings may be organised depending on the interest of authors.
[1]https://2020.emnlp.org/files/emnlp2020-templates.zip
Extended abstracts should be submitted in the pdf format athttps://easychair.org/conferences/?conf=resourceful2020
Workshop organisers
Tewodros Gebreselassie, University of Gothenburg
Simon Dobnik, University of Gothenburg
Barbara Plank, ITU, IT University of Copenhagen
Lars Borin, University of Gothenburg
resourceful2020(a)easychair.org
https://gu-clasp.github.io/resourceful-2020/index.html
--
Claudia Soria
Researcher
Istituto di Linguistica Computazionale "A. Zampolli"
Consiglio Nazionale delle Ricerche
Via Moruzzi 1
56124 Pisa
Italy
Tel. +39 050 3153166
Skype clausor
---
SECOND CALL FOR PARTICIPATION
Advanced Language Processing School (ALPS)
January, 17-22 2021
Autrans (Grenoble area) - France
We are opening the registration for the first Advanced Language Processing School (ALPS) in Grenoble, co-organized by University Grenoble Alpes and Naver Labs Europe.
*Target Audience*
This is a winter school covering advanced topics in NLP, and we are primarily targeting doctoral students and advanced (research) masters. A few slots will also be reserved for academics and persons working in research-heavy positions in industry.
*Characteristics*
This winter school aims to provide talks of renowned NLP researchers, as well as creating an ideal environment to foster collaborations
The speakers are:
- Isabelle Augenstein: _Interpretability and Explainability for NLP_
- Tim Baldwin: _Natural Language Processing for User Generated Content_
- Kyunghyun Cho: _Neural Sequence Modeling: Learning and Inference_
- Yejin Choi: _Neural Commonsense Knowledge and Reasoning_
- Grzegorz Chrupała: _Visually Grounded Models of Spoken Language and their Analysis_
- Claire Gardent: _Neural approaches to Natural Language Generation_
- Sanjeev Khudanpur: _Recent Advances in Automatic Speech Recognition (TBC)_
In addition to the talks, an important aspect of this school is the interaction between participants. The registration fee covers full board in a residence close to a ski resort, and some of the afternoons there will be organised social activities.
In view of the current public health situation, we are preparing to hold the event virtually in case it will not be possible to do so physically in January. Registration fees will be adapted in that case.
*Application*
To apply to this winter school, please follow the instructions at http://alps.imag.fr/index.php/application/ <http://alps.imag.fr/index.php/application/> . The deadline for applying is July 31st, and we will notify acceptance in September
*Contact*
Website: http://alps.imag.fr/
Contact: alps2021(a)univ-grenoble-alpes.fr
> Début du message réexpédié :
>
> De: Yannick Parmentier <yannick.parmentier(a)loria.fr>
> Objet: [lift_members] [GdR LIFT] - prochaines journées LIFT - appel à contributions
> Date: 7 juillet 2020 à 16:25:54 UTC+2
> À: ln(a)groupes.renater.fr, lue-olki(a)univ-lorraine.fr, crem-sic-chercheurs(a)univ-lorraine.fr, tous_loria(a)loria.fr, lift_resp(a)inria.fr, lift_members(a)inria.fr
> Répondre à: Yannick Parmentier <yannick.parmentier(a)loria.fr>
>
> ******************************
> Appel à communications
> Journées LIFT, 10-11 Décembre
> https://gdr-lift.loria.fr/
> ******************************
>
> Le GdR LIFT (Linguistique Informatique, Formelle et de Terrain) organise deux journées de rencontres autour des thèmes de LIFT. Par ailleurs, deux thèmes spécifiques sont proposées pour ces journées : (i) Corpus arborés et syntaxe et (ii) Grammaire et Linguistique de terrain. L’objectif de ces journées est de favoriser les interactions entre linguistique informatique, linguistique formelle et linguistique de terrain, afin de favoriser le développement de recherches en linguistique qui tirent le meilleur parti des nouvelles technologies.
>
> En plus d’orateur.trice.s invité.e.s, ces journées incluent des moments d’échanges via la présentation de travaux de recherche en cours. Les personnes souhaitant présenter leurs travaux sont invitées à soumettre un résumé étendu (limité à 2 pages). Les résumés retenus pour présentation seront publiés dans les actes des journées (et téléversés dans l’archive ouverte HAL). La possibilité de passer certains résumés dans un format long pour publication dans une revue scientifique est à l’étude.
>
> Dates
>
> Soumission des résumés : 15 octobre 2020
> Notification aux auteurs : 7 novembre 2020
> Soumission des versions finales : 1er décembre 2020
> Journées scientifiques : 10-11 décembre 2020
>
> Thématiques
>
> Les communications pourront porter sur tous les thèmes de LIFT ainsi que sur les thèmes des deux sessions spéciales (Corpus arborés et syntaxe; Grammaire et Linguistique de terrain), incluant, de façon non limitative:
>
> - Retours d’expérience concernant l’emploi et/ou le développement d’outils informatiques pour l’analyse linguistique
> - Linguistique informatique et Science ouverte : perspectives ouvertes par le partage des données, des outils et des publications
> - Modélisation informatique et linguistique formelle (théorie des langages formels, grammaires d’unification, théorie de la preuve…)
> - Mise en dialogue des modèles linguistiques et des modèles d’apprentissage automatique (de tous types : approches génératives et discriminantes, approches statistiques neuronales, approches de type encodeurs-décodeurs…)
> - Méthodes non supervisées ou faiblement supervisées pour l’analyse des langues peu dotées, peu écrites ou non documentées
> - Réflexions au sujet de l’automatisation des processus d’analyse et de validation
>
> Toutes les propositions qui entrent dans les thèmes des journées d’étude sont les bienvenues. La présentation de travaux aboutis mais aussi de travaux en cours est possible. Les journées LIFT visent aussi bien à présenter des résultats qu’à susciter des discussions, notamment autour de travaux en cours pour lesquels les auteurs aimeraient développer des collaborations (par ex., expert en TAL souhaitent appliquer un outil sur des données langagières variées ; linguiste de terrain souhaitant automatiser une tâche d’annotation, etc.).
>
> Format
>
> En pratique, les résumés feront 2 pages maximum. Les soumissions devront être conformes aux lignes directrices officielles qui sont contenues dans les fichiers de style (https://mycore.core-cloud.net/index.php/s/YI4oEOQcw0vhQWj) et être en format PDF. Les articles sélectionnés seront présentés sous forme de poster.
>
> Lieu
>
> Selon la situation sanitaire et les règles en vigueur, les journées se feront soit en présentiel, soit en virtuel, soit dans une formule hybride avec des regroupements locaux permettant des rassemblements d’un nombre autorisés de personnes. Le lieu choisi sera aussi central que possible pour faciliter la participation à la conférence. Des bourses d’aide seront disponibles pour permettre à un maximum de personnes de participer.
>
> Comité d’organisation
>
> Yannick Parmentier, LORIA/Université de Lorraine
> Thierry Poibeau, LATTICE/CNRS
> Emmanuel Schang, LLL/Université d’Orléans
>
> Comité de programme
>
> A venir
>
> Conférencier.ière.s invité.e.s
>
> A venir
Apologies for cross-posting
----
We are inviting researchers to participate in a shared task at FIRE 2020 on
sentiment analysis for Dravidian languages in code-mixed text.
Website: https://dravidian-codemix.github.io/2020/
The goal of this task is to identify sentiment polarity of the code-mixed
dataset of comments/posts in Dravidian Languages (Malayalam-English and
Tamil-English) collected from social media. The comment/post may contain
more than one sentence but the average sentence length of the corpora is 1.
Each comment/post is annotated with sentiment polarity at the comment/post
level. This dataset also has class imbalance problems depicting real-world
scenarios. Our proposal aims to encourage research that will reveal how
sentiment is expressed in code-mixed scenarios on social media.
The participants will be provided development, training and test dataset.
Key Dates:
Release of Trail data: 10 June
Release of Training data: 10 June
Release of Test data: 1 August
Run submission deadline: 20 August
Results declared: 31 August
Paper submission: 20 September
Revised paper: 30 October.
10th-13th December - FIRE 2020
Dear SIGUL list members,
we are very happy to announce that the SLTU-CCURL2020 Proceedings are available online: https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/SLTUCCURLb… <https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/SLTUCCURLb…>
This year, LREC2020 would have featured an extraordinary event: the first joint SLTU-CCURL2020 Workshop, which was planned as a two-day workshop, with 54 papers accepted either as oral and poster presentations.
The workshop program was enriched by two tutorials and two keynote speeches.
We will miss the presentations, the discussions and the overall stimulating environment very deeply.
We are thankful to ELRA and ISCA for their support to the workshop, to our Google sponsor and to the 60 experts of the Program Committee, who worked tirelessly in order to help us to select the best papers representing a wide perspective over NLP, speech and computational linguistics addressing less-resource languages.
Looking forward to better times when we will be able to meet in person again, we hope that you will find these workshop proceedings relevant and stimulating for your own research.
With our best wishes,
Claudia Soria, Laurent Besacier, Dorothee Beermann, and Sakriani Sakti
hi
i thought some of you might be interested by this online paper
best
laurent
====
>
> https://arxiv.org/abs/2003.07082 <https://arxiv.org/abs/2003.07082>
>
> Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
>
> (Submitted on 16 Mar 2020)
> We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Stanza includes a native Python interface to the widely used Java Stanford CoreNLP software, which further extends its functionalities to cover other tasks such as coreference resolution and relation extraction. Source code, documentation, and pretrained models for 66 languages are available at this https URL <https://stanfordnlp.github.io/stanza>.