2-year Postdoc position in Natural Language Processing on Incorporating Demographic Factors into Natural Language Processing Models
Funded by ERC Starting grant INTEGRATOR <https://milanlproc.github.io/project/integrator/>
Start: from September 2022
Dirk Hovy, Bocconi University and MilanLP group
Posting: https://bit.ly/3tk5UR6 <https://bit.ly/3tk5UR6>
Application Form: https://bit.ly/3Q5j7qv <https://bit.ly/3Q5j7qv>
Project:
The goal of the INTEGRATOR project is to develop novel data sets, theories, and algorithms to incorporate demographic factors into language technology. This will improve performance of existing tools for all users, reduce demographic bias, and enable completely new applications.
Language reflects demographic factors like our age, gender, etc. People actively use this information to make inferences, but current language technology (NLP) fails to account for demographics, both in language understanding (e.g., sentiment analysis) and generation (e.g., chatbots). This failure prevents us from reaching human-like performance, limits possible future applications, and introduces systematic bias against underrepresented demographic groups.
Solving demographic bias is one of the greatest challenges for current language technology. Failing to do so will limit the field and harm public trust in it. Bias in AI systems recently emerged as a severe problem for privacy, fairness, and ethics of AI. It is especially prevalent in language technology, due to language's rich demographic information. Since NLP is ubiquitous (translation, search, personal assistants, etc.), demographically biased models creates uneven access to vital technology.
Despite increased interest in demographics in NLP, there are no concerted efforts to integrate it: no theory, data sets, or algorithmic solutions. INTEGRATOR will address these by identifying which demographic factors affect NLP systems, devising a bias taxonomy and metrics, and creating new data. These will enable us to use transfer and reinforcement learning methods to build demographically aware input representations and systems that incorporate demographics to improve performance and reduce bias.
Demographically aware NLP will lead to high-performing, fair systems for text analysis and generation.
This ground-breaking research advances our understanding of NLP, algorithmic fairness, and bias in AI, and creates new research resources and avenues.
Successful candidates will work actively on novel directions in NLP, machine learning, and neural networks for representation learning, and transfer learning in various languages, and collaborate closely with Prof. Hovy as well as the lab. The candidates will innovate in both NLP and social sciences.
Successful candidates will have to prove
* excellent programming skills in Python (additional languages like C++, R, Julia are a plus),
* knowledge of current neural network models for transfer and few-shot learning and
* implementation tools for neural networks (e.g. PyTorch, Tensorflow, etc.)
* prove strong track record in top-tier venues in the field of NLP/ Machine Learning.
* fluency in spoken and written English. Knowledge of Italian is NOT a requirement.
INFORMATION
* Application deadline: July 7 2022
* Skype interviews will take place during July 2022
* Starting date: from September 2022, or any time thereafter
* Duration: 2 years, 1 year extension possible
* Salary: 42k EUR gross per annum (median salary in Milan is 37k EUR). Applicants from outside Italy may qualify for a researcher taxation scheme with reduced tax load.
HOW TO APPLY
The official application must be sent via https://bit.ly/3Q5j7qv <https://bit.ly/3Q5j7qv>
Informal enquiries can be sent by email to Dirk Hovy (dirk.hovy(a)unibocconi.it <mailto:dirk.hovy@unibocconi.it>).
You can find more information about the call here: https://bit.ly/3tk5UR6 <https://bit.ly/3tk5UR6>
**Deadline fast-approaching**
Dear colleagues and friends,
The Data Science and Digital Libraries
<https://www.tib.eu/en/research-development/research-groups-and-labs/data-sc…>
at the TIB – Leibniz Information Centre for Science and Technology and
University Library <https://www.tib.eu/en/> invites applications for a
Research Associate/PhD Candidate with the following specializations:
Natural Language Processing; Computational Linguistics; Corpus Annotation.
*Description: *
The PhD topics will be in the context of the Open Research Knowledge Graph (
https://www.orkg.org) and the project “SCINEXT - Neural-Symbolic Scholarly
Innovation Extraction”, funded by the Federal Ministry of Education and
Research (BMBF). The aim of these projects is to research and develop
techniques for crowdsourcing, representing and managing semantically
structured, rich representations of scholarly contributions and research
data in knowledge graphs and thus develop a novel model for scholarly
communication. In the context of the PhD thesis you will be responsible for
conducting independent and original scientific research involving corpus
development and annotation in organizing research contributions in the ORKG
in a structured, semantic way, so other researchers can get a quick
overview on the state-of-the-art in the field. You will participate in
local, national and international collaboration activities. Given the
multidisciplinary nature of the programme, we encourage applicants with a
strong curiosity and interest in Science to apply.
The tasks will focus on
- Collaborating with researchers from different disciplines to gain
familiarity with research problems and their contribution descriptions
expressed in scholarly literature.
- Conceptually designing, modelling and implementing ontology-based
knowledge representations for crowdsourcing of the Open Research Knowledge
Graph.
- Annotation and curation of multidisciplinary scholarly contribution
descriptions.
*Application Deadline:** June 28th 2022*
*Please apply online here:*
https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/ste…
*Contact Information: *Dr. Jennifer D'Souza
*Email:* jennifer.dsouza(a)tib.eu
Best regards,
Jennifer
Apologies for the multiple postings.
----
*Indian Language Summarization (ILSUM 2022)*
Website: https://ilsum.github.io/
To be organized in conjunction with FIRE 2022 (fire.irsi.res.in)
9th-13th December 2022 (Hybrid Event, hosted in Kolkata)
Registration Deadline*: 22nd July 2022*
-------------------------------------------------------
The first shared task on Indian Language Summarization (ILSUM) aims at
creating an evaluation benchmark dataset for Indian Languages. While
large-scale datasets exist for a number of languages like English, Chinese,
French, German, Spanish, etc. no such datasets exist for any Indian
languages. Through this shared task, we aim to bridge the existing gap.
In the first edition, we cover two major Indian languages Hindi and
Gujarati alongside Indian English, a widely recognized dialect of the
English Language. It is a classic summarization task, where we will provide
~10,000 article-summary pairs for each language and the participants are
expected to generate a fixed-length summary.
*Timeline*
-------------
8th June - Task announced and Registrations open
22nd June - Training Data Release
1st August - Test Data Release
8th August - Run Submission Deadline
15th August - Results Declared
15th September - Working notes due
9th-13th December - FIRE 2022 (Hybrid Event hosted at Kolkata)
*Organisers*
----------------
Bhavan Modha, University of Texas at Dallas, USA
Shrey Satapara, Indian Institute of Technology, Hyderabad, India
Sandip Modha, LDRP-ITR, Gandhinagar, India
Parth Mehta, Parmonic, USA
*For regular updates subscribe to our mailing list: **ilsum(a)googlegroups.com
<+ilsum(a)googlegroups.com>*
Regards,
Parth Mehta
Co-organiser, ILSUM 2022
Dear all,
On behalf of “Frontiers”, the third most cited publisher, I’m happy to let you know that a new research topic has been launched and will be supervised by Cornelia Caragea and myself: automatic stance detection. Contributions to “Frontiers in Artificial Intelligence” and/or “Frontiers in Big Data” concerning this hot topic are invited. We are collaborating primarily with “Frontiers in Artificial Intelligence”, but both types of submissions are possible – as you will see in the research topic’s presentation page.
Stance detection in natural language texts deals with determining the position (or stance) of a text producer towards a target or a set of targets. This Research Topic specifically addresses all aspects related to stance detection and its applications, both from a theoretical and practical viewpoint. We are looking for contributions in the form of Review, Original Research, Brief Research Report, Perspective, Technology and code etc. submission articles on substantial, original, and unpublished research in the following areas, including, but not limited to:
• sentiment analysis in stance detection
• perspective identification in stance detection
• sarcasm/irony detection in stance detection
• controversy detection in stance detection
• argument mining in stance detection
• biased language detection in stance detection
• novel algorithms for stance detection: feature-based machine learning approaches, deep learning approaches, and ensemble learning approaches
• novel datasets for stance detection
You can check this research topic’s detailed presentation, submission deadlines and instructions for authors at:
https://www.frontiersin.org/research-topics/40998/automatic-stance-detection
Please note that submitting papers is possible even in the absence of a previously submitted abstract (although we would appreciate knowing what to expect in advance). Submission deadline can be extended by one month (but no longer) upon request.
For any questions regarding the topic itself, submission etc., please don’t hesitate to contact me (fhristea(a)fmi.unibuc.ro<mailto:fhristea@fmi.unibuc.ro>).
Looking forward to receiving your contributions,
Florentina Hristea
https://cs.unibuc.ro/~fhristea/
-- Apologies for cross posting --
The First Workshop on Corpus Generation and Corpus Augmentation for
Machine Translation (CoCo4MT)https://sites.google.com/view/coco4mt
@ AMTA – 2022
This 15th biennial conference of the Association for Machine
Translation in the Americas
12-16 September 2022, Orlando, Florida, USA
INVITED TALKS
Julia Kreutzer Google Research
More TBA...
SCOPE
It is a well-known fact that machine translation systems, especially
those that use deep learning, require massive amounts of data. Several
resources for languages are not available in their human-created
format. Some of the types of resources available are monolingual,
multilingual, translation memories, and lexicons. Those types of
resources are generally created for formal purposes such as
parliamentary collections when parallel and more informal situations
when monolingual. The quality and abundance of resources including
corpora used for formal reasons is generally higher than those used
for informal purposes. Additionally, corpora for low-resource
languages, languages with less digital resources available, tends to
be less abundant and of lower quality.
CoCo4MT sets out to be the first workshop centered around research
that focuses on corpora creation, cleansing, and augmentation
techniques specifically for machine translation. We accept work that
covers any spoken language (including high-resource languages) but we
are specifically interested in those submissions that are on languages
with limited existing resources (low-resource languages) where
resources are not highly available.
The goal of this workshop is to begin to close the gap between corpora
available for low-resource translation systems and promote
high-quality data for online systems that can be used by native
speakers of low-resource languages is of particular interest.
Therefore, It will be beneficial if the techniques presented in
research papers include their impact on the quality of MT output and
how they can be used in the real world.
CoCo4MT aims to encourage research on new and undiscovered techniques.
We hope that submissions will provide high-quality corpora that is
available publicly for download and can be used to increase machine
translation performance thus encouraging new dataset creation for
multiple languages that will, in turn, provide a general workshop to
consult for corpora needs in the future. The workshop’s success will
be measured by the following key performance indicators:
- Promotes the ongoing increase in quality of machine translation
systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research
areas to increase the availability of commonly used corpora and new
corpora,
- Drives innovation to address the need for higher quality and
abundance of low-resource language data.
TOPICS
We are highly interested in original research papers on the topics
below; however, we welcome all novel ideas that cover research on
corpora techniques.
- Difficulties with using existing corpora (e.g., political
considerations or domain limitations) and their effects on final MT
systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for
training MT systems.
SUBMISSION INFORMATION
There is one type of submission in the workshop: Research, review and
position paper. The length of each paper should be at least four (4)
and not exceed ten (10) pages, plus unlimited pages for references.
Submissions should be formatted according to the official AMTA 2022
style templates (PDF, LaTeX, Word). Accepted papers will be published
on-line in the AMTA 2022 proceedings which includes the ACL Anthology
and will be presented at the conference either orally or as a poster.
Submissions must be anonymized and should be done using the official
conference management system
(https://cmt3.research.microsoft.com/AMTA2022). Scientific papers that
have been or will be submitted to other venues must be declared as
such, and must be withdrawn from the other venues if accepted and
published at CoCo4MT. The review will be double-blind.
We would like to encourage authors to cite papers written in ANY
language that are related to the topics, as long as both original
bibliographic items and their corresponding English translations are
provided.
Registration will be handled by the main conference. (To be announced)
IMPORTANT DATES
June 1, 2022 – Call for papers released
June 15, 2022 – Second call for papers
June 29, 2022 – Third and final call for papers
July 13, 2022 – Paper submissions due
July 27, 2022 – Notification of acceptance
August 7, 2022 – Camera-ready due
August 31, 2022 – Video recordings due
September 16, 2022 - CoCo4MT workshop
CONTACT
CoCo4MT Workshop Organizerscoco4mt2022(a)googlegroups.com
ORGANIZING COMMITTEE (listed alphabetically)
Constantine Lignos Brandeis University
John E. Ortega New York University and University of Santiago de
Compostela (CITIUS)
Katharina Kann University of Colorado Boulder
Maja Popopvić ADAPT Centre at Dublin City University
Marine Carpuat University of Maryland
Shabnam Tafreshi University of Maryland
William Chen Carnegie Mellon University
PROGRAM COMMITTEE (listed alphabetically tentative)
Abteen Ebrahimi University of Colorado Boulder
Adelani David Saarland University
Ananya Ganesh University of Colorado Boulder
Alberto Poncelas ADAPT Centre at Dublin City University
Amirhossein Tebbifakhr University of Trento
Anna Currey Amazon
Arturo Oncevay University of Edinburgh
Atul Kr. Ojha National University of Ireland Galway
Bharathi Raja Chakravarthi National University of Ireland Galway
Beatrice Savoldi University of Trento
Bogdan Babych Heidelberg University
Briakou Eleftheria University of Maryland
Dossou Bonaventure Mila Quebec AI Institute
Duygu Ataman New York University
Eleni Metheniti Université Toulosse - Paul Sabatier
Francis Tyers Indiana University
Jasper Kyle Catapang University of Birmingham
John E. Ortega New York University and USC - CITIUS
José Ramom Pichel Campos Universidade de Santiago de Compostela - CITIUS
Kalika Bali Microsoft
Koel Dutta Chowdhury Saarland University
Liangyou Li Huawei
Manuel Mager University of Stuttgart
Maria Art Antonette Clariño University of the Philippines Los Baños
Mathias Müller University of Zurich
Nathaniel Oco De La Salle University
Niu Xing Amazon
Pablo Gamallo Universidade de Santiago de Compostela - CITIUS
Rico Sennrich University of Zurich
Sangjee Dondrub Qinghai Normal University
Santanu Pal Saarland University
Sardana Ivanova University of Helsinki
Shantipriya Parida Silo AI
Surafel Melaku Lakew Amazon
Tommi A Pirinen University of Tromsø
Valentin Malykh Moscow Institute of Physics and Technology
Xu Weijia University of Maryland
--
*Shabnam Tafreshi, PhD*
*Assistant Research Scientist*
*Computational Linguistics, NLP*
*UMD: ARLIS @ College Park*
*"All the problems of the world could be settled easily, if people only
willing to think."*
*-Thomas J. Watson*
Dear Colleagues,
** Sorry for cross-postings **
This year our workshop LaCATODA 2022 will be co-located with ACII 2022.
Please, consider submitting a paper. The accepted papers will be published in the ACII workshop proceedings indexed by IEEExplore.
Best regards,
Michal Ptaszynski in the name of LaCATODA 2022 organizers,
Michal PTASZYNSKI, Ph.D., Associate Professor
Department of Computer Science
Kitami Institute of Technology,
165 Koen-cho, Kitami, 090-8507, Japan
TEL/FAX: +81-157-26-9327
michal(a)mail.kitami-it.ac.jp
http://arakilab.media.eng.hokudai.ac.jp/~ptaszynski/
==========================================================
The Eighth Linguistic and Cognitive Approaches to Dialog Agents (LaCATODA 2022)
(ACII 2022 Workshop)
http://arakilab.media.eng.hokudai.ac.jp/ACII2022/
Venue: Nara, Japan & online (in conjunction with ACII, https://acii-conf.net/2022/)
==========================================================
WHAT IS LaCATODA?
A multidisciplinary workshop for researchers who develop dialog agents and methods for achieving more natural machine-generated conversation or study problems of human communication which are difficult to mimic algorithmically. We are interested in original papers on systems and ideas for systems that use common sense knowledge and reasoning, affective computing, cognitive methods, learning from broad sets of data and acquiring knowledge, or language and user preferences.
------------------------------------------------------------------
Important Dates:
Paper submission: 20 July 2022 (11:59PM UTC-12:00, "anywhere on Earth")
Notification of acceptance: 4 August 2022
Camera-Ready submission: 14 June 2022
LaCATODA 2022 Workshop: 17 October 2022
Submission: https://easychair.org/conferences/?conf=lacatoda2022
------------------------------------------------------------------
Relevant Topics:
- Affective computing
- Agent-based information retrieval
- Attention and focus in dialog processing
- Artificial assistants
- Artificial tutors
- Common sense, knowledge and reasoning - Computational cognition
- Conversational theories
- Daily life dialog systems
- Emotional intelligence simulations
- Ethical reasoning
- Humor processing
- Language acquisition
- Machine learning for / from dialogs
- Text mining for / from dialogs
- Philosophy of interaction / communication
- Preference models
- Unlimited question answering
- User modeling
- Wisdom of Crowds approaches
- World knowledge acquisition
- Systems and approaches combining above topics
Organizers:
Rafal Rzepka, Hokkaido University, Japan
Jordi Vallverdú, Autonomous University of Barcelona, Spain
Andre Wlodarczyk, Charles de Gaulle University, France
Michal Ptaszynski, Kitami institute of Technology, Japan
Pawel Dybala, Jagiellonian University, Poland
The Center for Language and Speech Processing (CLSP) at Johns Hopkins
University seeks applicants for postdoctoral fellowship positions in
speech and language processing, including the areas of natural
language processing, machine learning and health informatics.
Applicants must have a Ph.D. in a relevant discipline and a strong
research record.
Possible research topics include:
- Explainable AI, natural language processing and medicine
- Text generation and training large language models
- Information extraction, retrieval, question answering, and
human-in-the-loop learning
Johns Hopkins University is a private university located in Baltimore,
Maryland with easy access to a number of affordable and vibrant
neighborhoods.
CLSP is one of the world’s largest academic centers focused on speech
and language with a dozen faculty members and over 80 graduate
students. It has a history of placing students in top academic and
industry positions.
Applicants are not required to be US citizens or permanent residents.
Details and application information:
https://www.clsp.jhu.edu/employment-opportunities/
Dear all,
It is our great pleasure to invite you to the second edition of
multilingual representation learning that will be held at EMNLP 2022 (as a
hybrid WS) on December 8, 2022. The details regarding the paper submission
and the shared task are given below:
-----------------------------
IMPORTANT DATES
-----------------------------
*- Latest ARR deadline for paper submissions: September 1, 2022*
- The last ARR commitment deadline: October 15, 2022
- Paper notifications: November 2, 2022
- Camera-Ready Papers due: November 9, 2022
- Main Conference: December 7-11, 2022
*- Workshop: December 8, 2022*
(All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”))
---------------------------
AIMS AND SCOPE
---------------------------
The 2nd MRL Workshop invites participants to share and discuss recent
findings on multi-lingual representation learning methods and their
application in different settings. The main objectives of the workshop will
be:
To construct and present a wide array of multi-lingual representation
learning methods, including their theoretical formulation and analysis,
practical aspects such as the application of current state-of-the-art
approaches in transfer learning to different tasks or studies on adaptation
into previously under-studied context;
To provide a better understanding on how the language typology may impact
the applicability of these methods and motivate the development of novel
methods that are more generic or competitive in different languages;
To promote collaborations in developing novel software libraries or
benchmarks in implementing or evaluating multi-lingual models that would
accelerate progress in the field.
By allowing a communication means for research groups working on machine
learning, linguistic typology, or real-life applications of NLP tasks in
various languages, to share and discuss their recent findings, our ultimate
goal is to support rapid development of NLP methods and tools that are
applicable to a wider range of languages.
----------------------
SUBMISSIONS
----------------------
Research papers:
We invite all potential participants to submit their novel research
contributions in the related fields as long papers following the EMNLP 2021
long paper format (anonymized with 8 pages excluding the references, and an
additional page for the camera-ready versions for the accepted papers). All
accepted research papers will be published as part of our workshop
proceedings and presented either as oral or poster presentations.
Our research paper track only accepts submissions made to the ACL Rolling
Review.
Extended abstracts:
Besides long paper submissions, we also invite previously published or
ongoing and incomplete research contributions to our non-archival extended
abstract track. All extended abstracts can use the same EMNLP template with
a 2-page limit, excluding the bibliography. Extended abstracts can be
submitted to the workshop submission system using the softconf link:
https://softconf.com/emnlp2022/mrl-2022
<https://urldefense.proofpoint.com/v2/url?u=https-3A__softconf.com_emnlp2022…>
.
----------------------
Shared Task
----------------------
MRL 2022 features a new shared task on Multilingual Clause-level
Morphology, which aims to provide a new evaluation benchmark for assessing
multilingual representation learning models in terms of linguistic and
cross-lingual generalization capabilities. We invite all interested peers
in getting in touch about participation following the website
https://sigtyp.github.io/st2022-mrl.html and mailing list
participants-mcmsharedtask-2022(a)googlegroups.com. We anticipate awarding a
prize for the shared task winning team and a presentation devoted to them.
-------------------------
Best Paper Award
-------------------------
A Best Paper Award will be presented at the workshop.
-------------------------
Invited Speakers
-------------------------
Ev Fedorenko, Massachusetts Institute of Technology
Kyunghyun Cho, New York University
Razvan Pascanu, DeepMind
------------------------
*Contact *
Email: mrlw2022(a)gmail.com
Website: https://sigtyp.github.io/ws2022-mrl.html
*Organizers: *
Duygu Ataman
Orhan Firat
Hila Gonen
Jamshidbek Mirzakhalov
Kelechi Ogueji
Sebastian Ruder
Gözde Gül Şahin
On behalf of the organization committee,
---------------------------
Gözde Gül Şahin
Assistant Professor
Computer Engineering Department
Koç University
https://gozdesahin.github.io/
Dear colleague,
The last post concerning the upcoming HiTZ webinar included incorrect
information about the summary. This email contains the correct information.
Apologies for the mix up!
We are happy to announce an additional webinar in the Language
Technology webinar series organized by the HiTZ research center (Basque
Center for Language Technology, http://hitz.eus). Instead of the usual
afternoon hour, it will take place at 10:00am CET (June 24).
Next webinar:
Speaker: Mikel Artetxe (FAIR (Meta AI))
Title: Is scale all you need?
Date: Jun 24, 2022, 10:00 CET
Summary: Every once in a while, a new language model with gazillion
parameters makes a big splash in Twitter, smashing the previous SOTA in
some benchmarks or showing some impressive emerging capabilities. While
some may argue that scaling will eventually solve NLP, others are
skeptical about the scientific value of this trend. In this talk, I will
argue that scaling is not just engineering, but also comes with exciting
research questions. I will present some of our recent work in the topic,
and discuss our efforts to make large language models more accessible
for the community.
Bio:Mikel Artetxe is a Research Scientist at FAIR (Meta AI). His
primary area of research is multilingual NLP. Mikel was one the pioneers
of unsupervised machine translation, and has done extensive work on
cross-lingual representation learning. More recently, he has also been
working on natural language generation, few-shot learning, and
large-scale language models. Prior to joining FAIR, Mikel did his PhD at
the IXA group at the University of the Basque Country, and interned at
DeepMind, FAIR and Google.
Check past and upcoming webinars at the following url:
http://www.hitz.eus/webinars If you are interested in participating,
please complete this registration form:
http://www.hitz.eus/webinar_izenematea
If you cannot attend this seminar, but you want to be informed of the
following HiTZ webinars, please complete this registration form instead:
http://www.hitz.eus/webinar_info
Best wishes,
HiTZ Zentroa
Unsuscribe: If you do not wish to receive further emails from us, please
feel free to contact us
*** We apologize for the multiple copies of this email. In case y
Dear colleague,
We are happy to announce an additional webinar in the Language
Technology webinar series organized by the HiTZ research center (Basque
Center for Language Technology, http://hitz.eus). Instead of the usual
afternoon hour, it will take place at 10:00am CET (June 24).
Next webinar:
Speaker: Mikel Artetxe (FAIR (Meta AI))
Title: Is scale all you need?
Date: Jun 24, 2022, 10:00 CET
Summary: The development of advanced spoken language technologies
based on automatic speech recognition (ASR) and text-to-speech synthesis
(TTS) has enabled computers to either learn how to listen or speak. Many
applications and services are now available but still support fewer than
100 languages. Nearly 7000 living languages that are spoken by 350
million people remain uncovered. This is because the construction is
commonly done based on machine learning trained in a supervised fashion
where a large amount of paired speech and corresponding transcription is
required. In this talk, we will introduce a semi-supervised learning
mechanism based on a machine speech chain framework. First, we describe
the primary machine speech chain architecture that learns not only to
listen or speak but also to listen while speaking. The framework enables
ASR and TTS to teach each other given unpaired data. After that, we
describe the use of machine speech chain for code-switching and
cross-lingual ASR and TTS of several languages, including low-resourced
ethnic languages. Finally, we describe the recent multimodal machine
chain that mimics overall human communication to listen while speaking
and visualizing. With the support of image captioning and production
models, the framework enables ASR and TTS to improve their performance
using an image-only dataset.
Summary: Every once in a while, a new language model with gazillion
parameters makes a big splash in Twitter, smashing the previous SOTA in
some benchmarks or showing some impressive emerging capabilities. While
some may argue that scaling will eventually solve NLP, others are
skeptical about the scientific value of this trend. In this talk, I will
argue that scaling is not just engineering, but also comes with exciting
research questions. I will present some of our recent work in the topic,
and discuss our efforts to make large language models more accessible
for the community.
Bio:Mikel Artetxe is a Research Scientist at FAIR (Meta AI). His
primary area of research is multilingual NLP. Mikel was one the pioneers
of unsupervised machine translation, and has done extensive work on
cross-lingual representation learning. More recently, he has also been
working on natural language generation, few-shot learning, and
large-scale language models. Prior to joining FAIR, Mikel did his PhD at
the IXA group at the University of the Basque Country, and interned at
DeepMind, FAIR and Google.
Check past and upcoming webinars at the following url:
http://www.hitz.eus/webinars If you are interested in participating,
please complete this registration form:
http://www.hitz.eus/webinar_izenematea
If you cannot attend this seminar, but you want to be informed of the
following HiTZ webinars, please complete this registration form instead:
http://www.hitz.eus/webinar_info
Best wishes,
HiTZ Zentroa
Unsuscribe: If you do not wish to receive further emails from us, please
feel free to contact us