PhD in ML/NLP – Efficient, Fair, robust and knowledge informed
self-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 19th, 2022
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, natural language processing,
self-supervised learning, knowledge informed learning, Robustness, fairness
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive
and Innovative Speech Technologies) will start on November 1st 2022.
Self-supervised learning (SSL) has recently emerged as one of the most
promising artificial intelligence (AI) methods as it becomes now
feasible to take advantage of the colossal amounts of existing unlabeled
data to significantly improve the performances of various speech
processing tasks.
*PROJECT OBJECTIVES*
Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an
impressive impact on downstream tasks performance. This is mainly due to
their ability to benefit from a large amount of data at the cost of a
tremendous carbon footprint rather than improving the efficiency of the
learning. Another question related to SSL models is their unpredictable
results once applied to realistic scenarios which exhibit their lack of
robustness. Furthermore, as for any pre-trained models applied in
society, it isimportant to be able to measure the bias of such models
since they can augment social unfairness.
The goals of this PhD position are threefold:
- to design new evaluation metrics for SSL of speech models ;
- to develop knowledge-driven SSL algorithms ;
- to propose methods for learning robust and unbiased representations.
SSL models are evaluated with downstream task-dependent metrics e.g.,
word error rate for speech recognition. This couple the evaluation of
the universality of SSL representations to a potentially biased and
costly fine-tuning that also hides the efficiencyinformation related to
the pre-training cost. In practice, we will seek to measure the training
efficiency as the ratio between the amount of data, computation and
memory needed to observe a certain gain in terms of performance on a
metric of interest i.e.,downstream dependent or not. The first step will
be to document standard markers that can be used as robust measurements
to assess these values robustly at training time. Potential candidates
are, for instance, floating point operations for computational
intensity, number of neural parameters coupled with precision for
storage, online measurement of memory consumption for training and
cumulative input sequence length for data.
Most state-of-the-art SSL models for speech rely onmasked prediction
e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such
prevalence in the literature is mostly linked to the size, amount of
data and computational resources injected by thecompany producing these
models. In fact, vanilla masking approaches and contrastive losses may
be identified as uninformed solutions as they do not benefit from
in-domain expertise. For instance, it has been demonstrated that blindly
masking frames in theinput signal i.e. HuBERT and WavLM results in much
worse downstream performance than applying unsupervised phonetic
boundaries [Yue2021] to generate informed masks. Recently some studies
have demonstrated the superiority of an informed multitask learning
strategy carefully selecting self-supervised pretext-tasks with respect
to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive
learning loss [Zaiem2022]. In this PhD project, our objective is: 1.
continue to develop knowledge-driven SSL algorithms reaching higher
efficiency ratios and results at the convergence, data consumption and
downstream performance levels; and 2. scale these novel approaches to a
point enabling the comparison with current state-of-the-art systems and
therefore motivating a paradigm change in SSL for the wider speech
community.
Despite remarkable performance on academic benchmarks, SSL powered
technologies e.g. speech and speaker recognition, speech synthesis and
many others may exhibit highly unpredictable results once applied to
realistic scenarios. This can translate into a global accuracy drop due
to a lack of robustness to adversarial acoustic conditions, or biased
and discriminatory behaviors with respect to different pools of end
users. Documenting and facilitating the control of such aspects prior to
the deployment of SSL models into the real-life is necessary for the
industrial market. To evaluate such aspects, within the project, we will
create novel robustness regularization and debasing techniques along two
axes: 1. debasing and regularizing speech representations at the SSL
level; 2. debasing and regularizing downstream-adapted models (e.g.
using a pre-trained model).
To ensure the creation of fair and robust SSL pre-trained models, we
propose to act both at the optimization and data levels following some
of our previous work on adversarial protected attribute disentanglement
and the NLP literature on data sampling and augmentation [Noé2021].
Here, we wish to extend this technique to more complex SSL architectures
and more realistic conditions by increasing the disentanglement
complexity i.e. the sex attribute studied in [Noé2021] is particularly
discriminatory. Then, and to benefit from the expert knowledge induced
by the scope of the task of interest, we will build on a recent
introduction of task-dependent counterfactual equal odds criteria
[Sari2021] to minimize the downstream performance gap observed in
between different individuals of certain protected attributes and to
maximize the overall accuracy. Following this multi-objective
optimization scheme, we will then inject further identified constraints
as inspired by previous NLP work [Zhao2017]. Intuitively, constraints
are injected so the predictions are calibrated towards a desired
distribution i.e. unbiased.
*SKILLS*
*
Master 2 in Natural Language Processing, Speech Processing, computer
science or data science.
*
Good mastering of Python programming and deep learning framework.
*
Previous in Self-Supervised Learning, acoustic modeling or ASR would
be a plus
*
Very good communication skills in English
*
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The thesis will be conducted within the Getalp teams of the LIG
laboratory (_https://lig-getalp.imag.fr/_ <https://lig-getalp.imag.fr/>)
and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team
and the LIA have a strong expertise and track record in Natural Language
Processing and speech processing. The recruited person will be welcomed
within the teams which offer a stimulating, multinational and pleasant
working environment.
The means to carry out the PhD will be providedboth in terms of missions
in France and abroad and in terms of equipment. The candidate will have
access to the cluster of GPUs of both the LIG and LIA. Furthermore,
access to the National supercomputer Jean-Zay will enable to run large
scale experiments.
The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon)
and Benjamin Lecouteux and François Portet (Université Grenoble Alpes).
Joint meetings are planned on a regular basis and the student is
expected to spend time in both places. Moreover, the PhD student will
collaborate with several team members involved in the project in
particular the two other PhD candidates who will be recruited and the
partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore,
the project will involve one of the founders of SpeechBrain, Titouan
Parcollet with whom the candidate will interact closely.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master
notes + be ready to provide letter(s) of recommendation; and be
addressed to Mickael Rouvier (_mickael.rouvier(a)univ-avignon.fr_
<mailto:mickael.rouvier@univ-avignon.fr>), Benjamin
Lecouteux(benjamin.lecouteux(a)univ-grenoble-alpes.fr) and François Portet
(_francois.Portet(a)imag.fr_ <mailto:francois.Portet@imag.fr>). We
celebrate diversity and are committed to creating an inclusive
environment for all employees.
*REFERENCES:*
[Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T.,
Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker
Representation for Attribute-Driven Privacy Preservation in Proc.
Interspeech 2021 (2021), 1902–1906.
[Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually
Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio,
Speech, and Language Processing 29, 3515–3525 (2021)
[Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech
Representation Learning in Proc. Interspeech 2021 (2021), 746–750.
[Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection
for Multitask Self-Supervised Speech Representation in AAAI, The 2nd
Workshop on Self-supervised Learning for Audio and Speech Processing,
2023 (2022).
[Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W.
Men Also Like Shopping: Reducing Gender Bias Amplification using
Corpus-level Constraints in Proceedings of the 2017 Conference on
Empirical Methods in Natural Language Processing (2017), 2979–2989.
--
François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE
Phone: +33 (0)4 57 42 15 44
Email:francois.portet@imag.fr
www:http://membres-liglab.imag.fr/portet/
== 11th NLP4CALL, Louvain-la-Neuve, Belgium==
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of "Computational SLA" through setting up Second Language research infrastructure(s), on the other.
The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings "understanding" of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.
The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.
We welcome papers:
- that describe research directly aimed at ICALL;
- that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
- that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
- that discuss challenges and/or research agenda for ICALL
- that describe empirical studies on language learner data.
This year a special focus is given to work done on second language vocabulary and grammar profiling, as well as the use of crowdsourcing for creating, collecting and curating data in NLP projects.
We encourage paper presentations and software demonstrations describing the above-mentioned themes primarily, but not exclusively, for the Nordic languages.
==Invited speakers==
This year, we have the pleasure to announce two invited talks.
The first talk is by Christopher Bryant from Reverso and the University of Cambridge.
The second talk is given by Marije Michel from the University of Amsterdam.
==Submission information==
Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references. We will be using the NLP4CALL workshop template for the workshop this year. The author kit, including LaTeX and Microsoft Word templates can be accessed here, alternatively on Overleaf:
<https://spraakbanken.gu.se/sites/default/files/2022/NLP4CALL%20workshop%20t…>
<https://spraakbanken.gu.se/sites/default/files/2022/nlp4call%20template.doc>
<https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…>
Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2022>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.
Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>).
==Important dates==
7 October 2022: paper submission deadline
4 November 2022: notification of acceptance
25 November 2022: camera-ready papers for publication
9 December 2022: workshop date
==Organizers==
David Alfter (1,2), Elena Volodina (2), Thomas François (1), Piet Desmet (3), Frederik Cornillie (3), Arne Jönsson (4), Eveline Rennes (4)
(1) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium
(2) Språkbanken, University of Gothenburg, Sweden
(3) Itec, Department of Linguistics at KU Leuven & imec, Belgium
(4) Department of Computer and Information Science, Linköping University, Sweden
==Contact==
For any questions, please contact David Alfter, david.alfter(a)uclouvain.be
For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…>
Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>
David Alfter, PhD
Post-doctoral researcher
Institut Langage et communication, CENTAL
Université catholique de Louvain
Place Montesquieu, 3 (box L2.06.04)
1348 Louvain-la-Neuve
The Austrian Research Institute for Artificial Intelligence (OFAI) is
delighted to announce its 2022 Lecture Series, featuring an eclectic
lineup of internal and external speakers.
The talks are intended to familiarize attendees with the latest research
developments in AI and related fields (particularly computational
linguistics and natural language processing), and to forge new
connections with those working in other areas.
Most lectures (see prospective schedule below) will take place on
Wednesdays at 18:30 Central European (Summer) Time. All lectures will be
held online via Zoom; in-person attendance at OFAI Headquarters in
Vienna is also possible for certain lectures.
Attendance is open to the public and free of charge. No registration is
required.
Visit https://www.ofai.at/lectures for full details!
29 June
Scott Patterson
McGill University
Domesticating Wealth Inequality: Hybrid Discourse Analysis of UN General
Assembly Speeches, 1971–2018
6 July
Pamela Breda
Independent artist
Feeling for Nonexsistent Beings
13 July
Brigitte Krenn
OFAI
Robots as Social Agents: Between Construct and Reality
20 July
Tristan Miller
OFAI
What's in a Pun? Assessing the Relationship Between Phonological and
Semantic Distance and Perceived Funniness of Punning Jokes
27 July
Katrien Beuls
Université de Namur
Unravelling the Computational Mechanisms Underlying the Emergence of
Human-like Communication Systems in Populations of Autonomous Agents
7 September
Steffen Eger
Bielefeld University
Text Generation for the Humanities
14 September
Antti Arppe
University of Alberta
Finding Words that Aren't There: Using Word Embeddings to Improve
Dictionary Search for Low-resource Languages
21 September
Roman Pflugfelder
AIT Austrian Institute of Technology
Title TBA
28 September
Raphael Deimel
TU Wien
Towards Intuitive Object Handovers Between Humans and Robots
5 October
Christoph Scheepers
University of Glasgow
The “Crossword Effect” in Free Word Recall: A Retrieval Advantage for
Words Encoded in Line with their Spatial Associations
12 October
Karën Fort
Sorbonne Université
Title TBA
19 October
Benjamin Roth
University of Vienna
Evaluation and Learning with Structured Test Sets
25 October
Peter Hallman
OFAI
Comparatives in Arabic
2 November
Stephanie Gross
OFAI
Title TBA
9 November
Bernhard Pfahringer
University of Waikato
The World is not IID: Learning from Data Streams to the Rescue
16 November
Paolo Petta
OFAI
Title TBA
23 November
Robert Trappl
OFAI
Title TBA
--
Dr.-Ing. Tristan Miller, Research Scientist
Austrian Research Institute for Artificial Intelligence (OFAI)
Freyung 6/6, 1010 Vienna, Austria | Tel: +43 1 5336112 12
https://logological.org/ | https://punderstanding.ofai.at/
*** Apologies for cross-posting ***
Call for Papers: Semantics-enabled Biomedical Literature Analytics
This Special Issue aims to highlight the development of novel informatics
methods for *retrieval, indexing, and analysis of biomedical literature,
focusing on semantics-based techniques*. We invite researchers working in
biomedical informatics, knowledge representation/ontologies, information
retrieval, natural language processing, artificial intelligence/machine
learning, data mining, and other related areas to submit clear and detailed
descriptions of their novel methodological results.
The topics of interest include but are not limited to:
- Knowledge representation and semantics for biomedical literature
retrieval
- Biomedical ontologies in search
- Biomedical knowledge source integration
- Biomedical knowledge graph construction and embeddings
- Knowledge graphs in biomedical search
- Semantic knowledge in biomedical literature classification and ranking
- Biomedical information extraction
- Entity linking and semantic annotation in biomedical texts
- Literature-based knowledge discovery
- Semantics for biomedical knowledge synthesis and systematic literature
review
All submitted papers must be original and will go through a rigorous
peer-review process with at least two reviewers. Papers previously
published in conference proceedings will not be considered. JBI’s
editorial policy will be strictly followed by special issue reviewers. Note
in particular that JBI emphasizes the publication of papers that introduce
innovative and generalizable methods of interest to the informatics
community. Specific applications can be described to motivate the
methodology being introduced, but papers that focus solely on a specific
application are not suitable for JBI.
*Submission Guidelines*
Authors must submit their papers via the online Editorial Manager (EES) at
<http://ees.elsevier.com/jbi>https://www.editorialmanager.com/jbi
<https://ees.elsevier.com/jbi>. Authors should select “Semantics-enabled
Biomedical Literature Analytics” as their submission category and note in a
cover letter that their submission is for the “*Special Issue on
Semantics-enabled Biomedical Literature Analytics.*” If the manuscript is
not intended as an original research paper, the cover letter should also
specify if it is, rather, a *Methodological Review, Commentary, or Special
Communication*. Authors should make sure to place their work in the context
of human-focused biomedical research or health care, and to review
carefully the relevant literature.
JBI’s editorial policy, and the types of articles that the journal
publishes, are outlined under *Aims and Scope *on the journal home page at
https://www.sciencedirect.com/journal/journal-of-biomedical-informatics
<https://www.journals.elsevier.com/journal-of-biomedical-informatics>(click
on “View full Aims and Scope” for details). All submissions should follow
the guidelines for authors at
<https://www.elsevier.com/journals/journal-ofbiomedical-%20informatics/1532-…>*https://www.elsevier.com/journals/journal-ofbiomedical-
informatics/1532-0464/guide-for-authors
<https://www.elsevier.com/journals/journal-ofbiomedical-%20informatics/1532-…>*,
including format and manuscript structure.
*Important Dates*
Deadline for submissions: November 15, 2022
First-round review decisions: January 15, 2023
Deadline for revision submissions: February 15, 2023
Notification of final decisions: April 15, 2023
The full Call for Papers is available at
https://doi.org/10.1016/j.jbi.2022.104134. Please direct any questions
regarding the special issue to Dr. Halil Kilicoglu (halil(a)illinois.edu).
*Guest Editors:*
Halil Kilicoglu (University of Illinois Urbana-Champaign, halil(a)illinois.edu
)
Faezeh Ensan (Ryerson University, fensan(a)ryerson.ca)
Bridget McInnes (Virginia Commonwealth University, bmtinnes(a)vcu.edu)
Lucy Lu Wang (University of Washington/Allen Institute for AI, lucylw(a)uw.edu
)
--Halil
*HALIL KILICOGLU*
*Associate Professor*
School of Information Sciences
University of Illinois at Urbana-Champaign
halil(a)illinois.edu
https://ischool.illinois.edu/people/halil-kilicoglu
*** First Workshop on Information Extraction from Scientific Publications (
WIESP) at AACL-IJCNLP 2022 ***
*** Website: https://ui.adsabs.harvard.edu/WIESP/
*** Twitter: https://twitter.com/wiesp_nlp
The number of scientific papers published per year has exploded in recent
years. Indexing the article's full text in search engines helps discover
and retrieve vital scientific information to continue building on the
shoulders of giants, informing policy, and making evidence-based decisions.
Nevertheless, it is difficult to navigate this ocean of data. Using simple
string matching has substantial limitations: human language is ambiguous in
nature, context matters, and we frequently use the same word and acronyms
to represent a multitude of different meanings. Extracting structured and
semantically relevant information from scientific publications (e.g.,
named-entity recognition, summarization, citation intention, linkage to
knowledge graphs) allows for better selection and filter articles.
The First Workshop on Information Extraction from Scientific Publications (
WIESP) will create the necessary forum to foster discussion and research
using Natural Language Processing and Machine Learning. WIESP would
specifically focus on topics related to information extraction from
scientific publications, including (but not limited to):
- Scientific document parsing
- Scientific named-entity recognition
- Scientific article summarization
- Question-answering on scientific articles
- Citation context/span extraction
- Structured information extraction from full-text, tables, figures,
bibliography
- Novel datasets curated from scientific publications
- Argument extraction and mining
- Challenges in information extraction from scientific articles
- Building knowledge graphs via mining scientific literature; querying
scientific knowledge graphs
- Novel tools for IE on scientific literature and interaction with users
- Mathematical information extraction
- Scientific concepts, facts extraction
- Visualizing scientific knowledge
- Bibliometric and Altmetric studies via information extraction from
scientific articles and metadata
- Information extraction from COVID-19 articles to inform public health
policy
In addition to research paper presentations, WIESP would also feature
keynote talks, a panel discussion, and a shared task. We will update the
details on our website as and when they become available. We especially
welcome participation from academic and research institutions, government
and industry labs, publishers, and information service providers. Projects
and organizations using NLP/ML techniques in their text mining and
enrichment efforts are also welcome to participate.
***Call for Papers***
We invite papers of the following categories:
***Long papers*** must describe substantial, original, completed, and
unpublished work. Wherever appropriate, concrete evaluation and analysis
should be included. Papers must not exceed eight (8) pages of content, plus
unlimited pages of references. The final versions of long papers will be
given one additional page of content (up to 9 pages) so that reviewers'
comments can be taken into account.
***Short papers*** must describe original and unpublished work. Please note
that a short paper is not a shortened long paper. Instead, short papers
should have a point that can be made in a few pages, such as a small,
focused contribution, a negative result, or an interesting application
nugget. Short papers must not exceed four (4) pages, plus unlimited pages
of references. The final versions of short papers will be given one
additional page of content (up to 5 pages) so that reviewers' comments can
be taken into account.
***Position papers*** will give voice to authors who wish to take a
position on a topic listed above or the field of scholarly information
extraction. Submissions need not present original work and should be two to
four pages in length, including title, text, figures and tables, and
references.
***Demo papers*** should be no more than four (4) pages in length,
including references, and should describe implemented systems that are of
relevance to the theme of the workshop. Authors of demo papers should be
willing to present a demo of their system during WIESP at AACL-IJCNLP 2022.
***Extended Abstracts*** We welcome submissions of extended abstracts (2
pages max) related to the research topics mentioned above. Submissions may
include previously published results, late-breaking results, or a
description of ongoing projects in the broad field of information
extraction and mining from scientific publications. Extended abstracts can
also summarize existing work, work in progress, or a collection of works
under a unified theme (e.g., a series of closely related papers that build
on each other or tackle a common problem).
***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)***
A good amount of astrophysics research makes use of data coming from
missions and facilities such as ground observatories in remote locations or
space telescopes, as well as digital archives that hold large amounts of
observed and simulated data. These missions and facilities are frequently
named after historical figures or use some ingenious acronym which,
unfortunately, can be easily confused when searching for them in the
literature via simple string matching. For instance, Planck can refer to
the person, the mission, the constant, or several institutions.
Automatically recognizing entities such as missions or facilities would
help tackle this word sense disambiguation problem.
The shared task consists of Named Entity Recognition (NER) on samples of
text extracted from astrophysics publications. The labels were created by
domain experts and designed to identify entities of interest to the
astrophysics community. They range from simple to detect (ex: URLs) to
highly unstructured (ex: Formula), and from useful to researchers (ex:
Telescope) to more useful to archivists and administrators (ex: Grant).
Overall, 31 different labels are included, and their distribution is highly
unbalanced (ex: ~100x more Citations than Proposals). Submissions will be
scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity
level and scikit-learn's Matthews correlation coefficient method at the
token level. We also encourage authors to propose their own evaluation
metrics. A sample dataset and more instructions can be found at:
https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks
Participants (individuals or groups) will have the opportunity to present
their findings during the workshop and write a short paper. The best
performant or interesting approaches might be invited to further
collaborate with the NASA Astrophysical Data System (
https://ui.adsabs.harvard.edu/).
***Important Dates***
- Paper/Abstract Submission Deadline: August 25, 2022
- Notification of workshop paper/abstract acceptance: September 25, 2022
- Camera-ready Submission Deadline: October 10, 2022
- Workshop: November 20, 2021 (online)
***All submission deadlines are 11.59 pm UTC -12h ("Anywhere on Earth")***
***Submission Website and Format***
Submission Link: softconf.com/aacl2022/WIESP
Submission will be via softconf. Submissions should follow the ACLPUB
formatting guidelines (https://acl-org.github.io/ACLPUB/formatting.html)
and template files (https://github.com/acl-org/acl-style-files/tree/master).
Submissions (Long and Short Papers) will be subject to a double-blind
peer-review process. Position papers, Demo papers, and Extended Abstracts
need not be anonymized. The authors will present accepted papers at the
workshop either as a talk or a poster. All accepted papers will be
published in the workshop proceedings.
We follow the same policies as AACL-IJCNLP 2022 regarding preprints and
double submissions. The anonymity period for WIESP 2022 is from July 15 to
September 25.
***Organizers***
- Tirthankar Ghosal, Charles University, CZ
- Sergi Blanco-Cuaresma, Center for Astrophysics | Harvard & Smithsonian,
USA
- Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA
- Robert M. Patton, Oak Ridge National Laboratory, USA
- Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA
- Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA
--
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
Researcher at UFAL, Charles University, CZ
https://member.acm.org/~tghosal
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Call for papers for the International Journal of Learner Corpus Research (Benjamins):
Special issue on Cumulative knowledge building and replication in Learner Corpus Research
Guest editors: Tove Larsson & Doug Biber (Northern Arizona University)
Compared to other subfields of linguistics, Learner Corpus Research (LCR) has a relatively short history. For this and other reasons, most of the studies that get published in the field are exploratory in nature and focus on topics that have yet to receive prolonged attention. Such studies no doubt make valuable contributions to the field. However, LCR is arguably mature enough as a field to also have accumulated enough knowledge on certain topics for researchers to be able to instead adopt a cumulative approach.
In the cumulative approach to knowledge building, individual studies are viewed as building blocks, carefully pieced together to help us form an increasingly better understanding of a topic. There are three distinguishing characteristics of this approach: First, the literature review focuses on what we have actually learned from previous research on the topic, rather than merely cataloging individual studies. Second, the research ‘gap’ refers to an important missing element in our cumulative knowledge, rather than to a research angle that has not been explored yet; that is, the literature review is used to identify a missing piece in an existing puzzle, rather than to justify starting a new one. And finally, results of the new study are explicitly compared to previous findings, to discuss the state of our knowledge based on all studies taken together. Through this big-picture thinking, we can collectively refine our understanding of the topic, and further our knowledge in a systematic matter. Put differently, this approach enables us to build a state-of-the-art in the field by moving beyond the results of individual studies.
With this call, we invite studies of two kinds:
* Empirical studies that set out to test hypotheses arrived at from an existing body of research with the explicit aim of adding to our knowledge on a given topic that has received ample attention in LCR. Examples of topics that may be ripe for studies of this kind include, but are not limited to, linguistic complexity and the formulaic nature of learner language.
* Empirical studies that replicate findings from an existing body of research and, importantly, that focus on strengthening and/or tweaking existing generalizations in LCR. Examples of topics include, but are not limited to, claims of the spoken-like nature of learners’ written production.
Timeline:
* August 1, 2022: Abstract and title due
* September 1, 2022: Authors are notified
* September 1, 2023: Full manuscript due
Please send submissions to tove.larsson(a)nau.edu<mailto:tove.larsson@nau.edu>
---
Tove Larsson, Ph.D.
Assistant Professor of Applied Linguistics
English Department
Northern Arizona University
https://tovelarssoncl.wordpress.com
Dear All,
We are the guest editors of the special issue “Mathematical and Computational Modeling of Language and Social Behaviors” in Mathematics<https://www.mdpi.com/journal/mathematics> (IF=2.592, Q1).
We would like to call for papers to the above special issue from people whose research interest include computational linguistics and the related areas. Deadline for manuscript submissions: 30 June 2023.
The aim of the special issue is to highlight the contributions of quantitative modeling and NLP technology to understanding collective human behaviors and to help resolve some of the greatest challenges of our time. We welcome new or improved methods to model linked data from heterogeneous sources and their computational application to solve some real-world problems relating to languages and social behaviors. Topics of interest include such as Sentiment and/or Emotion Analysis, fake news detection, FinNLP and Medical Informatics.
Check the details about the special issue through the link: https://www.mdpi.com/si/mathematics/Mathe_Compu_NLP
We look forward to your submissions and contribution to this special issue. Thank you very much!
Best,
Clara
(on behalf of the Guest Editors)
[https://www.polyu.edu.hk/emaildisclaimer/85A-PolyU_Email_Signature.jpg]
Disclaimer:
This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful.
The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.
** With apologies for multiple posting **
The Seventeenth International Workshop on
ONTOLOGY MATCHING
(OM-2022)
http://om2022.ontologymatching.org/
October 23rd or 24th, 2022,
International Semantic Web Conference (ISWC) Workshop Program,
Hybrid conference, Hangzhou, China
BRIEF DESCRIPTION AND OBJECTIVES
Ontology matching is a key interoperability enabler for the Semantic Web,
as well as a useful technique in some classical data integration tasks
dealing with the semantic heterogeneity problem. It takes ontologies
as input and determines as output an alignment, that is, a set of
correspondences between the semantically related entities of those ontologies.
These correspondences can be used for various tasks, such as ontology
merging, data interlinking, query answering or navigation over knowledge graphs.
Thus, matching ontologies enables the knowledge and data expressed
with the matched ontologies to interoperate.
The workshop has three goals:
1.
To bring together leaders from academia, industry and user institutions
to assess how academic advances are addressing real-world requirements.
The workshop will strive to improve academic awareness of industrial
and final user needs, and therefore, direct research towards those needs.
Simultaneously, the workshop will serve to inform industry and user
representatives about existing research efforts that may meet their
requirements. The workshop will also investigate how the ontology
matching technology is going to evolve, especially with respect to
data interlinking, knowledge graph and web table matching tasks.
2.
To conduct an extensive and rigorous evaluation of ontology matching
and instance matching (link discovery) approaches through
the OAEI (Ontology Alignment Evaluation Initiative) 2022 campaign:
http://oaei.ontologymatching.org/2022/
3.
To examine similarities and differences from other, old, new and emerging,
techniques and usages, such as web table matching or knowledge embeddings.
TOPICS of interest include but are not limited to:
Business and use cases for matching (e.g., big, open, closed data);
Requirements to matching from specific application scenarios (e.g., public sector);
Application of matching techniques in real-world scenarios (e.g., in cloud, with mobile apps);
Formal foundations and frameworks for matching;
Novel matching methods, including link prediction, ontology-based access;
Matching and knowledge graphs;
Matching and deep learning;
Matching and embeddings;
Matching and big data;
Matching and linked data;
Instance matching, data interlinking and relations between them;
Privacy-aware matching;
Process model matching;
Large-scale and efficient matching techniques;
Matcher selection, combination and tuning;
User involvement (including both technical and organizational aspects);
Explanations in matching;
Social and collaborative matching;
Uncertainty in matching;
Expressive alignments;
Reasoning with alignments;
Alignment coherence and debugging;
Alignment management;
Matching for traditional applications (e.g., data science);
Matching for emerging applications (e.g., web tables, knowledge graphs).
SUBMISSIONS
Contributions to the workshop can be made in terms of technical papers and
posters/statements of interest addressing different issues of ontology matching
as well as participating in the OAEI 2022 campaign. Long technical papers should
be of max. 12 pages. Short technical papers should be of max. 5 pages.
Posters/statements of interest should not exceed 2 pages.
All contributions have to be prepared using the LNCS Style:
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0
and should be submitted in PDF format (no later than August 9th, 2022)
through the workshop submission site at:
https://www.easychair.org/conferences/?conf=om2022
Contributors to the OAEI 2022 campaign have to follow the campaign conditions
and schedule at http://oaei.ontologymatching.org/2022/.
DATES FOR TECHNICAL PAPERS AND POSTERS:
August 9th, 2022: Deadline for the submission of papers.
September 6th, 2022: Deadline for the notification of acceptance/rejection.
September 20th, 2022: Workshop camera ready copy submission.
October 23rd or 24th, 2022: OM-2022, hybrid conference, Hangzhou, China.
Contributions will be refereed by the Program Committee.
Accepted papers will be published in the workshop proceedings as a volume of CEUR-WS as well as indexed on DBLP.
ORGANIZING COMMITTEE
1. Pavel Shvaiko (main contact)
Trentino Digitale, Italy
2. Jérôme Euzenat
INRIA & Univ. Grenoble Alpes, France
3. Ernesto Jiménez-Ruiz
City, University of London, UK & SIRIUS, University of Oslo, Norway
4. Oktie Hassanzadeh
IBM Research, USA
5. Cássia Trojahn
IRIT, France
PROGRAM COMMITTEE (to be completed):
Alsayed Algergawy, Jena University, Germany
Manuel Atencia, Universidad de Málaga, Spain
Jiaoyan Chen, University of Oxford, UK
Jérôme David, University Grenoble Alpes & INRIA, France
Gayo Diallo, University of Bordeaux, France
Daniel Faria, Instituto Gulbenkian de Ciéncia, Portugal
Alfio Ferrara, University of Milan, Italy
Marko Gulic, University of Rijeka, Croatia
Wei Hu, Nanjing University, China
Ryutaro Ichise, National Institute of Informatics, Japan
Antoine Isaac, Vrije Universiteit Amsterdam & Europeana, Netherlands
Naouel Karam, Fraunhofer, Germany
Prodromos Kolyvakis, EPFL, Switzerland
Patrick Lambrix, Linköpings Universitet, Sweden
Oliver Lehmberg, University of Mannheim, Germany
Fiona McNeill, University of Edinburgh, UK
Majid Mohammadi, Eindhoven University of Technology, Netherlands
Hoa Ngo, CSIRO, Australia
George Papadakis, University of Athens, Greece
Henry Rosales-Méndez, University of Chile, Chile
Booma Sowkarthiga, Microsoft, USA
Kavitha Srinivas, IBM, USA
Ludger van Elst, DFKI, Germany
Xingsi Xue, Fujian University of Technology, China
Ondrej Zamazal, Prague University of Economics, Czech Republic
Songmao Zhang, Chinese Academy of Science, China
Lu Zhou, TigerGraph, USA
---------------------
Best regards,
Cassia
WiNLP 2022 – Call for Submissions
http://www.winlp.org/winlp22-call-for-papers/
Workshop Date
December 7th or 8th, 2022 (date TBD)
EMNLP 2022
Abu Dhabi, UAE
The Sixth Widening Natural Language Processing Workshop (WiNLP) will be held in conjunction with EMNLP 2022 in Abu Dhabi, UAE. Since EMNLP is anticipating a hybrid format for their conference, we also anticipate our workshop will be hybrid, with both online and in-person attendees. The one-day workshop will occur during EMNLP’s workshop period either December 7th or 8th, 2022 (date TBD).
We invite authors from underrepresented groups in Natural Language Processing (NLP) to submit a two-page abstract to be considered for a poster presentation at our workshop.
Important Dates:
Last date to join author workshopping: August 24, 2022
Submission deadline: September 7, 2022
Acceptance notification: October 9, 2022
Travel grant applications due: October 21, 2022
Travel grant notification: October 25, 2022
Workshop Description
The WiNLP workshop is open to all to foster an inclusive and welcoming ACL environment. It aims to promote diversity and highlight the work of underrepresented groups in NLP: anyone who self-identifies within an underrepresented group [based on gender, ethnicity, nationality, sexual orientation, disability status, or otherwise] is invited to submit a two-page abstract for a poster presentation. In our 2022 iteration, we hope to be more intentional about centering discussions of access and disability, as well as contributing to diversity in scientific background, discipline, training, obtained degrees, seniority, and communities from underrepresented languages.
The full-day event includes invited talks, oral presentations, and poster sessions. The workshop provides an excellent opportunity for junior members in the community to showcase their work and connect with senior mentors for feedback and career advice. It also offers recruitment opportunities with leading industrial labs. Most importantly, the workshop will provide an inclusive and accepting space, and work to lower structural barriers to joining and collaborating with the NLP community at large.
Submission guidelines
While everyone is encouraged to attend, the opportunity to present a talk or poster is intended for members of underrepresented groups at all career levels: students, post-docs, professors, and other researchers. Since many submissions are works in progress, we act as a non-archival repository for these works: while authors may elect to have their papers linked from our website, they will not be archived in the ACL Anthology. Authors may elect to not have their submission listed on the website if they wish to avoid de-anonymizing themselves for later submissions to other venues.
Submissions should be two pages long (not including references). Authors must use the ACL Rolling Review style files to format their submission, and must submit it electronically in PDF format via the WiNLP 2022 online submission system: https://softconf.com/emnlp2022/WiNLP22/.https://softconf.com/emnlp2022/WiNL…
Travel Support
There will be a limited amount of travel grants and/or additional funding to cover expenses, similar to the previous editions. Funding is available for travel, lodging, registration, and visa costs for one author for each submission. The funded author may elect to attend virtually if they prefer. The selected author should be identified as part of the travel grant submission form. If we find ourselves with extra funds, we will attempt to support further funding for virtual attendance for additional authors, but we do not guarantee we can support any further in-person attendance. We recommend additional student authors keep an eye out for the EMNLP call for student volunteers or call for D&I subsidies as opportunities for further funding.
For further details please visit our website: http://www.winlp.org/winlp22-call-for-papers/http://www.winlp.org/winlp22-c…
The Seventeenth International Workshop on
ONTOLOGY MATCHING
(OM-2022)
http://om2022.ontologymatching.org/
October 23rd or 24th, 2022,
International Semantic Web Conference (ISWC) Workshop Program,
Hybrid conference, Hangzhou, China
=====================================================================
The submission deadline for tech. papers is approaching in 2 weeks on Aug.
9th:
https://www.easychair.org/conferences/?conf=om2022
=====================================================================
BRIEF DESCRIPTION AND OBJECTIVES
Ontology matching is a key interoperability enabler for the Semantic Web,
as well as a useful technique in some classical data integration tasks
dealing with the semantic heterogeneity problem. It takes ontologies
as input and determines as output an alignment, that is, a set of
correspondences between the semantically related entities of those
ontologies.
These correspondences can be used for various tasks, such as ontology
merging, data interlinking, query answering or navigation over knowledge
graphs.
Thus, matching ontologies enables the knowledge and data expressed
with the matched ontologies to interoperate.
The workshop has three goals:
1.
To bring together leaders from academia, industry and user institutions
to assess how academic advances are addressing real-world requirements.
The workshop will strive to improve academic awareness of industrial
and final user needs, and therefore, direct research towards those needs.
Simultaneously, the workshop will serve to inform industry and user
representatives about existing research efforts that may meet their
requirements. The workshop will also investigate how the ontology
matching technology is going to evolve, especially with respect to
data interlinking, knowledge graph and web table matching tasks.
2.
To conduct an extensive and rigorous evaluation of ontology matching
and instance matching (link discovery) approaches through
the OAEI (Ontology Alignment Evaluation Initiative) 2022 campaign:
http://oaei.ontologymatching.org/2022/
3.
To examine similarities and differences from other, old, new and emerging,
techniques and usages, such as web table matching or knowledge embeddings.
TOPICS of interest include but are not limited to:
Business and use cases for matching (e.g., big, open, closed data);
Requirements to matching from specific application scenarios (e.g.,
public sector);
Application of matching techniques in real-world scenarios (e.g., in
cloud, with mobile apps);
Formal foundations and frameworks for matching;
Novel matching methods, including link prediction, ontology-based
access;
Matching and knowledge graphs;
Matching and deep learning;
Matching and embeddings;
Matching and big data;
Matching and linked data;
Instance matching, data interlinking and relations between them;
Privacy-aware matching;
Process model matching;
Large-scale and efficient matching techniques;
Matcher selection, combination and tuning;
User involvement (including both technical and organizational aspects);
Explanations in matching;
Social and collaborative matching;
Uncertainty in matching;
Expressive alignments;
Reasoning with alignments;
Alignment coherence and debugging;
Alignment management;
Matching for traditional applications (e.g., data science);
Matching for emerging applications (e.g., web tables, knowledge graphs).
SUBMISSIONS
Contributions to the workshop can be made in terms of technical papers and
posters/statements of interest addressing different issues of ontology
matching
as well as participating in the OAEI 2022 campaign. Long technical papers
should
be of max. 12 pages. Short technical papers should be of max. 5 pages.
Posters/statements of interest should not exceed 2 pages.
All contributions have to be prepared using the LNCS Style:
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0
and should be submitted in PDF format (no later than August 9th, 2022)
through the workshop submission site at:
https://www.easychair.org/conferences/?conf=om2022
Contributors to the OAEI 2022 campaign have to follow the campaign
conditions
and schedule at http://oaei.ontologymatching.org/2022/.
DATES FOR TECHNICAL PAPERS AND POSTERS:
August 9th, 2022: Deadline for the submission of papers.
September 6th, 2022: Deadline for the notification of
acceptance/rejection.
September 20th, 2022: Workshop camera ready copy submission.
October 23rd or 24th, 2022: OM-2022, hybrid conference, Hangzhou,
China.
Contributions will be refereed by the Program Committee.
Accepted papers will be published in the workshop proceedings
as a volume of CEUR-WS as well as indexed on DBLP.
ORGANIZING COMMITTEE
1. Pavel Shvaiko (main contact)
Trentino Digitale, Italy
2. Jérôme Euzenat
INRIA & Univ. Grenoble Alpes, France
3. Ernesto Jiménez-Ruiz
City, University of London, UK & SIRIUS, University of Oslo, Norway
4. Oktie Hassanzadeh
IBM Research, USA
5. Cássia Trojahn
IRIT, France
PROGRAM COMMITTEE:
Alsayed Algergawy, Jena University, Germany
Manuel Atencia, Universidad de Málaga, Spain
Jiaoyan Chen, University of Oxford, UK
Jérôme David, University Grenoble Alpes & INRIA, France
Gayo Diallo, University of Bordeaux, France
Daniel Faria, Instituto Gulbenkian de Ciéncia, Portugal
Alfio Ferrara, University of Milan, Italy
Marko Gulic, University of Rijeka, Croatia
Wei Hu, Nanjing University, China
Ryutaro Ichise, National Institute of Informatics, Japan
Antoine Isaac, Vrije Universiteit Amsterdam & Europeana, Netherlands
Naouel Karam, Fraunhofer, Germany
Prodromos Kolyvakis, EPFL, Switzerland
Patrick Lambrix, Linköpings Universitet, Sweden
Oliver Lehmberg, University of Mannheim, Germany
Fiona McNeill, University of Edinburgh, UK
Majid Mohammadi, Eindhoven University of Technology, Netherlands
Hoa Ngo, CSIRO, Australia
George Papadakis, University of Athens, Greece
Henry Rosales-Méndez, University of Chile, Chile
Booma Sowkarthiga, Microsoft, USA
Kavitha Srinivas, IBM, USA
Ludger van Elst, DFKI, Germany
Xingsi Xue, Fujian University of Technology, China
Ondrej Zamazal, Prague University of Economics, Czech Republic
Songmao Zhang, Chinese Academy of Science, China
Lu Zhou, TigerGraph, USA
-------------------------------------------------------
More about ontology matching:
http://www.ontologymatching.org/http://book.ontologymatching.org/
-------------------------------------------------------
Best regards
Cassia Trojahn