July 2022 - Corpora - ELRA lists

PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing
by François Portet 01 Jun '23

01 Jun '23

PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing Starting date: November 1st, 2022 (flexible) Application deadline: September 5th, 2022 Interviews (tentative): September 19th, 2022 Salary: ~2000€ gross/month (social security included) Mission: research oriented (teaching possible but not mandatory) *Keywords:*speech processing, natural language processing, self-supervised learning, knowledge informed learning, Robustness, fairness *CONTEXT* The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies) will start on November 1st 2022. Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks. *PROJECT OBJECTIVES* Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an impressive impact on downstream tasks performance. This is mainly due to their ability to benefit from a large amount of data at the cost of a tremendous carbon footprint rather than improving the efficiency of the learning. Another question related to SSL models is their unpredictable results once applied to realistic scenarios which exhibit their lack of robustness. Furthermore, as for any pre-trained models applied in society, it isimportant to be able to measure the bias of such models since they can augment social unfairness. The goals of this PhD position are threefold: - to design new evaluation metrics for SSL of speech models ; - to develop knowledge-driven SSL algorithms ; - to propose methods for learning robust and unbiased representations. SSL models are evaluated with downstream task-dependent metrics e.g., word error rate for speech recognition. This couple the evaluation of the universality of SSL representations to a potentially biased and costly fine-tuning that also hides the efficiencyinformation related to the pre-training cost. In practice, we will seek to measure the training efficiency as the ratio between the amount of data, computation and memory needed to observe a certain gain in terms of performance on a metric of interest i.e.,downstream dependent or not. The first step will be to document standard markers that can be used as robust measurements to assess these values robustly at training time. Potential candidates are, for instance, floating point operations for computational intensity, number of neural parameters coupled with precision for storage, online measurement of memory consumption for training and cumulative input sequence length for data. Most state-of-the-art SSL models for speech rely onmasked prediction e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such prevalence in the literature is mostly linked to the size, amount of data and computational resources injected by thecompany producing these models. In fact, vanilla masking approaches and contrastive losses may be identified as uninformed solutions as they do not benefit from in-domain expertise. For instance, it has been demonstrated that blindly masking frames in theinput signal i.e. HuBERT and WavLM results in much worse downstream performance than applying unsupervised phonetic boundaries [Yue2021] to generate informed masks. Recently some studies have demonstrated the superiority of an informed multitask learning strategy carefully selecting self-supervised pretext-tasks with respect to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive learning loss [Zaiem2022]. In this PhD project, our objective is: 1. continue to develop knowledge-driven SSL algorithms reaching higher efficiency ratios and results at the convergence, data consumption and downstream performance levels; and 2. scale these novel approaches to a point enabling the comparison with current state-of-the-art systems and therefore motivating a paradigm change in SSL for the wider speech community. Despite remarkable performance on academic benchmarks, SSL powered technologies e.g. speech and speaker recognition, speech synthesis and many others may exhibit highly unpredictable results once applied to realistic scenarios. This can translate into a global accuracy drop due to a lack of robustness to adversarial acoustic conditions, or biased and discriminatory behaviors with respect to different pools of end users. Documenting and facilitating the control of such aspects prior to the deployment of SSL models into the real-life is necessary for the industrial market. To evaluate such aspects, within the project, we will create novel robustness regularization and debasing techniques along two axes: 1. debasing and regularizing speech representations at the SSL level; 2. debasing and regularizing downstream-adapted models (e.g. using a pre-trained model). To ensure the creation of fair and robust SSL pre-trained models, we propose to act both at the optimization and data levels following some of our previous work on adversarial protected attribute disentanglement and the NLP literature on data sampling and augmentation [Noé2021]. Here, we wish to extend this technique to more complex SSL architectures and more realistic conditions by increasing the disentanglement complexity i.e. the sex attribute studied in [Noé2021] is particularly discriminatory. Then, and to benefit from the expert knowledge induced by the scope of the task of interest, we will build on a recent introduction of task-dependent counterfactual equal odds criteria [Sari2021] to minimize the downstream performance gap observed in between different individuals of certain protected attributes and to maximize the overall accuracy. Following this multi-objective optimization scheme, we will then inject further identified constraints as inspired by previous NLP work [Zhao2017]. Intuitively, constraints are injected so the predictions are calibrated towards a desired distribution i.e. unbiased. *SKILLS* * Master 2 in Natural Language Processing, Speech Processing, computer science or data science. * Good mastering of Python programming and deep learning framework. * Previous in Self-Supervised Learning, acoustic modeling or ASR would be a plus * Very good communication skills in English * Good command of French would be a plus but is not mandatory *SCIENTIFIC ENVIRONMENT* The thesis will be conducted within the Getalp teams of the LIG laboratory (_https://lig-getalp.imag.fr/_ <https://lig-getalp.imag.fr/>) and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team and the LIA have a strong expertise and track record in Natural Language Processing and speech processing. The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment. The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and LIA. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments. The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon) and Benjamin Lecouteux and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore, the project will involve one of the founders of SpeechBrain, Titouan Parcollet with whom the candidate will interact closely. *INSTRUCTIONS FOR APPLYING* Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Mickael Rouvier (_mickael.rouvier(a)univ-avignon.fr_ <mailto:mickael.rouvier@univ-avignon.fr>), Benjamin Lecouteux(benjamin.lecouteux(a)univ-grenoble-alpes.fr) and François Portet (_francois.Portet(a)imag.fr_ <mailto:francois.Portet@imag.fr>). We celebrate diversity and are committed to creating an inclusive environment for all employees. *REFERENCES:* [Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T., Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation in Proc. Interspeech 2021 (2021), 1902–1906. [Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3515–3525 (2021) [Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech Representation Learning in Proc. Interspeech 2021 (2021), 746–750. [Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection for Multitask Self-Supervised Speech Representation in AAAI, The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing, 2023 (2022). [Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), 2979–2989. -- François PORTET Professeur - Univ Grenoble Alpes Laboratoire d'Informatique de Grenoble - Équipe GETALP Bâtiment IMAG - Office 333 700 avenue Centrale Domaine Universitaire - 38401 St Martin d'Hères FRANCE Phone: +33 (0)4 57 42 15 44 Email:francois.portet@imag.fr www:http://membres-liglab.imag.fr/portet/

1 6

[Corpora-List]NLP4CALL 2022 First call for papers
by David Alfter 20 Feb '23

20 Feb '23

== 11th NLP4CALL, Louvain-la-Neuve, Belgium== The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of "Computational SLA" through setting up Second Language research infrastructure(s), on the other. The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings "understanding" of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools. The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field. We welcome papers: - that describe research directly aimed at ICALL; - that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning; - that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback; - that discuss challenges and/or research agenda for ICALL - that describe empirical studies on language learner data. This year a special focus is given to work done on second language vocabulary and grammar profiling, as well as the use of crowdsourcing for creating, collecting and curating data in NLP projects. We encourage paper presentations and software demonstrations describing the above-mentioned themes primarily, but not exclusively, for the Nordic languages. ==Invited speakers== This year, we have the pleasure to announce two invited talks. The first talk is by Christopher Bryant from Reverso and the University of Cambridge. The second talk is given by Marije Michel from the University of Amsterdam. ==Submission information== Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references. We will be using the NLP4CALL workshop template for the workshop this year. The author kit, including LaTeX and Microsoft Word templates can be accessed here, alternatively on Overleaf: <https://spraakbanken.gu.se/sites/default/files/2022/NLP4CALL%20workshop%20t…> <https://spraakbanken.gu.se/sites/default/files/2022/nlp4call%20template.doc> <https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…> Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2022>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments. Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>). ==Important dates== 7 October 2022: paper submission deadline 4 November 2022: notification of acceptance 25 November 2022: camera-ready papers for publication 9 December 2022: workshop date ==Organizers== David Alfter (1,2), Elena Volodina (2), Thomas François (1), Piet Desmet (3), Frederik Cornillie (3), Arne Jönsson (4), Eveline Rennes (4) (1) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium (2) Språkbanken, University of Gothenburg, Sweden (3) Itec, Department of Linguistics at KU Leuven & imec, Belgium (4) Department of Computer and Information Science, Linköping University, Sweden ==Contact== For any questions, please contact David Alfter, david.alfter(a)uclouvain.be For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…> Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/> David Alfter, PhD Post-doctoral researcher Institut Langage et communication, CENTAL Université catholique de Louvain Place Montesquieu, 3 (box L2.06.04) 1348 Louvain-la-Neuve

1 7

OFAI 2022 Lecture Series
by Tristan Miller 09 Nov '22

09 Nov '22

The Austrian Research Institute for Artificial Intelligence (OFAI) is delighted to announce its 2022 Lecture Series, featuring an eclectic lineup of internal and external speakers. The talks are intended to familiarize attendees with the latest research developments in AI and related fields (particularly computational linguistics and natural language processing), and to forge new connections with those working in other areas. Most lectures (see prospective schedule below) will take place on Wednesdays at 18:30 Central European (Summer) Time. All lectures will be held online via Zoom; in-person attendance at OFAI Headquarters in Vienna is also possible for certain lectures. Attendance is open to the public and free of charge. No registration is required. Visit https://www.ofai.at/lectures for full details! 29 June Scott Patterson McGill University Domesticating Wealth Inequality: Hybrid Discourse Analysis of UN General Assembly Speeches, 1971–2018 6 July Pamela Breda Independent artist Feeling for Nonexsistent Beings 13 July Brigitte Krenn OFAI Robots as Social Agents: Between Construct and Reality 20 July Tristan Miller OFAI What's in a Pun? Assessing the Relationship Between Phonological and Semantic Distance and Perceived Funniness of Punning Jokes 27 July Katrien Beuls Université de Namur Unravelling the Computational Mechanisms Underlying the Emergence of Human-like Communication Systems in Populations of Autonomous Agents 7 September Steffen Eger Bielefeld University Text Generation for the Humanities 14 September Antti Arppe University of Alberta Finding Words that Aren't There: Using Word Embeddings to Improve Dictionary Search for Low-resource Languages 21 September Roman Pflugfelder AIT Austrian Institute of Technology Title TBA 28 September Raphael Deimel TU Wien Towards Intuitive Object Handovers Between Humans and Robots 5 October Christoph Scheepers University of Glasgow The “Crossword Effect” in Free Word Recall: A Retrieval Advantage for Words Encoded in Line with their Spatial Associations 12 October Karën Fort Sorbonne Université Title TBA 19 October Benjamin Roth University of Vienna Evaluation and Learning with Structured Test Sets 25 October Peter Hallman OFAI Comparatives in Arabic 2 November Stephanie Gross OFAI Title TBA 9 November Bernhard Pfahringer University of Waikato The World is not IID: Learning from Data Streams to the Rescue 16 November Paolo Petta OFAI Title TBA 23 November Robert Trappl OFAI Title TBA -- Dr.-Ing. Tristan Miller, Research Scientist Austrian Research Institute for Artificial Intelligence (OFAI) Freyung 6/6, 1010 Vienna, Austria | Tel: +43 1 5336112 12 https://logological.org/ | https://punderstanding.ofai.at/

1 2

Journal of Biomedical Informatics (JBI) Special Issue on Semantics-enabled Biomedical Literature Analytics
by Halil Kilicoglu 07 Nov '22

07 Nov '22

*** Apologies for cross-posting *** Call for Papers: Semantics-enabled Biomedical Literature Analytics This Special Issue aims to highlight the development of novel informatics methods for *retrieval, indexing, and analysis of biomedical literature, focusing on semantics-based techniques*. We invite researchers working in biomedical informatics, knowledge representation/ontologies, information retrieval, natural language processing, artificial intelligence/machine learning, data mining, and other related areas to submit clear and detailed descriptions of their novel methodological results. The topics of interest include but are not limited to: - Knowledge representation and semantics for biomedical literature retrieval - Biomedical ontologies in search - Biomedical knowledge source integration - Biomedical knowledge graph construction and embeddings - Knowledge graphs in biomedical search - Semantic knowledge in biomedical literature classification and ranking - Biomedical information extraction - Entity linking and semantic annotation in biomedical texts - Literature-based knowledge discovery - Semantics for biomedical knowledge synthesis and systematic literature review All submitted papers must be original and will go through a rigorous peer-review process with at least two reviewers. Papers previously published in conference proceedings will not be considered. JBI’s editorial policy will be strictly followed by special issue reviewers. Note in particular that JBI emphasizes the publication of papers that introduce innovative and generalizable methods of interest to the informatics community. Specific applications can be described to motivate the methodology being introduced, but papers that focus solely on a specific application are not suitable for JBI. *Submission Guidelines* Authors must submit their papers via the online Editorial Manager (EES) at <http://ees.elsevier.com/jbi>https://www.editorialmanager.com/jbi <https://ees.elsevier.com/jbi>. Authors should select “Semantics-enabled Biomedical Literature Analytics” as their submission category and note in a cover letter that their submission is for the “*Special Issue on Semantics-enabled Biomedical Literature Analytics.*” If the manuscript is not intended as an original research paper, the cover letter should also specify if it is, rather, a *Methodological Review, Commentary, or Special Communication*. Authors should make sure to place their work in the context of human-focused biomedical research or health care, and to review carefully the relevant literature. JBI’s editorial policy, and the types of articles that the journal publishes, are outlined under *Aims and Scope *on the journal home page at https://www.sciencedirect.com/journal/journal-of-biomedical-informatics <https://www.journals.elsevier.com/journal-of-biomedical-informatics>(click on “View full Aims and Scope” for details). All submissions should follow the guidelines for authors at <https://www.elsevier.com/journals/journal-ofbiomedical-%20informatics/1532-…>*https://www.elsevier.com/journals/journal-ofbiomedical- informatics/1532-0464/guide-for-authors <https://www.elsevier.com/journals/journal-ofbiomedical-%20informatics/1532-…>*, including format and manuscript structure. *Important Dates* Deadline for submissions: November 15, 2022 First-round review decisions: January 15, 2023 Deadline for revision submissions: February 15, 2023 Notification of final decisions: April 15, 2023 The full Call for Papers is available at https://doi.org/10.1016/j.jbi.2022.104134. Please direct any questions regarding the special issue to Dr. Halil Kilicoglu (halil(a)illinois.edu). *Guest Editors:* Halil Kilicoglu (University of Illinois Urbana-Champaign, halil(a)illinois.edu ) Faezeh Ensan (Ryerson University, fensan(a)ryerson.ca) Bridget McInnes (Virginia Commonwealth University, bmtinnes(a)vcu.edu) Lucy Lu Wang (University of Washington/Allen Institute for AI, lucylw(a)uw.edu ) --Halil *HALIL KILICOGLU* *Associate Professor* School of Information Sciences University of Illinois at Urbana-Champaign halil(a)illinois.edu https://ischool.illinois.edu/people/halil-kilicoglu

1 2

[Corpora-List][CfP] First Workshop on Information Extraction from Scientific Publications (WIESP) at AACL-IJCNLP 2022
by Tirthankar Ghosal 25 Aug '22

25 Aug '22

*** First Workshop on Information Extraction from Scientific Publications ( WIESP) at AACL-IJCNLP 2022 *** *** Website: https://ui.adsabs.harvard.edu/WIESP/ *** Twitter: https://twitter.com/wiesp_nlp The number of scientific papers published per year has exploded in recent years. Indexing the article's full text in search engines helps discover and retrieve vital scientific information to continue building on the shoulders of giants, informing policy, and making evidence-based decisions. Nevertheless, it is difficult to navigate this ocean of data. Using simple string matching has substantial limitations: human language is ambiguous in nature, context matters, and we frequently use the same word and acronyms to represent a multitude of different meanings. Extracting structured and semantically relevant information from scientific publications (e.g., named-entity recognition, summarization, citation intention, linkage to knowledge graphs) allows for better selection and filter articles. The First Workshop on Information Extraction from Scientific Publications ( WIESP) will create the necessary forum to foster discussion and research using Natural Language Processing and Machine Learning. WIESP would specifically focus on topics related to information extraction from scientific publications, including (but not limited to): - Scientific document parsing - Scientific named-entity recognition - Scientific article summarization - Question-answering on scientific articles - Citation context/span extraction - Structured information extraction from full-text, tables, figures, bibliography - Novel datasets curated from scientific publications - Argument extraction and mining - Challenges in information extraction from scientific articles - Building knowledge graphs via mining scientific literature; querying scientific knowledge graphs - Novel tools for IE on scientific literature and interaction with users - Mathematical information extraction - Scientific concepts, facts extraction - Visualizing scientific knowledge - Bibliometric and Altmetric studies via information extraction from scientific articles and metadata - Information extraction from COVID-19 articles to inform public health policy In addition to research paper presentations, WIESP would also feature keynote talks, a panel discussion, and a shared task. We will update the details on our website as and when they become available. We especially welcome participation from academic and research institutions, government and industry labs, publishers, and information service providers. Projects and organizations using NLP/ML techniques in their text mining and enrichment efforts are also welcome to participate. ***Call for Papers*** We invite papers of the following categories: ***Long papers*** must describe substantial, original, completed, and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included. Papers must not exceed eight (8) pages of content, plus unlimited pages of references. The final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account. ***Short papers*** must describe original and unpublished work. Please note that a short paper is not a shortened long paper. Instead, short papers should have a point that can be made in a few pages, such as a small, focused contribution, a negative result, or an interesting application nugget. Short papers must not exceed four (4) pages, plus unlimited pages of references. The final versions of short papers will be given one additional page of content (up to 5 pages) so that reviewers' comments can be taken into account. ***Position papers*** will give voice to authors who wish to take a position on a topic listed above or the field of scholarly information extraction. Submissions need not present original work and should be two to four pages in length, including title, text, figures and tables, and references. ***Demo papers*** should be no more than four (4) pages in length, including references, and should describe implemented systems that are of relevance to the theme of the workshop. Authors of demo papers should be willing to present a demo of their system during WIESP at AACL-IJCNLP 2022. ***Extended Abstracts*** We welcome submissions of extended abstracts (2 pages max) related to the research topics mentioned above. Submissions may include previously published results, late-breaking results, or a description of ongoing projects in the broad field of information extraction and mining from scientific publications. Extended abstracts can also summarize existing work, work in progress, or a collection of works under a unified theme (e.g., a series of closely related papers that build on each other or tackle a common problem). ***Shared Task: Detecting Entities in the Astrophysics Literature (DEAL)*** A good amount of astrophysics research makes use of data coming from missions and facilities such as ground observatories in remote locations or space telescopes, as well as digital archives that hold large amounts of observed and simulated data. These missions and facilities are frequently named after historical figures or use some ingenious acronym which, unfortunately, can be easily confused when searching for them in the literature via simple string matching. For instance, Planck can refer to the person, the mission, the constant, or several institutions. Automatically recognizing entities such as missions or facilities would help tackle this word sense disambiguation problem. The shared task consists of Named Entity Recognition (NER) on samples of text extracted from astrophysics publications. The labels were created by domain experts and designed to identify entities of interest to the astrophysics community. They range from simple to detect (ex: URLs) to highly unstructured (ex: Formula), and from useful to researchers (ex: Telescope) to more useful to archivists and administrators (ex: Grant). Overall, 31 different labels are included, and their distribution is highly unbalanced (ex: ~100x more Citations than Proposals). Submissions will be scored using both the CoNLL-2000 shared task seqeval F1-Score at the entity level and scikit-learn's Matthews correlation coefficient method at the token level. We also encourage authors to propose their own evaluation metrics. A sample dataset and more instructions can be found at: https://ui.adsabs.harvard.edu/WIESP/2022/SharedTasks Participants (individuals or groups) will have the opportunity to present their findings during the workshop and write a short paper. The best performant or interesting approaches might be invited to further collaborate with the NASA Astrophysical Data System ( https://ui.adsabs.harvard.edu/). ***Important Dates*** - Paper/Abstract Submission Deadline: August 25, 2022 - Notification of workshop paper/abstract acceptance: September 25, 2022 - Camera-ready Submission Deadline: October 10, 2022 - Workshop: November 20, 2021 (online) ***All submission deadlines are 11.59 pm UTC -12h ("Anywhere on Earth")*** ***Submission Website and Format*** Submission Link: softconf.com/aacl2022/WIESP Submission will be via softconf. Submissions should follow the ACLPUB formatting guidelines (https://acl-org.github.io/ACLPUB/formatting.html) and template files (https://github.com/acl-org/acl-style-files/tree/master). Submissions (Long and Short Papers) will be subject to a double-blind peer-review process. Position papers, Demo papers, and Extended Abstracts need not be anonymized. The authors will present accepted papers at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings. We follow the same policies as AACL-IJCNLP 2022 regarding preprints and double submissions. The anonymity period for WIESP 2022 is from July 15 to September 25. ***Organizers*** - Tirthankar Ghosal, Charles University, CZ - Sergi Blanco-Cuaresma, Center for Astrophysics | Harvard & Smithsonian, USA - Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA - Robert M. Patton, Oak Ridge National Laboratory, USA - Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA - Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA -- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Tirthankar Ghosal Researcher at UFAL, Charles University, CZ https://member.acm.org/~tghosal +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

1 2

[Corpora-List]CfP: Special issue on Cumulative knowledge building and replication in Learner Corpus Research
by Tove E Larsson 23 Aug '22

23 Aug '22

Call for papers for the International Journal of Learner Corpus Research (Benjamins): Special issue on Cumulative knowledge building and replication in Learner Corpus Research Guest editors: Tove Larsson & Doug Biber (Northern Arizona University) Compared to other subfields of linguistics, Learner Corpus Research (LCR) has a relatively short history. For this and other reasons, most of the studies that get published in the field are exploratory in nature and focus on topics that have yet to receive prolonged attention. Such studies no doubt make valuable contributions to the field. However, LCR is arguably mature enough as a field to also have accumulated enough knowledge on certain topics for researchers to be able to instead adopt a cumulative approach. In the cumulative approach to knowledge building, individual studies are viewed as building blocks, carefully pieced together to help us form an increasingly better understanding of a topic. There are three distinguishing characteristics of this approach: First, the literature review focuses on what we have actually learned from previous research on the topic, rather than merely cataloging individual studies. Second, the research ‘gap’ refers to an important missing element in our cumulative knowledge, rather than to a research angle that has not been explored yet; that is, the literature review is used to identify a missing piece in an existing puzzle, rather than to justify starting a new one. And finally, results of the new study are explicitly compared to previous findings, to discuss the state of our knowledge based on all studies taken together. Through this big-picture thinking, we can collectively refine our understanding of the topic, and further our knowledge in a systematic matter. Put differently, this approach enables us to build a state-of-the-art in the field by moving beyond the results of individual studies. With this call, we invite studies of two kinds: * Empirical studies that set out to test hypotheses arrived at from an existing body of research with the explicit aim of adding to our knowledge on a given topic that has received ample attention in LCR. Examples of topics that may be ripe for studies of this kind include, but are not limited to, linguistic complexity and the formulaic nature of learner language. * Empirical studies that replicate findings from an existing body of research and, importantly, that focus on strengthening and/or tweaking existing generalizations in LCR. Examples of topics include, but are not limited to, claims of the spoken-like nature of learners’ written production. Timeline: * August 1, 2022: Abstract and title due * September 1, 2022: Authors are notified * September 1, 2023: Full manuscript due Please send submissions to tove.larsson(a)nau.edu<mailto:tove.larsson@nau.edu> --- Tove Larsson, Ph.D. Assistant Professor of Applied Linguistics English Department Northern Arizona University https://tovelarssoncl.wordpress.com

3 2

Invitation of paper submissions to special issue "Mathematical and Computational Modeling of Language and Social Behaviors” in Mathematics (IF=2.592, Q1)
by WAN, Mingyu [CBS] 18 Aug '22

18 Aug '22

Dear All, We are the guest editors of the special issue “Mathematical and Computational Modeling of Language and Social Behaviors” in Mathematics<https://www.mdpi.com/journal/mathematics> (IF=2.592, Q1). We would like to call for papers to the above special issue from people whose research interest include computational linguistics and the related areas. Deadline for manuscript submissions: 30 June 2023. The aim of the special issue is to highlight the contributions of quantitative modeling and NLP technology to understanding collective human behaviors and to help resolve some of the greatest challenges of our time. We welcome new or improved methods to model linked data from heterogeneous sources and their computational application to solve some real-world problems relating to languages and social behaviors. Topics of interest include such as Sentiment and/or Emotion Analysis, fake news detection, FinNLP and Medical Informatics. Check the details about the special issue through the link: https://www.mdpi.com/si/mathematics/Mathe_Compu_NLP We look forward to your submissions and contribution to this special issue. Thank you very much! Best, Clara (on behalf of the Guest Editors) [https://www.polyu.edu.hk/emaildisclaimer/85A-PolyU_Email_Signature.jpg] Disclaimer: This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful. The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information.

2 1

[OM-2022] 2nd CFP: 17th workshop on Ontology Matching collocated with ISWC
by Cassia TROJAHN 04 Aug '22

04 Aug '22

** With apologies for multiple posting ** The Seventeenth International Workshop on ONTOLOGY MATCHING (OM-2022) http://om2022.ontologymatching.org/ October 23rd or 24th, 2022, International Semantic Web Conference (ISWC) Workshop Program, Hybrid conference, Hangzhou, China BRIEF DESCRIPTION AND OBJECTIVES Ontology matching is a key interoperability enabler for the Semantic Web, as well as a useful technique in some classical data integration tasks dealing with the semantic heterogeneity problem. It takes ontologies as input and determines as output an alignment, that is, a set of correspondences between the semantically related entities of those ontologies. These correspondences can be used for various tasks, such as ontology merging, data interlinking, query answering or navigation over knowledge graphs. Thus, matching ontologies enables the knowledge and data expressed with the matched ontologies to interoperate. The workshop has three goals: 1. To bring together leaders from academia, industry and user institutions to assess how academic advances are addressing real-world requirements. The workshop will strive to improve academic awareness of industrial and final user needs, and therefore, direct research towards those needs. Simultaneously, the workshop will serve to inform industry and user representatives about existing research efforts that may meet their requirements. The workshop will also investigate how the ontology matching technology is going to evolve, especially with respect to data interlinking, knowledge graph and web table matching tasks. 2. To conduct an extensive and rigorous evaluation of ontology matching and instance matching (link discovery) approaches through the OAEI (Ontology Alignment Evaluation Initiative) 2022 campaign: http://oaei.ontologymatching.org/2022/ 3. To examine similarities and differences from other, old, new and emerging, techniques and usages, such as web table matching or knowledge embeddings. TOPICS of interest include but are not limited to: Business and use cases for matching (e.g., big, open, closed data); Requirements to matching from specific application scenarios (e.g., public sector); Application of matching techniques in real-world scenarios (e.g., in cloud, with mobile apps); Formal foundations and frameworks for matching; Novel matching methods, including link prediction, ontology-based access; Matching and knowledge graphs; Matching and deep learning; Matching and embeddings; Matching and big data; Matching and linked data; Instance matching, data interlinking and relations between them; Privacy-aware matching; Process model matching; Large-scale and efficient matching techniques; Matcher selection, combination and tuning; User involvement (including both technical and organizational aspects); Explanations in matching; Social and collaborative matching; Uncertainty in matching; Expressive alignments; Reasoning with alignments; Alignment coherence and debugging; Alignment management; Matching for traditional applications (e.g., data science); Matching for emerging applications (e.g., web tables, knowledge graphs). SUBMISSIONS Contributions to the workshop can be made in terms of technical papers and posters/statements of interest addressing different issues of ontology matching as well as participating in the OAEI 2022 campaign. Long technical papers should be of max. 12 pages. Short technical papers should be of max. 5 pages. Posters/statements of interest should not exceed 2 pages. All contributions have to be prepared using the LNCS Style: http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0 and should be submitted in PDF format (no later than August 9th, 2022) through the workshop submission site at: https://www.easychair.org/conferences/?conf=om2022 Contributors to the OAEI 2022 campaign have to follow the campaign conditions and schedule at http://oaei.ontologymatching.org/2022/. DATES FOR TECHNICAL PAPERS AND POSTERS: August 9th, 2022: Deadline for the submission of papers. September 6th, 2022: Deadline for the notification of acceptance/rejection. September 20th, 2022: Workshop camera ready copy submission. October 23rd or 24th, 2022: OM-2022, hybrid conference, Hangzhou, China. Contributions will be refereed by the Program Committee. Accepted papers will be published in the workshop proceedings as a volume of CEUR-WS as well as indexed on DBLP. ORGANIZING COMMITTEE 1. Pavel Shvaiko (main contact) Trentino Digitale, Italy 2. Jérôme Euzenat INRIA & Univ. Grenoble Alpes, France 3. Ernesto Jiménez-Ruiz City, University of London, UK & SIRIUS, University of Oslo, Norway 4. Oktie Hassanzadeh IBM Research, USA 5. Cássia Trojahn IRIT, France PROGRAM COMMITTEE (to be completed): Alsayed Algergawy, Jena University, Germany Manuel Atencia, Universidad de Málaga, Spain Jiaoyan Chen, University of Oxford, UK Jérôme David, University Grenoble Alpes & INRIA, France Gayo Diallo, University of Bordeaux, France Daniel Faria, Instituto Gulbenkian de Ciéncia, Portugal Alfio Ferrara, University of Milan, Italy Marko Gulic, University of Rijeka, Croatia Wei Hu, Nanjing University, China Ryutaro Ichise, National Institute of Informatics, Japan Antoine Isaac, Vrije Universiteit Amsterdam & Europeana, Netherlands Naouel Karam, Fraunhofer, Germany Prodromos Kolyvakis, EPFL, Switzerland Patrick Lambrix, Linköpings Universitet, Sweden Oliver Lehmberg, University of Mannheim, Germany Fiona McNeill, University of Edinburgh, UK Majid Mohammadi, Eindhoven University of Technology, Netherlands Hoa Ngo, CSIRO, Australia George Papadakis, University of Athens, Greece Henry Rosales-Méndez, University of Chile, Chile Booma Sowkarthiga, Microsoft, USA Kavitha Srinivas, IBM, USA Ludger van Elst, DFKI, Germany Xingsi Xue, Fujian University of Technology, China Ondrej Zamazal, Prague University of Economics, Czech Republic Songmao Zhang, Chinese Academy of Science, China Lu Zhou, TigerGraph, USA --------------------- Best regards, Cassia

1 2

CFP - The Sixth Widening Natural Language Processing Workshop (WiNLP)
by haddad.hatem＠gmail.com 31 Jul '22

31 Jul '22

WiNLP 2022 – Call for Submissions http://www.winlp.org/winlp22-call-for-papers/ Workshop Date December 7th or 8th, 2022 (date TBD) EMNLP 2022 Abu Dhabi, UAE The Sixth Widening Natural Language Processing Workshop (WiNLP) will be held in conjunction with EMNLP 2022 in Abu Dhabi, UAE. Since EMNLP is anticipating a hybrid format for their conference, we also anticipate our workshop will be hybrid, with both online and in-person attendees. The one-day workshop will occur during EMNLP’s workshop period either December 7th or 8th, 2022 (date TBD). We invite authors from underrepresented groups in Natural Language Processing (NLP) to submit a two-page abstract to be considered for a poster presentation at our workshop. Important Dates: Last date to join author workshopping: August 24, 2022 Submission deadline: September 7, 2022 Acceptance notification: October 9, 2022 Travel grant applications due: October 21, 2022 Travel grant notification: October 25, 2022 Workshop Description The WiNLP workshop is open to all to foster an inclusive and welcoming ACL environment. It aims to promote diversity and highlight the work of underrepresented groups in NLP: anyone who self-identifies within an underrepresented group [based on gender, ethnicity, nationality, sexual orientation, disability status, or otherwise] is invited to submit a two-page abstract for a poster presentation. In our 2022 iteration, we hope to be more intentional about centering discussions of access and disability, as well as contributing to diversity in scientific background, discipline, training, obtained degrees, seniority, and communities from underrepresented languages. The full-day event includes invited talks, oral presentations, and poster sessions. The workshop provides an excellent opportunity for junior members in the community to showcase their work and connect with senior mentors for feedback and career advice. It also offers recruitment opportunities with leading industrial labs. Most importantly, the workshop will provide an inclusive and accepting space, and work to lower structural barriers to joining and collaborating with the NLP community at large. Submission guidelines While everyone is encouraged to attend, the opportunity to present a talk or poster is intended for members of underrepresented groups at all career levels: students, post-docs, professors, and other researchers. Since many submissions are works in progress, we act as a non-archival repository for these works: while authors may elect to have their papers linked from our website, they will not be archived in the ACL Anthology. Authors may elect to not have their submission listed on the website if they wish to avoid de-anonymizing themselves for later submissions to other venues. Submissions should be two pages long (not including references). Authors must use the ACL Rolling Review style files to format their submission, and must submit it electronically in PDF format via the WiNLP 2022 online submission system: https://softconf.com/emnlp2022/WiNLP22/.https://softconf.com/emnlp2022/WiNL… Travel Support There will be a limited amount of travel grants and/or additional funding to cover expenses, similar to the previous editions. Funding is available for travel, lodging, registration, and visa costs for one author for each submission. The funded author may elect to attend virtually if they prefer. The selected author should be identified as part of the travel grant submission form. If we find ourselves with extra funds, we will attempt to support further funding for virtual attendance for additional authors, but we do not guarantee we can support any further in-person attendance. We recommend additional student authors keep an eye out for the EMNLP call for student volunteers or call for D&I subsidies as opportunities for further funding. For further details please visit our website: http://www.winlp.org/winlp22-call-for-papers/http://www.winlp.org/winlp22-c…

1 0

[OM-2022] Final CFP: 17th wshop on Ontology Matching collocated with ISWC: the submission deadline is approaching on Aug. 9th
by Cassia TROJAHN 29 Jul '22

29 Jul '22

The Seventeenth International Workshop on ONTOLOGY MATCHING (OM-2022) http://om2022.ontologymatching.org/ October 23rd or 24th, 2022, International Semantic Web Conference (ISWC) Workshop Program, Hybrid conference, Hangzhou, China ===================================================================== The submission deadline for tech. papers is approaching in 2 weeks on Aug. 9th: https://www.easychair.org/conferences/?conf=om2022 ===================================================================== BRIEF DESCRIPTION AND OBJECTIVES Ontology matching is a key interoperability enabler for the Semantic Web, as well as a useful technique in some classical data integration tasks dealing with the semantic heterogeneity problem. It takes ontologies as input and determines as output an alignment, that is, a set of correspondences between the semantically related entities of those ontologies. These correspondences can be used for various tasks, such as ontology merging, data interlinking, query answering or navigation over knowledge graphs. Thus, matching ontologies enables the knowledge and data expressed with the matched ontologies to interoperate. The workshop has three goals: 1. To bring together leaders from academia, industry and user institutions to assess how academic advances are addressing real-world requirements. The workshop will strive to improve academic awareness of industrial and final user needs, and therefore, direct research towards those needs. Simultaneously, the workshop will serve to inform industry and user representatives about existing research efforts that may meet their requirements. The workshop will also investigate how the ontology matching technology is going to evolve, especially with respect to data interlinking, knowledge graph and web table matching tasks. 2. To conduct an extensive and rigorous evaluation of ontology matching and instance matching (link discovery) approaches through the OAEI (Ontology Alignment Evaluation Initiative) 2022 campaign: http://oaei.ontologymatching.org/2022/ 3. To examine similarities and differences from other, old, new and emerging, techniques and usages, such as web table matching or knowledge embeddings. TOPICS of interest include but are not limited to: Business and use cases for matching (e.g., big, open, closed data); Requirements to matching from specific application scenarios (e.g., public sector); Application of matching techniques in real-world scenarios (e.g., in cloud, with mobile apps); Formal foundations and frameworks for matching; Novel matching methods, including link prediction, ontology-based access; Matching and knowledge graphs; Matching and deep learning; Matching and embeddings; Matching and big data; Matching and linked data; Instance matching, data interlinking and relations between them; Privacy-aware matching; Process model matching; Large-scale and efficient matching techniques; Matcher selection, combination and tuning; User involvement (including both technical and organizational aspects); Explanations in matching; Social and collaborative matching; Uncertainty in matching; Expressive alignments; Reasoning with alignments; Alignment coherence and debugging; Alignment management; Matching for traditional applications (e.g., data science); Matching for emerging applications (e.g., web tables, knowledge graphs). SUBMISSIONS Contributions to the workshop can be made in terms of technical papers and posters/statements of interest addressing different issues of ontology matching as well as participating in the OAEI 2022 campaign. Long technical papers should be of max. 12 pages. Short technical papers should be of max. 5 pages. Posters/statements of interest should not exceed 2 pages. All contributions have to be prepared using the LNCS Style: http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0 and should be submitted in PDF format (no later than August 9th, 2022) through the workshop submission site at: https://www.easychair.org/conferences/?conf=om2022 Contributors to the OAEI 2022 campaign have to follow the campaign conditions and schedule at http://oaei.ontologymatching.org/2022/. DATES FOR TECHNICAL PAPERS AND POSTERS: August 9th, 2022: Deadline for the submission of papers. September 6th, 2022: Deadline for the notification of acceptance/rejection. September 20th, 2022: Workshop camera ready copy submission. October 23rd or 24th, 2022: OM-2022, hybrid conference, Hangzhou, China. Contributions will be refereed by the Program Committee. Accepted papers will be published in the workshop proceedings as a volume of CEUR-WS as well as indexed on DBLP. ORGANIZING COMMITTEE 1. Pavel Shvaiko (main contact) Trentino Digitale, Italy 2. Jérôme Euzenat INRIA & Univ. Grenoble Alpes, France 3. Ernesto Jiménez-Ruiz City, University of London, UK & SIRIUS, University of Oslo, Norway 4. Oktie Hassanzadeh IBM Research, USA 5. Cássia Trojahn IRIT, France PROGRAM COMMITTEE: Alsayed Algergawy, Jena University, Germany Manuel Atencia, Universidad de Málaga, Spain Jiaoyan Chen, University of Oxford, UK Jérôme David, University Grenoble Alpes & INRIA, France Gayo Diallo, University of Bordeaux, France Daniel Faria, Instituto Gulbenkian de Ciéncia, Portugal Alfio Ferrara, University of Milan, Italy Marko Gulic, University of Rijeka, Croatia Wei Hu, Nanjing University, China Ryutaro Ichise, National Institute of Informatics, Japan Antoine Isaac, Vrije Universiteit Amsterdam & Europeana, Netherlands Naouel Karam, Fraunhofer, Germany Prodromos Kolyvakis, EPFL, Switzerland Patrick Lambrix, Linköpings Universitet, Sweden Oliver Lehmberg, University of Mannheim, Germany Fiona McNeill, University of Edinburgh, UK Majid Mohammadi, Eindhoven University of Technology, Netherlands Hoa Ngo, CSIRO, Australia George Papadakis, University of Athens, Greece Henry Rosales-Méndez, University of Chile, Chile Booma Sowkarthiga, Microsoft, USA Kavitha Srinivas, IBM, USA Ludger van Elst, DFKI, Germany Xingsi Xue, Fujian University of Technology, China Ondrej Zamazal, Prague University of Economics, Czech Republic Songmao Zhang, Chinese Academy of Science, China Lu Zhou, TigerGraph, USA ------------------------------------------------------- More about ontology matching: http://www.ontologymatching.org/ http://book.ontologymatching.org/ ------------------------------------------------------- Best regards Cassia Trojahn

1 0

2026

2025

2024

2023

2022

Corpora July 2022