MATCHING WORKSHOP @ ACL 2023
https://megagon.ai/matching-2023/
*I**mportant Dates*
direct submission via OpenReview: April 24, 2023
submission via ARR - commitment deadline: May 8th, 2023
Notification of Acceptance: May 22, 2023
Camera-ready papers due: June 6, 2023
Pre-recorded video due: June 12, 2023
MATCHING Workshop @ACL2023 (Toronto and remote): July 13
Matching Entities from structured and unstructured sources is an important
task in many domains and applications such as HR and E-commerce. For
example, in HR platforms/services, it is important to match resumes to job
descriptions and job seekers to companies. Similarly in web
platforms/services, it is important to match customers to businesses such
as hotels and restaurant, among others. In such domains, it is also
relevant to match “textual customer reviews” to customers queries, and
sentences (or phrases) as answers to customer questions. Recent advances in
Natural Language Processing, Natural Language Understanding, Conversational
AI, Language Generation, Machine Learning, Deep Learning, Data Management,
Information Extraction, Knowledge Bases/Graphs, (MultiSingle
Hop/Commonsense) Inference/Reasoning, Recommendation Systems, and others,
have demonstrated promising results in different Matching tasks related
(but not limited) to the previously mentioned domains. We believe that
there is tremendous opportunity to further exploit and explore the use of
advanced NLP (and language related) techniques applied to Matching tasks.
Therefore, the goal of this workshop is to bring together the research
communities (from academia and industry) of these related areas, that are
interested in the development and the application of novel
natural-language-based approaches/models/systems to address challenges
around different Matching tasks.
*Confirmed Invited Speakers*
Alan Ritter - Georgia Tech
Ndapa Nakashole - UC San Diego
William Cohen - Google
We invite submissions of long and short papers on original and unpublished
work that address challenges around the specific task of matching
information from heterogeneous sources spanning structured (e.g.,
databases, resume) and unstructured (e.g., online reviews, job
advertisements, social media posts) data. The topics include but are not
limited to:
* Entity Alignment;
* Entity Matching;
* Entity Linking;
* Language-Model-based Matching;
* Data-driven Matching;
* Knowledge-graph-based matching;
* Rule-based Matching;
* Temporal Information Matching;
* Human-Centric AI for Matching;
* Trust, Explainability, and Fairness in Matching;
* Information extraction from user-generated text;
* Robustness to noise
* Matching applications: recommendations, question answering, fact
checking, news and social media content identification/suggestion, etc.
All machine learning, text mining, and natural language processing
techniques are welcome. All regular papers and short papers should follow
the ACL 2023 style guidelines:
“Both long and short papers must follow the ACL 2023 two-column format,
using the supplied official style files. The templates can be downloaded in
Style Files and Formatting. Please do not modify these style files, nor
should you use templates designed for other conferences. Submissions that
do not conform to the required styles, including paper size, margin width,
and font size restrictions, will be rejected without review.”
The maximum length of a regular paper is 8 pages plus an unlimited number
of pages for references. The maximum length of a short paper is 4 pages
plus an unlimited number of pages for references. At least one author of
every accepted paper is expected to attend the workshop.
MATCHING is using a hybrid submission process, Authors can submit their
papers using the OpenReview platform
(
https://openreview.net/group?id=aclweb.org/ACL/2023/Workshop/MATCHING&refer…).
Alternatively, authors can also commit papers and reviews from ARR. We
allow parallel commitment to the EACL 2023 conference and our workshop,
with the requirement that if the paper is accepted at EACL 2023, it will be
withdrawn from archival publication at the workshop. We ask the authors to
notify MATCHING if they have also committed to EACL or other workshops.
All accepted short and long papers must be presented as a talk/poster/demo
at the workshop, depending on the workshop schedule. At least one author of
each accepted paper must register for ACL 2023 and attend the workshop.
Mentorship Program
We will be hosting a mentorship program to facilitate exchange between
authors and experts working in areas relevant to the workshop. The goal of
this program is to foster collaborations and increase the quality and
impact of the submissions. Mentors are expected to guide mentees during the
mentorship period as they prepare submissions for this workshop. Their
interactions may include an in-depth discussion of related work to ensure
submissions are well contextualized, discussions on writing and
presentation of the submission, and discussions about possible extensions
of the submissions. Mentees are expected to prepare the submissions by
February 28, 2023 and contact their assigned mentor. Mentors and mentees
are encouraged to dedicate at least 4hrs over the course of the program to
maximize the benefits of the program. They can meet virtually within the
first week after the mentor-mentee matching is made and set expectations
for subsequent meetings. Their efforts should culminate in a final version
of the paper that should be submitted by the deadline.
Applications to the mentorship program are due February 28, 2023
Application to be a mentor/mentee: Apply Here:
*Organizers:*
Estevam Hruschka - Megagon Labs
Tom Mitchell - Carnegie Mellon University
Dunia Mladenić - Jozef Stefan Institute (JSI)
Marko Grobelnik - Jozef Stefan Institute (JSI)
Sajjadur Rahman - Megagon Labs
*Contact*
If you have any questions or inquiries regarding the workshop or need
further information, please do not hesitate to send an email to
matching-workshop(a)megagon.ai
--
Estevam Hruschka
Lab Director and Staff Research Scientist
Megagon Labs - www.megagon.ai
--
Sent from my iPhone
The Third Workshop on NLP for Indigenous Languages of the Americas
(AmericasNLP 2023)
First Call for Papers
The Third Workshop on NLP for Indigenous Languages of the Americas
(AmericasNLP) will be co-located with the 61st Annual Meeting of the
Association for Computational Linguistics (ACL 2023
<https://2023.aclweb.org/>), which is scheduled to be held in Toronto,
Canada, between July 9-14, 2023.
The goal of the workshop is to encourage and increase the visibility of
work on the Indigenous languages of the Americas. It aims to encourage
research on NLP, computational linguistics, corpus linguistics and speech
for Indigenous languages, to connect researchers and professionals from
underrepresented communities and native speakers of endangered languages
with the ACL community, and, more generally, to promote machine learning
approaches suitable for low-resource languages.
We invite the submission of
-
Long papers (8 pages) and short papers (4 pages) on substantial,
original, and unpublished research
-
Non-archival extended abstracts (2 pages), technical reports (8 pages),
and work which has been presented at other venues (in the format of the
original publication)
Submissions do not need to describe work on native languages directly, as
long as it is clear why those can benefit from the described approaches.
Areas of interest include but are not limited to:
-
Creation of datasets for NLP applications
-
Incorporation of external knowledge into neural systems
-
Linguistic typology and the use of typological features for NLP
-
Transfer learning, meta-learning, and active learning
-
Weakly supervised, semi-supervised, and unsupervised learning
-
Machine translation of low-resource languages
-
Morphology and phonology of low-resource languages
-
NLP applications for Indigenous languages of the Americas
Important dates:
-
Start of the anonymity period: March 15, 2023
-
Submission deadline: April 15, 2023
-
Notification of acceptance: May 15, 2023
-
Camera ready papers due: May 26, 2023
-
Workshop: July 14, 2023
All deadlines are 11.59 pm UTC -12h (anywhere on earth).
Link to submission portal:
https://softconf.com/acl2023/AmericasNLP2023/
The workshop also includes:
-
An open-ended machine translation shared task on truly low-resource
languages
-
A mentoring program to support students and newcomers from
underrepresented communities (application form:
https://forms.gle/afBWauDfDQijXHTy9)
We also have a diverse set of invited speakers, focused on bridging the gap
between linguists, NLP, and machine learning research!
-
Steven Bird (linguistics; ethics)
-
Emiliana Cruz Cruz (linguistics; anthropology; education)
-
Angela Fan (NLP; machine translation)
-
Kristine Stenzel (field linguistics; American Indigenous languages)
Organizing Committee
-
Manuel Mager, AWS AI Labs
-
Arturo Oncevay, University of Edinburgh
-
Enora Rice, University of Colorado Boulder
-
Abteen Ebrahimi, University of Colorado Boulder
-
Shruti Rijhwani, Google Research
-
Alexis Palmer, University of Colorado Boulder
-
Katharina Kann, University of Colorado Boulder
More information and contact information can be found at
http://turing.iimas.unam.mx/americasnlp/.
--
Dr. Katharina Kann
Assistant Professor of Computer Science
University of Colorado Boulder
Personal page: https://kelina.github.io
Group page: https://nala-cub.github.io
4th Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech2023)
================================================================================
The PatentSemTech2023 workshop aims to establish a long-term collaboration and a
two-way communication channel between the IP industry and academia from relevant
fields such as natural language processing (NLP), text and data mining (TDM), and
semantic technologies (ST) in order to explore and transfer new knowledge, methods,
and technologies for the benefit of industrial applications as well as support
research in applied sciences for the IP and neighbouring domains.
Call for Contributions
======================
PatentSemTech2023 will be held as a full-day event in conjunction with SIGIR 2023.
Workshop website: http://ifs.tuwien.ac.at/patentsemtech/
Important Dates
===============
Submission deadline: April 25, 2023
Notification: May 23, 2023
SIGIR PatentSemTech2023 workshop: July 27, 2023
Topics of Interest
==================
We encourage submissions of high quality research papers on all topics related
to the IP domain. Topics of interest include (but are not limited to):
* Text mining and retrieval from patents, legal documents, or other
scientific-technical information sources
* Machine learning methods, in particular deep learning methods for
- Representation learning (word and document embeddings)
- Query expansion
- Clustering and classification
- Recommendation
- IPC/CPC prediction
- Trend detection
- Entity extraction
* Semantic approaches for
- Linking semantic information
- Integrating external knowledge sources
- Semantic enrichment
* Methods and applications for retrieving, mining, and analysing, including
- Patent landscaping
- Hot spot / White spot analysis
- Multi-modal analysis
- Technology trend analysis
- Innovative user interfaces
- Visual user interface concepts
Contributions
=============
We solicit two types of submissions: full papers and short papers for three tracks: research, demo, and summarization task. Full papers will be limited to 8 pages (including references); short papers will be 4 pages (including references).
The submissions will be peer-reviewed by at least two program committee members and evaluated based on innovativeness, novelty, interestingness, and impact.
We plan for three tracks:
*Research Track*
For this track, we solicit contributions from academia that present
* Novel applications of existing state of the art methods for the IP domain
* Novel methods or tasks in the IP domain
* Novel user interfaces for the IP domain
* Novel evaluation or analysis insights in the IP domain
* Novel benchmark datasets or other resources of interest
* A survey or overview related to a particular task in the IP domain
*Demo/System Track*
We solicit demos, case study, insights, or novel ideas from industry that present
* Focused case studies making use of semantic technologies or machine learning
* Interesting IP-related task descriptions or best practices for patent analysis
* In-use systems or prototype implementations of semantic technologies
* Demos on processing or analysing data from the IP domain, or user interfaces
* In-use resources related to patents or external resources, e.g., linked open data.
*Summarization Task Track*
Within the patent text mining community, especially from the industry, there is an interest in developing text mining tools targeting text summarization.
* Participants are free to use publicly available data sets to train their models. We recommend exploring US Patents, which many contain the text section SUMMARY OF THE INVENTION.
* We will also publish a small training and test data set on the 23rd of February. The provided data set is composed of patents within the field of Green Plastics Technology.
* Participants are asked to submit a short (4 pages) scientific paper, which will be peer-reviewed by the workshop organizers. The most interesting submissions will be invited to present their solution at the workshop.
* Furthermore, we will have an additional interactive evaluation to reflect a more real-life scenario at the workshop, making it possible to evaluate not only the performance in terms of F1, ROUGE, recall, precision etc., but also efficiency. Therefore the invited participants will be asked to set-up their solutions as a service and provide a REST API. Input will be a patent document (PDF,DOCX), and output should be a summary of not more than 700 words.
Submission Guidelines
=====================
Submissions must be in English, in PDF, and in the current ACM two-column conference
format. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website:
https://www.acm.org/publications/proceedings-template ("sigconf" template for LaTeX;
Interim Template for Word). Submissions should be at most 8 (full) or 4 (short) pages (including figures and references) in length. Submissions should be submitted electronically via EasyChair:
https://www.easychair.org/conferences/?conf=patentsemtech2023.
At least one author of each accepted paper is required to register for,
and present the work in person at the workshop.
Publication
===========
Accepted papers will be published as CEUR proceedings. Selected contributions
will be invited to submit extended, full papers to Elsevier's World Patent
Information (WPI) journal: https://www.journals.elsevier.com/world-patent-information/
Organizers
==========
Ralf Krestel (ZBW & CAU Kiel, Germany), Hidir Aras (FIZ Karlsruhe, Germany),
Linda Andersson (Artificial Researcher, Austria), Florina Piroi (Data Science Studio,
RSA FG, Austria), Allan Hanbury (TU Wien, Austria), Dean Alderucci (CMU, USA)
All questions about submissions should be emailed to:
r.krestel(a)zbw.eu and hidir.aras(a)fiz-karlsruhe.de
In this newsletter:
LDC membership discounts expire March 1
30th Anniversary Highlight: Arabic Treebank
New publications:
2019 NIST Speaker Recognition Evaluation Test Set - Audio-Visual<https://catalog.ldc.upenn.edu/LDC2023V01>
LORELEI Tagalog Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2023T02>
________________________________
LDC membership discounts expire March 1
Time is running out to save on 2023 membership fees. Renew your LDC membership, rejoin the Consortium, or become a new member by March 1 to receive a discount of up to 10%. For more information on membership benefits and options, visit Join LDC<https://www.ldc.upenn.edu/members/join-ldc>.
30th Anniversary Highlight: Arabic Treebank
The Penn/LDC Arabic Treebank (ATB) project began in 2001 with support from the DARPA TIDES program and later, the DARPA GALE and BOLT programs. The original focus was on Modern Standard Arabic (MSA), not natively spoken and not homogenously acquired across its writing and reading community. In addition to the expected issues associated with complex data annotation, LDC encountered several challenges unique to a highly inflected language with a rich history of traditional grammar. LDC relied on traditional Arabic grammar, as well as established and modern grammatical theories of MSA -- in combination with the Penn Treebank approach to syntactic annotation -- to design an annotation system for Arabic. (Maamouri, et al., 2004<https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/nemlar2004-penn-ara…>). LDC was innovative with respect to traditional grammar when necessary and when other syntactic approaches were found to account for the data. LDC also developed a wide-coverage MSA morphological analyzer, LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 (LDC2010L01<https://catalog.ldc.upenn.edu/LDC2010L01>), which greatly benefited ATB development. Revisions to the annotation guidelines during the DARPA GALE program (principally related to tokenization and syntactic annotation) improved inter-annotator agreement and parsing scores.
ATB corpora were annotated for morphology, part-of-speech, gloss, and syntactic structure. Data sets based on MSA newswire developed under the revised annotation guidelines include Arabic Treebank: Part 1 v 4.1 (LDC2010T13<https://catalog.ldc.upenn.edu/LDC2010T13>), Arabic Treebank: Part 2 v 3.1 (LDC0211T09<https://catalog.ldc.upenn.edu/LDC2011T09>), and Arabic Treebank: Part 3 v 3.2 (LDC2010T08<https://catalog.ldc.upenn.edu/LDC2010T08>). Other genres are represented in Arabic Treebank - Broadcast News v 1.0 (LDC2012T07<https://catalog.ldc.upenn.edu/LDC2012T07>) and Arabic Treebank - Weblog (LDC2016T02<https://catalog.ldc.upenn.edu/LDC2016T02>).
LDC's later work on Egyptian Arabic treebanks in the DARPA BOLT program benefited from the strides in its MSA treebank annotation pipeline. As for the challenges presented by informal, dialectal material, collaborator Columbia University provided a normalized Arabic orthography to account for instances of Romanized script (Arabizi) in the data and developed a morphological analyzer (CALIMA) in parallel, working in a tight feedback loop with LDC's annotation team. SAMA and CALIMA were synchronized in the Egyptian Arabic treebanks, the former used for MSA tokens and the latter used for Egyptian Arabic tokens. Resulting corpora include BOLT Egyptian Arabic Treebank - Discussion Forum (LDC2018T23<https://catalog.ldc.upenn.edu/LDC2018T23>), Conversational Telephone Speech (LDC2021T12<https://catalog.ldc.upenn.edu/LDC2021T12>), and SMS/Chat (LDC2021T17<https://catalog.ldc.upenn.edu/LDC2021T17>).
ATB corpora and its related releases are available for licensing to LDC members and nonmembers. For more information about licensing LDC data, visit Obtaining Data<https://www.ldc.upenn.edu/language-resources/data/obtaining>
________________________________
New publications:
2019 NIST Speaker Recognition Evaluation Test Set - Audio-Visual<https://catalog.ldc.upenn.edu/LDC2023V01> contains approximately 64 hours of English audio-visual data for development and test, answer keys, enrollment, trial files, and documentation from the NIST-sponsored 2019 Speaker Recognition Evaluation (SRE)<https://www.nist.gov/itl/iad/mig/nist-2019-speaker-recognition-evaluation>.
The 2019 evaluation task was speaker detection, that is, to determine whether a specified target speaker was speaking during a segment of speech. The evaluation was conducted in two parts: (1) a leaderboard-style challenge based on conversational telephone speech and (2) a separate evaluation using audio-visual data. This release relates to the audio-visual evaluation.
The source audio-visual data was collected by LDC for the VAST (Video Annotation for Speech Technology) project. That collection focused on amateur video recordings from various online media hosting services. The recordings vary in duration from 17.5 seconds to 13 minutes; most have two audio channels (stereo), but some are monophonic (one channel).
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Tagalog Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2023T02> was developed by LDC and is comprised of approximately 4.8 million words of Tagalog monolingual text, 341,000 words of found Tagalog-English parallel text, and 124,000 Tagalog words translated from English data. Approximately 78,000 words were annotated for named entities and over 26,000 words were annotated for entity discovery and linking and situation frames (identifying entities, needs and issues). Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
Dear colleagues,
We are further extending the deadline for this call to the 31st of March. Please find below the updated call:
-------------
We invite researchers in the broad area of computational morphology to submit their recent, unpublished work to a special issue of the Journal of Language Modelling <https://jlm.ipipan.waw.pl/index.php/JLM><https://jlm.ipipan.waw.pl/index.php/JLM>.
Motivation:
Computational techniques have a long history of use in the study of morphology, where they have been used both for practical tasks such as the analysis and production of complex word forms and for theoretical ones such as structural and informational analysis of morphological systems. As both systems and datasets improve, these techniques are increasingly developed and evaluated on a typologically diverse array of languages, including many which are endangered or lack large-scale resources. Detailed comparisons across languages can help to reveal typological biases or assumptions within existing computational techniques [1, 2]. Alternatively, computational methods and analyses can also shed light on questions within linguistic typology [3, 4, 5, 6].
The goal of this special issue is to bring researchers from multiple communities together in exploring issues of linguistic typology across a wide range of different languages and phenomena. We encourage the submission of work on endangered or less-studied languages.
The Journal of Language Modelling is a free (for readers and authors alike) open-access peer-reviewed journal. All articles are peer-reviewed by at least 3 reviewers, usually including at least one member of the Editorial Board.
Topics of interest:
- Typological clustering or classification of languages
- Investigation of particular linguistic features which improve or detract from the performance of computational morphology tools
- Comparison of morphological structures (e.g., inflection classes, implicative networks) across typologically different languages
- Investigation of diachronic typological change using computational methods
- Creation, curation or analysis of typological databases via computational methods
Submissions:
The submissions should be journal papers, not proceedings papers, totalling 25-50 pages, excluding references.
Authors are advised to use the online manuscript submission for the journal. Make sure to select the special issue when asked to provide the article type. More information, including formatting instructions for authors can be found on the journal's webpage at: https://jlm.ipipan.waw.pl/index.php/JLM/about/submissions. An adaptation of the LaTeX template for overleaf can be found at: https://fr.overleaf.com/latex/templates/template-for-journal-of-language-mo….
Important dates:
Call for papers issued: 15/7/2022
Submissions due: 15/1/2023 --- extended to 31/03/2023
Author notification: Spring 2023
Guest editors:
Sacha Beniamine (University of Surrey)
Micha Elsner (The Ohio State University)
Katharina Kann (University of Colorado, Boulder)
References
[1] Ryan Cotterell, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner, and Mans Hulden. 2016a. The SIGMORPHON 2016 shared Task— Morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 10–22, Berlin, Germany. Association for Computational Linguistics.
[2] Huiming Jin, Liwei Cai, Yihui Peng, Chen Xia, Arya McCarthy, and Katharina Kann. 2020. Unsupervised morphological paradigm completion. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6696– 6707, Online. Association for Computational Linguistics.
[3] Neil Rathi, Michael Hahn, and Richard Futrell. 2021. An Information-Theoretic Characterization of Morphological Fusion. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10115–10120, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
[4] Parker, J., Reynolds, R., & Sims, A. (2022). Network Structure and Inflection Class Predictability: Modeling the Emergence of Marginal Detraction. In A. Sims, A. Ussishkin, J. Parker, & S. Wray (Eds.), Morphological Diversity and Linguistic Cognition (pp. 247-281). Cambridge: Cambridge University Press. DOI: 10.1017/9781108807951.010
[5] Guzmán Naranjo, Matías and Becker, Laura. Statistical bias control in typology. Linguistic Typology, to appear, 2021. DOI: 10.1515/lingty-2021-0002
[6] Sacha Beniamine. 2021. One lexeme, many classes: Inflection class systems as lattices. In Berthold Crysmann & Manfred Sailer (eds.), One-to-many relations in morphology, syntax, and semantics, 23--51. Berlin: Language Science Press. DOI: 10.5281/zenodo.4729789
Senior Research Associate at the Alan Turing Institute, Foundation Models and Commonsense Reasoning
Position
In 2022, the Alan Turing Institute signalled its intention to establish a portfolio of foundational AI research, which would complement the strengths of the institute around applications of AI and AI policy. An initial portfolio of research in foundation models, game theory, and probabilistic programming will be launched in early 2023. Each of these areas is called a ‘Pillar’. It is intended that this portfolio will complement the UK’s current activity, rather than duplicating existing efforts, and aiming to promote emerging new areas that show promise for the future.
FOUNDATION MODELS
Foundation models are large ML models trained on large, broad data sets. Foundation models such as GPT-3 have been shown to have remarkable capabilities for generating realistic natural language, and, to some extent, capabilities for problem solving and common-sense reasoning. Developing a Turing Foundation Model is beyond our present capacity. Instead we therefore propose work aimed at developing Turing expertise around the problem of precisely understanding the capabilities of such models. The main issue we aim to address is that of *benchmarking* such models: although such models appear to be very capable in some respects, they fail on apparently simple tasks, in unpredictable ways. In short, we don't have a clear understanding of the capabilities and shortcomings of such systems - which raises concerns for their use.
ROLE PURPOSE
We are looking for a Senior Research Associate to support and enable the delivery of the Foundation Model theme, under the direction of Anthony (Tony) Cohn, and in collaboration with Michael Wooldridge and Nigel Shadbolt.
The successful candidate will primarily focus on evaluating the extent to which, and the conditions under which existing Foundational Models can support common-sense reasoning (such as naive physics, spatial and temporal reasoning, the concept of agency, and causality). This work will be taken in parallel and in collaboration with researchers working on other aspects of the Foundation Models Pillar.
The candidate will join a vibrant team of researchers and will have opportunities to engage with cutting-edge projects and experts at leading universities.
The post can be based either at The Alan Turing Institute site in London, or at the University of Leeds. In either case you will need to travel to the other site when required (travel expenses will be paid as appropriate).
Further details and the online application form can be found here:
https://cezanneondemand.intervieweb.it/turing/jobs/senior-research-associat…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcezanneon…>
====
SEMANTiCS - 19th International Conference on Semantic Systems
Leipzig, Germany
Workshops and Tutorials
September 20 - 22, 2023
https://2023-eu.semantics.cc/page/cfp_ws
====
SEMANTiCS 2023 is a major venue for research and industrial innovation
and features a workshop and tutorial program addressing the diverse
practical interests of its audience. This program is intended to offer a
rich diversity of topics to conference attendees and local participants
seeking to pick up new skills and stay up-to-date regarding the latest
developments in the community. We encourage submissions of proposals on
all topics in the general areas of SEMANTiCS 2023 and proposals bridging
or introducing new perspectives in these areas. Workshops and tutorials
may incorporate panel discussions, lightning talks, meetings, networking
or hands-on sessions, hackathons and other practical formats where
applicable. Rooms for business or project meetings are available upon
request as well.
=Important Dates for Workshops=
* Proposals WS Deadline: March 07, 2023 (11:59 pm, Hawaii time)
* Notification of Acceptance: March 14, 2023 (11:59 pm, Hawaii time)
=Important Dates for Tutorials (and other meetings, e.g. seminars,
show-cases, etc., without call for papers)=
* Proposals Tutorial Deadline: June 06, 2023 (11:59 pm, Hawaii time)
* Notification of Acceptance: June 20, 2023 (11:59 pm, Hawaii time)
Submission via Easychair on https://easychair.org/conferences/?conf=sem23
=Scope & Goals=
Workshops and tutorials at SEMANTiCS 2023 allow your organisation or
project to advance and promote your topics and gain increased
visibility. The workshops and tutorials will be announced on the
SEMANTiCS website and they will be seen by all participants. SEMANTiCS
2023 workshops and tutorials can be incubators for industrial and
scientific communities that form and share a particular research and
development agenda. They provide a forum for presenting contributions
and findings to a diverse and knowledgeable community.
Furthermore, the event can be used as a dissemination activity in the
scope of large research projects or as a closed format for
research/commercial project consortia meetings.
=Setup and Requirements=
SEMANTiCS 2023 workshops and tutorials may be either half or full day
long. Workshops and tutorials take place on the days before and/or after
the main SEMANTiCS 2023 EU conference (20th, 21st, and/or 22nd of
September 2023). Details will be communicated on time.
Organizers of workshops and tutorials will be granted three free tickets
(only for the workshop & tutorial day) for organization purposes or
keynotes. Participants of workshops and tutorials will be charged a
marginal fee to cover the basic costs.
Workshop and tutorials proposals must include the following information:
* outline of the themes and goals of the event, including a title and a
brief abstract (less than 200 words) intended for the SEMANTiCS 2023 website
* a statement addressing why the event is important, why the event is
timely, how it is relevant to SEMANTiCS 2023 and the field of semantic
web. For the tutorials, why the presenters are qualified for a
high-quality introduction of the topic
* related workshops and conferences, i.e., specifying if this is a
continuation of a workshop series or is a new workshop to address an
emerging issue. Please provide information about past versions of this
workshop and other related workshops (including URLs and
submission/acceptance counts, if available).
* a statement addressing the quality assurance criterion that will be
used by the event organizers to select the papers for the workshops and
the presenters for the tutorials (e.g., peer review or review/evaluation
by event organizers). If a peer review process is chosen as a quality
assurance criterion for the workshops, the organizers will be
responsible for their own reviewing process. Workshop organizers will be
responsible also for their own publicity (e.g., website, timelines and
call for papers) and proceedings production.
* structure of the event and plans for generating and stimulating
discussion; how will the interaction be organized in case of a hybrid event
* desired minimum and maximum number of event participants, expected
number of participants, and (in case of previously held events) number
of registered attendees and web site for previous editions of the event
* a description of the intended audience and the expected learning outcomes
* desired prerequisite knowledge of the audience
* proposed duration of the event (i.e., half or full day), different
sessions if applicable (final time slot will be assigned in accordance
with the SEMANTiCS program)
* any equipment, room capacity, or other logistic constraints
* full contact information of all organizers of the event and main
contact person; a brief description of each organizer's background,
including relevant past experience in organizing events
Proposals for workshop and tutorial proposals must be submitted via
Easychair: https://easychair.org/my/conference?conf=sem23
=Review and Evaluation Criteria=
Workshop and tutorial proposals will be reviewed by the SEMANTiCS 2023
Workshop Chairs, as well as by the SEMANTiCS 2023 organizing committee,
according to the following criteria:
* The potential to advance the state of semantic web research and practice
* The quality assurance criterion proposed by the organizers to select
high-quality papers for workshops and presenters for tutorials
* The organizers' experience and ability to lead a successful event
* Timeliness and expected interest in the event topics
* The balance and synergy between all SEMANTiCS 2023 events
=Topics of interest include (but are not limited to)=
* Web Semantics & Linked (Open) Data
* Enterprise Knowledge Graphs, Graph Data Management and Deep Semantics
* Machine Learning & Deep Learning Techniques
* Semantic Information Management & Knowledge Integration
* Terminology, Thesaurus & Ontology Management
* Data Mining and Knowledge Discovery
* Reasoning, Rules and Policies
* Natural Language Processing and Computational Linguistics
* Social and Human aspects of Semantic Web
* Data Quality Management and Assurance
* Explainable Artificial Intelligence
* Semantics in Data Science
* Semantics of Blockchain & Distributed Ledger Technologies
* Trust, Data Privacy, and Security with Semantic Technologies
* Economics of Data, Data Services and Data Ecosystems
* Applications of Semantic Web technologies in domains such as law,
medicine, life sciences, digital humanities, mobility and smart cities, etc.
We especially invite contributions that illustrate the applicability of
the topics mentioned above for industrial purposes and/or illustrate the
business relevance of their contribution for specific industries.
Workshop proposals on emerging themes for the topics listed above are
encouraged.
In case you have additional questions concerning the submission process,
please do not hesitate to contact us via Easychair.
We are looking forward to your contribution!
Jennifer D’Souza - jennifer.dsouza(a)tib.eu
Anisa Rula - anisa.rula(a)unibs.it
Workshop & Tutorial Chairs
FYI.
---------- Forwarded message ---------
From: 'Archna Bhatia' via MWE Workshop 2023 Organizers <
mweworkshop2023(a)googlegroups.com>
Date: Wed, Feb 8, 2023 at 8:09 PM
Subject: Fwd: [Corpora-List] Deadline extension: 19th Workshop on Multiword
Expressions (MWE 2023)
To: MWE Workshop 2023 Organizers <mweworkshop2023(a)googlegroups.com>
I have no idea why my emails seem like they need to be moderated recently
(I must have sent something inappropriate, just kidding!), but even to my
response below I received a notification that it is awaiting moderation and
that either this would post or I would hear back of the moderator’s
decision. From the past two CFPs (small sample), my experience is that
posts awaiting moderation do not get posted nor do I hear of the
moderator’s decision. So could someone else forward my response?
thanks,
Archna
Begin forwarded message:
*From: *Archna Bhatia <abhatia(a)ihmc.org>
*Subject: **Re: [Corpora-List] Deadline extension: 19th Workshop on
Multiword Expressions (MWE 2023)*
*Date: *February 8, 2023 at 2:59:35 PM EST
*To: *Ada Wan <adawan919(a)gmail.com>
*Cc: *Ken Litkowski <ken(a)clres.com>, corpora(a)list.elra.info
Hi Ada,
While appropriate space is found for this discussion, let me respond to
just your first suggestion (for now): Why do you think they should be
renamed “fixed/idiomatic expressions”? What would your definition of
“fixed” and of “idiomatic” mean? How fixed would you say these expressions
would be? Is morphological variation allowed? Is variation in any of the
other linguistic aspects allowed? From my point of view, “fixed/idiomatic
expressions” results in a much restricted category than what all we
consider could be treated as multiwords.
Thanks,
Archna
On Feb 8, 2023, at 2:38 PM, Ada Wan via Corpora <corpora(a)list.elra.info>
wrote:
Hi Ken
Thanks for the message. Unfortunately, it looks like there has been no
prior discussions on any of the topics I suggested, and the earliest post I
can access dates back only to 22Nov2020. I can surely start a discussion,
but that might look to be the first/only discussion on the list? (I went
through all the conversations accessible thus far and only saw
announcements.)
Perhaps more importantly:
as this seems to be an issue that could also affect other areas of concern
to the general audience of the Corpora-List (*not just for MWEs/SIGLEX*),
is there a way that we all can make some changes in the "language space"
across the board?
Thanks and best
Ada
On Wed, Feb 8, 2023 at 5:57 PM Ken Litkowski <ken(a)clres.com> wrote:
> Dear Ada,
>
> When I added the SIGLEX discussion code back in 2010, I did so with the
> idea that we would have discussion of just like the topic of yours. The
> morph of the discussion now is located on the Google group, via
> https://groups.google.com/g/siglex-members
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.go…>.
> There, you will find a place "Search conversations ..." where you can add
> your topic so that all will be sent. Rather than just the announcements
> that are the mainly topics.
>
> Ken (webmaster retiree)
> On 2/8/2023 10:18 AM, Ada Wan via Corpora wrote:
>
> Hi Kilian
>
> Hope all has been well.
>
> I'm surprised that people are still "wording around" nowadays. Some
> suggestions:
>
> 1. Can't we rename "MWEs" to "fixed/idiomatic expressions" instead? One
> can reformulate these as sequences/strings/expressions of various
> lengths/vocabs in characters.
> 2. Also, one can interpret these without information/association with any
> syntactic categories, nouns or verbs etc..
> 3. They do just represent lexical info (some reflecting/encoding
> historico-social habits, though one also should be aware of the ethical
> aspects of reinforcing some "traditional values"). Perhaps a more
> sophisticated view of language could help wean practitioners from a
> mindframe that relies of "linguistic structure(s)" as we've had it thus far
> (i.e. based on "words" and "sentences")?
> 4. Re " their meaning often does not result from the direct combination of
> the meanings of their parts": non-compositionality may be a better
> description of a more realistic view of language, it should prob be our
> default expectation (instead of the cherry-picked compositional
> counterparts).
>
> I think efforts towards mitigating a mental dependency on "words" would be
> a good direction to pursue, what do you think?
> Can we get SIGLEX to update in this regard?
>
> Best
> Ada
>
>
> On Wed, Feb 8, 2023 at 11:12 AM Kilian Evang via Corpora <
> corpora(a)list.elra.info> wrote:
>
>> [Apologies for cross-postings]
>>
>>
>> ********************************************************************************
>>
>> Call for Papers: Deadline extended
>>
>> 19th Workshop on Multiword Expressions (MWE 2023)
>>
>> Organized and sponsored by SIGLEX, the Special Interest Group
>> on the Lexicon of the ACL
>>
>> Full-day workshop collocated with EACL 2023, Dubrovnik, Croatia, May 5
>> or 6, 2023
>>
>> Hybrid (on-site & on-line)
>>
>> NEW: Submission deadline: February 20, 2023
>>
>> NEW: Invited speakers announced (see below)
>>
>> NEW: Best paper award (see below)
>>
>> MWE 2023 website: https://multiword.org/mwe2023/
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmultiword…>
>>
>>
>> ********************************************************************************
>>
>> Multiword expressions (MWEs) are word combinations that exhibit
>> lexical, syntactic, semantic, pragmatic, and/or statistical
>> idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog,
>> pay a visit and pull one's leg. The notion encompasses closely related
>> phenomena: idioms, compounds, light-verb constructions, phrasal verbs,
>> rhetorical figures, collocations, institutionalised phrases, etc.
>> Their behaviour is often unpredictable; for example, their meaning
>> often does not result from the direct combination of the meanings of
>> their parts. Given their irregular nature, MWEs often pose complex
>> problems in linguistic modelling (e.g. annotation), NLP tasks (e.g.
>> parsing), and end-user applications (e.g. natural language
>> understanding and MT), hence still representing an open issue for
>> computational linguistics (Constant et al. 2017).
>>
>> For almost two decades, modelling and processing MWEs for NLP has been
>> the topic of the MWE workshop organised by the MWE section of SIGLEX
>> in conjunction with major NLP conferences since 2003. Impressive
>> progress has been made in the field, but our understanding of MWEs
>> still requires much research considering their need and usefulness in
>> NLP applications. This is also relevant to domain-specific NLP
>> pipelines that need to tackle terminologies most often realised as
>> MWEs. Following previous years, for this 19th edition of the workshop,
>> we identified the following topics on which contributions are
>> particularly encouraged:
>>
>> MWE processing and identification in specialized languages and
>> domains: Multiword terminology extraction from domain-specific corpora
>> (Bonin et al. 2010) is of particular importance to various
>> applications, such as MT (Semmar & Laib, 2017), or for the
>> identification and monitoring of neologisms and technical jargon
>> (Chatzitheodorou et al, 2021). We expect approaches that deal with
>> the processing of MWEs as well as the processing of terminology in
>> specialised domains can benefit from each other.
>>
>> MWE processing to enhance end-user applications: MWEs have gained
>> particular attention in end-user applications, including MT (Zaninello
>> & Birch 2020; Han et al. 2021, 2022), simplification (Kochmar et al.
>> 2020), language learning and assessment (Paquot et al. 2019;
>> Christiansen & Arnon 2017), social media mining (Maisto et al. 2017),
>> and abusive language detection (Zampieri et al. 2020; Caselli et al.
>> 2020). We believe that it is crucial to extend and deepen these first
>> attempts to integrate and evaluate MWE technology in these and further
>> end-user applications.
>>
>> MWE identification and interpretation in pre-trained language models:
>> Most current MWE processing is limited to their identification and
>> detection using pre-trained language models, but we still lack
>> understanding about how MWEs are represented and dealt with therein
>> (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook
>> 2021), how to better model the compositionality of MWEs from semantics
>> (Moreau et al. 2018). Now that NLP has shifted towards end-to-end
>> neural models like BERT, capable of solving complex tasks with little
>> or no intermediary linguistic symbols, questions arise about the
>> extent to which MWEs should be implicitly or explicitly modelled
>> (Shwartz & Dagan, 2019).
>>
>> MWE processing in low-resource languages: The PARSEME shared tasks
>> (Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have
>> fostered significant progress in MWE identification, providing
>> datasets that include low-resource languages, evaluation measures, and
>> tools that now allow fully integrating MWE identification into
>> end-user applications. A few efforts have recently explored methods
>> for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017),
>> and their processing in low-resource languages (Liu & Wang 2020; Kumar
>> et al. 2017). Resource creation and sharing should be pursued in
>> parallel with the development of methods able to capitalize on small
>> datasets (Han et al. 2020).
>>
>> Through this workshop, we would like to bring together and encourage
>> researchers in various NLP subfields to submit MWE-related research,
>> so that approaches that deal with processing of MWEs including
>> processing for low-resource languages and for various applications can
>> benefit from each other. We also intend to consolidate the converging
>> effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and
>> MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022
>> joint session, extending our scope to MWEs in e-lexicons and WordNets,
>> MWE annotation, as well as grammatical constructions. Correspondingly,
>> we call for papers on research related (but not limited) to MWEs and
>> constructions in:
>>
>> Computationally-applicable theoretical work in psycholinguistics and
>> corpus linguistics;
>>
>> Annotation (expert, crowdsourcing, automatic) and representation in
>> resources such as corpora, treebanks, e-lexicons, and WordNets (also
>> for low-resource languages);
>>
>> Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG,
>> LFG, TAG, UD, etc.);
>>
>> Discovery and identification methods, including for specialized
>> languages and domains such as clinical or biomedical NLP;
>>
>> Interpretation of MWEs and understanding of text containing them;
>>
>> Language acquisition, language learning, and non-standard language
>> (e.g. tweets, speech);
>>
>> Evaluation of annotation and processing techniques;
>>
>> Retrospective comparative analyses from the PARSEME shared tasks;
>>
>> Processing for end-user applications (e.g. MT, NLU, summarisation,
>> language learning, etc.);
>>
>> Implicit and explicit representation in pre-trained language models
>> and end-user applications;
>>
>> Evaluation and probing of pre-trained language models;
>>
>> Resources and tools (e.g. lexicons, identifiers) and their integration
>> into end-user applications;
>>
>> Multiword terminology extraction;
>>
>> Adaptation and transfer of annotations and related resources to new
>> languages and domains including low-resource ones.
>>
>>
>> Shared Task
>>
>> We do not have a shared task this year, but a new release of the
>> PARSEME corpus of verbal MWEs is currently underway. We encourage
>> submission of research papers that include analyses of the new edition
>> of the PARSEME data and improvements over the results for PARSEME 2020
>> shared task as well as SemEval 2022 task 2 on idiomaticity prediction.
>>
>>
>> *** Special Track on MWEs in Clinical NLP ***
>>
>> Pursuing the MWE Section’s tradition of synergies with other
>> communities, this year, we are organizing a joint session with the
>> Clinical NLP workshop for shared papers/poster presentations. Since
>> clinical texts contain an important amount of multiword expressions
>> (e.g. medical terms or domain-specific collocations), a joint session
>> is deemed beneficial for both communities. The goal is to foster
>> future synergies that could address scientific challenges in the
>> creation of resources, models and applications to deal with multiword
>> expressions and related phenomena in the specialised domain of
>> ClinicalNLP. Submissions describing research on MWEs in the
>> specialized domain of ClinicalNLP, especially introducing new datasets
>> or new tools and resources, are welcome. Papers accepted in this track
>> will have the option to present their work in the Clinical NLP
>> workshop at ACL 2023 as well, after being presented at MWE 2023.
>>
>>
>> Invited Speakers
>>
>> We are looking forward to invited talks by two amazing speakers:
>>
>> Leo Wanner, Universitat Pompeu Fabra
>>
>> TBD
>>
>>
>> Best paper award
>>
>> All full papers in the workshop will be considered by the program
>> committee for a best paper award. The decision will be announced in
>> the closing session.
>>
>>
>> Submission formats
>>
>> The workshop invites two types of submissions:
>>
>> archival submissions that present substantially original research in
>> both long paper format (8 pages + references) and short paper format
>> (4 pages + references).
>>
>> non-archival submissions of abstracts describing relevant research
>> presented/published elsewhere which will not be included in the MWE
>> proceedings.
>>
>>
>> Paper submission and templates
>>
>> Papers should be submitted via the workshop's START submission page
>> (https://softconf.com/eacl2023/mwe2023/
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftconf.…>).
>> Please choose the
>> appropriate submission format (archival/non-archival). Archival papers
>> with existing reviews will also be accepted through the ACL Rolling
>> Review. Submissions must follow the ACL 2023 stylesheet.
>>
>>
>> Archival papers with existing reviews from ACL Rolling Review will
>> also be considered. A paper may not be simultaneously under review
>> through ARR and MWE. A paper that has or will receive reviews through
>> ARR may not be submitted for review to MWE.
>>
>>
>> Important Dates
>>
>> Paper submission: February 20, 2023
>>
>> ARR paper commitment: March 6, 2023
>>
>> Notification of acceptance: March 13, 2023
>>
>> Camera-ready papers due: March 27, 2023
>>
>> Workshop: May 5 or 6, 2023
>>
>>
>> All deadlines are at 23:59 UTC-12 (Anywhere on Earth).
>>
>>
>> Organizing Committee
>>
>> Program chairs: Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
>>
>> Publication chair: Archna Bhatia
>>
>> Publicity chair: Kilian Evang
>>
>>
>> Anti-harassment policy
>>
>> The workshop follows the ACL anti-harassment policy.
>>
>>
>> Contact
>>
>> For any inquiries regarding the workshop, please send an email to the
>> Organizing Committee at mweworkshop2023(a)googlegroups.com.
>> _______________________________________________
>> Corpora mailing list -- corpora(a)list.elra.info
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra…>
>> To unsubscribe send an email to corpora-leave(a)list.elra.info
>>
>
> _______________________________________________
> Corpora mailing list -- corpora@list.elra.infohttps://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra…>
> To unsubscribe send an email to corpora-leave(a)list.elra.info
>
> --
> Ken Litkowski TEL.: 301-482-0237
> CL Research EMAIL: ken(a)clres.com
> 9208 Gue Road Home Page: http://www.clres.com <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.clres.…>
> Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.clres.…>
>
> _______________________________________________
Corpora mailing list -- corpora(a)list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave(a)list.elra.info
--
Archna Bhatia, Ph.D.
Research Scientist, Institute for Human & Machine Cognition
15 SE Osceola Ave, Ocala, FL 34471
(352) 387-3061
--
You received this message because you are subscribed to the Google Groups
"MWE Workshop 2023 Organizers" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to mweworkshop2023+unsubscribe(a)googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mweworkshop2023/A86DBC58-1695-45C4-AC5D-9…
<https://groups.google.com/d/msgid/mweworkshop2023/A86DBC58-1695-45C4-AC5D-9…>
.
For more options, visit https://groups.google.com/d/optout.
--
*News*:
*CFPs and participants*: HealTAC23
<http://healtex.org/healtac-conference-series/> (Manchester June 14-16) |
MWE23 <https://multiword.org/mwe2023/>@EACL (joint w ClinicalNLP@ACL)
*our work:*
ClinicalMT
<https://scholar.google.com/citations?view_op=view_citation&hl=en&user=_vf3E…>@WMT22_w_EMNLP
| Meta-eval Tutorial
<https://scholar.google.com/scholar?oi=bibs&hl=en&cluster=5418617987092956707>
/
HumanEval <https://aclanthology.org/2022.lrec-1.2.pdf> (paper_w_tool) /
TranslationUncertainty <https://aclanthology.org/2022.lrec-1.2.pdf> (paper)
@LREC22 | ClinicalTextMinging
<https://github.com/poethan/TransformerCRF> (ML-Tools)
@HealTAC2022 *|* Covid-Topic-Modeling <https://arxiv.org/abs/2301.03029> (
*arXiv-2023*)
Serving as ACL2023 <https://2023.aclweb.org> AC (area chair): resource and
evaluation |
MWE-SIGLEX <https://multiword.org> elected Standing Committee Board member
(2022-2024) |
Ph.D. in Computer Application (Machine Translation, thesis
<https://doras.dcu.ie/26559/>), M.Sc. (Software Engineering, thesis
<https://arxiv.org/abs/1703.08748> *excellent-award*), B.Sc. (Math, *GPA
80/100*)
Google-Scholar <https://scholar.google.nl/citations?user=_vf3E2QAAAAJ&hl=en> ,
Presentation <https://www.slideshare.net/AaronHanLiFeng>(ppt),
Research-Gate <https://www.researchgate.net/profile/Aaron_L-F_Han>
Google-site <https://sites.google.com/view/poetgarden/home> Linkedin
<https://www.linkedin.com/in/aaronhan/>, Writer
<https://books.apple.com/us/author/lifeng-han/id1602229739>(poetry)
Postdoctoral <https://www.research.manchester.ac.uk/portal/lifeng.han.html>
Research Associate at HECTA
<https://www.research.manchester.ac.uk/portal/en/researchers/goran-nenadic(4…>
group, The University of Manchester, UK
https://www.research.manchester.ac.uk/portal/lifeng.han.html Former: ADAPT
Research Centre & DCU, Ireland
CALL FOR PARTICIPATION
IberLEF 2023 Task - HOPE: Multilingual Hope Speech detection
Held as part of the evaluation forum IberLEF 2023
<https://sites.google.com/view/iberlef-2023> in the XXXIX edition of the
International Conference of the Spanish Society for Natural Language
Processing (SEPLN 2023 <http://sepln2023.sepln.org/en/home/>)
September 26, 2023. Jaén, Andalusia, Spain
Codalab link: https://codalab.lisn.upsaclay.fr/competitions/10215
Dear All,
We are inviting researchers and students to participate in the
shared-task HOPE:
Multilingual Hope Speech detection, held as part of IberLEF 2023, the
shared evaluation campaign for Natural Language Processing systems in
Spanish and other Iberian languages, collocated with SEPLN 2023 Conference.
The HOPE shared task is related to the inclusion of vulnerable groups and
focuses on the study of the detection of hope speech, in pursuit of
equality, diversity and inclusion. This task was previously organized at
the second workshop on Language Technology for Equality, Diversity and
Inclusion (LT-EDI-2022), as a part of ACL 2022, but for five languages:
Tamil, Malayalam, Kannada, English and Spanish. The novelties of this
shared task are: i) it is organized in two languages, Spanish and English;
and ii) it provides an expanded and improved dataset. It consists of two
subtasks:
-
Subtask 1: Hope Speech detection in Spanish. Given a Spanish tweet,
identifying whether it contains hope speech or not. The possible categories
for each text are:
-
HS: Hope Speech.
-
NHS: Non Hope Speech.
-
Subtask 2: Hope Speech detection in English. Given an English Youtube
comment, identifying whether it contains hope speech or not. The possible
categories for each text are:
-
HS: Hope Speech.
-
NHS: Non Hope Speech.
In both subtasks there will be a real time leaderboard and the participants
will be allowed to make a maximum of 10 submissions through CodaLab, from
which each team will have to select the best one for ranking.
The dataset for this task comprises two corpus, one in Spanish and another
in English. The Spanish corpus was collected between 2021 and 2022. It is
an extension of the SpanishHopeEDI dataset (García-Baena et al., 2023) to
be published in the journal Language Resources and Evaluation, which was
used in the ACL LT-EDI-2022 Spanish task (Chakravarthi et al., 2022). It
consists of a set of LGBT-related tweets annotated as HS (Hope Speech) or
NHS (Non Hope Speech). A tweet is considered as HS if the text: i)
explicitly supports the social integration of minorities; ii) is a positive
inspiration for the LGTBI community; iii) explicitly encourages LGTBI
people who might find themselves in a situation; or iv) unconditionally
promotes tolerance. On the contrary, a tweet is marked as NHS if the text:
i) expresses negative sentiment towards the LGTBI community; ii) explicitly
seeks violence; or iii) uses gender-based insults. The English corpus is an
extension of the English part of the HopeEDI dataset (Chakravarthi, 2020).
It consists of comments posted on YouTube videos on a wide range of
socially relevant topics such as Equality, Diversity and Inclusion,
including LGBTIQ issues, COVID-19, women in STEM, Black Lives Matter, etc.
To download the data and participate, go to:
https://codalab.lisn.upsaclay.fr/competitions/10215.
Best regards,
The HOPE 2023 organizing committee
References
-
García-Baena, D., García-Cumbreras, M.A., Jiménez-Zafra, S.M.,
García-Díaz, J.A., Valencia-García, R. (2023). Hope Speech Detection in
Spanish. The LGBT case. Language Resources and Evaluation. To be published.
-
Chakravarthi BR (2020) HopeEDI: A multilingual hope speech detection
dataset for equality, diversity, and inclusion. In: Proceedings of the
Third Workshop on Computational Modeling of People’s Opinions, Personality,
and Emotion’s in Social Media, Association for Computational Linguistics,
Barcelona, Spain (Online), pp 41–53, URL
https://aclanthology.org/2020.peoples-1.5
-
Chakravarthi, B. R., Muralidaran, V., Priyadharshini, R., Cn, S.,
McCrae, J. P., García-Cumbreras, M. Á., Jiménez-Zafra, S. M.,
Valencia-García, R., Kumar Kumaresan, P., Ponnusamy, R., García-Baena, D. &
García-Díaz, J. (2022, May). Overview of the Shared Task on Hope Speech
Detection for Equality, Diversity, and Inclusion. In Proceedings of the
Second Workshop on Language Technology for Equality, Diversity and
Inclusion (pp. 378-388). https://aclanthology.org/2022.ltedi-1.58
Important dates
-
Release of training + development corpora: Feb 13, 2023.
-
Release of test corpora and start of evaluation campaign: Mar 13, 2023
-
End of evaluation campaign (deadline for runs submission): Mar 28, 2023.
-
Publication of official results: Mar 30, 2023.
-
Paper submission: Abr 25, 2023.
-
Review notification: May 23, 2023.
-
Camera ready submission: Jun 9, 2023.
-
IberLEF Workshop (SEPLN 2023): Sep 26, 2023 (Jaén, Andalusia, Spain)
-
Publication of proceedings: Sep ??, 2023
Organizing committee
-
Miguel Ángel García Cumbreras (SINAI, Universidad de Jaén)
-
Daniel García-Baena (SINAI, Universidad de Jaén)
-
Bharathi Raja Chakravarthi (University of Galway)
-
Salud María Jiménez-Zafra (SINAI, Universidad de Jaén)
-
José Antonio García-Díaz (UMUTeam, Universidad de Murcia)
-
Rafael Valencia-García (UMUteam, Universidad de Murcia)
-
L. Alfonso Ureña-López (SINAI, Universidad de Jaén)
[image: Universidad de Jaén] <http://www.uja.es/> *Salud María Jiménez
Zafra*
sjzafra(a)ujaen.es
Universidad de Jaén
Grupo de Investigación SINAI <http://sinai.ujaen.es/> | Departamento de
Informática
EPS Jaén, Edificio A3, Despacho 219
Campus Las Lagunillas s/n 23071 - Jaén | +34 953212992
[image: Universidad de Jaén] <http://www.uja.es/>
[Apologies for cross-posting]
*************************
Call for Participation
*************************
Task: *Homotransphobia Detection in Italian (HODI)* at EVALITA 2023
<https://www.evalita.it/campaigns/evalita-2023/>
Info: https://hodi-evalita.github.io/
Final Workshop: 7th - 8th September 2023, Parma, Italy
*Registration is required to obtain data and participate in the shared
task.*
To register, follow the instruction here
https://hodi-evalita.github.io/how_to_participate/
-----------------------------------------------------
🌈 The HODI Shared Task 🌈
-----------------------------------------------------
We invite participants to participate in the first shared task of
homotransphobia detection in Italian (HODI). Despite the NLP community’s
interest in hate speech detection datasets and models, very few studies
covered homotransphobia. This is a concern, due to the target-oriented
nature of hate speech: recent studies have revealed that hate speech
detection methods cannot be used to multiple sorts of hate speech targets.
HODI is organized according to two main subtasks:
** Subtask A - Homotransphobia detection:** the objective is to detect if a
text is homotransphobic or not.
** Subtask B - Explainability:** the objective is to extract the rationales
of the classification models trained for Subtask A.
Further details on the task, data, and evaluation are available at the task
website: <https://di.unito.it/sardistance2020>
https://hodi-evalita.github.io/
-----------------------
Important Dates
-----------------------
- 7 Feb 2023: Training data available (training period starts)
- 2 May 2023 Test data available
- 9 May 2023 Systems results due to organizers
- 30 May 2023 Results notification to participants
- 14 Jun 2023 Technical report due to organizers
- 10 Jul 2023 Reviews to participants (peer-reviews)
- 25 Jul 2023 Camera ready due to organizers
- 7 - 8 Sep 2023 EVALITA Workshop
----------------
Organizers
----------------
Debora Nozza, Bocconi University
Greta Damo, Bocconi University
Alessandra Teresa Cignarella, University of Turin
Tommaso Caselli, University of Groningen
Viviana Patti, University of Turin
--
Tommaso Caselli, Ph.D.
Senior Assistant Professor in Computational Semantics
Faculty of Arts, Rijksuniversiteit Groningen
The Netherlands
----------------------------
https://xs4all.academia.edu/TommasoCasellihttps://www.researchgate.net/profile/Tommaso_Caselli
Twitter: @tommaso_caselli