Second call for papers for the LREC-COLING2024 pre-conference workshop:
Holocaust Testimonies as Language Resources
Date: 21 May 2024 (full day)
Venue: Lingotto Conference Centre, Turin, Italy
Webpage: https://www.clarin.eu/HTRes2024
<https://url6.mailanyone.net/scanner?m=1rX22L-0002ym-4R&d=4%7Cmail%2F90%2F17…>
Submission Deadline: 21 February 2024
Submission Portal: https://softconf.com/lrec-coling2024/htres2024/
*Workshop description*
Holocaust testimonies serve as a bridge between survivors and history’s
darkest chapters, providing a connection to the profound experiences of the
past. Testimonies stand as the primary source of information that describe
the Holocaust, offering first-hand accounts and personal narratives of
those who experienced it. The majority of testimonies are captured in an
oral format, as survivors vividly explain and share their personal
experiences and observations from that time period. Transforming Holocaust
testimonies into a machine-processable digital format can be a difficult
task owing to the unstructured nature of the text. The creation of
accessible, comprehensive, and well-annotated Holocaust testimony
collections is of paramount importance to our society. These collections
empower researchers and historians to validate the accuracy of socially and
historically significant information, enabling them to share critical
insights and trends derived from these data. This workshop will investigate
a number of ways in which techniques and tools from natural language
processing and corpus linguistics can contribute to the exploration,
analysis, dissemination and preservation of Holocaust testimonies.
The workshop is supported by CLARIN and the European Holocaust Research
Infrastructure (EHRI).
We expect contributions related to the following topics:
Creation of datasets and development of tools for the study of Holocaust
testimonies:
- Creation of language corpora of Holocaust testimonies
- Digitization and enhancement of oral and written testimonies
(including automatic speech recognition, alignment of text and speech,
format conversion, OCR, handwriting recognition, machine translation)
- Named entity recognition for identifying people, places, and events in
testimonies
- Standards, representation formats, and guidelines for annotations and
vocabularies relevant to the Holocaust testimonies
- Creation, adaptation and tuning of software applications for the
creation, annotation, enhancement and use of Holocaust testimonies as
language resources
- Research using and Holocaust testimonies
- Applications of NLP in analysing Holocaust survivor testimonies
- Sentiment analysis and emotional content extraction from survivor
narratives.
- Data Visualisation, Knowledge Representation and Information
Extraction:
- Visualising complex data structures from Holocaust testimonies
- Building knowledge graphs and networks to represent historical
relationships
- Interactive data visualisations for education and research
- Extracting biographical and temporal information relevant to the
Holocaust
- Deep learning and large language models
- Digital Archiving and Long-Term Preservation:
- Methods and tools for digitising and preserving Holocaust
testimonies
- Best practices for metadata standards and cataloguing
- Ensuring long-term accessibility and data integrity
- Ethical Considerations and Privacy
- Ethical challenges in digitising and sharing sensitive testimonies
- Anonymisation and privacy protection in Holocaust data
- Community engagement and consent in digital projects
- User and application aspects
- Development of tools and interfaces for the search, analysis and
exploration of Holocaust testimonies
- Other relevant use cases and application scenarios
All papers must clearly state and explain their relevance to the topic of
'Holocaust Testimonies as Language Resources'.
*Submission & Publication*
All papers must represent original and unpublished work that is not
currently under review. Papers will be evaluated according to their
significance, originality, technical content, style, clarity, and relevance
to the workshop. We welcome the following types of contributions:
- Standard research papers (up to 8 pages, plus more pages for
references if needed);
- Short research papers (from 4 to 6 pages, plus more pages for
references if needed).
Submissions must be anonymous and strictly follow the LREC2024 stylesheet
formatting
<https://url6.mailanyone.net/scanner?m=1rX22L-0002ym-4R&d=4%7Cmail%2F90%2F17…>guidelines.
All papers should be electronically submitted in PDF format via the main
conference platform via START
<https://url6.mailanyone.net/scanner?m=1rX22L-0002ym-4R&d=4%7Cmail%2F90%2F17…>
.
*Important Dates*
- *Paper submission deadline:* 21 February 2024
- *Notification of acceptance:* 20 March 2024
- *Camera-ready paper: *15 April 2024
- *Workshop date: *21 May 2024
*Organising Committee*
- Isuri Anuradha, University of Wolverhampton, UK
- Ingo Frommholz, University of Wolverhampton, UK
- Francesca Frontini, CNR-ILC, Italy & CLARIN, Italy
- Martin Wynne, Oxford University, UK
- Ruslan Mitkov, Lancaster University, UK
- Paul Rayson, Lancaster University, UK
- Alistair Plum, University of Luxembourg, Luxembourg
*Programme Committee*
- Le An Ha, Ho Chi Minh City University of Foreign Languages and
Information Technology, Vietnam
- Federico Boschetti, CNR-Istituto di, Linguistica Computazionale “A.
Zampolli”, Italy
- Estelle Bunout, University of Luxembourg, Luxembourg
- Martin Bulin, University of West Bohemia, Czech Republic
- Tim Cole, University of Bristol, UK
- Angelo Mario Del Grosso, CNR-Istituto di, Linguistica Computazionale
“A. Zampolli”, Italy
- Maria Dermentzi, King’s College London, UK
- Robert Ehrenreich, USHMM, USA
- Ignatius Ezeani, Lancaster University, UK
- Ian Gregory, Lancaster University, UK
- Arjan van Hessen, Radboud University
- Henk van den Heuvel, Radboud University & CLARIN ERIC
- Renana Keydar, The Hebrew University of Jerusalem, Israel
- William J.B. Mattingly, USHMM, USA
- Patricia Murrieta-Flores, Lancaster, University, UK
- Maciej Ogrodniczuk, Institute of Computer, Science, Polish Academy of
Sciences, Poland
- Maciej Piasecki, Wroclaw University of Science and Technology, Poland
- Rachel Pistol, King’s College London, UK
- Johannes-Dieter Steinert, University of Wolverhampton, UK
- Jan Svec, University of West Bohemia
- Gabor Toth, University of Luxembourg,Luxembourg
- Eveline Wandl-Vogt, Austrian Academy of Sciences, Vienna
The next meeting of the Edge Hill Corpus Research Group will take place online (via MS Teams) on Thursday 29 February 2024, 2:00-3:30 pm (GMT).
Attendance is free. You can register here:
https://store.edgehill.ac.uk/conferences-and-events/conferences/events/edge…
Registration closes on Wednesday 28 February, 11 am (GMT)
Topic: Corpus Methodology
Speaker: Matteo Di Cristofaro<https://infogrep.it/site/> (University of Modena and Reggio Emilia, Italy)
Title: One dataset, many corpora: Problems of scientific validity in corpora and corpus-derived results
Abstract
Corpus linguistics has, since its inception, recognised the relevance of digital technologies as a major driving force behind corpus techniques and their (r)evolution in the study of language (cf. Tognini-Bonelli 2012). And yet, while both corpus linguistics and digital technologies have frequently benefited from each other (the case of NLP/NLU is one such macro example), their pathways have often diverged. The result is a disconnect between corpus linguistics and digital data processing whose effects directly impinge on the ability to analyse language through software tools. A disconnect becoming more and more relevant as corpus linguistics is being applied to vast amounts of data obtained from manifold sources – including a wide array of social media platforms, each one with its unique linguistic and technical peculiarities.
As the ground-truth of an ever-increasing number of language studies, corpora must be able to correctly treat and represent such peculiarities: e.g. the dialogic dimension of comments or forum posts; the presence (and potential subsequent normalisation) of spelling variations; the use of hashtags and emojis. Failing to do so, the corpus-derived results will likely present researchers with a falsified view of the language under scrutiny.
What is at stake is not the ability to “count” what is in a corpus, but rather whether what is being counted is or is not a feature present in the original data – of which the corpus should be a faithful representation.
The presentation is consequently devoted to tackling digital technicalities, i.e. “those notions and mechanisms that – while not classically associated with natural language – are i) foundational of the digital environments in which language production and exchanges occur and ii) at the core of the techniques that are used to produce, collect, and process the focus of investigation, that is, digital textual data.” (Di Cristofaro 2023:5). One such example is represented by character encodings: although at the “core” of the whole corpus linguistics enterprise (cf. McEnery and Xiao 2005; Gries 2016:39,111) – since they allow written language to be processed by a computer and understood by humans -, these are often overlooked at all stages of corpus compilation and analysis, potentially leading linguists to involuntarily tampering with the data and its linguistic contents.
Starting from practical examples, the presentation discusses the implications that digital technicalities have on corpora and their analyses – or rather, what happens when they are not properly treated – while outlining (also in the form of Python scripts and practical tools) potential new pathways that a “digital-aware” perspective of corpus linguistics can open up.
References
Di Cristofaro, Matteo. Corpus Approaches to Language in Social Media. Routledge Advances in Corpus Linguistics. New York: Routledge, 2023. https://doi.org/10.4324/9781003225218<https://doi.org/10.4324/9781003225218>.
Gries, Stefan Th. Quantitative Corpus Linguistics with R: A Practical Introduction. 2nd ed. New York: Routledge, 2016. https://doi.org/10.4324/9781315746210<https://doi.org/10.4324/9781315746210>.
McEnery, Tony, and Richard Xiao. ‘Character Encoding in Corpus Construction’. In Developing Linguistic Corpora: A Guide to Good Practice, edited by Martin Wynne, 47–58. Oxford: Oxbow Books, 2005. https://users.ox.ac.uk/~martinw/dlc/index.htm<https://users.ox.ac.uk/~martinw/dlc/index.htm>.
Tognini Bonelli, Elena. ‘Theoretical Overview of the Evolution of Corpus Linguistics’. In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 14–27. Routledge Handbooks in Applied Linguistics. Milton Park, Abingdon, Oxon ; New York: Routledge, 2012.
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
HealTAC 2024 (the 7th Healthcare Text Analytics Conference) is taking place in Lancaster (UK), from June 12 to June 14th, 2024.
Submit your extended abstracts that describe any aspect of healthcare text analytics by March 28th. Submissions (up to 2 pages) should be prepared based on a template that is available at the conference web site:
https://healtac2024.github.io/submissions/
We also invite PhD and fellowship project submissions that describe ongoing PhD research (any stage) or a planned fellowship application. The conference will provide an opportunity to receive constructive feedback from a panel of experts.
As in previous years, there will be a post-conference open call to submit a journal length paper for further peer review and publication in Frontiers in Digital Health.
We are looking forward to seeing you at #HEALTAC2024
**** We apologize for the multiple copies of this email. In case you are
already registered to the next webinar, you do not need to register
again. ****
Dear colleague,
We are happy to announce the next webinar in the Language Technology
webinar series organized by the HiTZ research center (Basque Center for
Language Technology, http://hitz.eus). You can check the videos of
previous webinars and the schedule for upcoming webinars here:
http://www.hitz.eus/webinars
Next webinar:
* *Speaker*: Heng Ji (University of Illinois)
* *Title*: SmartBook: an AI Prophetess for Disaster Reporting and
Forecasting
* *Date*: *Friday*, February 16, 2024 - 15:00 CET
* *Summary*: History repeats itself, sometimes in a bad way. If we
don’t learn lessons from history, we might suffer similar tragedies,
which are often preventable. For example, many experts now agree
that some schools were closed for too long during COVID-19 and that
abruptly removing millions of children from American classrooms has
had harmful effects on their emotional and intellectual health. Also
many wish we had invested in vaccines earlier, prepared more
personal protective equipment and medical facilities, provided
online consultation services for people who suffered from anxiety
and depression, and created better online education platforms for
students. Similarly, genocides throughout history (from those in
World War II to the recent one in Rwanda in 1994) have also all
shared early warning signs (e.g., organization of hate groups,
militias, and armies and polarization of the population) forming
patterns that follow discernible progressions. Preventing natural or
man-made disasters requires being aware of these patterns and taking
pre-emptive action to address and reduce them, or ideally, eliminate
them. Emerging events, such as the COVID pandemic and the Ukraine
Crisis, require a time-sensitive comprehensive understanding of the
situation to allow for appropriate decision-making and effective
action response. Automated generation of situation reports can
significantly reduce the time, effort, and cost for domain experts
when preparing their official human-curated reports. However, AI
research toward this goal has been very limited, and no successful
trials have yet been conducted to automate such report generation
and “what-if” disaster forecasting. Pre-existing natural language
processing and information retrieval techniques are insufficient to
identify, locate, and summarize important information, and lack
detailed, structured, and strategic awareness. We propose SmartBook,
a novel framework that cannot be solved by ChatGPT, targeting
situation report generation which consumes large volumes of news
data to produce a structured situation report with multiple
hypotheses (claims) summarized and grounded with rich links to
factual evidence by claim detection, fact checking, misinformation
detection and factual error correction. Furthermore, SmartBook can
also serve as a novel news event simulator, or an intelligent
prophetess. Given “What-if” conditions and dimensions elicited from
a domain expert user concerning a disaster scenario, SmartBook will
induce schemas from historical events, and automatically generate a
complex event graph along with a timeline of news articles that
describe new simulated events based on a new Λ-shaped attention mask
that can generate text with infinite length. By effectively
simulating disaster scenarios in both event graph and natural
language format, we expect SmartBook will greatly assist
humanitarian workers and policymakers to exercise reality checks
(what would the next disaster look like under these given
conditions?), and thus better prevent and respond to future disasters.
* *Bio*: Heng Ji is a professor at Computer Science Department, and an
affiliated faculty member at Electrical and Computer Engineering
Department and Coordinated Science Laboratory of University of
Illinois Urbana-Champaign. She is an Amazon Scholar. She is the
Founding Director of Amazon-Illinois Center on AI for Interactive
Conversational Experiences (AICE). She received her B.A. and M. A.
in Computational Linguistics from Tsinghua University, and her M.S.
and Ph.D. in Computer Science from New York University. Her research
interests focus on Natural Language Processing, especially on
Multimedia Multilingual Information Extraction, Knowledge-enhanced
Large Language Models, Knowledge-driven Generation and
Conversational AI. She was selected as a Young Scientist to attend
the 6th World Laureates Association Forum, and selected to
participate in DARPA AI Forward in 2023. She was selected as "Young
Scientist" and a member of the Global Future Council on the Future
of Computing by the World Economic Forum in 2016 and 2017. She was
named as part of Women Leaders of Conversational AI (Class of 2023)
by Project Voice. The awards she received include "AI's 10 to Watch"
Award by IEEE Intelligent Systems in 2013, NSF CAREER award in 2009,
PACLIC2012 Best paper runner-up, "Best of ICDM2013" paper award,
"Best of SDM2013" paper award, ACL2018 Best Demo paper nomination,
ACL2020 Best Demo Paper Award, NAACL2021 Best Demo Paper Award,
Google Research Award in 2009 and 2014, IBM Watson Faculty Award in
2012 and 2014 and Bosch Research Award in 2014-2018. She was invited
by the Secretary of the U.S. Air Force and AFRL to join Air Force
Data Analytics Expert Panel to inform the Air Force Strategy 2030,
and invited to speak at the Federal Information Integrity R&D
Interagency Working Group (IIRD IWG) briefing in 2023. She is the
lead of many multi-institution projects and tasks, including the
U.S. ARL projects on information fusion and knowledge networks
construction, DARPA ECOLE MIRACLE team, DARPA KAIROS RESIN team and
DARPA DEFT Tinker Bell team. She has coordinated the NIST TAC
Knowledge Base Population task since 2010-2021. She was the
associate editor for IEEE/ACM Transaction on Audio, Speech, and
Language Processing, and served as the Program Committee Co-Chair of
many conferences including NAACL-HLT2018 and AACL-IJCNLP2022. She is
elected as the North American Chapter of the Association for
Computational Linguistics (NAACL) secretary 2020-2023. Her research
has been widely supported by the U.S. government agencies (DARPA,
NSF, DoE, ARL, IARPA, AFRL, DHS) and industry (Amazon, Google,
Facebook, Bosch, IBM, Disney).
* *Upcoming webinars: * Smaranda Muresan (March 7, 2024)
* Ralf Schlüter (May 2, 2024)
* Marco Baroni (June 6, 2024)
Check past and upcoming webinars at the following url:
http://www.hitz.eus/webinars If you are interested in participating,
please complete this registration form:
http://www.hitz.eus/webinar_izenematea
If you cannot attend this seminar, but you want to be informed of the
following HiTZ webinars, please complete this registration form instead:
http://www.hitz.eus/webinar_info
Best wishes,
HiTZ Zentroa
P.S:HiTZ will not grant any type of certificate for attendance at these
webinars.
BIONLP 2024 and Shared Tasks @ ACL 2024
https://aclweb.org/aclwiki/BioNLP_Workshop
*Tentative* Important Dates
(All submission deadlines are 11:59 p.m. UTC-12:00 “anywhere on Earth”)
Paper submission deadline: May 17 (Friday), 2024
Notification of acceptance: June 17 (Monday), 2024
Camera-ready paper due: July 1 (Monday), 2024
Workshop: August 16, 2024, Location: LOTUS SUITE 12
Please watch for the updates!
SUBMISSION INSTRUCTIONS
-----------------------------------------
Two types of submissions are invited: full papers and short papers.
Full papers should not exceed eight (8) pages of text, plus unlimited
references. These are intended to be reports of original research. BioNLP
aims to be the forum for interesting, innovative, and promising work
involving biomedicine and language technology, whether or not yielding high
performance at the moment. This by no means precludes our interest in and
preference for mature results, strong performance, and thorough
evaluation. Both types of research and combinations thereof are
encouraged.
Short papers may consist of up to four (4) pages of content, plus unlimited
references. Appropriate short paper topics include preliminary results,
application notes, descriptions of work in progress, etc.
Electronic Submission
Submissions must be electronic and in PDF format, using the Softconf START
conference management system
Submissions need to be anonymous.
Submission site for the workshop https://softconf.com/acl2024/BioNLP2024
Please follow the ACL formatting guidelines:
https://github.com/acl-org/acl-style-files
Dual submission policy: papers may NOT be submitted to the BioNLP workshop
if they are or will be concurrently submitted to another meeting or
publication.
INVITED TALK
-----------------------------------------
Titipat Achakulvisut. Biomedical and Data (Bio-Data) lab at Mahidol
University
WORKSHOP OVERVIEW AND SCOPE
-----------------------------------------
The BioNLP workshop, associated with the ACL SIGBIOMED special interest
group, is an established primary venue for presenting research in language
processing and language understanding for the biological and medical
domains. The workshop has been running every year since 2002 and continues
getting stronger. Many other emerging biomedical and clinical language
processing workshops can afford to be more specialized because BioNLP truly
encompasses the breadth of the domain and brings together researchers in
bio- and clinical NLP from all over the world.
BioNLP 2024 will be particularly interested in transparency of the
generative approaches and factuality of the generated text. Language
processing that supports DEIA (Diversity, Equity, Inclusion and
Accessibility) is still of utmost importance. The work on detection and
mitigation of bias and misinformation continues to be of interest. Research
in languages other than English, particularly, under-represented languages,
and health disparities are always of interest to BioNLP. Other active areas
of research include, but are not limited to:
Tangible results of biomedical language processing applications;
Entity identification and normalization (linking) for a broad range of
semantic categories;
Extraction of complex relations and events;
Discourse analysis; Anaphora \& coreference resolution;
Text mining \& Literature based discovery;
Summarization;
Text simplification;
Question Answering;
Resources and strategies for system testing and evaluation;
Infrastructures and pre-trained language models for biomedical NLP;
Processing and annotation platforms;
Synthetic data generation \& data augmentation;
Translating NLP research into practice;
Getting reproducible results.
SHARED TASKS
-----------------------------------------
1. Clinical Text generation
Task 1: Radiology Report Generation
An important medical application of natural language generation (NLG) is to
build assistive systems that take X-ray images of a patient and generate a
textual report describing clinical observations in the images. This is a
clinically important task, offering the potential to reduce radiologists’
repetitive work and generally improve clinical communication. This shared
task is using the first large-scale collection of RRG datasets based on
MIMIC-CXR, CheXpert, PadChest and CANDID-PTX. Participants will need to
generate findings and impression from chest x-rays and will be evaluated on
a common leaderboard with recent proposed metrics such as F1-Radgraph and
RadCliQ. This shared task aims to benchmark recent progress using common
data splits and evaluation implementations.
See details at https://stanford-aimi.github.io/RRG24/
Task 2: Discharge Me!
The primary objective of this task is to reduce the time and effort
clinicians spend on writing detailed notes in the electronic health record
(EHR). Clinicians play a crucial role in documenting patient progress in
discharge summaries, but the creation of concise yet comprehensive hospital
course summaries and discharge instructions often demands a significant
amount of time, especially since these sections cannot be readily copied
from prior notes. This can lead to clinician burnout and operational
inefficiencies within hospital workflows. By streamlining the generation of
these sections, we can not only enhance the accuracy and completeness of
clinical documentation but also significantly reduce the time clinicians
spend on administrative tasks, ultimately improving patient care quality.
See details at https://stanford-aimi.github.io/discharge-me/
2. BioLaySumm
This shared task surrounds the abstractive summarization of biomedical
articles, with an emphasis on catering to non-expert audiences through the
generation of summaries that are more readable, containing more background
information and less technical terminology (i.e., a “lay summary”).
This is the 2nd iteration of BioLaySumm, following the success of the 1st
edition of the task at BioNLP 2023 which attracted 56 submissions across 20
different teams. In this edition, we aim to build on last year’s task by
introducing a new test set, updating our evaluation protocol, and
encouraging participants to explore novel approaches that will help to
further advance the state-of-the-art for Lay Summarization.
See details at https://biolaysumm.org/
Organizers
-----------------------------------------
* Dina Demner-Fushman, US National Library of Medicine
* Sophia Ananiadou, National Centre for Text Mining and University of
Manchester, UK
* Makoto Miwa, Toyota Technological Institute, Japan
* Kirk Roberts, UTHealth, Houston, Texas
* Jun-ichi Tsujii, National Institute of Advanced Industrial Science and
Technology, Japan
On behalf of the Association of Cyber Forensics and Threat
Investigators (ACFTI), I am pleased to invite you to the new
Cybersecurity stream lecture/seminar series.
The presentation is a maximum of 1 hour in length, with an audience of
about 60+, made up of undergraduate and postgraduate students plus
cybersecurity students from developing countries. Our goal is to shine
a spotlight on the broad array of new advances in cybersecurity
science and operations currently adopted in the industry. This session
will be conducted online. It will be fantastic to have any hands-on
topics related to cyber forensics.
Your discussion on this topic will be a great addition to our event.
Expressions of interest to present from anyone doing research or
applying cybersecurity techniques to practical or theoretical
applications related to the interactions between cyber forensics and
threat investigations can be sent as a summary of your work (c.200
words) to acfti (at) acfti (dot) org by February 15, 2024
Thank you in advance for your consideration, and we are very much
looking forward to hearing from you.
To get more news about our events, please join our low-traffic
announcement group @ https://groups.google.com/g/acfti
________________________________________________________
Association of Cyber Forensics and Threat Investigators
https://www.acfti.org
Twitter: @acfti
We are very pleased to share our second call for papers for our workshop on Reference, Framing, and Perspective co-located with LREC-COLING 2024.
Quick overview:
* Workshop website: https://cltl.github.io/reference-framing-perspective/
* When: Saturday, May 25th, 20204
* Where: Torino, Italy (co-located with LREC-COLING 2024)
* Deadline for submissions: February 20th, 2024
* Paper types: regular papers (short and long) and extended abstracts
* Paper submission link: https://softconf.com/lrec-coling2024/reference-framing-perspective2024/user…
* Deadline for camera-ready papers: March 29th, 2024
* Shared dataset: https://github.com/cltl/rfp_corpus_collection
When something happens in the world, we have access to an unlimited range of ways (from lexical choices to specific syntactic structures) to refer to the same real-world event. We can chose to express information explicitly or imply it. Variations in reference may convey radically different perspectives. This process of making reference to something by adopting a specific perspective is also known as framing. Although previous work in this area is present (see Ali and Hassan (2022)’s survey for an overview), there is a lack of a unitary framework and only few targeted datasets (Chen et al., 2019) and tools based on Large Language Models exist (Minnema et al., 2022). In this workshop, we propose to adopt Frame Semantics (Fillmore, 1968, 1985, 2006) as a unifying theoretical framework and analysis method to understand the choices made in linguistic references to events. The semantic frames (expressed by predicates and roles) we choose give rise to our understanding, or framing, of an event. We aim to bring together different research communities interested in lexical and syntactic variation, referential grounding, frame semantics, and perspectives. We believe that there is significant overlap within the goals and interests of these communities, but not necessarily the common ground to enable collaborative work.
Referentially Grounded Shared Dataset
One way to study variation in framing is to conduct contrastive analyses of texts reporting on the same real-world event. Such an analysis can help to reveal the extent of variation in framing and possibly give rise to the underlying factors that lead to different choices in framing the same event. We collected such a corpus about the Eurovision Song Festival and make it available as a Shared Dataset for the Workshop. The purpose of this corpus is to enable exploratory analyses, facilitate discussion among participants, and, last but not least, make our workshop a real working workshop.
The corpus is composed of news articles reporting on the Eurovision Song Contest that took place in Rotterdam, the Netherlands (canceled in 2020 and held in 2021). The news articles have been collected using the structured data-to-text approach (Vossen et al., 2018). The corpus contains news articles in multiple languages. We invite participants to submit short and targeted analyses using the data (extended abstracts to be discussed in a hands-on data session). Participants are also free to use the data in regular contributions.
Regular contributions:
We aim to lay the groundwork for such efforts. We invite contributions (regular long papers of 8 pages or short papers of 4 pages) targeting any of the following - non-exhaustive - list of topics:
* Theoretical models of framing and perspective
* Annotation frameworks for framing and perspectives
* Computational models of framing and perspective
* Approaches for creating and analyzing referentially grounded datasets (containing different perspectives, written at different points in time, written in different languages)
* Approaches for and analyses of texts about contested and divisive events triggering different opinions and perspectives
* Analyses of and methods for analyzing (diachronic) lexical variation and framing
* Language resources for reference, frames, and perspectives
* Approaches and tools to compare claims of sources
* Frames as expressions of bias in the representation of social groups
* User interface for the visualization of multiple perspectives
Extended abstracts:
We invite extended abstracts (1,500 words maximum) about small analyses or experiments conducted on our Shared Data. The abstracts will be non-archival and discussed in a dedicated data session.
Invited speakers:
Maria Antoniak
Vered Shwartz
Organizers:
Pia Sommerauer, Tommaso Caselli, Malvina Nissim, Levi Remijnse, Piek Vossen
Apologies for cross-posting.
---------------------------------------------------------------------------
The *9th Workshop on Representation Learning for NLP (RepL4NLP 2024)*,
co-located with ACL 2024 in Bangkok, Thailand, invites papers of a
theoretical or experimental nature describing recent advances in vector
space models of meaning, compositionality, and the application of deep
neural networks and spectral methods to NLP. We welcome submissions on
representations of text, as well as representations that are multi-modal,
cross-lingual, representations of symbolic languages, code, enriched with
external knowledge, or structure-informed (syntax, morphology, etc).
*Topics for the workshop will include, but are not limited to:*
- *Developing new representations*: at any level of granularity
(document to character) using supervised, unsupervised or semi-supervised
techniques for a multitude of tasks such as language modeling, similarity
search, clustering, etc.
- *Efficient learning of representations*: with respect to training and
inference time, model size, amount of training data, etc.
- *Evaluating representations*: with respect to training objectives (for
LLMs: next token prediction, RLHF, span-mask denoising, etc), types of test
data (e.g., text vs code), and architectures (decoder-only,
encoder-decoder, etc), as well as assessing representations for
generalization, compositionality, and robustness (e.g., adversarial), etc.
- *Representation analysis*: methods for visualizing, explaining, and
inspecting specific properties of representations (e.g., through probing),
enhancing their interpretability, investigating their influence on the
model's behavior, assessing the causal impact of interventions within the
representation space on the model's behavior, etc.
- *Relating representation to behavior*: whether, and to what extent, a
model’s representations cause, condition, or boost its behavior (e.g., for
LLMs: the relationship between encoded knowledge and task performance). Is
possessing good representations necessary or sufficient for solving a task?
Vice versa, is model behavior informative of its learned representations?
*Key Dates*
Direct paper submission deadline: May 17, 2024
ARR commitment deadline: June 1, 2024
Notification of acceptance: June 17, 2024
Camera-ready papers due: July 1, 2024
Workshop date: Aug 16, 2024
*Submissions*Papers may be long (maximum 8 pages plus references) or short
(maximum 4 pages plus references). We encourage authors to include a
broader impact and ethical concerns statement, following ARR Ethics Policy
from the main conference. Papers can be submitted directly via OpenReview.
*ACL 2023 fast-track submissions*Papers submitted to the ACL 2024 main
conference that have not been selected can be submitted to the RepL4NLP
2024 fast-track. We will then make a decision based on your reviews
received from ACL 2024. Note that you do not need to submit the reviews
received from ACL 2024.
*Website*
https://sites.google.com/view/repl4nlp2024/
*Organizers*
Chen Zhao, New York University Shanghai
Marius Mosbach, Saarland University
Pepa Atanasova, University of Copenhagen
Seraphina Goldfarb-Tarrent, Cohere
Peter Hase, University of North Carolina at Chapel Hill
Arian Hosseini, University of Montreal
Maha Elbayad, Meta AI
Sandro Pezzelle, University of Amsterdam
Maximilian Mozes, University College London
Dear colleagues
I would like to invite you to register and join us *online *or ** in person
** in Qatar Foundation Minaretein building Auditorium on Feb 19th 2024 as
of 9AM to attend two panels related to Data sciences, AI, LLM, NLP and
Social computing to discuss topics such as Hate Speech Detection, Fake news
Detection and Text analytics with an impressive lineup of International
speakers such as *Dr. Kareem Darwish *(AIXPLAIN), *Dr. Kiran
Garimella* (Rutgers
University), Tuğrulcan Elmas ( University of Edinburgh), Roy Lee Ka Wei
(Singapore University of Technology and Design) , Patrick Juola (Duquesne
University), Jiří Milička (Charles University) and David Kaufer (Carnegie
Mellon University).
I believe this public event will be useful for faculty, students,
researchers, staff etc. Please feel free to forward this invitation to your
colleagues and students. The event is organized within the second HBKU
MIDDLE EAST CONFERENCE 2024 Gender, Technology, and Digital Cultures in the
Middle East
Registration link (Free Registration)
https://app.micetribe.com/public/workspaces/chss/events/1282251051/forms/vi…
<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapp.micet…>
Full Program (Feb 18 and Feb 19)
https://www.hbku.edu.qa/en/mec2024/program
<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.hbku.…>
Best,
Wajdi Zaghouani
----
*Wajdi Zaghouani, Ph.D.*
*Associate Professor in Digital Humanities*
College of Humanities and Social Sciences
P.O. Box 34110 | Education City | Doha, Qatar
tel: +974 4454 5601 | mob: +974 33454992
wzaghouani(a)hbku.edu.qa| Office A141, LAS Building
================[Apologies for any cross-posting]================
**Special issue of the journal Traitement Automatique des Langues (TAL)
Abusive Language Detection : Linguistic Resources, Methods and
Applications **
**Guest Editors**
Farah Benamara (IRIT-Toulouse University, IPAL Singapore), Delphine
Battistelli (MoDyCo, Paris Nanterre University) and Viviana Patti (Turin
University)
**Motivations**
Abusive language - or, in another very common terminology, hate speech -
and the propagation of harmful stereotypes have unfortunately become
commonplace occurrences on various social media platforms, partly due to
users’ freedom and anonymity and the lack of regulation provided by
these platforms. The sheer volume and often implicit nature of such
unwanted content make manual moderation of these user spaces a
formidable task. Various scientific communities interested in its at
least partial automation have taken up the problem over the past ten
years. In particular, Computational Social Science, Natural Language
Processing and Computational Linguistics have proposed numerous works to
create resources, datasets, and models aimed at automating the task of
abusive language detection (henceforth ALD). In fact, we see that ALD
has become a research theme in its own right in the field of Natural
Language Processing with an abundant literature.
Abusive language (umbrella term to refer to the various forms of harmful
language, such as toxic, offensive language, hate speech, and
stereotypes) is topically focused and each specific manifestation of
abusive language targets different vulnerable groups based on
characteristics such as gender (misogyny, sexism), ethnicity, race,
religion (xenophobia, racism, Islamophobia), sexual orientation
(homophobia), and so on. Most automatic ALD approaches cast the problem
into a binary classification task but important considerations should be
taken into account, in particular: (1) the topical focus or the
target-oriented nature of hate speech ; (2) the degree of engagement of
users in abusive content (e.g., denunciation, approbation, reporting,
neutral attitude) ; (3) the question of stereotypes and dominant
ideologies ; (4) the question of linguistic strategies more particularly
linked or born with social networks (e.g., emoticons, hashtags).
Furthermore, most of the work (resources, classifiers) is developed for
English.
**Topics**
Motivated by the interest of the community in the problem of ALD, we
invite papers from Natural Language Processing, Machine Learning and
Computational Social Sciences. We explicitly encourage interdisciplinary
submissions (resources, computational methods, and user applications at
the interface of linguistics/psychology/socio-linguistics/sociology) but
also position papers on the actual state of the art in the field
discussing the limitations of the current approaches and directions for
future work. The topics covered by the special issue include, but are
not limited to:
-- Linguistic resources and evaluation: annotation schemes, corpus
linguistics studies, new datasets, with a particular interest in French
language and/or multilingual resources. In the case of strictly lexical
resources: methods for constituting them and coverage, semantic
categories retained.
-- Formal/Conceptual approaches for ALD as inspired by models in
sociology, socio-linguistics and psychology.
-- Models and Methods: supervised and unsupervised approaches, including
LLMs.
-- Role of contextual phenomena, including discourses, extra-linguistic
contexts (e.g., cultural aspects).
-- Models for cross-lingual and multimodal detection.
-- New approaches beyond binary classification: target-oriented ALD,
degrees of user engagement, etc.
-- Dynamics of online AL in social media, propaganda propagation.
-- Bias detection and removal in resource creation, datasets and methods.
-- Application of ALD tools in education, social media content
moderation, etc.
-- Social, legal, and ethical implications of detecting, monitoring and
moderating AL.
**Important dates**
May 31th, 2024: Submission deadline
July 15th, 2024: Notification of acceptance after first rereading
End of September 2024: Revised version
Mid October 2024: Final decision
End of November 2024: Camera ready
January 2025: Publication of the special issue
**Submission**
Submissions can either be in French or English and should follow the
journal templates: https://tal-65-3.sciencesconf.org/
**About the journal**
Traitement Automatiques des Langues Journal (TAL) is the international
French journal of Natural Language Processing
(https://www.atala.org/revuetal) published by ATALA (French Association
for Natural Language Processing, http://www.atala.org) since 1959 with
the support of CNRS (National Centre for Scientific Research). It is
indexed by ACL Anthology as well as DBLP. It is also supported by the
Institute of Human and Social Sciences of the CNRS.
**Contact**
For any question, please contact tal-65-3(a)sciencesconf.org
**External committee**
-- Cristina Bosco, University of Turin
-- Elena Cabrio, University of Côte d'Azur
-- Tommaso Caselli, Faculty of Arts, Rijksuniveristeit Groningen
-- Valentina Dragos, ONERA
-- Karën Fort, Sorbonne University
-- Claire Hugonnier, University of Grenoble Alpes
-- Irina Illina, University of Lorraine
-- Roy Ka-Wei Lee, Singapore University of Technology and Design
-- Véronique Moriceau, IRIT, University of Toulouse
-– Frédérique Segond, INRIA Paris
-- Mariona Taulé, University of Barcelona
-- Samuel Vernet, Aix-Marseille University
-- Mathieu Valette, Paris Sorbonne Nouvelle University
-- Marcos Zampieri, George Mason University
--
========================
Farah Benamara Zitoune
Professor in Computer Science, Université Paul Sabatier
IRIT-CNRS
118 Route de Narbonne, 31062, Toulouse.
Tel : +33 5 61 55 77 06
http://www.irit.fr/~Farah.Benamara
==================================