The 1st Workshop on Computational Terminology in NLP and Translation
Studies (ConTeNTs)
Varna, 7th-8th September, 2023
In conjunction with RANLP 2023 – International Conference “Recent
Advances in Natural Language Processing”
Final call for papers
Computational Terminology and new technologies applied to translation
studies have attracted the interest of researchers with very different
multidisciplinary backgrounds and motivations. Those fields cover a
range of areas in Natural Language Processing (NLP) such as information
retrieval, terminology extraction, question-answering systems, ontology
building, machine translation, computer-aided translation, automatic or
semi-automatic abstracting, text generation, etc.
Terminological identification, extraction and coinage of new terms are
essential for knowledge mining from texts, both in high and low
resources languages. Quick evolutions and new developments in
specialised domains require efficient and systematic automatic term
management. New terms need to be coined and translated to ensure the
equitable development of domains in all languages.
During the last decade, deep learning and neural methods have become the
state of the art for most NLP applications. Those applications were
shown to outperform previous methods on various tasks, including
automatic term extraction, language mining, assessment of quality in
machine translation, accessibility of terminology, etc. On the one hand,
NLP and computational linguistics try to improve the work of translators
and interpreters by developing Computer-Assisted Translation (CAT)
tools, Translation Memories (TMs), terminological databases and
terminology extraction tools, etc. On the other hand, the NLP field
still needs the efforts and knowledge of translators, interpreters and
linguists to provide better services and tools based on the real
necessities of those language professionals.
The aim of this workshop is to promote new insights into the ongoing and
forthcoming developments in computational terminology by bringing
together NLP experts, as well as terminologists and translators. By
uniting researchers with such diverse profiles, we hope to bridge some
of the gaps between these disciplines and inspire a dialogue between
various parties, thus paving the way to more artificial intelligence
applications based on mutual collaboration between language and
technology.
Topics of Interest
The ConTeNTs workshop invites the submission of papers reporting on
original and unpublished research on topics related to Computational
Terminology in NLP and Translation Studies, including but not limited
to:
- Automatic term extraction: monolingual and multilingual extraction of
terms from parallel and comparable corpora, including single and
multiword expressions;
- Extraction and acquisition of semantic relations between terms;
- Extraction and generation of domain specific definitions and
disambiguation of terms;
- Representation of terms, management of term variation and the
discovery of synonym terms or term clusters and its relation to NLP
applications;
- Extraction of terminological context, through the use of comparable
and parallel corpus;
- Accessibility of terminology in certain domains, relevant to
non-experts or to laypersons, and its relevance to NLP applications such
as, chatbots, automatic email generation or spoken language interface;
- The impact of terminology on MT (applying terminology constraints,
evaluation of MT in domain-specific settings, etc.);
- The creation of domain ontologies, thesaurus, terminological resources
in specialised domains;
- The use of new technologies in translation studies and research and
the use of terminological resources in specialised translation;
- Identification of key problems in terminology and new technologies
used in translation studies;
- Evaluation of terminological resources in various NLP applications and
the impact of these resources have on the performance of the automatic
systems;
- Emerging language technologies: how the increased reliance on
real-time language technologies would change the structure of language;
- Corpus based studies applied to translation and interpreting: the use
of parallel and comparable corpora for translating phraseological units;
- Phraseology and multiword expressions in cross-linguistic studies;
- Translation and interpreting tools, such as translation memories,
machine translation and alignment tools;
- User requirements for interpreting and translation tools.
Submission Guidelines
Submissions must consist of full-text papers and should not exceed 7
pages excluding references, they should be a minimum of 5 pages long.
The accepted papers will be published as ConTeNTs workshop e-proceedings
with ISBN, will be assigned a DOI and will be also available at the time
of the conference. The papers should be in English.
Authors of accepted papers will receive guidelines regarding how to
produce camera-ready versions of their papers for inclusion in the
proceedings.
Each submission will be reviewed by at least two programme committee
members. Accepted papers will be presented orally as part of the
programme of the workshop.
Submissions
Link to START system: https://softconf.com/ranlp23/ConTeNTS
Website of the workshop: https://contents2023.kulak.kuleuven.be/
Should you require any assistance with the submission, please do not
hesitate to contact us at amalhaddad(a)ugr.es and
ayla.rigoutsterryn(a)kuleuven.be.
Important Dates
Deadline for paper submission: 15 August 2023
Workshop camera-ready proceedings ready: 31 August 2023
ConTeNTs workshop: 7/8 September 2023
Workshop Chairs & Organising Committee
Ayla Rigouts Terryn, Katholieke Universiteit Leuven, Belgium
Amal Haddad Haddad, Universidad de Granada, Spain
Ruslan Mitkov, University of Wolverhampton, United Kingdom
Programme Committee
- Sophia Ananiadou (University of Manchester)
- Maria Andreeva Todorova (Bulgarian Academy of Sciences)
- Silvia Bernardini (University of Bologna)
- Melania Cabezas García (Universidad de Granada)
- Rute Costa (Universidade Nova de Lisboa)
- Esther Castillo Pérez (Universidad de Granada)
- Patrick Drouin (Université de Montréal)
- Pamela Faber (Universidad de Granada)
- Mercedes García de Quesada (Universidad de Granada)
- Dagmar Gromann (Centre for Translation Studies – University of Vienna)
- Tran Thi Hong Hanh (L3i Laboratory, University of La Rochelle)
- Rejwanul Haque (National College of Ireland)
- Amir Hazem (Nantes University)
- Kyo Kageura (University of Tokyo)
- Barbara Karsch (BIK Terminology – USA)
- Dorothy Kenny (Dublin City University)
- Miloš Jakubíček (Sketch Engine)
- Hendrik Kockaert (KU Leuven)
- Philipp Koehn (Johns Hopkins University)
- Maria Kunilovskaya (Saarland University)
- Marie-Claude L’Homme (Université de Montréal)
- Hélène Ledouble (Université de Toulon)
- Pilar León-Araúz (Universidad de Granada)
- Rodolfo Maslias (former Head of TermCoord, European Parliament)
- Silvia Montero Martínez (Universidad de Granada)
- Emmanuel Morin (LS2N-TALN)
- Rogelio Nazar (Pontificia Universidad Católica de Valparaíso)
- Sandrine Peraldi (University College Dublin)
- Silvia Piccini (Italian National Research Council)
- Thierry Poibeau (CNRS)
- Senja Pollak (Jožef Stefan Institute)
- Maria Pozzi Pardo (El Colegio de México)
- Tharindu Ranasinghe (Aston University)
- Arianne Reimerink (Universidad de Granada)
- Andres Repar (Jožef Stefan Institute)
- Christophe Roche (Université Savoie Mont-Blanc)
- Antonio San Martín Pizarro (Université du Québec à Trois-Rivières)
- Beatriz Sánchez Cárdenas (Universidad de Granada)
- Vilelmini Sosoni (Ionian University)
- Irena Spasic (Cardiff University)
- Elena Isabelle Tamba (Romanian Academy, Iași Branch)
- Rita Temmerman (Vrije Universiteit Brussel)
- Jorge Vivaldi Palatresi (Universitat Pompeu Fabra)
International workshop
NLP for translation and interpreting applications (NLP4TIA)
Varna, Bulgaria, 8 September 2023
https://nlp4tia.web.uah.es/
Last Call for Papers
***Extended deadline: 10 August 2023***
In the last two decades, we have been able to witness a technological turn in translation and interpreting studies with Natural Language Processing (NLP) and deep learning playing more and more prominent part. There is already a growing number of NLP applications that are used to support the work of translators and interpreters. In addition, the recent advances in (and latest models of) deep learning have powered the further development and success of high performing Neural Machine Translation (NMT) systems.
Translation technology has revolutionised the translation profession and nowadays most professional translators employ tools such as translation memory (TM) systems in their daily work. Latest advances of Neural Machine Translation (NMT) have resulted in NMT not only becoming an integral part of most state-of-the art TM tools but also typical for the translation workflow of many companies, organisations and freelance translators.
Although translation has benefited more from technological advances, interpreting has also experienced a technological turn. However, it has not been until some years ago that soft technology has permeated interpreting practice and research. Computer assisted translation, MT and NLP tools have been adapted to be used by interpreters. In addition, corpus-based studies have also underpinned dialogue interpreting.
The increasing interest in NLP, MT and the automation of processes has brought us to multidisciplinary projects that deal with the development of models for automated oral communication. Machine interpreting has already been developed and is being improved, focusing on speed and accuracy matters. Either domain-specific (commercial, military, humanitarian) or general (Skype Translator), there is still a long way to go to render machine interpreting more human-like.
Many of the above recent developments have to do with the employment of Natural Language Processing tools and resources to support the work of translators and interpreters. This workshop is expected to discuss the growing importance of NLP in different translation and interpreting scenarios.
Workshop topics
The workshop invites submissions reporting original unpublished work on topics including but not limited to:
* NLP and MT for under-resourced languages;
* Translation Memory systems;
* NLP and MT for translation memory systems;
* NLP for CAT and CAI tools;
* Integration of NLP tools in remote interpreting platforms;
* NLP for dialogue interpreting;
* Development of NLP based applications for communication in public service settings (healthcare, education, law, emergency services);
* Corpus-based studies applied to translation and interpreting.;
* Machine translation and machine interpreting;
* Resources for translation and machine translation;
* Resources for interpreting and interpreting technology application;
* Quality estimation of human and machine translation;
* Post-editing strategies and tools;
* Automatic post-editing of MT;
* NLP and MT for subtitling.
* Technology acceptance by interpreters and translations;
* Machine Translation and translation tools for literary texts;
* Evaluation of machine translation and translation and interpreting tools in general;
* The impact of the technological turn in translation and interpreting;
* Cognitive effort and eye-tracking experiments in translation and interpreting;
* Development of models for research and practice of translation and interpreting;
* Multidisciplinary cooperation in NLP applied to translation and interpreting.
Submissions and publication
Submissions must consist of full-text papers and should not exceed 7 pages excluding references, they should be a minimum of 5 pages long. The accepted papers will be published as NLP4TIA workshop e-proceedings with ISBN, will be assigned a DOI and will be also available at the time of the conference. The papers should be in English and should be submitted via the conference management system START using this link<https://softconf.com/ranlp23/NLP4TIA/>.
Authors of accepted papers will receive guidelines regarding how to produce camera-ready versions of their papers for inclusion in the proceedings.
Each submission will be reviewed by at least two programme committee members. Accepted papers will be presented orally as part of the programme of the workshop.
Submissions should be compliant with the below templates and should be uploaded as pdf files in START (START is configured to accept pdf files only).
The following templates should be used: LaTeX at Overleaf<https://www.overleaf.com/latex/templates/instructions-for-ranlp-2023-procee…>, LaTeX<http://ranlp.org/ranlp2023/Templates/ranlp2023-LaTeX.zip> , MS Office<http://ranlp.org/ranlp2023/Templates/ranlp2023-word.docx>
Important dates
Deadline for paper submission: 23 July 2023
Deadline for paper submission (extended): 10 August 2023
Acceptance notification: 20 August 2023
Final camera-ready version: 30 August 2023
Workshop camera-ready proceedings ready: 3 September 2023
NLP4TIA workshop: 8 September 2023
Workshop Chairs
Raquel Lázaro Gutiérrez (Universidad de Alcalá)
Antonio Pareja Lora (Universidad de Alcalá)
Ruslan Mitkov (Lancaster University)
Programme Committee
Cristina Aranda (Big Onion)
Juanjo Arevalillo (Hermes Traducciones)
Silvia Bernardini (University of Bologna)
Gabriel Cabrera Méndez (Dualia Teletraducciones)
Matt Coler (University of Groningen)
Gloria Corpas Pastor (University of Malaga)
Elena Davitti (University of Surrey)
Joanna Drugan (Heriot-Watt University)
Marie Escribe (LanguageWire)
Claudio Fantinuoli (Mainz University/KUDO Inc)
Antonio García Cabot (Universidad de Alcalá)
Adriana Jaime Pérez (Migralingua Voze)
Miguel Ángel Jiménez Crespo (Rutgers University)
Óscar Luis Jiménez Serrano (University of Granada)
Koen Kerremans (Free University Brussel)
Maria Kunilovskaya (Saarland University)
Els Lefever (Ghent University)
Pilar León Arauz (University of Granada)
Johanna Monti (University of Naples L'Orientale)
Elena Montiel Ponsoda (Polytechnic University of Madrid)
Helena Moriz (University of Lisbon)
Elena Murgolo (Orbital 14)
Dora Murgu (Interprefy)
Constantin Orasan (University of Surrey)
María Teresa Ortego Antón (University of Valladolid)
Tharindu Ranasinghe (Aston University)
Celia Rico (Universidad Complutense de Madrid)
Caroline Rossi (University Grenoble les Alpes)
María del Mar Sánchez Ramos (Universidad de Alcalá)
Miriam Seghiri (University of Malaga)
Vilelmini Sosoni (Ionian University)
Rui Manuel Sousa Silva (University of Porto)
Nicoletta Spinolo (University of Bologna)
Venue
The workshop will take place at hotel Cherno More<https://www.chernomorebg.com/en/> in Varna.
Further information and contact details
Registration for NLP4TIA is now open and is done via the RANLP main conference page. To register, please complete the registration form<https://url6.mailanyone.net/scanner?m=1pii0v-000B6E-3x&d=4%7Cmail%2F14%2F16…>.
The conference website (https://nlp4tia.web.uah.es/) will be updated on a regular basis. For further information, please email raquel.lazaro(a)uah.es<mailto:raquel.lazaro@uah.es>.
DLinNLP 2023 - Deep Learning Summer School at RANLP 2023
Call for Participation
Varna, Bulgaria
30th August - 1st September
https://dlinnlp2023.github.io/
We invite everyone interested in Machine Learning and Natural Language Processing to attend the Deep Learning Summer School at 14th biennial RANLP conference (RANLP 2023).
Purpose:
Deep Learning is a branch of machine learning that has gained significant traction in the field of Artificial Intelligence, pushing the envelope in the state-of-the-art, with many sub-areas including natural language, image, and speech processing employing it widely in their best-performing models.
This summer school will feature presentations from outstanding researchers in the field of Natural Language Processing (NLP) and Deep Learning. These will include coverage of recent advances in theoretical foundations and extensive practical coding sessions showcasing the latest relevant technology.
The summer school would be of interest to novices and established practitioners in the fields of NLP, corpus linguistics, language technologies, and similar related areas.
Important Dates:
30 August - 1 September: Deep Learning Summer School in NLP
Lectures:
* Lucas Beyer (Google Brain)
* Tharindu Ranasinghe (Aston University, UK)
* Iacer Calixto (University of Amsterdam, Holland)
Practical Sessions:
* Damith Premasiri (practical sessions) (University of Wolverhampton, UK)
* Isuri Anuradha (practical sessions) (University of Wolverhampton, UK)
* Anthony Hughes (practical sessions) (University of Wolverhampton, UK)
Registration:
**** Registration is now open: ******
https://ranlp.org/ranlp2023/index.php/fees-registration/
Programme:
Please refer to the website for the details of the programme:
https://dlinnlp2023.github.io/#programme
Contact Email: dlinnlp2023(a)gmail.com<mailto:dlinnlp2023@gmail.com>
Apologies for cross-posting.
----------------------------------------
The International Conference on Spoken Language Translation (IWSLT)
<https://iwslt.org/>is the premier annual conference for all aspects of
Spoken Language Translation. Every year, the conference organizes and
sponsors open evaluation campaigns around key challenges in simultaneous
and consecutive translation, under real-time/low latency or offline
conditions, and for a variety of languages in under-resourced or
multilingual conditions. System descriptions and results from participants’
systems and scientific papers related to key algorithmic advances and best
practices are presented.
IWSLT is the venue of the SIGSLT, the Special Interest Group on Spoken
Language Translation of ACL, ISCA, and ELRA. With a track record of 20
years, IWSLT benchmarks and proceedings serve as a reference for all
researchers and practitioners working on speech translation and related
fields. 2024 will mark IWSLT’s 21st edition.
There are many challenges in speech translation that have not yet been
addressed, among them, we are really interested in topics related to new
application scenarios (e.g. meetings, subtitling, dubbing), specific
aspects (e.g. names, accents), different styles, multilingually, discourse
and summarization, multimodal and multi-party speech translation or many
other ideas that researchers have not yet focused on. Therefore, we
invite *proposals
for shared tasks. *For more details about this initiative, please refer to
https://iwslt.org/assets/pdfs/IWSLT2024-Call_for_Tasks.pdf
If you want to propose a new task to encourage researchers around the world
to work on particular timely challenges in SLT, please fill out the
following form <https://iwslt.org/assets/pdfs/IWSLT2024-Call_for_Tasks.pdf>and
*send it to <https://groups.google.com/>*iwslt-organizers(a)googlegroups.com * by
August 31st, 2023.*
Best,
Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
IWSLT Organisers
Application Deadline: 30 August 2023
Details
This project has a specific focus in managing the single greatest threat to global health, the increasing burden from infections caused by bacteria that are resistant to antibiotics (antimicrobial resistance, AMR). Doctors (humans) can’t reliably know which antibiotic to administer in an emergency. In fact, based on our earlier research they get it wrong about 20% of the time. A serious bacterial infection will look the same whether the bacteria causing the infection are resistant to certain antibiotics or not, and the first antibiotic must be selected on very limited information and be given the first hour of admission to hospital if there is a risk they have developed an infection that is spreading through their body. Understandably, this ‘high stakes’ uncertainty promotes the use of ‘broad-spectrum’ antibiotics which should be held in reserve for known drug-resistant infections.
Natural language processing (NLP) has the potential to safely unlock successful antimicrobial stewardship for AMR at the first dose. In earlier work, we used quantitative and categorical data from electronic health records (EHRs) from patients who needed emergency hospital admission to see which antibiotics were given in the emergency room, how often a patient was prescribed an antibiotic that their bacterial infection was resistant to (under-prescribing), and how often a broad-spectrum antibiotic was used when another antibiotic alternative would have been equally effective (over-prescribing). We trained a machine learning algorithm that was allowed to under-prescribe at the same rate as doctors (about 20% of the time), that could also reduce the use of broad-spectrum antibiotics by about 40% by anticipation of which patients were unlikely to have an AMR infection. This powerful proof-of-concept work shows the huge potential for AI in personalised medicine and antimicrobial stewardship at the first and most important dose. Taking the next steps in AI for AMR. We know that a lot of important information is held in free text clinician notes that aren’t reflected in the data we used to build the model, and want to understand what valuable information contained in the free text data would help improve prediction accuracy.
This project aims to analyse free-text clinician notes to retrieve valuable information that can improve the prescribing of antibiotics by more accurately predicting an individual patient’s risk of having an antibiotic-resistant infection. We are seeking a motivated student to undertake a 4 year funded PhD, in collaboration with Shionogi, a pharmaceutical company with offices in London.
Eligiblity
The successful candidate will hold a bachelor’s degree (or above) in Computer Science, Physics, Mathematics, Psychology or related discipline and have proven experience in computational linguistics, natural language processing, machine learning. Previous experience of applying AI methods to the medical domain is a strong advantage. Furthermore, the candidate will have strong programming skills, expertise in machine learning approaches and be excited be the challenges of interdisciplinary research between medicine and computer science. We want our PhD student cohorts to reflect our diverse society. UoB is therefore committed to widening the diversity of our PhD student cohorts. UoB studentships are open to all and we particularly welcome applications from under-represented groups, including, but not limited to BAME, disabled and neuro-diverse candidates. We also welcome applications for part-time study.
The University of Birmingham works closely with University Hospitals Birmingham NHS Foundation Trust (UHB), which is the single-largest Acute NHS Trust in the UK, and serves the healthcare needs of over 1.2m people in the second-largest city in the UK. PIONEER, the Health Data Research Hub for Acute Care, alone includes >1.2m patient episodes per year with >10yrs longitudinal health data. This experienced collaboration means we are uniquely positioned to develop, model and then later embed AI-supported antimicrobial stewardship within a clinical trial and electronic prescribing systems. The student will be located at the Institute of Microbiology and Infection (IMI) of the University of Birmingham, the largest academic research institute in the field of microbiology and infectious diseases in the United Kingdom. The IMI is part of the School of Medical and Dental Sciences, defining the future of health and medicine through the provision of innovative education and exceptional research.
Throughout the PhD project, regular meetings with industry partner colleagues at Shionogi will be held to monitor progression and support the student in their research. About Shionogi Established in Japan 140 years ago, Shionogi has a history of drug discovery and scientific rigour in addressing some of the toughest challenges in healthcare. Shionogi’s work in antimicrobial resistance (AMR) is a key part of our contribution to the UN Sustainable Development Goals (SDGs) - we invest the highest proportion of our pharmaceutical revenues in relevant anti-infectives R&D compared to other large pharmaceutical companies. Shionogi announced the first-ever licence agreement for an antibiotic to treat serious bacterial infections between a pharmaceutical company and a non-profit organisation driven by public health priorities. Working with the Global Antibiotic Research and Development Partnership (GARDP) and the Clinton Health Access Initiative (CHAI), the agreement aims to provide 135 countries with access. At Shionogi, our belief is that sustainable growth hinges not only on new drug creation, but also on consolidating our strengths in areas of strategic focus. Through external partnerships, we seek to bring benefits to more patients through collaboration in areas where it would be difficult for us to go it alone. Globally, the number of our partners, including partnerships across a range of industries, including academia, enables us to accelerate innovation to better help societies manage some of the most important public health threats and to take on areas where the unmet clinical need is greatest.
Funding Notes
The position offered is for three and a half years full-time study. The current (2023-24) value of the award is stipend; £18,622 pa; tuition fee: £4,712 pa. Awards are usually incremented on 1 October each following year. The package includes a Macbook Air and funding for additional training and conference attendance.
References
Moran E, Robinson E, Green C, Keeling M, Collyer B. Towards personalized guidelines: using machine-learning algorithms to guide antimicrobial selection. J Antimicrob Chemother. 2020. doi:10.1093/jac/dkaa222
Cavallaro M, Moran E, Collyer B, McCarthy ND, Green C, Keeling MJ. Informing antimicrobial stewardship with explainable AI. bioRxiv. 2022. doi:10.1101/2022.08.12.22278678
https://www.findaphd.com/phds/project/natural-language-processing-of-electr…
With best regards,
Mark Lee
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham
www.cs.bham.ac.uk/~mgl<http://www.cs.bham.ac.uk/~mgl>
15th meeting of /Forum for Information Retrieval Evaluation* HASOC-2023*/
We are excited to announce the 5th edition of HASOC, consisting of four
interesting shared tasks. We invite you to participate.
*Task 1 focus on identifying hate speech, offensive language, and
profanity in different languages using natural language processing
techniques.*
* Task 1A deals with identifying hate and offensive content in
Sinhala, a low-resource Indo-Aryan language spoken in Sri Lanka. The
task involves classifying tweets into Hate and Offensive (HOF) or
Non-Hate and Offensive (NOT). The dataset for this task is based on
the Sinhala Offensive Language Detection dataset.
* Task 1B focuses on identifying hate and offensive content in
Gujarati, another low-resource Indo-Aryan language spoken by
approximately 50 million people in India. Similarly, participants
need to classify tweets into HOF or NOT categories. The training set
for this task consists of around 200 tweets.
For more details, please visit task 1 page
<https://hasocfire.github.io/hasoc/2023/task1.html>.
*Task 2, Identification of Conversational Hate-Speech in Code-Mixed
Languages (ICHCL), addresses the challenge of identifying hate speech
and offensive content in code-mixed conversations on social media.
Code-mixed text includes multiple languages within a single
conversation. The task is divided into two subtasks.*
* In Task 2a, participants need to perform binary classification on
conversational tweets with tree-structured data. They must determine
whether a tweet, comment, or reply contains hate speech, offensive
language, or profanity (HOF) or is non-hate and offensive (NOT). The
classification should consider both the individual content and
support for hate expressed in the parent tweet.
* Task 2b involves the classification of conversational tweets with
tree-structured data into specific forms of hate. Participants must
identify if the tweet, comment, or reply contains standalone hate
(SHOF), contextual hate (CHOF) that supports hate expressed in the
parent, or if it is non-hate (NONE).
For more details, please visit Task 2 webpage.
<https://hasocfire.github.io/hasoc/2023/ichcl.html>
*Task 3 aims to detect hateful spans within a sentence already
considered hateful. A hate span is a set of continuous tokens that, in
tandem, communicate the explicit hatefulness in a sentence.*
* For instance, in the statement, "Women ... Can't live with them...
Can't shoot them," the portion highlighted in bold will be
considered a hateful span. This shared task aims to extract all such
spans from a hateful text.
* The input texts are all in English. The detection of hateful spans
is achieved by mapping this into a sequence labeling problem. For
every token of the sequences, we have manually annotated the start
and end of a hateful span. This is achieved by the BIO notation
tagging, where B' represents the beginning of the hate span,' I'
forms the continuation of a hate span, and' O' represents the
non-hate tag. The task is then to learn the correct sequence of the
BIO tags for a given sentence. For example, in the above sentence,
the tag sequence for the preprocessed sentence will be of the form
"women can't live with them can't shoot them" → "O O O O O B I I";
"I" notation cannot exist on its own and will always be preceded by
either an "I" or "B". Consequently, a “B” notation can be
immediately followed by an “O” in case the span is just a single word.
For more details, please visit Task 3 webpage.
<https://lcs2.in/hatenorm-2023/>
*Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese
languages. It is a binary classification task. Each dataset (for the
three languages) consists of a list of sentences with their
corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is
primarily collected from Twitter, Facebook, and Youtube comments.
*
The Macro F1 score will be the yardstick of the task. Team rank will be
determined based on the Macro F1 score of the first part.
For more details, please visit Task 4 webpage.
<https://sites.google.com/view/hasoc-2023-annihilate-hates/home>
Registration for all four tasks is open on our registration page.
<https://hasocfire.github.io/hasoc/2023/registration.html>
We believe that your expertise and contribution will be invaluable in
advancing the state-of-the-art hate speech classification. We encourage
you to participate in this exciting shared task and contribute to the
research community.
Regards,
HASOC organizing team
Hi everyone
City, University of London are looking for postdoc NLP research fellow to work on the VISION - Violence, Health and Society project, to develop methods for extraction of information on violence from public sector records. Details below. Please circulate to anyone you think might be interested.
https://www.city.ac.uk/about/jobs/apply/details.html?nPostingId=1579&nPosti…
best wishes
--
Angus
> [Apologies for cross-posting]
> ======================================================================
> EXTENDED DEADLINE TO **JULY 31**
> ======================================================================
>
> SIMBig 2023 - 10th International Conference on Information Management and Big Data
> Where: Instituto Politécnico Nacional, Mexico DF, MEXICO
> When: October 18 - 20, 2023
> Website: https://simbig.org/SIMBig2023/
>
> ======================================================================
>
> OVERVIEW
> ----------------------------------
>
> SIMBig 2023 seeks to present new methods of Artificial Intelligence (AI), Data Science, Machine Learning, Natural Language Processing, Semantic Web, and related fields, for analyzing, managing, and extracting insights and patterns from large volumes of data.
>
>
> KEYNOTE SPEAKERS (to be confirmed)
> ----------------------------------
>
 Mona Diab, Meta AI, USA
 Huan Liu, Arizona State University, USA
>
> and more to be announced soon...
>
> IMPORTANT DATES
> ----------------------------------
>
> July 24, 2023 July 31, 2023 --> Full papers and short papers due
> August 28, 2023 --> Notification of acceptance
> September 10, 2023 --> Camera-ready versions
> October 18 - 20, 2023 --> Conference held in Mexico DF, Mexico
>
> PUBLICATION
> ----------------------------------
>
> All accepted papers of SIMBig 2023 (tracks including) will be published with Springer CCIS Series <https://www.springer.com/series/7899> (to be confirmed).
>
> Best papers of SIMBig 2023 (tracks including) will be selected to submit an extension to be published in the Springer SN Computer Science Journal. <https://www.springer.com/journal/42979>
 
> TOPICS OF INTEREST
> ----------------------------------
>
> SIMBig 2023 has a broad scope. We invite contributions on theory and practice, including but not limited to the following technical areas:
>
> Artificial Intelligence
> Big/Masive Data
> Data Science
> Machine Learning
> Deep Learning
> Natural Language Processing
> Semantic Web
> Data-driven Software Engineering
> Data-driven software adaptation
> Healthcare Informatics
> Biomedical Informatics
> Data Privacy and Security
> Information Retrieval
> Ontologies and Knowledge Representation
> Social Networks and Social Web
> Information Visualization
> OLAP and Business intelligence
> Crowdsourcing
>
> SPECIAL TRACKS
> ----------------------------------
>
> SIMBig 2023 proposes six special tracks in addition to the main conference:
>
> ANLP <https://simbig.org/SIMBig2023/en/anlp.html> - Applied Natural Language Processing
> DISE <https://simbig.org/SIMBig2023/en/dise.html> - Data-Driven Software Engineering
> EE-AI-HPC <https://simbig.org/SIMBig2023/en/eeaihpc.html> - Efficiency Enhancement for AI and High-Performance Computing
> SNMAM <https://simbig.org/SIMBig2023/en/snmam.html> - Social Network and Media Analysis and Mining
>
> CONTACT
> ----------------------------------
>
> SIMBig 2023 General Chairs
>
> Juan Antonio Lossio-Ventura, National Institutes of Health, USA (juan.lossio(a)nih.gov <mailto:juan.lossio@nih.gov>)
> Hugo Alatrista-Salas, Pontificia Universidad Católica del Perú, Peru (halatrista(a)pucp.pe <mailto:halatrista@pucp.pe>)
15th meeting of Forum for Information Retrieval Evaluation HASOC-2023
We are excited to announce the 5th edition of HASOC, consisting of four interesting shared tasks. We invite you to participate.
Task 1 focuses on identifying hate speech, offensive language, and profanity in different languages using natural language processing techniques.
* Task 1A is identifying hate and offensive content in Sinhala, a low-resource Indo-Aryan language spoken mainly in Sri Lanka. The task involves classifying tweets into Hate and Offensive (HOF) or Non-Hate and Offensive (NOT). The training set for this task is based on the Sinhala Offensive Language Detection dataset, which contains 10,000 tweets.
* Task 1B focuses on identifying hate and offensive content in Gujarati, another low-resource Indo-Aryan language spoken by approximately 50 million people in India. Similarly, participants need to classify tweets into HOF or NOT categories. The training set for this task consists of around 200 tweets.
For more details, please visit task 1 page<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>.
Task 2, Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL), addresses the challenge of identifying hate speech and offensive content in code-mixed conversations on social media. Code-mixed text includes multiple languages within a single conversation. The task is divided into two subtasks.
* In Task 2a, participants need to perform binary classification on conversational tweets with tree-structured data. They must determine whether a tweet, comment, or reply contains hate speech, offensive language, or profanity (HOF) or is non-hate and offensive (NOT). The classification should consider both the individual content and support for hate expressed in the parent tweet.
* Task 2b involves the classification of conversational tweets with tree-structured data into specific forms of hate. Participants must identify if the tweet, comment, or reply contains standalone hate (SHOF), contextual hate (CHOF) that supports hate expressed in the parent, or if it is non-hate (NONE).
For more details, please visit Task 2 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Task 3 aims to detect hateful spans within a sentence already considered hateful. A hate span is a set of continuous tokens that, in tandem, communicate the explicit hatefulness in a sentence.
* For instance, in the statement, "Women ... Can't live with them... Can't shoot them," the portion highlighted in bold will be considered a hateful span. This shared task aims to extract all such spans from a hateful text.
* The input texts are all in English. The detection of hateful spans is achieved by mapping this into a sequence labeling problem. For every token of the sequences, we have manually annotated the start and end of a hateful span. This is achieved by the BIO notation tagging, where B' represents the beginning of the hate span,' I' forms the continuation of a hate span, and' O' represents the non-hate tag. The task is then to learn the correct sequence of the BIO tags for a given sentence. For example, in the above sentence, the tag sequence for the preprocessed sentence will be of the form "women can't live with them can't shoot them" → "O O O O O B I I"; "I" notation cannot exist on its own and will always be preceded by either an "I" or "B". Consequently, a “B” notation can be immediately followed by an “O” in case the span is just a single word.
For more details, please visit Task 3 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese languages. It is a binary classification task. Each dataset (for the three languages) consists of a list of sentences with their corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is primarily collected from Twitter, Facebook, and Youtube comments.
The Macro F1 score will be the yardstick of the task. Team rank will be determined based on the Macro F1 score of the first part.
For more details, please visit Task 4 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Registration for all four tasks is open on our registration page.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
We believe that your expertise and contribution will be invaluable in advancing the state-of-the-art hate speech classification. We encourage you to participate in this exciting shared task and contribute to the research community.
Regards,
HASOC organizing team