[Corpora-List] HASOC 2023 tasks - Call for Participation - Hate Speech and Offensive Content Identification

26 Jul 2023


      15th meeting of /Forum for Information Retrieval Evaluation* HASOC-2023*/
We are excited to announce the 5th edition of HASOC, consisting of four
interesting shared tasks. We invite you to participate.
*Task 1 focus on identifying hate speech, offensive language, and
profanity in different languages using natural language processing
techniques.*
* Task 1A deals with identifying hate and offensive content in
    Sinhala, a low-resource Indo-Aryan language spoken in Sri Lanka. The
    task involves classifying tweets into Hate and Offensive (HOF) or
    Non-Hate and Offensive (NOT). The dataset for this task is based on
    the Sinhala Offensive Language Detection dataset.
  * Task 1B focuses on identifying hate and offensive content in
    Gujarati, another low-resource Indo-Aryan language spoken by
    approximately 50 million people in India. Similarly, participants
    need to classify tweets into HOF or NOT categories. The training set
    for this task consists of around 200 tweets.
For more details, please visit task 1 page
https://hasocfire.github.io/hasoc/2023/task1.html.
*Task 2, Identification of Conversational Hate-Speech in Code-Mixed
Languages (ICHCL), addresses the challenge of identifying hate speech
and offensive content in code-mixed conversations on social media.
Code-mixed text includes multiple languages within a single
conversation. The task is divided into two subtasks.*
* In Task 2a, participants need to perform binary classification on
    conversational tweets with tree-structured data. They must determine
    whether a tweet, comment, or reply contains hate speech, offensive
    language, or profanity (HOF) or is non-hate and offensive (NOT). The
    classification should consider both the individual content and
    support for hate expressed in the parent tweet.
  * Task 2b involves the classification of conversational tweets with
    tree-structured data into specific forms of hate. Participants must
    identify if the tweet, comment, or reply contains standalone hate
    (SHOF), contextual hate (CHOF) that supports hate expressed in the
    parent, or if it is non-hate (NONE).
For more details, please visit Task 2 webpage.
https://hasocfire.github.io/hasoc/2023/ichcl.html
*Task 3 aims to detect hateful spans within a sentence already
considered hateful. A hate span is a set of continuous tokens that, in
tandem, communicate the explicit hatefulness in a sentence.*
* For instance, in the statement, "Women ... Can't live with them...
    Can't shoot them," the portion highlighted in bold will be
    considered a hateful span. This shared task aims to extract all such
    spans from a hateful text.
  * The input texts are all in English. The detection of hateful spans
    is achieved by mapping this into a sequence labeling problem. For
    every token of the sequences, we have manually annotated the start
    and end of a hateful span. This is achieved by the BIO notation
    tagging, where B' represents the beginning of the hate span,' I'
    forms the continuation of a hate span, and' O' represents the
    non-hate tag. The task is then to learn the correct sequence of the
    BIO tags for a given sentence. For example, in the above sentence,
    the tag sequence for the preprocessed sentence will be of the form
    "women can't live with them can't shoot them" → "O O O O O B I I";
    "I" notation cannot exist on its own and will always be preceded by
    either an "I" or "B". Consequently, a “B” notation can be
    immediately followed by an “O” in case the span is just a single word.
For more details, please visit Task 3 webpage.
https://lcs2.in/hatenorm-2023/
*Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese
languages. It is a binary classification task. Each dataset (for the
three languages) consists of a list of sentences with their
corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is
primarily collected from Twitter, Facebook, and Youtube comments.
*
The Macro F1 score will be the yardstick of the task. Team rank will be
determined based on the Macro F1 score of the first part.
For more details, please visit Task 4 webpage.
https://sites.google.com/view/hasoc-2023-annihilate-hates/home
Registration for all four tasks is open on our registration page.
https://hasocfire.github.io/hasoc/2023/registration.html
We believe that your expertise and contribution will be invaluable in
advancing the state-of-the-art hate speech classification. We encourage
you to participate in this exciting shared task and contribute to the
research community.
Regards,
HASOC organizing team

2025

2024

2023

2022

[Corpora-List] HASOC 2023 tasks - Call for Participation - Hate Speech and Offensive Content Identification