15th meeting of Forum for Information Retrieval Evaluation HASOC-2023
We are excited to announce the 5th edition of HASOC, consisting of four interesting shared tasks. We invite you to participate.
Task 1 focuses on identifying hate speech, offensive language, and profanity in different languages using natural language processing techniques.
* Task 1A is identifying hate and offensive content in Sinhala, a low-resource Indo-Aryan language spoken mainly in Sri Lanka. The task involves classifying tweets into Hate and Offensive (HOF) or Non-Hate and Offensive (NOT). The training set for this task is based on the Sinhala Offensive Language Detection dataset, which contains 10,000 tweets. * Task 1B focuses on identifying hate and offensive content in Gujarati, another low-resource Indo-Aryan language spoken by approximately 50 million people in India. Similarly, participants need to classify tweets into HOF or NOT categories. The training set for this task consists of around 200 tweets.
Task 2, Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL), addresses the challenge of identifying hate speech and offensive content in code-mixed conversations on social media. Code-mixed text includes multiple languages within a single conversation. The task is divided into two subtasks.
* In Task 2a, participants need to perform binary classification on conversational tweets with tree-structured data. They must determine whether a tweet, comment, or reply contains hate speech, offensive language, or profanity (HOF) or is non-hate and offensive (NOT). The classification should consider both the individual content and support for hate expressed in the parent tweet. * Task 2b involves the classification of conversational tweets with tree-structured data into specific forms of hate. Participants must identify if the tweet, comment, or reply contains standalone hate (SHOF), contextual hate (CHOF) that supports hate expressed in the parent, or if it is non-hate (NONE).
For more details, please visit Task 2 webpage.https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlackgaQ3cC3KWA3A7oIjI0BCuz5GrEbnXK0YnlINTMPPdrr_X~PcFlF77uHN2da8HxHvcdJf06x3jV-bm5tis8JY8FYsAbnRn98PzzG~bp2fcV5f1ze3iC1rcrZTfSAceIyf9T75A3g3CkkT-bnWf3UsB6kH~mUdRa&s=WtKiQda0FckS0p0KDCBdRk_QxtE
Task 3 aims to detect hateful spans within a sentence already considered hateful. A hate span is a set of continuous tokens that, in tandem, communicate the explicit hatefulness in a sentence.
* For instance, in the statement, "Women ... Can't live with them... Can't shoot them," the portion highlighted in bold will be considered a hateful span. This shared task aims to extract all such spans from a hateful text. * The input texts are all in English. The detection of hateful spans is achieved by mapping this into a sequence labeling problem. For every token of the sequences, we have manually annotated the start and end of a hateful span. This is achieved by the BIO notation tagging, where B' represents the beginning of the hate span,' I' forms the continuation of a hate span, and' O' represents the non-hate tag. The task is then to learn the correct sequence of the BIO tags for a given sentence. For example, in the above sentence, the tag sequence for the preprocessed sentence will be of the form "women can't live with them can't shoot them" → "O O O O O B I I"; "I" notation cannot exist on its own and will always be preceded by either an "I" or "B". Consequently, a “B” notation can be immediately followed by an “O” in case the span is just a single word.
For more details, please visit Task 3 webpage.https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacZyzjZQmnS5rUEIxoaw2FYcG25Z7J_gRHJJUcp4JKXOl4thC6COa9i~RG0N58ogF0DrXuL6YwRU2RjhX8HUMS6wBDbb6tMCc7cBhb9mlhYZJvCBxwmTxeJM01xT5VMX6LQQmNAmsnl2TrRez&s=Dw0BXsV3_dtoHi2T87rE7sFScrk
Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese languages. It is a binary classification task. Each dataset (for the three languages) consists of a list of sentences with their corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is primarily collected from Twitter, Facebook, and Youtube comments.
The Macro F1 score will be the yardstick of the task. Team rank will be determined based on the Macro F1 score of the first part.
For more details, please visit Task 4 webpage.https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphti%3A%2Fstsg.teeoolsgem.c%2Fviwo%2F0oha3-22scln-athiani%2Fae-oeshhtem&s=hi9XoHnW5xc1PvQvk_kyIY5yH-Q
Registration for all four tasks is open on our registration page.https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F1690315200%2F1qOOEH-000CON-4S%7Cin6f%7C57e1b682%7C10977208%7C9441127%7C64C02A350B98BF72E7D5D52BC3310027&o=%2Fphtk%3A%2Futstl01gani..%2F.en%2FomlacSEbXAm5ASTyPZp~mwSToakJHxJUigj0TV53jJLP8YRpjnznqUd4TQ~URRk2BF08gL8rxoeodN08p7dnwO2EZCQ6PuQTSx3WgHiC3559Ohe7pr6jBJBqmYxk6crbMjbqJnDqtqEUC560feaATSu1bybrXJD9466xoaj3QsZ&s=bGdnMV6qIjoYsO7tOx7A2JtwHog
We believe that your expertise and contribution will be invaluable in advancing the state-of-the-art hate speech classification. We encourage you to participate in this exciting shared task and contribute to the research community.
Regards, HASOC organizing team