*CoLI-Kanglish: Word Level Language Identification in Code-mixed Kannada-English Texts*
CoLI-Kanglish shared task@ICON2022*
URL: https://sites.google.com/view/kanglishicon2022/home
Registration link: https://docs.google.com/forms/d/e/1FAIpQLSfFZR_5ugGKQnf2FYNIWnOh4rv6Bz6podD1...
The training and test set is now available.
*Participants are invited to publish Working Notes of ICON 2022**
*Task Description*
The task of automatically identifying languages used in a given text is called Language Identification (LI). LI is a pre-processing step for many applications and LI at the word level can be viewed as a sequence labelling problem where every word in a sentence is tagged with either a mixed language or one of the languages in the predefined set of languages. Despite a lot of work being done in LI, the problem of LI in the code-mixed scenario is still a long way from being illuminated.
India has a rich heritage of languages and Kannada is one of the Dravidian languages as well as the official language of Karnataka state. People of Karnataka read, write and speak Kannada but many find it difficult to use Kannada script to post messages or comments on social media. While technological limitations like keyboards of computers and smartphones are one reason, another reason may be the complexity of framing words with consonant conjuncts. Hence, most of the users use only Roman script or a combination of both Kannada and Roman script to post comments on social media. To address word level LI in code-mixed Kannada-English (Kn-En) texts, these texts are extracted from Kannada YouTube video comments to construct Code-mixed Language Identification (CoLI-Kenglish) dataset.
We encourage participants to use the CoLI-Kenglish dataset which consists of English, Kannada and mixed language words, in Roman script and submit their methods to Kanglish shared task where each word will be identified and categorized in one of the predefined categories.
*Important Dates*
- 2nd November – Train and test datasets are released - 2nd November – Submission link release - 16th November – Run submission deadline - 22nd November – Working Note submission deadline - 25th November - Reviews Notifications - 1st December– Camera Ready Due - December 15th - 18th - ICON 2022 https://www.google.com/url?q=https%3A%2F%2Flcs2.in%2FICON-2022%2Findex.html&sa=D&sntz=1&usg=AOvVaw0bQpphhjt6mkQuIEQenw7z Conference
*NOTE:* All dates mentioned here are in the Indian Time zone.
*Organizers*
Fazlourrahman Balouchzahi, Instituto Politecnico Nacional, Mexico
Sabur Butt, Instituto Politecnico Nacional, Mexico
Noman Ashraf, Dana-Farber Cancer Institute, Harvard Medical School, United States
Asha Hegde, Department of Computer Science, Mangalore University, India
Shashirekha Hosahalli Lakshmaiah, Department of Computer Science, Mangalore University, India
Grigori Sidorov, Instituto Politecnico Nacional, Mexico
Alexander Gelbukh, Instituto Politecnico Nacional, Mexico
*Contact*
Email: Kanglish2022@gmail.com
*ICON 2022: https://www.lcs2.in/ICON-2022/index.html