*Apologies for cross-posting*
Do you believe machine generated text is becoming an issue? Are you interested in boosting research to automatically detect machine generated text? 🤖👩🏻
We cordially invite all researchers and practitioners from all fields to participate in the AuTexTification task. If interested, register yourself in the shared task through this link: https://lnkd.in/dzBZsYiD
Once registered and training phase started, the datasets will be sent to your email along with a password. Look for more information regarding task description, schedules, or submissions through the Autextification web page: https://sites.google.com/view/autextification
More information on the shared task The new era of automatic content generation has surged through powerful causal language models like GPT, PALM, or Bloom that can be used to spread untruthful news, human-looking reviews, or opinions. Thus, it is imperative to develop technology to automatically detect generated text for content moderation and to attribute generated text to specific models to protect intellectual property or to distill responsibilities. In this context, we propose the “Automatic Text Identification” (AuTexTification) shared task, to boost research and development of automatic systems to detect automatically generated text, obtained by state-of-the-art language models, in English and Spanish.
We propose two subtasks: (i) Human or Generated, where given a text participants will have to determine whether a text has been automatically generated or not; and (ii) Model Attribution, where participants will have to determine what model generated a text. The generation models used to generate the text are of increasing number of neural parameters, ranging from 2 to 175 billion, meaning that participants' systems should be versatile enough to detect a diverse set of text generation models and writing styles.
In the training phase, participants will be provided with two partitions for subtask 1, i.e., English and Spanish partitions, with binary labels 👩🏻 and 🤖. Similarly, a partition per language will be released for subtask 2. It will include six labels (A, B, C, D, E, and F), each label representing a text generation model. Later, the unlabeled test data will be released.
Important Dates March 22, 2023: Release of training data April 21, 2023: Release of test data May 10, 2023: Participant system results submission May 17, 2023: Results notification June 3, 2023: Paper submission June 16, 2023: Paper peer-reviewed July 4, 2023: Camera-ready paper version September 26, 2023: Conference
Task organizers José Ángel González (Symanto) Contact Email: jose.gonzalez@symanto.com Areg Sarvazyan (Symanto) Contact Email: areg.sarvazyan@symanto.com Marc Franco-Salvador (Symanto) Francisco Rangel (Symanto) Berta Chulvi (Universitat Politècnica de València) Paolo Rosso (Universitat Politècnica de València)
Please reach out to the organizers or join the Slack workspace to connect with the other participants and organizers: https://lnkd.in/di_zaMHf