Call for papers
Workshop Discourse studies and linguistic data science: Addressing challenges in interoperability, multilinguality and linguistic data processing - DiSLiDaS 2023 University of Vienna, Vienna, Austria 12-13 September 2023 (TBA) Website: http://dislidas.mozajka.cohttp://dislidas.mozajka.co/
The fourth biennial conference on Language, Data and Knowledge (LDK 2023) (http://2023.ldk-conf.orghttp://2023.ldk-conf.org/) and Cost Action CA18209 NexusLinguarum (https://nexuslinguarum.euhttps://nexuslinguarum.eu/) are glad to announce the second workshop Discourse studies and linguistic data science: Addressing challenges in interoperability, multilinguality and linguistic data processing – DiSLiDaS 2023.
*Conference aims and topics* The workshop aims to follow through the topics discussed during DiSLiDaS 2022 (https://dislidas.mozajka.co/?page_id=211) and to gather current research advances in discourse analysis and representation, in the context of multilinguality, from a linguistic and computational perspective. We invite submissions addressing challenges such as interoperability, linguistic linked open data (LLOD), and language processing and analysis.
The workshop topics are the following (but not limited to): ● Discourse and dialogue annotation: Parsing and representation across languages and frameworks ● Discourse markers and discourse relations (RST, PDTB, SDRT): Identification, prediction and extraction ● Attitudes discovery and interpretation in Discourse: Appraisal and sentiment ● Effects of multimodality on discourse interpretation: Intonation, gesture and text ● Interoperability for Multilingual language data: Challenges of rich and distributed data ● Discourse data and machine learning: Methods and tools
Discourse comprises a wide variety of linguistic phenomena, such as discourse markers, discourse relations, and speaker attitude, which have been largely studied by different communities of practice from Linguistics and Computation, rendering several theoretical frameworks (for instance, RST, SDRT, PDTB, for discourse relations; appraisal theory for sentiment analysis,...), and technological approaches, such as transformer models, embeddings and alike. Nonetheless, there are open issues concerning interoperability, multilinguality, and language processing, in particular, the existence of different annotation schemas, disambiguation, lack of training data for machine learning, scarcity of effective language phenomena detection and interpretation methods, diverse vocabularies, insufficient multilingual parallel corpora of non-dialogue and dialogue, initial stages of exploration of multimodality.
Discourse research is one of the central research areas of natural language processing (NLP) too. NLP research focuses on the formalisation, identification and discovery of semantic phenomena, dialogue exchange structure, and text coherence. Some of the technological approaches of NLP include the use of transformer models, word embeddings, linguistic linked open data, the constitution of aligned multilingual corpora, vocabularies of language phenomena and alike. Computational discourse explores the evidence that language consists not only of placing words in the right order but also of detecting and interpreting the meaning and deeper textual relations and organising ideas into a logical flow. The linguistic approaches study language phenomena referring to coherence and cohesiveness of discourse, lexical, phrasal, syntactic, semantic and pragmatic means to express discourse relations, represent their roles and build language resources for them.
Despite all the advances, there are still plenty of unresolved problems related to interoperability, multilinguality, and language processing. With the growth of the Semantic Web and Linguistic Linked Data, interoperability is key to reading, interpreting and adopting language resources. The existence of different annotation schemas to encode discourse relations constitutes a problem for data exchange and reuse and for theoretical consistency. The treatment of multilinguality is also complicated because of the insufficiency of multilingual parallel corpora of collections of non-dialogue and dialogue texts, which would allow systematic contrastive studies. As to language processing, the lack of training data for machine learning, coupled with the scarcity of effective language phenomena detection and interpretation methods, the coexistence of diverse vocabularies, and the minimal attention to the contribution of the tone of voice, intonation, gestures to the meaning and the informative value of discourse elements make the task of discourse processing still very challenging.
The workshop intends to be a discussion forum for researchers interested in addressing the aforementioned challenges and advancing the state-of-art in discourse studies and linguistic data science.
*Programme* The Scientific Programme will include one invited talk and oral presentations.
Invited Speaker Johan Bos, University of Groningen
*Submissions* Submissions can be in the form of: • long papers: 9–12 pages; • short papers: 4–6 pages.
All submission lengths are given including references. Accepted submissions will be published by ACL in an open-access conference proceedings volume, free of charge for authors. The ACL templates should therefore be used for all conference submissions. As the reviewing process is single-blind, submissions should not be anonymised.
The workshop will be hybrid (face-to-face and remote). Note that at least one author of each accepted paper must register to present the paper at the workshop (either remotely or on-site). There will be no registration fee administered for participating in DiSLiDaS 2023.
Submissions must be submitted electronically via EasyChair: https://easychair.org/conferences/?conf=dislidas2023
*Important dates* Time Zone: Anywhere on Earth Papers due: May, 19, 2023 Papers acceptance notifications: June, 16, 2023 Camera-ready papers due: June, 30, 2023
*Programme Committee* Elena-Simona Apostol, University Politehnica of Bucharest, Romania Harry Bunt, Tilburg University, Netherlands Maria Josep Cuenca, Universitat de València Debopam Das, Humboldt University of Berlin, Germany Jorge Garcia, University of Zaragoza, Spain Mikel Iruskieta, University of the Basque Country, Spain António Leal, University of Porto, Portugal Chaya Liebeskind, Jerusalem College of Technology, Israel Amália Mendes, University of Lisbon, Portugal Maciej Ogrodniczuk, Polish Academy of Sciences, Poland Giedre Valunaite Oleskevicienė, Mykolas Romeris University, Lithuanian Georg Rehm, DFKI GmbH, Germany Ted Sanders, Utrecht University, Netherlands Merel Scholman, University of Saarland, Germany Dimitar Trajanov, Ss. Cyril and Methodius University, North Macedonia Radoslava Trnavac, University of Belgrade, Serbia Ciprian-Octavian Truica, University Politehnica of Bucharest, Romania Amir Zeldes, The Georgetown University, USA
*Organising Committee* Purificação Silvano, University of Porto, Portugal Mariana Damova, Mozaika, Ltd., Bulgaria Christian Chiarcos, Goethe-Universität, Germany Anna Bączkowska, University of Gdansk, Poland
*Contact* organizers@dislidas.mozajka.comailto:organizers@dislidas.mozajka.comailto:organizers@dislidas.mozajka.co
*Apologies for cross-posting*
Do you believe machine generated text is becoming an issue? Are you interested in boosting research to automatically detect machine generated text? 🤖👩🏻
We cordially invite all researchers and practitioners from all fields to participate in the AuTexTification task. If interested, register yourself in the shared task through this link: https://lnkd.in/dzBZsYiD
Once registered and training phase started, the datasets will be sent to your email along with a password. Look for more information regarding task description, schedules, or submissions through the Autextification web page: https://sites.google.com/view/autextification
More information on the shared task The new era of automatic content generation has surged through powerful causal language models like GPT, PALM, or Bloom that can be used to spread untruthful news, human-looking reviews, or opinions. Thus, it is imperative to develop technology to automatically detect generated text for content moderation and to attribute generated text to specific models to protect intellectual property or to distill responsibilities. In this context, we propose the “Automatic Text Identification” (AuTexTification) shared task, to boost research and development of automatic systems to detect automatically generated text, obtained by state-of-the-art language models, in English and Spanish.
We propose two subtasks: (i) Human or Generated, where given a text participants will have to determine whether a text has been automatically generated or not; and (ii) Model Attribution, where participants will have to determine what model generated a text. The generation models used to generate the text are of increasing number of neural parameters, ranging from 2 to 175 billion, meaning that participants' systems should be versatile enough to detect a diverse set of text generation models and writing styles.
In the training phase, participants will be provided with two partitions for subtask 1, i.e., English and Spanish partitions, with binary labels 👩🏻 and 🤖. Similarly, a partition per language will be released for subtask 2. It will include six labels (A, B, C, D, E, and F), each label representing a text generation model. Later, the unlabeled test data will be released.
Important Dates March 22, 2023: Release of training data April 21, 2023: Release of test data May 10, 2023: Participant system results submission May 17, 2023: Results notification June 3, 2023: Paper submission June 16, 2023: Paper peer-reviewed July 4, 2023: Camera-ready paper version September 26, 2023: Conference
Task organizers José Ángel González (Symanto) Contact Email: jose.gonzalez@symanto.com Areg Sarvazyan (Symanto) Contact Email: areg.sarvazyan@symanto.com Marc Franco-Salvador (Symanto) Francisco Rangel (Symanto) Berta Chulvi (Universitat Politècnica de València) Paolo Rosso (Universitat Politècnica de València)
Please reach out to the organizers or join the Slack workspace to connect with the other participants and organizers: https://lnkd.in/di_zaMHf