October 2024 - Corpora

[Deadline Extension] Call for Shared Task Proposals for RANLP 2025
by Ranasinghe, Tharindu 30 Oct '24

30 Oct '24

[Due to many requests, we have extended the deadlines.] We invite proposals for tasks to be run as part of RANLP 2025 (Recent Advances in Natural Language Processing): https://ranlp.org/ranlp2025/.<https://ranlp.org/ranlp2025/> RANLP is one of the most influential and competitive NLP conferences. RANLP 2025 will take place in September 2025 at the Black Sea city of Varna. For the first time in RANLP history, we are organising a shared task campaign as part of the main conference and inviting task organisers to submit their task proposals. Researchers and practitioners from all areas of Natural Language Processing and related communities are invited to submit task proposals. For RANLP 2025, we welcome any task that can evaluate an automatic system for natural language processing. We especially encourage tasks for languages other than English, multi-lingual tasks, and tasks that develop novel applications of natural language processing. We strongly encourage proposals based on already published datasets, as this can provide concrete examples and help minimise the challenges of organising the shared task. In the event of receiving many proposals, preference will be given to proposals based on already published datasets. If you are unsure whether a task is suitable, please contact the shared task chairs to discuss your idea. Task Selection Task proposals will be reviewed by at least two reviewers, and the reviews will serve as the basis for acceptance decisions. Task proposals will be evaluated on: * Novelty - Is the task based on a new problem that has not been explored much in the community? If similar tasks have been organised before, does this task cover new languages/ domains? * Data – Is the data available and published already? Do annotations have meaningfully high inter-annotator agreements? Have all appropriate licenses for the use and re-use of the data been secured? * Evaluation—Is the evaluation methodology sound? Is there an automated platform for the evaluation (e.g., CodaLab, Kaggle)? Task Organisation We specifically welcome task proposals from early career researchers. However, we strongly encourage tasks that have a diverse team of organisers as that will ease the task organisation. Apart from providing a dataset, task organisers are expected to: 1. Verify data quality in terms of annotator agreement. 2. Verify licenses for the data to allow its use in the competition. 3. Provide task participants with baseline systems. 4. Create a CodaLab or other similar evaluation platform for the task and manage automatic evaluation. 5. Promote the task within the target research community. 6. Manage and organise review process of participants’ submissions of system description papers. 7. Write a task description paper to be included in RANLP proceedings. 8. Contribute to the tasks overview paper written by shared task chairs and other task organisers which will also be included in RANLP proceedings. 9. Register and present the shared task description paper at RANLP 2025 on either 11th or 12th September 2025 (the exact date will be confirmed later) Important Dates * Task proposals due - November 20, 2024 * Task selection notification – November 25, 2024 Recommended Timeline for the Tasks * Sample data and task website ready - December 1, 2024 * Training data ready - December 15, 2024 * Evaluation data ready - March 1, 2025 * Evaluation starts – March 10, 2025 * Evaluation end - March 31, 2025 (latest date; task organisers may choose an earlier date) * Paper submission due – April 20, 2025 * Notification to authors – May 16, 2025 * Task overview paper due – May 25, 2025 * Camera-ready due - May 31, 2025 * Shared task presentation co-located with RANLP 2025 – September 11 and September 12, 2025 Tasks that do not meet critical deadlines, such as those for launching the task, setting up the CodaLab website, and uploading samples, training, and evaluation data, may be cancelled at the discretion of the shared task chairs. Submission Details The task proposal should be a self-contained document of no longer than 2 pages (plus additional pages for references). All submissions must be in PDF format, following the RANLP 2023 template available at https://ranlp.org/ranlp2023/index.php/submissions/ Each proposal should contain the following: * Overview * Summary of the task – What is the goal of the task * Expected number of participants and justification * Data & Resources * How the training/testing data will be produced. Discuss whether the dataset is already published * Details of license, so that the data can be used by the research community * How much data will be produced * How data quality will be ensured and evaluated * An example of what the data would look like * Evaluation * The evaluation methodology to be used, including clear evaluation criteria - * The evaluation platform (i.e. CodaLab, Kaggle etc.) * Task organisers * Names, affiliations, email addresses * brief description of relevant experience or expertise The submissions should be done via START - https://softconf.com/ranlp25/papers/user/scmd.cgi?scmd=submitPaperCustom&pa… Proceedings Tasks overview paper, task description papers and participant papers will be published as part of RANLP 2025 proceedings in ACLAnthology. Task organisers and participants are expected to attend RANLP 2025 on September 11 and September 12, 2025, and present their work in order to include it in the proceedings. Shared Task Chairs Dr Tharindu Ranasinghe, Lancaster University, UK Dr Saad Ezzini, Lancaster University, UK RANLP 2024 Chairs Programme Committee Chair: Prof Dr Ruslan Mitkov, Lancaster University, UK Organising Committee Chair: Prof Dr Galia Angelova, Bulgarian Academy of Sciences, Bulgaria Best Regards Dr Tharindu Ranasinghe | Lecturer in Security and Protection Science School of Computing and Communications | Lancaster University Contact me on Teams<https://teams.microsoft.com/l/chat/0/0?users=t.ranasinghe@lancaster.ac.uk> www.lancaster.ac.uk<https://www.lancaster.ac.uk/>

1 0

PostDoc at TurkuNLP (NLP/Digital linguistics)
by Veronika Laippala 30 Oct '24

30 Oct '24

Dear colleagues, TurkuNLP at the University of Turku, Finland, has a fully funded open PostDoc position in NLP / digital linguistics. Term: 1.12.2024-31.12.2025 (or upon agreement) Deadline: 18.11. 2024 Topics: Web register (genre) identification / web register studies, processing massively multilingual web crawls, massively multilingual cross-linguistic comparisons of web registers. Detailed information: https://ats.talentadore.com/apply/tutkijatohtorin-projektitutkijan-maaraaik… Please contact me for further information. Applications must be made using the Talentadore system linked above. Best, Veronika Laippala Professor of Digital linguistics, TurkuNLP

1 0

CfP: Workshop on Generative AI and Knowledge Graphs (GenAIK) co-located with COLING 2025 (with some travel grant for 2 students)
by Genet Asefa Gesese 28 Oct '24

28 Oct '24

--------------------------------------------------------------------------------- Workshop on Generative AI and Knowledge Graphs (GenAIK), 19 January 2025, Abu Dhabi, UAE Web: https://genetasefa.github.io/GenAIK2025/ X: @GenAIK25 LinkedIn: https://www.linkedin.com/groups/9868047 Mastodon: https://sigmoid.social/@GenAIK --------------------------------------------------------------------------------- In conjunction with COLING 2025, January 19-24 --------------------------------------------------------------------------------- Workshop Overview --------------------------------------------------------------------------------- Generative Artificial Intelligence (GenAI) is a branch of artificial intelligence capable of creating seemingly new, meaningful content, including text, images, and audio. It utilizes deep learning models, such as Large Language Models (LLMs), to recognize and replicate data patterns, enabling the generation of human-like content. Notable families of LLMs include GPT (GPT-3.5, GPT-3.5 Turbo, and GPT-4), LLaMA (LLaMA and LLaMA-2), and Mistral (Mistral and Mixtral). GPT, which stands for Generative Pretrained Transformer, is especially popular for text generation and is widely used in applications like ChatGPT. GenAI has taken the world by storm and revolutionized various industries, including healthcare, finance, and entertainment. However, GenAI models have several limitations, including biases from training data, generating factually incorrect information, and difficulty in understanding complex content. Additionally, their performance can vary based on domain specificity. In recent times, Knowledge Graphs (KGs) have attracted considerable attention for their ability to represent structured and interconnected information, and adopted by many companies in various domains. KGs represent knowledge by depicting relationships between entities, known as facts, usually based on formal ontological models. Consequently, they enable accuracy, decisiveness, interpretability, domain-specific knowledge, and evolving knowledge in various AI applications. The intersection between GenAI and KG has ignited significant interest and innovation in Natural Language Processing (NLP). For instance, by integrating LLMs with KGs during pre-training and inference, external knowledge can be incorporated for enhancing the model’s capabilities and improving interpretability. When integrated, they offer a robust approach to problem solving in diverse areas such as information enrichment, representation learning, conversational AI, cross-domain AI transfer, bias, content generation, and semantic understanding. This workshop aims at reinforcing the relationships between Deep Learning, Knowledge Graphs, and NLP communities and foster interdisciplinary research in the area of GenAI. --------------------------------------------------------------------------------- Topics of Interest --------------------------------------------------------------------------------- * Enhancing KG construction and completion with GenAI * Multimodal KG generation * Text-to-KG using LLMs * Multilingual KGs * GenAI for KG embeddings * GenAI for Temporal KGs * Dialogue systems enhanced by KG and GenAI * Cross-domain knowledge transfer with GenAI * Bias mitigation using KGs in GenAI * Explainability with KGs and GenAI * Natural language querying of KGs via GenAI * NLP tasks using KGs and GenAI * Prompt Engineering using KGs * GenAI for Ontology learning and schema induction in KGs * Hybrid QA systems combining KGs and GenAI * Recommendation systems and KGs with GenAI * Creating benchmark datasets relevant for tasks combining KGs and GenAI * Real-world applications on scholarly data, biomedical domain, etc. * Knowledge Graph Alignment * Applying to real-world scenarios ------------------------------------------------------------------------------------ Important Dates ------------------------------------------------------------------------------------ - Submission deadline: 5 November 2024 - Notification of Acceptance: 5 December 2024 - Camera-ready paper due: 13 December 2024 - COLING2025 Workshop day: 19 January 2025 ------------------------------------------------------------------------------------ Submissions ------------------------------------------------------------------------------------ Full research papers (6-8 pages) Short research papers (4-6 pages) Position papers (2 pages) These page limits only apply to the main body of the paper. At the end of the paper (after the conclusions but before the references) papers need to include a mandatory section discussing the limitations of the work and, optionally, a section discussing ethical considerations. Papers can include unlimited pages of references and an unlimited appendix. Papers must follow the two-column format of *ACL conferences, using the official templates ( https://www.overleaf.com/latex/templates/association-for-computational-ling… <https://goto-ng.fiz-karlsruhe.de/latex/templates/association-for-computatio…> ). The templates are available for download as style files and formatting guidelines. Submissions that do not adhere to the specified styles, including paper size, font size restrictions, and margin width, will be desk-rejected. Submissions are open to all and must be anonymous, adhering to COLING 2025's double-blind submission and reproducibility guidelines. All accepted papers (after double-blind review of at least 3 experts) will appear in the workshop proceedings that will be published in ACL Anthology. At least one of the authors of the accepted papers must register for the workshop to be included into the workshop proceedings. The workshop will be a 100% in-person 1-day event at COLING 2025. Submissions must be made using the START portal: https://softconf.com/coling2025/GenAIK25/ <https://goto-ng.fiz-karlsruhe.de/coling2025/GenAIK25/,DanaInfo=softconf.com…> --------------------------------------------------------------------------------- Sponsors --------------------------------------------------------------------------------- NFDI4DataScience (NFDI4DS - https://www.nfdi4datascience.de/ <https://goto-ng.fiz-karlsruhe.de/,DanaInfo=www.nfdi4datascience.de,SSL+> ) is a national research data infrastructure for Data Science and AI project. The overarching objective of the project is the development, establishment, and sustainment of a national research data infrastructure (NFDI) for the Data Science and Artificial Intelligence community in Germany. The vision of NFDI4DS is to support all steps of the complex and interdisciplinary research data lifecycle, including collecting/creating, processing, analyzing, publishing, archiving, and reusing resources in Data Science and Artificial Intelligence. NFDI4ds is offering a total of €2000 in travel grants (€1000 each) to two selected students who will attend and present their work at GenAIK 2025! To be considered, submit your paper to the workshop, and if your paper is accepted, you’ll be eligible for a chance to receive one of the two grants. --------------------------------------------------------------------------------- Organization --------------------------------------------------------------------------------- - Genet Asefa Gesese, FIZ Karlsruhe, KIT, Germany - Harald Sack, FIZ Karlsruhe, KIT, Germany - Heiko Paulheim, University of Mannheim, Germany - Albert Meroño-Peñuela, King’s College London, UK - Lihu Chen, Imperial College London, UK If you have published in ACL conferences previously, and are interested to be part of the program committee of GenAIK2025, please fill in this form: https://forms.gle/t56dP6McD1VJmTfT9 <https://goto-ng.fiz-karlsruhe.de/,DanaInfo=forms.gle,SSL+t56dP6McD1VJmTfT9> -- *Dr.-Ing. **Genet Asefa Gesese* Head of Machine Learning Department (Abteilungsleitung Maschinelles Lernen) FIZ Karlsruhe – Leibniz Institute for Information Infrastructure ( *https://www.fiz-karlsruhe.de/en/bereiche/lebenslauf-und-publikationen-dr-ing-genet-asefa-gesese <https://www.fiz-karlsruhe.de/en/bereiche/lebenslauf-und-publikationen-dr-in…>* ) AND Karlsruhe Institute of Technology (KIT) *( https://www.aifb.kit.edu/web/Genet_Asefa_Gesese/en <https://www.aifb.kit.edu/web/Genet_Asefa_Gesese/en> )*

1 0

SyntaxFest 2025 - First Announcement
by Dobrovoljc, Kaja 28 Oct '24

28 Oct '24

Dear colleagues, We are pleased to announce that SyntaxFest 2025 (https://syntaxfest.github.io/syntaxfest25/) will take place in Ljubljana, Slovenia, from 26 to 29 August 2025. SyntaxFest is a biennial event that brings together a series of events focusing on topics such as empirical syntax, linguistic annotation, statistical language analysis, and natural language processing. SyntaxFest 2025, organized by the University of Ljubljana, will host five events under a unified submission process and program: * TLT: 23rd Workshop on Treebanks and Linguistic Theories * DepLing: 8th International Conference on Dependency Linguistics * UDW: 8th Universal Dependencies Workshop * IWPT: 18th International Conference on Parsing Technologies * Quasy: 2nd Workshop on Quantitative Syntax In addition, the event will be co-located with the UniDive 1st Shared Task on Morphosyntactic Parsing, organized by the UniDive COST Action CA21167, on 26 August 2025. Preliminary timeline for paper submission procedure: * First call for papers: December 2024 * Submission deadline: April 2025 * Notification of acceptance: June 2025 * Conference dates: 26 to 29 August 2025 Workshop organizers / Programme chairs: TLT: * Heike Zinsmeister (University of Hamburg) * Sarah Jablotschkin (University of Hamburg) * Sandra Kübler (Indiana University) DepLing: * Eva Hajičová (Charles University, Prague) * Sylvain Kahane (Université Paris Nanterre) UDW: * Gosse Bouma (University of Groningen) * Cagri Coltekin (University of Tübingen) IWPT: * Kenji Sagae (University of California, Davis) * Stephan Oepen (University of Oslo) Quasy: * Xinying Chen (University of Ostrava) * Yaqin Wang (Guangdong University of Foreign Studies) Local Organizing Committee: * Kaja Dobrovoljc (University of Ljubljana, chair) * Špela Arhar Holdt (University of Ljubljana) * Marko Robnik Šikonja (University of Ljubljana) * Matej Klemen (University of Ljubljana) * Luka Terčon (University of Ljubljana) * Sara Kos (University of Ljubljana) We look forward to seeing you in Ljubljana! On behalf of SyntaxFest 2025 Organizing Committee, Kaja Dobrovoljc

1 0

2nd Call for papers "WRAICogS1 - Writing Aids at the Crossroads of AI, Cognitive Science, and NLP" (COLING workshop)
by Michael Zock 27 Oct '24

27 Oct '24

(Apologies for cross-posting) Second call for papers: "WRAICogS1 - Writing Aids at the Crossroads of AI, Cognitive Science, and NLP" * Co-located with COLING 2025, Abu Dhabi, https://coling2025.org/ * SUBMISSION DEADLINE: November 25, 2024 * SUBMISSION LINK: https://softconf.com/coling2025/AAC-AI25/ KEYNOTE SPEAKER Cerstin Mahlow, Professor of Digital Linguistics and Writing Research, ZHAW School of Applied Linguistics, Winterthur, Switzerland MOTIVATION This workshop is dedicated to developing writing aids grounded in human cognition (limitations of attention and memory, typically observed habits, knowledge states, and information needs). In other words, we focus on the cognitive and engineering aspects of interactive writing. Our goal is not only to help people acquire and improve their writing skills but also to enhance their productivity. By leveraging computer technology, we aim to enable them to produce better texts in less time. Writing is one of the four cornerstones of communication. By leaving a trace, it allows us to reach many people, to transcend space and time, and to spare ourselves the trouble of memorization. Writing is undeniably important, whether as a communication tool, a thinking aid, or a memorial support. However, what is less obvious is the process—that is, the precise steps required to transform an intuition or vague idea into concrete, well-polished prose. Producing readable, well-written text requires many skills, deep and broad knowledge of various sorts (topic, language, audience, metaknowledge, i.e., how to use the information at hand?)— a lot of practice and appropriate feedback. No one can learn all this overnight. The quantity and diversity of knowledge to interiorize, as well as the variety of cognitive states encountered, may explain why writing is so difficult and why it takes time to gain control over the whole process and become an expert writer. Unfortunately, knowledge alone is not enough. Writing is also a time- and energy-consuming endeavor. It is very hard work. Since writing is difficult, and since there are now computer programs capable of doing it, one may wonder: - whether we should leave the job entirely to the machine, or - whether we could use these programs to help people write or to acquire the skill of writing. Indeed, there are situations where it makes sense to rely on machines (e.g., routine work, business letters), but there are also many situations where this strategy is not recommended (e.g., writing to understand, writing to enrich and clarify our thoughts, writing to support thinking). That being said, one may find a middle ground where humans and machines work together, each contributing their strengths. It remains to be seen where machines can assist in the process (e.g., idea generation, idea structuring, translation into language, revision, editing) and where it is better to leave control to humans. Hence, the main question is not whether we should use LLMs to produce texts, but rather how, when, and at what level to use them or other techniques to help people produce written text. In sum, our main goal is not to substitute machines for people or to have them do the job in people's place, but rather to have machines assist people. Specifically, we aim to help people learn to write, speed up the process, gain better control, and reduce stress and cognitive load. Our motivation is largely practical and educational. Obviously, we are not the first ones to pursue this goal. However, while many workshops focused on developing educational software, creating intelligent writing assistants, or evaluating written text, the submitted papers have primarily addressed formal aspects, such as grammatical error detection and spotting spelling mistakes. Yet good writing (text composition) requires much more than just the production of well-formed sentences. Our mission is to go beyond merely identifying errors or mistakes made at the very end of the writing process, such as those due to ignorance or inattention. Instead, we aim to evaluate the quality of the choices made at higher levels. In other words, we are interested in the full spectrum of writing, including technology-based writing aids that address all tasks involved in writing: conceptual planning (ideation, organization), linguistic expression, editing, and revision. Hence, we welcome papers that focus on the higher levels of composition—such as thinking, reasoning, and planning (idea generation, outline planning)—as well as those concerned with the lower levels (grammar, spelling, and punctuation). Arguably, this is the first workshop to: - Consider the entire spectrum of writing rather than only the lower levels, - Integrate humans right from the start into the development cycle of writing aids, and - Provide support and feedback at any moment —before, during, and after writing— rather than only at the very end. TOPICS We welcome contributions on all topics related to writing aids, including but not limited to the following: 1. THE HUMAN PERSPECTIVE: Cognitive scientific viewpoints, including education, psycholinguistics, and neuroscience. (a) Support: How can AI tools support critical thinking and logical reasoning in writing? How can writing assistants tailor feedback to individual writers, considering their unique needs and styles? How can we assess the quality and impact of AI-generated feedback on students' writing (methods, metrics, etc.)? (b) Topical coherence: How can we help people organize their ideas into a coherent whole? How do we model or operationalize the concept of a topic, the paragraph's most central element? How do we detect possible topics within our data? What are typical subtopics of a given topic, and how do we identify them? How do we cluster content/ideas into topics and give the clusters appropriate names? (c) Building software: How do we include humans in the development cycle of writing aids? How and at what level can engineers use insights from psycholinguistics and neuroscience? How can they model the writing process while accounting for human and technological factors? (d) Metacognition: What do people typically know about writing in general and their own writing in particular? What are their problems and needs? How do people manage to coordinate the different processes? What should an authoring ecosystem look like (components)? What could be automated, and what is best left for interactive processing? (e) Shared tasks: What kinds of shared task would be meaningful while being technically feasible? 2. THE ENGINEERING SIDE (a) LLMs: Where in the writing process could we use methods developed in AI (e.g., LLMs) or computational linguistics (e.g., content generation, content structuring, translation into language, revision)? What are the potential benefits, dangers, and limitations of LLMs as writing aids? How could revealing the 'knowledge' embedded within black-box models improve their effectiveness, particularly in terms of increasing the accuracy and relevance of the feedback they provide? How can we address challenges related to data collection, privacy, and ethical considerations in developing and deploying AI writing tools? (b) Tools and resources: What kinds of tools and resources (e.g., Sketch Engine, Rhetorical Structure Theory, knowledge graphs, and linked data) could be useful? (c) Quality assessment: How can we check the veracity of facts, relevance, cohesion, coherence, style, fluency, proper use of pronouns, grammar, word choice, spelling, and punctuation? (d) Enhancement and evaluation: How do we enhance text analysis during or after writing (e.g., quality of coherence, style) using corpus linguistic tools? How do we evaluate or compare existing writing assistants (e.g., adequacy, design features, ease of use, lessons learned)? SUBMISSION INSTRUCTIONS Please submit your papers via the START/SoftConf submission portal (https://softconf.com/coling2025/AAC-AI25/), following the COLING 2025 templates. Submitted versions must be anonymous and should not exceed 8 pages for long papers and 4 pages for short papers. References do not count toward the page limit, and may be up to 4 pages long. Supplementary material and appendices are also allowed. We also invite papers discussing tools and applications (system demonstrations) related to our workshop topics. PUBLICATION All the accepted papers (be it for oral presentation or as poster) will be published as proceedings appearing in the ACL anthology. PARTICIPATION The workshop requires a physical presence. If any authors are unable to attend and present in person, alternative arrangements (such as remote presentations or video recordings) may be considered. However, we cannot guarantee these options, as the COLING organizers and local chairs have informed us that they will not provide technical support or online access. Generally, work presented in person will be given preference over work presented virtually. ORGANIZERS * Michael Zock (CNRS, LIS, Aix-Marseille University, Marseille, France) * Kentaro Inui (Mohamed bin Zayed University of Artificial Intelligence, UAE; Tohoku University, Japan; RIKEN, Japan) * Zheng Yuan (King's College London and the University of Cambridge, UK) MORE DETAILS: * homepage : https://sites.google.com/view/wraicogs1 * better readable CFP : https://sites.google.com/view/wraicogs1/home/call-for-papers * program committee : https://sites.google.com/view/wraicogs1/home/programme-committee * background information : https://sites.google.com/view/wraicogs1/home/background-and-topics -- Michael ZOCK Emeritus Research Director CNRS LIS UMR 7020 (Group TALEP) Aix Marseille Université 163 avenue de Luminy - case 901 13288 Marseille / France Mail: michael.zock(a)lis-lab.fr <mailto:michael.zock@lis-lab.fr> Tel.: +33 (0)6 51.70.97.22 Secr.: +33 (0)4.86.09.04.60 http://pageperso.lif.univ-mrs.fr/~michael.zock/ <http://pageperso.lif.univ-mrs.fr/%7Emichael.zock/>

1 0

[CFP] : Call for Tutorial : FIRE 2024
by Bhargav Dave 26 Oct '24

26 Oct '24

Apologies for the multiple postings. --------------------------------------------- *Call for Tutorial* *FIRE 2024: 16th meeting of the Forum for Information Retrieval Evaluation* 12th - 15th December 2024 DA-IICT, Gandhinagar, India *Submission Deadline: 15th November 2024* Website: fire.irsi.org.in Submission Link : https://cmt3.research.microsoft.com/FIRE2024 ------------------------------ The 16th meeting of the Forum for Information Retrieval Evaluation 2024 will be held at Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, India. It will be an in-person conference. We are inviting proposals for half-day tutorials covering topics relevant to information retrieval (IR) and its applications. We welcome topics that range from the theoretical foundations of IR to practical applications, as well as tutorials on IR and machine learning (ML) systems. Each tutorial should cover a single topic in depth. Tutorial proposals should include details according to guidelines below. *Submission Guidelines* Proposals should be *at most 4 pages (excluding references) * must follow ACM SIG's template available on https://authors.acm.org/proceedings/production-information/taps-production-…. The only accepted format of submissions is PDF. We strongly encourage the proposers to attend and present in-person. Submissions should include: - Title and abstract - Duration: Half Day - Proposed content of the tutorial - Target Audience - Speaker's bio: Name, affiliations, contact and short bio. Submissions are not anonymous (reviewing will be *single-blind*) and should contain speaker details. Proposals which do not conform to the requirements are likely to be rejected without review. All proposals should be submitted via Microsoft CMT: https://cmt3.research.microsoft.com/FIRE2024 *Important dates* Tutorial proposal due *Nov 15, 2024 * Tutorials notification *Nov 20, 2024 * Camera ready due *Nov 30, 2024 * Tutorial day *Dec 12-15, 2024* Note: All submission deadlines are 11:59 PM AoE Time Zone (Anywhere on Earth). *Presentation Requirements* If accepted, at least one author will have to register for the conference and present tutorial in-person. For queries related to conference please email us at [ clia(a)isical.ac.in ] For latest updates subscribe the FIRE mailing List [ https://groups.google.com/forum/#!forum/fire-list ]

1 0

Final CfP: 1st Workshop on Computational Humor (CHum @ COLING 2025)
by Tristan Miller 26 Oct '24

26 Oct '24

Call for papers: 1st Workshop on Computational Humor (CHum 2025) ================================================================ The 1st Workshop on Computational Humor (CHum 2025) will take place virtually on January 19, 2025 as part of the 31st International Conference on Computational Linguistics (COLING 2025). Scope and topics ---------------- CHum 2025 aims to foster further work on modeling the processes of humor with current methods in computational linguistics and natural language processing, against the theoretical backdrop of humor research and with reference to relevant corpora of textual, visual, and multimodal materials. A principal goal of the workshop is to unite researchers who can together probe the limits of various meaning representations -- symbolic, neural, and hybrid -- for humor processing. We welcome contributions on any topic relevant to the computational processing of humor, including but not limited to the following: * LLMs, knowledge representation * Resources and evaluation * Human-computer interaction * Computer-mediated communication * Assisted content creation * Machine and computer-assisted translation * Digital humanities applications * Formal modeling of humor * Proof-of-concept humor detection and classification Particularly encouraged are submissions describing inter- or multi-disciplinary work, whether completed or in progress, and position papers that critically discuss the past, present, and future of computational humor systems. Submission instructions ----------------------- Long and short papers should be formatted according to the same guidelines for the main COLING 2025 conference papers <https://coling2025.org/calls/submission_guidlines/> and submitted through START: <https://softconf.com/coling2025/CompHum25/> Important dates --------------- All deadlines are at 23:59 UTC-12:00 ("anywhere on Earth"). * Initial submission: November 15, 2024 * Notification of acceptance: December 2, 2024 * Camera-ready submission: December 13, 2024 * Workshop: January 19, 2025 Organizers ---------- * Christian F. Hempelmann, Texas A&M University-Commerce * Julia Rayz, Purdue University * Tiansi Dong, Fraunhofer IAIS * Tristan Miller, University of Manitoba Further information ------------------- * Website: <https://chum2025.github.io/> * E-mail: chum(a)groups.io -- Dr. Tristan Miller, Assistant Professor Department of Computer Science, University of Manitoba https://clam.cs.umanitoba.ca/ | Tel. +1 204 474 6792

1 0

Lin Lougheed Doctoral Fellowship in Language & Technology
by Voss, Erik 25 Oct '24

25 Oct '24

Apologies for cross posting *Lin Lougheed Doctoral Fellowship in Language & Technology* The Applied Linguistics & TESOL program at Teachers College, Columbia University announces the Lin Lougheed Doctoral Fellowship in Language & Technology. The fellowship will be offered to one student to develop and research AI technologies for language learning or language assessment. This 4-year fellowship will provide tuition and a stipend for a student in good standing during the program. Teachers College Application Portal: https://apply.tc.edu/apply/ The application deadline is December 1, 2024. Contact Prof. Erik Voss at ev2449(a)tc.columbia.edu with questions about this fellowship. -- Erik Voss, Ph.D. Assistant Professor, Applied Linguistics & TESOL program Language & Technology Specialization Department of Arts & Humanities Teachers College, Columbia University TC Faculty Profile <https://www.tc.columbia.edu/faculty/ev2449/>, Linkedin Profile <https://www.linkedin.com/in/erik-voss-ph-d-941a3ab9>, Google Scholar <https://scholar.google.com/citations?user=FMnVdjcAAAAJ&hl=en> ALTESOL Language & Technology Research Group <https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home> Editor-in-Chief of NYS TESOL Journal Associate Editor of Language Assessment Quarterly *Latest Publications* Voss, E. et al. (2023). The Use of Assistive Technologies Including Generative AI by Test Takers in Language Assessment: A Debate of Theory and Practice. <https://doi.org/10.1080/15434303.2023.2288256> LAQ Journal Voss, E. (2024) Duolingo Webinar: Current Applications of Artificial Intelligence in Language Assessment <https://youtu.be/b-mjLmvXLBU?si=nmph76-lizkfzi1J> (1 hour) Voss, E. (2024). Language Assessment and Artificial Intelligence. <https://books.google.com/books?hl=en&lr=&id=ht8aEQAAQBAJ&oi=fnd&pg=PA112&ot…> The Concise Companion to Language Assessment.

1 0

New issue of Research in Corpus Linguistics
by Paula Rodríguez-Puente 25 Oct '24

25 Oct '24

We are very pleased to announce that the second issue of volume 12 (2024) of *Research in Corpus Linguistics *(RiCL) has just been published. This is an issue guest edited by Robbie Love (Aston University) which includes a compilation of fantastic articles intended to shed new light on *"Innovations in the compilation and analysis of spoken corpora". *Please, visit our website <https://ricl.aelinco.es/index.php/ricl/issue/view/26> to see the whole volume. The table of contents can be found below. With best wishes, Paula Rodríguez-Puente & Carlos Prado-Alonso Editors of RiCL *Articles* - *Introduction: Innovation in spoken corpus linguistics <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Robbie Love - *“We’ve lost you Ian”: Multi-modal corpus innovations in capturing, processing and analysing professional online spoken interactions <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Anne O'Keeffe, Dawn Knight, Geraldine Mark, Christopher Fitzgerald, Justin McNamara, Svenja Adolphs, Benjamin Cowan, Tania Fahey Palma, Fiona Farr, Sandrine Peraldi - *Building LANA-CASE, a spoken corpus of American English conversation: Challenges and innovations in corpus compilation <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Elizabeth Hanks, Tony McEnery, Jesse Egbert, Tove Larsson, Douglas Biber, Randi Reppen, Paul Baker, Vaclav Brezina, Gavin Brookes, Isobelle Clarke, Raffaella Bottini - *Compiling a corpus of African American Language from oral histories <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Sarah Moeller, Alexis Davis, Wilermine Previlon, Michael Bottini, Kevin Tang - *Addressing comparability and retrieval issues in conversation corpora: A case study on the Spoken British National Corpora (1994 and 2014), using the past perfect <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Nicholas Smith, Cristiano Broccias, Cathleen Waters - *Rethinking interviews as representations of spoken language in learner corpora <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Pascual Pérez-Paredes, Geraldine Mark - *Developing a coding scheme for annotating opinion statements in L2 interactive spoken English with application for language teaching and assessment <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Yejin Jung, Dana Gablasova, Vaclav Brezina, Hanna Schmück - *Corpus as a slice of life: Representing naturally occurring language and its speakers <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Giorgia Troiani, John W. Du Bois, Andrey Filchenko - *Design and construction of a social media corpus: Influencers’ speech in vlogs <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Hülya Mısır *Book Reviews* - *Review of Gillings, Mathew, Gerlinde Mautner and Paul Baker. 2023. Corpus-Assisted Discourse Studies. Cambridge: Cambridge University Press. ISBN: 978-1-009-16815-1. DOI: https://doi.org/10.1017/9781009168144 <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Tamsin Parnell - *Review of Brookes, Gavin and Luke C. Collins. 2023. Corpus Linguistics for Health Communication: A Guide for Research. London: Routledge. ISBN: 978-1-003-09965-9 https://doi.org/10.4324/9781003099659 <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Ovidia Martínez Sánchez - *Review of Pettersson-Traba, Daniela. 2022. The Development of the Concept of SMELL in American English. A Usage-Based View of Near-Synonymy. Berlin: De Gruyter Mouton. ISBN: 978-3-11079-2201. DOI: https://doi.org/10.1515/9783110792294 <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Daniel Granados-Meroño - *Review of Izquierdo, Marlén and Zuriñe Sanz-Villar eds. 2023. Corpus Use in Cross-linguistic Research: Paving the Way for Teaching, Translation and Professional Communication. Amsterdam: John Benjamins. ISBN: 978-9-027-21430-0. DOI: https://doi.org/10.1075/scl.113 <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Isabel Pizarro-Sánchez - *Review of Viana, Vander ed. 2023. Teaching English with Corpora: A Resource Book. London: Routledge. ISBN: 978-1-032-25297-1. DOI: https://doi.org/10.4324/b22833 <https://urldefense.com/v3/__https:/ricl.aelinco.es/index.php/ricl/article/v…>*. Gaëtanelle Gilquin *Paula Rodríguez-Puente* Departamento de Filología Inglesa, Francesa y Alemana Universidad de Oviedo Campus El Milán C/ Amparo Pedregal s/n 33011 Oviedo *Tlf*. 985-104570 Variation, Linguistic Change and Grammaticalization <http://www.usc-vlcg.es/PRP.htm> LINGUO <https://linguo.grupos.uniovi.es/presentacion/miembros/detalle/-/asset_publi…> Academia <https://uniovi.academia.edu/PaulaRodr%C3%ADguezPuente> Scholar <https://scholar.google.es/citations?user=I4axNDUAAAAJ&hl=es>

1 0

Genre-enriched web corpora and multilingual genre classifier
by Taja Kuzman 25 Oct '24

25 Oct '24

Dear all, If you are involved in (web) corpora creation and curation, interested in large multilingual corpora for European languages, or working with automatic genre annotation, the following resources might be useful for you. Multiple multilingual genre-related resources and technologies are now available on the CLARIN.SI and Hugging Face repositories: - 𝗚𝗲𝗻𝗿𝗲-𝗲𝗻𝗿𝗶𝗰𝗵𝗲𝗱 𝗠𝗮𝗖𝗼𝗖𝘂-𝗚𝗲𝗻𝗿𝗲 𝘄𝗲𝗯 𝗰𝗼𝗿𝗽𝗼𝗿𝗮 - MaCoCu web corpora for 13 European languages (Albanian, Bosnian, Bulgarian, Catalan, Croatian, Greek, Icelandic, Macedonian, Montenegrin, Serbian, Slovenian, Turkish, and Ukrainian), automatically annotated with genre labels. In total, the corpus collection comprises 67 million texts and 28.5 billion words. They are available on the CLARIN.SI repository: http://hdl.handle.net/11356/1969 - 𝗫-𝗚𝗘𝗡𝗥𝗘 𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿 - multilingual text genre classifier, applicable to any of the 100 languages that are included in the XLM-RoBERTa model - available on Hugging Face (https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-cla…) and CLARIN.SI repository (http://hdl.handle.net/11356/1961) - 𝗘𝗻𝗴𝗹𝗶𝘀𝗵-𝗦𝗹𝗼𝘃𝗲𝗻𝗶𝗮𝗻 𝗫-𝗚𝗘𝗡𝗥𝗘 𝗱𝗮𝘁𝗮𝘀𝗲𝘁 - manually-annotated genre dataset, used for training and evaluation of the X-GENRE classifier - available on Hugging Face (https://huggingface.co/datasets/TajaKuzman/X-GENRE-text-genre-dataset) and CLARIN.SI repository (http://hdl.handle.net/11356/1960). Additionally, we set up a 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝗳𝗼𝗿 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰 𝗴𝗲𝗻𝗿𝗲 𝗶𝗱𝗲𝗻𝘁𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻 (https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark) for continuous evaluation of the emerging technologies on this task. The benchmark is based on unpublished manually-annotated datasets - if you wish to test your own systems on the task, let me know, and we'll be happy to share them with you. Best regards, -- TajaKuzman Research Assistant Department of Knowledge Technologies | Jožef Stefan Institute, Slovenia CLASSLA Knowledge Centre for South Slavic languages | CLARIN.SI twitter <https://twitter.com/TajaKuzman> linkedin <https://www.linkedin.com/in/taja-kuzman/>

1 0

2026

2025

2024

2023

2022

Corpora October 2024