---------------------------------------------------------------------------------
Workshop on Generative AI and Knowledge Graphs (GenAIK),
19 January 2025, Abu Dhabi, UAE
Web: https://genetasefa.github.io/GenAIK2025/
X: @GenAIK25
LinkedIn: https://www.linkedin.com/groups/9868047
Mastodon: https://sigmoid.social/@GenAIK
---------------------------------------------------------------------------------
In conjunction with COLING 2025, January 19-24
---------------------------------------------------------------------------------
Workshop Overview
---------------------------------------------------------------------------------
Generative Artificial Intelligence (GenAI) is a branch of artificial
intelligence capable of creating seemingly new, meaningful content,
including text, images, and audio. It utilizes deep learning models, such
as Large Language Models (LLMs), to recognize and replicate data patterns,
enabling the generation of human-like content. Notable families of LLMs
include GPT (GPT-3.5, GPT-3.5 Turbo, and GPT-4), LLaMA (LLaMA and LLaMA-2),
and Mistral (Mistral and Mixtral). GPT, which stands for Generative
Pretrained Transformer, is especially popular for text generation and is
widely used in applications like ChatGPT. GenAI has taken the world by
storm and revolutionized various industries, including healthcare, finance,
and entertainment. However, GenAI models have several limitations,
including biases from training data, generating factually incorrect
information, and difficulty in understanding complex content. Additionally,
their performance can vary based on domain specificity.
In recent times, Knowledge Graphs (KGs) have attracted considerable
attention for their ability to represent structured and interconnected
information, and adopted by many companies in various domains. KGs
represent knowledge by depicting relationships between entities, known as
facts, usually based on formal ontological models. Consequently, they
enable accuracy, decisiveness, interpretability, domain-specific knowledge,
and evolving knowledge in various AI applications. The intersection between
GenAI and KG has ignited significant interest and innovation in Natural
Language Processing (NLP). For instance, by integrating LLMs with KGs
during pre-training and inference, external knowledge can be incorporated
for enhancing the model’s capabilities and improving interpretability. When
integrated, they offer a robust approach to problem solving in diverse
areas such as information enrichment, representation learning,
conversational AI, cross-domain AI transfer, bias, content generation, and
semantic understanding. This workshop aims at reinforcing the relationships
between Deep Learning, Knowledge Graphs, and NLP communities and foster
interdisciplinary research in the area of GenAI.
---------------------------------------------------------------------------------
Topics of Interest
---------------------------------------------------------------------------------
* Enhancing KG construction and completion with GenAI
* Multimodal KG generation
* Text-to-KG using LLMs
* Multilingual KGs
* GenAI for KG embeddings
* GenAI for Temporal KGs
* Dialogue systems enhanced by KG and GenAI
* Cross-domain knowledge transfer with GenAI
* Bias mitigation using KGs in GenAI
* Explainability with KGs and GenAI
* Natural language querying of KGs via GenAI
* NLP tasks using KGs and GenAI
* Prompt Engineering using KGs
* GenAI for Ontology learning and schema induction in KGs
* Hybrid QA systems combining KGs and GenAI
* Recommendation systems and KGs with GenAI
* Creating benchmark datasets relevant for tasks combining KGs and GenAI
* Real-world applications on scholarly data, biomedical domain, etc.
* Knowledge Graph Alignment
* Applying to real-world scenarios
------------------------------------------------------------------------------------
Important Dates
------------------------------------------------------------------------------------
- Submission deadline: 5 November 2024
- Notification of Acceptance: 5 December 2024
- Camera-ready paper due: 13 December 2024
- COLING2025 Workshop day: 19 January 2025
------------------------------------------------------------------------------------
Submissions
------------------------------------------------------------------------------------
Full research papers (6-8 pages)
Short research papers (4-6 pages)
Position papers (2 pages)
These page limits only apply to the main body of the paper. At the end of
the paper (after the conclusions but before the references) papers need to
include a mandatory section discussing the limitations of the work and,
optionally, a section discussing ethical considerations. Papers can include
unlimited pages of references and an unlimited appendix.
Papers must follow the two-column format of *ACL conferences, using the
official templates (
https://www.overleaf.com/latex/templates/association-for-computational-ling…
<https://goto-ng.fiz-karlsruhe.de/latex/templates/association-for-computatio…>
).
The templates are available for download as style files and formatting
guidelines. Submissions that do not adhere to the specified styles,
including paper size, font size restrictions, and margin width, will be
desk-rejected. Submissions are open to all and must be anonymous, adhering
to COLING 2025's double-blind submission and reproducibility guidelines.
All accepted papers (after double-blind review of at least 3 experts) will
appear in the workshop proceedings that will be published in ACL Anthology.
At least one of the authors of the accepted papers must register for the
workshop to be included into the workshop proceedings. The workshop will be
a 100% in-person 1-day event at COLING 2025.
Submissions must be made using the START portal:
https://softconf.com/coling2025/GenAIK25/
<https://goto-ng.fiz-karlsruhe.de/coling2025/GenAIK25/,DanaInfo=softconf.com…>
---------------------------------------------------------------------------------
Sponsors
---------------------------------------------------------------------------------
NFDI4DataScience (NFDI4DS - https://www.nfdi4datascience.de/
<https://goto-ng.fiz-karlsruhe.de/,DanaInfo=www.nfdi4datascience.de,SSL+> )
is a national research data infrastructure for Data Science and AI project.
The overarching objective of the project is the development, establishment,
and sustainment of a national research data infrastructure (NFDI) for the
Data Science and Artificial Intelligence community in Germany. The vision
of NFDI4DS is to support all steps of the complex and interdisciplinary
research data lifecycle, including collecting/creating, processing,
analyzing, publishing, archiving, and reusing resources in Data Science and
Artificial Intelligence. NFDI4ds is offering a total of €2000 in travel
grants (€1000 each) to two selected students who will attend and present
their work at GenAIK 2025! To be considered, submit your paper to the
workshop, and if your paper is accepted, you’ll be eligible for a chance to
receive one of the two grants.
---------------------------------------------------------------------------------
Organization
---------------------------------------------------------------------------------
- Genet Asefa Gesese, FIZ Karlsruhe, KIT, Germany
- Harald Sack, FIZ Karlsruhe, KIT, Germany
- Heiko Paulheim, University of Mannheim, Germany
- Albert Meroño-Peñuela, King’s College London, UK
- Lihu Chen, Imperial College London, UK
If you have published in ACL conferences previously, and are interested to
be part of the program committee of GenAIK2025, please fill in this form:
https://forms.gle/t56dP6McD1VJmTfT9
<https://goto-ng.fiz-karlsruhe.de/,DanaInfo=forms.gle,SSL+t56dP6McD1VJmTfT9>
--
*Dr.-Ing. **Genet Asefa Gesese*
Head of Machine Learning Department (Abteilungsleitung Maschinelles Lernen)
FIZ Karlsruhe – Leibniz Institute for Information Infrastructure
( *https://www.fiz-karlsruhe.de/en/bereiche/lebenslauf-und-publikationen-dr-ing-genet-asefa-gesese
<https://www.fiz-karlsruhe.de/en/bereiche/lebenslauf-und-publikationen-dr-in…>*
)
AND
Karlsruhe Institute of Technology (KIT)
*( https://www.aifb.kit.edu/web/Genet_Asefa_Gesese/en
<https://www.aifb.kit.edu/web/Genet_Asefa_Gesese/en> )*
Dear colleagues,
We are pleased to announce that SyntaxFest 2025 (https://syntaxfest.github.io/syntaxfest25/) will take place in Ljubljana, Slovenia, from 26 to 29 August 2025. SyntaxFest is a biennial event that brings together a series of events focusing on topics such as empirical syntax, linguistic annotation, statistical language analysis, and natural language processing.
SyntaxFest 2025, organized by the University of Ljubljana, will host five events under a unified submission process and program:
* TLT: 23rd Workshop on Treebanks and Linguistic Theories
* DepLing: 8th International Conference on Dependency Linguistics
* UDW: 8th Universal Dependencies Workshop
* IWPT: 18th International Conference on Parsing Technologies
* Quasy: 2nd Workshop on Quantitative Syntax
In addition, the event will be co-located with the UniDive 1st Shared Task on Morphosyntactic Parsing, organized by the UniDive COST Action CA21167, on 26 August 2025.
Preliminary timeline for paper submission procedure:
* First call for papers: December 2024
* Submission deadline: April 2025
* Notification of acceptance: June 2025
* Conference dates: 26 to 29 August 2025
Workshop organizers / Programme chairs:
TLT:
* Heike Zinsmeister (University of Hamburg)
* Sarah Jablotschkin (University of Hamburg)
* Sandra Kübler (Indiana University)
DepLing:
* Eva Hajičová (Charles University, Prague)
* Sylvain Kahane (Université Paris Nanterre)
UDW:
* Gosse Bouma (University of Groningen)
* Cagri Coltekin (University of Tübingen)
IWPT:
* Kenji Sagae (University of California, Davis)
* Stephan Oepen (University of Oslo)
Quasy:
* Xinying Chen (University of Ostrava)
* Yaqin Wang (Guangdong University of Foreign Studies)
Local Organizing Committee:
* Kaja Dobrovoljc (University of Ljubljana, chair)
* Špela Arhar Holdt (University of Ljubljana)
* Marko Robnik Šikonja (University of Ljubljana)
* Matej Klemen (University of Ljubljana)
* Luka Terčon (University of Ljubljana)
* Sara Kos (University of Ljubljana)
We look forward to seeing you in Ljubljana!
On behalf of SyntaxFest 2025 Organizing Committee,
Kaja Dobrovoljc
(Apologies for cross-posting)
Second call for papers:
"WRAICogS1 - Writing Aids at the Crossroads of AI, Cognitive Science,
and NLP"
* Co-located with COLING 2025, Abu Dhabi, https://coling2025.org/
* SUBMISSION DEADLINE: November 25, 2024
* SUBMISSION LINK: https://softconf.com/coling2025/AAC-AI25/
KEYNOTE SPEAKER
Cerstin Mahlow, Professor of Digital Linguistics and Writing Research,
ZHAW School of Applied Linguistics, Winterthur, Switzerland
MOTIVATION
This workshop is dedicated to developing writing aids grounded in human
cognition (limitations of attention and memory, typically observed
habits, knowledge states, and information needs). In other words, we
focus on the cognitive and engineering aspects of interactive writing.
Our goal is not only to help people acquire and improve their writing
skills but also to enhance their productivity. By leveraging computer
technology, we aim to enable them to produce better texts in less time.
Writing is one of the four cornerstones of communication. By leaving a
trace, it allows us to reach many people, to transcend space and time,
and to spare ourselves the trouble of memorization. Writing is
undeniably important, whether as a communication tool, a thinking aid,
or a memorial support. However, what is less obvious is the process—that
is, the precise steps required to transform an intuition or vague idea
into concrete, well-polished prose. Producing readable, well-written
text requires many skills, deep and broad knowledge of various sorts
(topic, language, audience, metaknowledge, i.e., how to use the
information at hand?)— a lot of practice and appropriate feedback.
No one can learn all this overnight. The quantity and diversity of
knowledge to interiorize, as well as the variety of cognitive states
encountered, may explain why writing is so difficult and why it takes
time to gain control over the whole process and become an expert writer.
Unfortunately, knowledge alone is not enough. Writing is also a time-
and energy-consuming endeavor. It is very hard work.
Since writing is difficult, and since there are now computer programs
capable of doing it, one may wonder:
- whether we should leave the job entirely to the machine, or
- whether we could use these programs to help people write or to acquire
the skill of writing.
Indeed, there are situations where it makes sense to rely on machines
(e.g., routine work, business letters), but there are also many
situations where this strategy is not recommended (e.g., writing to
understand, writing to enrich and clarify our thoughts, writing to
support thinking). That being said, one may find a middle ground where
humans and machines work together, each contributing their strengths. It
remains to be seen where machines can assist in the process (e.g., idea
generation, idea structuring, translation into language, revision,
editing) and where it is better to leave control to humans. Hence, the
main question is not whether we should use LLMs to produce texts, but
rather how, when, and at what level to use them or other techniques to
help people produce written text.
In sum, our main goal is not to substitute machines for people or to
have them do the job in people's place, but rather to have machines
assist people. Specifically, we aim to help people learn to write, speed
up the process, gain better control, and reduce stress and cognitive
load. Our motivation is largely practical and educational.
Obviously, we are not the first ones to pursue this goal. However, while
many workshops focused on developing educational software, creating
intelligent writing assistants, or evaluating written text, the
submitted papers have primarily addressed formal aspects, such as
grammatical error detection and spotting spelling mistakes. Yet good
writing (text composition) requires much more than just the production
of well-formed sentences.
Our mission is to go beyond merely identifying errors or mistakes made
at the very end of the writing process, such as those due to ignorance
or inattention. Instead, we aim to evaluate the quality of the choices
made at higher levels. In other words, we are interested in the full
spectrum of writing, including technology-based writing aids that
address all tasks involved in writing: conceptual planning (ideation,
organization), linguistic expression, editing, and revision. Hence, we
welcome papers that focus on the higher levels of composition—such as
thinking, reasoning, and planning (idea generation, outline planning)—as
well as those concerned with the lower levels (grammar, spelling, and
punctuation).
Arguably, this is the first workshop to:
- Consider the entire spectrum of writing rather than only the lower levels,
- Integrate humans right from the start into the development cycle of
writing aids, and
- Provide support and feedback at any moment —before, during, and after
writing— rather than only at the very end.
TOPICS
We welcome contributions on all topics related to writing aids,
including but not limited to the following:
1. THE HUMAN PERSPECTIVE: Cognitive scientific viewpoints, including
education, psycholinguistics, and neuroscience.
(a) Support: How can AI tools support critical thinking and logical
reasoning in writing? How can writing assistants tailor feedback to
individual writers, considering their unique needs and styles? How can
we assess the quality and impact of AI-generated feedback on students'
writing (methods, metrics, etc.)?
(b) Topical coherence: How can we help people organize their ideas into
a coherent whole? How do we model or operationalize the concept of a
topic, the paragraph's most central element? How do we detect possible
topics within our data? What are typical subtopics of a given topic, and
how do we identify them? How do we cluster content/ideas into topics and
give the clusters appropriate names?
(c) Building software: How do we include humans in the development cycle
of writing aids? How and at what level can engineers use insights from
psycholinguistics and neuroscience? How can they model the writing
process while accounting for human and technological factors?
(d) Metacognition: What do people typically know about writing in
general and their own writing in particular? What are their problems and
needs? How do people manage to coordinate the different processes? What
should an authoring ecosystem look like (components)? What could be
automated, and what is best left for interactive processing?
(e) Shared tasks: What kinds of shared task would be meaningful while
being technically feasible?
2. THE ENGINEERING SIDE
(a) LLMs: Where in the writing process could we use methods developed in
AI (e.g., LLMs) or computational linguistics (e.g., content generation,
content structuring, translation into language, revision)? What are the
potential benefits, dangers, and limitations of LLMs as writing aids?
How could revealing the 'knowledge' embedded within black-box models
improve their effectiveness, particularly in terms of increasing the
accuracy and relevance of the feedback they provide? How can we address
challenges related to data collection, privacy, and ethical
considerations in developing and deploying AI writing tools?
(b) Tools and resources: What kinds of tools and resources (e.g., Sketch
Engine, Rhetorical Structure Theory, knowledge graphs, and linked data)
could be useful?
(c) Quality assessment: How can we check the veracity of facts,
relevance, cohesion, coherence, style, fluency, proper use of pronouns,
grammar, word choice, spelling, and punctuation?
(d) Enhancement and evaluation: How do we enhance text analysis during
or after writing (e.g., quality of coherence, style) using corpus
linguistic tools? How do we evaluate or compare existing writing
assistants (e.g., adequacy, design features, ease of use, lessons learned)?
SUBMISSION INSTRUCTIONS
Please submit your papers via the START/SoftConf submission portal
(https://softconf.com/coling2025/AAC-AI25/), following the COLING 2025
templates. Submitted versions must be anonymous and should not exceed 8
pages for long papers and 4 pages for short papers. References do not
count toward the page limit, and may be up to 4 pages long.
Supplementary material and appendices are also allowed. We also invite
papers discussing tools and applications (system demonstrations) related
to our workshop topics.
PUBLICATION
All the accepted papers (be it for oral presentation or as poster) will
be published as proceedings appearing in the ACL anthology.
PARTICIPATION
The workshop requires a physical presence. If any authors are unable to
attend and present in person, alternative arrangements (such as remote
presentations or video recordings) may be considered. However, we cannot
guarantee these options, as the COLING organizers and local chairs have
informed us that they will not provide technical support or online
access. Generally, work presented in person will be given preference
over work presented virtually.
ORGANIZERS
* Michael Zock (CNRS, LIS, Aix-Marseille University, Marseille, France)
* Kentaro Inui (Mohamed bin Zayed University of Artificial
Intelligence, UAE; Tohoku University, Japan; RIKEN, Japan)
* Zheng Yuan (King's College London and the University of Cambridge, UK)
MORE DETAILS:
* homepage : https://sites.google.com/view/wraicogs1
* better readable CFP :
https://sites.google.com/view/wraicogs1/home/call-for-papers
* program committee :
https://sites.google.com/view/wraicogs1/home/programme-committee
* background information :
https://sites.google.com/view/wraicogs1/home/background-and-topics
--
Michael ZOCK
Emeritus Research Director CNRS
LIS UMR 7020 (Group TALEP)
Aix Marseille Université
163 avenue de Luminy - case 901
13288 Marseille / France
Mail: michael.zock(a)lis-lab.fr <mailto:michael.zock@lis-lab.fr>
Tel.: +33 (0)6 51.70.97.22
Secr.: +33 (0)4.86.09.04.60
http://pageperso.lif.univ-mrs.fr/~michael.zock/
<http://pageperso.lif.univ-mrs.fr/%7Emichael.zock/>
Apologies for the multiple postings.
---------------------------------------------
*Call for Tutorial*
*FIRE 2024: 16th meeting of the Forum for Information Retrieval Evaluation*
12th - 15th December 2024
DA-IICT, Gandhinagar, India
*Submission Deadline: 15th November 2024*
Website: fire.irsi.org.in
Submission Link : https://cmt3.research.microsoft.com/FIRE2024
------------------------------
The 16th meeting of the Forum for Information Retrieval Evaluation 2024
will be held at Dhirubhai Ambani Institute of Information and Communication
Technology (DA-IICT), Gandhinagar, India. It will be an in-person
conference. We are inviting proposals for half-day tutorials covering
topics relevant to information retrieval (IR) and its applications. We
welcome topics that range from the theoretical foundations of IR to
practical applications, as well as tutorials on IR and machine learning
(ML) systems. Each tutorial should cover a single topic in depth. Tutorial
proposals should include details according to guidelines below.
*Submission Guidelines*
Proposals should be *at most 4 pages (excluding references) * must follow
ACM SIG's template available on
https://authors.acm.org/proceedings/production-information/taps-production-….
The only accepted format of submissions is PDF. We strongly encourage the
proposers to attend and present in-person. Submissions should include:
- Title and abstract
- Duration: Half Day
- Proposed content of the tutorial
- Target Audience
- Speaker's bio: Name, affiliations, contact and short bio.
Submissions are not anonymous (reviewing will be *single-blind*) and should
contain speaker details. Proposals which do not conform to the requirements
are likely to be rejected without review.
All proposals should be submitted via Microsoft CMT:
https://cmt3.research.microsoft.com/FIRE2024
*Important dates*
Tutorial proposal due *Nov 15, 2024 *
Tutorials notification *Nov 20, 2024 *
Camera ready due *Nov 30, 2024 *
Tutorial day *Dec 12-15, 2024*
Note: All submission deadlines are 11:59 PM AoE Time Zone (Anywhere on
Earth).
*Presentation Requirements*
If accepted, at least one author will have to register for the conference
and present tutorial in-person.
For queries related to conference please email us at [ clia(a)isical.ac.in ]
For latest updates subscribe the FIRE mailing List [
https://groups.google.com/forum/#!forum/fire-list ]
Call for papers: 1st Workshop on Computational Humor (CHum 2025)
================================================================
The 1st Workshop on Computational Humor (CHum 2025) will take place
virtually on January 19, 2025 as part of the 31st International
Conference on Computational Linguistics (COLING 2025).
Scope and topics
----------------
CHum 2025 aims to foster further work on modeling the processes of humor
with current methods in computational linguistics and natural language
processing, against the theoretical backdrop of humor research and with
reference to relevant corpora of textual, visual, and multimodal
materials. A principal goal of the workshop is to unite researchers who
can together probe the limits of various meaning representations --
symbolic, neural, and hybrid -- for humor processing.
We welcome contributions on any topic relevant to the computational
processing of humor, including but not limited to the following:
* LLMs, knowledge representation
* Resources and evaluation
* Human-computer interaction
* Computer-mediated communication
* Assisted content creation
* Machine and computer-assisted translation
* Digital humanities applications
* Formal modeling of humor
* Proof-of-concept humor detection and classification
Particularly encouraged are submissions describing inter- or
multi-disciplinary work, whether completed or in progress, and position
papers that critically discuss the past, present, and future of
computational humor systems.
Submission instructions
-----------------------
Long and short papers should be formatted according to the same
guidelines for the main COLING 2025 conference papers
<https://coling2025.org/calls/submission_guidlines/> and submitted
through START: <https://softconf.com/coling2025/CompHum25/>
Important dates
---------------
All deadlines are at 23:59 UTC-12:00 ("anywhere on Earth").
* Initial submission: November 15, 2024
* Notification of acceptance: December 2, 2024
* Camera-ready submission: December 13, 2024
* Workshop: January 19, 2025
Organizers
----------
* Christian F. Hempelmann, Texas A&M University-Commerce
* Julia Rayz, Purdue University
* Tiansi Dong, Fraunhofer IAIS
* Tristan Miller, University of Manitoba
Further information
-------------------
* Website: <https://chum2025.github.io/>
* E-mail: chum(a)groups.io
--
Dr. Tristan Miller, Assistant Professor
Department of Computer Science, University of Manitoba
https://clam.cs.umanitoba.ca/ | Tel. +1 204 474 6792
Apologies for cross posting
*Lin Lougheed Doctoral Fellowship in Language & Technology*
The Applied Linguistics & TESOL program at Teachers College, Columbia
University announces the Lin Lougheed Doctoral Fellowship in Language &
Technology. The fellowship will be offered to one student to develop and
research AI technologies for language learning or language assessment. This
4-year fellowship will provide tuition and a stipend for a student in good
standing during the program.
Teachers College Application Portal: https://apply.tc.edu/apply/
The application deadline is December 1, 2024.
Contact Prof. Erik Voss at ev2449(a)tc.columbia.edu with questions about this
fellowship.
--
Erik Voss, Ph.D.
Assistant Professor, Applied Linguistics & TESOL program
Language & Technology Specialization
Department of Arts & Humanities
Teachers College, Columbia University
TC Faculty Profile <https://www.tc.columbia.edu/faculty/ev2449/>, Linkedin
Profile <https://www.linkedin.com/in/erik-voss-ph-d-941a3ab9>, Google
Scholar <https://scholar.google.com/citations?user=FMnVdjcAAAAJ&hl=en>
ALTESOL Language & Technology Research Group
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home>
Editor-in-Chief of NYS TESOL Journal
Associate Editor of Language Assessment Quarterly
*Latest Publications*
Voss, E. et al. (2023). The Use of Assistive Technologies Including
Generative AI by Test Takers in Language Assessment: A Debate of Theory and
Practice. <https://doi.org/10.1080/15434303.2023.2288256> LAQ Journal
Voss, E. (2024) Duolingo Webinar: Current Applications of Artificial
Intelligence in Language Assessment
<https://youtu.be/b-mjLmvXLBU?si=nmph76-lizkfzi1J> (1 hour)
Voss, E. (2024). Language Assessment and Artificial Intelligence.
<https://books.google.com/books?hl=en&lr=&id=ht8aEQAAQBAJ&oi=fnd&pg=PA112&ot…>
The Concise Companion to Language Assessment.
Dear all,
If you are involved in (web) corpora creation and curation, interested
in large multilingual corpora for European languages, or working with
automatic genre annotation, the following resources might be useful for
you. Multiple multilingual genre-related resources and technologies are
now available on the CLARIN.SI and Hugging Face repositories:
- 𝗚𝗲𝗻𝗿𝗲-𝗲𝗻𝗿𝗶𝗰𝗵𝗲𝗱 𝗠𝗮𝗖𝗼𝗖𝘂-𝗚𝗲𝗻𝗿𝗲 𝘄𝗲𝗯
𝗰𝗼𝗿𝗽𝗼𝗿𝗮 - MaCoCu web corpora for 13 European languages (Albanian,
Bosnian, Bulgarian, Catalan, Croatian, Greek, Icelandic, Macedonian,
Montenegrin, Serbian, Slovenian, Turkish, and Ukrainian), automatically
annotated with genre labels. In total, the corpus collection comprises
67 million texts and 28.5 billion words. They are available on the
CLARIN.SI repository: http://hdl.handle.net/11356/1969
- 𝗫-𝗚𝗘𝗡𝗥𝗘 𝗰𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿 - multilingual text genre
classifier, applicable to any of the 100 languages that are included in
the XLM-RoBERTa model - available on Hugging Face
(https://huggingface.co/classla/xlm-roberta-base-multilingual-text-genre-cla…)
and CLARIN.SI repository (http://hdl.handle.net/11356/1961)
- 𝗘𝗻𝗴𝗹𝗶𝘀𝗵-𝗦𝗹𝗼𝘃𝗲𝗻𝗶𝗮𝗻 𝗫-𝗚𝗘𝗡𝗥𝗘 𝗱𝗮𝘁𝗮𝘀𝗲𝘁 -
manually-annotated genre dataset, used for training and evaluation of
the X-GENRE classifier - available on Hugging Face
(https://huggingface.co/datasets/TajaKuzman/X-GENRE-text-genre-dataset)
and CLARIN.SI repository (http://hdl.handle.net/11356/1960).
Additionally, we set up a 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝗳𝗼𝗿 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰
𝗴𝗲𝗻𝗿𝗲 𝗶𝗱𝗲𝗻𝘁𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻
(https://github.com/TajaKuzman/AGILE-Automatic-Genre-Identification-Benchmark)
for continuous evaluation of the emerging technologies on this task. The
benchmark is based on unpublished manually-annotated datasets - if you
wish to test your own systems on the task, let me know, and we'll be
happy to share them with you.
Best regards,
--
TajaKuzman
Research Assistant
Department of Knowledge Technologies | Jožef Stefan Institute, Slovenia
CLASSLA Knowledge Centre for South Slavic languages | CLARIN.SI
twitter <https://twitter.com/TajaKuzman>
linkedin <https://www.linkedin.com/in/taja-kuzman/>
Dear Colleagues,
The COLING2025 conference aims for a high-quality peer reviewing process by experts on a wide range of topics.
We are looking for people who are willing to serve as emergency reviewers for COLING 2025. Emergency reviewing will take place between 23 and 29 October. We are counting on your help to provide 1 or 2 reviews with a quick turnaround time during this period.
To volunteer, please fill in your information on this link: https://docs.google.com/forms/d/e/1FAIpQLSfDIxZGeyKOe5nV8YFUreNxA4Uw367oYpP…
The Natural Language Learning Group (NLLG) at University of Technology Nuremberg (UTN) has three fully funded PhD / PostDoc positions in Natural Language Processing.
Deadlines: 31.11.2024, 02.11.2024, 04.11.2024
Topics are: Next Generation LLMs, Multimodal Evaluation Metrics for Generative AI, NLP for science
Detailed information: https://nl2g.github.io/positions
Please direct all inquiries regarding scientific content to Steffen Eger (steffen.eger(a)utn.de<mailto:steffen.eger@utn.de>).
For general questions, contact stars(a)utn.de<mailto:stars@utn.de>.
Applications must be made to stars(a)utn.de with the corresponding reference numbers.
---------------------------------------------
Heisenberg Professor
Natural Language Learning & Generation (NLLG)
University of Technology Nuremberg (UTN)
https://nl2g.github.io/
<https://nl2g.github.io/>https://www.utn.de/en/person/prof-dr-steffen-eger/<https://www.utn.de/person/prof-dr-steffen-eger/>
https://www.utn.de/en/departments/department-engineering/nllg-lab/
<https://nl2g.github.io/>
Ulmenstraße 52i
90443 Nürnberg