*Apologies for crossposting*
LLMs Beyond the Cutoff: 1st International Workshop on Computational Methods Beyond the Temporal Borders of Training Data
https://llmsbeyondthecutoff2024.wordpress.com
Collocated with CIKM 2024 October 25, 2024 — Boise (Idaho), USA
* July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date
=== NEWS === * LLMs Beyond the CutOff will be published as a volume of Springer Nature’s post-proceedings * Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff * Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gui...
SUMMARY
LLMs are trained on large amounts of web data that spread temporally up to a specific moment in time. For instance, chatGPT’s LLM “knows” the world before May 2023 with no real time access to information beyond this limit, other than a browsing tool similar to a search engine enabling simple lookup. However, in many scenarios, being able to analyze and reason with novel emerging events and topics is crucial to face the challenges of rapidly evolving landscapes of information. The workshop provides an interdisciplinary forum for discussing the temporal limitations of LLMs and proposing technical solutions of how to apply and develop LLMs beyond their cutoff dates. We explore two prominent scenarios, where contexts tend to evolve faster than the LLMs that are used to analyze them: (1) journalism and (2) industry. In terms of (1) the goal is to propose methods of detecting, classifying and reasoning with emerging topics that infuse public discourse on social or mainstream media. An example of such a topic is COVID-19 at the dawn of the pandemics outbreak. Downstream tasks of interest are fake news detection and fact-checking on novel topics, including claim analysis, opinion mining and narratives extraction. With regard to (2), the goal is to shed light on the limits of LLMs for companies in sectors such as international geopolitical monitoring and corporate intelligence, finance and stock market trading or insurance, where companies need to track their interests and products in real time. This does not address the inclusion of corporate data into the LLMs, but rather proposes solutions by using publicly available and constantly growing data. An overarching problem that will be studied is that of the cross-language and cross-country specificities of emerging data, where novel information in underrepresented languages or contexts may be more challenging to analyze. We welcome insights and parallels from the field of knowledge representation, where the similar problem with cutoff dates of knowledge graphs (dynamics and regular updates) is well understood. The expected outcomes are: 1) insights on the temporal limitations of LLMs, where the workshop will outline concrete challenges and bottlenecks in the identified scenarios; 2) novel methodological and technical solutions in terms of (incremental) machine learning models when dealing with (reasoning, extracting and classifying) information beyond the cutoff dates of current LLMs.
TOPICS OF INTEREST * Analysis of emerging topics and events, including counterfactual/what-if reasoning * Methods for few-shot or zero-shot learning * Large language models for online discourse * Large language models for corporate near real-time data analysis * Large language models for multimodal understanding and generation * Multilingual and cross-country emerging information extraction * Computational journalism, disinformation spread, fact-checking and fake news detection * Stance and viewpoint discovery for novel information * Detection and classification of claims within emerging narratives * Social, ethical and legal aspects of LLMs up-to-dateness * Interpretability / explainability of computational methods beyond the cut off * Linking and enrichment of data beyond LLM cut off * Foundational models for knowledge graph building and entity alignment * Recommender systems for novel information * Quality, provenance, uncertainty and trust of emerging information and data * Use-cases, applications and cross-community interfaces * Evaluation frameworks and benchmarks
SUBMISSION We welcome the following types of contributions: * Full papers (12-15 pages including references): contain original research. * Short papers (up to 11 pages including references): contain original research in progress. * Demo papers (up to 11 pages including references): contain descriptions of prototypes, demos or software systems. * Data papers (up to 11 pages including references): contain descriptions of resources related to the workshop topics, such as datasets, knowledge graphs, corpora, annotation protocols, etc. * Position papers (up to 11 pages including references): discuss vision statements or research directions.
Workshop papers must be self-contained and in English. They should not have been previously published, should not be considered for publication, and should not be under review for another workshop, conference, or journal. Manuscripts should be submitted via EasyChair ( https://easychair.org/conferences/?conf=llmsbeyondthecut0ff) in PDF format, using the Springer LNCS format. For full authors instructions, please check Springer’s website: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gui.... The review of manuscripts will be double-blind. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted contribution must register for the workshop and present the paper. Pre-prints of all contributions will be made available during the conference. The accepted papers will appear as a volume of Springer Nature’s LNCS post-proceedings. Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gui...
For any enquiries, please contact the workshop organizers: todorov@lirmm.fr, rettinger@uni-trier.de, jmgomez@expert.ai, croitoru@lirmm.fr, IMPORTANT DATES * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date
All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time zone.
KEYNOTES * TBA
AWARD * All contributions are eligible for the "Best Paper" award
ORGANIZING COMMITTEE * Konstantin Todorov (University of Montpellier, CNRS, LIRMM, France) * José Manuel Gomèz Perèz (Expert.ai, Spain) * Madalina Croitoru (University of Montpellier, CNRS, LIRMM, France) * Achim Rettinger (University of Trier, Germany)
PROGRAM COMMITTEE * Preslav Nakov, MBZUAI, United Arabe Emirates * Serena Villata, I3S, CNRS, France * Ronald Denaux, Amazon, USA * Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands * Elena Montiel, Universidad Politécnica de Madrid, Spain * Sandra Bringay, University Paul Valéry, France * Carlos Badenes, Universidad Politécnica de Madrid, Spain * Ioana Manolescu, Inria Saclay, France * Dino Ienco, INRAE, France * Colin Porlezza, Univ. della Svizzera Italiana, Switzerland * Katarina Boland, Heinrich Heine Universität, Germany * Gabriella Lapesa, GESIS, Germany * Jonas Fegert, FZI, Germany * Michael Färber, TU-Dresden, Germany * Salim Hafid, University of Montpellier, France * Pavlos Fafalios, FORTH, Greece * Andrés García Silva, Expert.ai, Spain * Sarah Labelle, University Paul Valéry, France * Pablo Calleja, Universidad Politécnica de Madrid, Spain
*Patricia Martín Chozas* *Assistant Professor *at the Applied Linguistics Department *Postdoctoral Researcher *at the Ontology Engineering Group (Artificial Intelligence Department) ETSI Informáticos - Universidad Politécnica de Madrid Phone: (+34) 910673091
Dear Sender,
I am currently out of the office and will not be checking emails regularly. I will return on September 9, and will respond to your message as soon as possible after that date.
Best regards, Charlott Jakob
On 18 Jul 2024, at 17:04, Patricia Martín Chozas via Corpora corpora@list.elra.info wrote:
*Apologies for crossposting*
LLMs Beyond the Cutoff: 1st International Workshop on Computational Methods Beyond the Temporal Borders of Training Data
https://llmsbeyondthecutoff2024.wordpress.com
Collocated with CIKM 2024 October 25, 2024 — Boise (Idaho), USA
* July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date
=== NEWS === * LLMs Beyond the CutOff will be published as a volume of Springer Nature’s post-proceedings * Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff * Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gui...
SUMMARY
LLMs are trained on large amounts of web data that spread temporally up to a specific moment in time. For instance, chatGPT’s LLM “knows” the world before May 2023 with no real time access to information beyond this limit, other than a browsing tool similar to a search engine enabling simple lookup. However, in many scenarios, being able to analyze and reason with novel emerging events and topics is crucial to face the challenges of rapidly evolving landscapes of information. The workshop provides an interdisciplinary forum for discussing the temporal limitations of LLMs and proposing technical solutions of how to apply and develop LLMs beyond their cutoff dates. We explore two prominent scenarios, where contexts tend to evolve faster than the LLMs that are used to analyze them: (1) journalism and (2) industry. In terms of (1) the goal is to propose methods of detecting, classifying and reasoning with emerging topics that infuse public discourse on social or mainstream media. An example of such a topic is COVID-19 at the dawn of the pandemics outbreak. Downstream tasks of interest are fake news detection and fact-checking on novel topics, including claim analysis, opinion mining and narratives extraction. With regard to (2), the goal is to shed light on the limits of LLMs for companies in sectors such as international geopolitical monitoring and corporate intelligence, finance and stock market trading or insurance, where companies need to track their interests and products in real time. This does not address the inclusion of corporate data into the LLMs, but rather proposes solutions by using publicly available and constantly growing data. An overarching problem that will be studied is that of the cross-language and cross-country specificities of emerging data, where novel information in underrepresented languages or contexts may be more challenging to analyze. We welcome insights and parallels from the field of knowledge representation, where the similar problem with cutoff dates of knowledge graphs (dynamics and regular updates) is well understood. The expected outcomes are: 1) insights on the temporal limitations of LLMs, where the workshop will outline concrete challenges and bottlenecks in the identified scenarios; 2) novel methodological and technical solutions in terms of (incremental) machine learning models when dealing with (reasoning, extracting and classifying) information beyond the cutoff dates of current LLMs.
TOPICS OF INTEREST * Analysis of emerging topics and events, including counterfactual/what-if reasoning * Methods for few-shot or zero-shot learning * Large language models for online discourse * Large language models for corporate near real-time data analysis * Large language models for multimodal understanding and generation * Multilingual and cross-country emerging information extraction * Computational journalism, disinformation spread, fact-checking and fake news detection * Stance and viewpoint discovery for novel information * Detection and classification of claims within emerging narratives * Social, ethical and legal aspects of LLMs up-to-dateness * Interpretability / explainability of computational methods beyond the cut off * Linking and enrichment of data beyond LLM cut off * Foundational models for knowledge graph building and entity alignment * Recommender systems for novel information * Quality, provenance, uncertainty and trust of emerging information and data * Use-cases, applications and cross-community interfaces * Evaluation frameworks and benchmarks
SUBMISSION We welcome the following types of contributions: * Full papers (12-15 pages including references): contain original research. * Short papers (up to 11 pages including references): contain original research in progress. * Demo papers (up to 11 pages including references): contain descriptions of prototypes, demos or software systems. * Data papers (up to 11 pages including references): contain descriptions of resources related to the workshop topics, such as datasets, knowledge graphs, corpora, annotation protocols, etc. * Position papers (up to 11 pages including references): discuss vision statements or research directions.
Workshop papers must be self-contained and in English. They should not have been previously published, should not be considered for publication, and should not be under review for another workshop, conference, or journal. Manuscripts should be submitted via EasyChair (https://easychair.org/conferences/?conf=llmsbeyondthecut0ff) in PDF format, using the Springer LNCS format. For full authors instructions, please check Springer’s website: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gui.... The review of manuscripts will be double-blind. Papers will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop. At least one author of each accepted contribution must register for the workshop and present the paper. Pre-prints of all contributions will be made available during the conference. The accepted papers will appear as a volume of Springer Nature’s LNCS post-proceedings. Submission via EasyChair: https://easychair.org/conferences/?conf=llmsbeyondthecut0ff Springer guidelines for authors: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gui...
For any enquiries, please contact the workshop organizers: todorov@lirmm.fr, rettinger@uni-trier.de, jmgomez@expert.ai, croitoru@lirmm.fr, IMPORTANT DATES * July 29, 2024: Paper submission deadline * August 30, 2024: Paper acceptance notification * September 15, 2024: Camera ready versions submission * October 25, 2024: Workshop date
All submission deadlines are end-of-day in the Anywhere on Earth (AoE) time zone.
KEYNOTES * TBA
AWARD * All contributions are eligible for the "Best Paper" award
ORGANIZING COMMITTEE * Konstantin Todorov (University of Montpellier, CNRS, LIRMM, France) * José Manuel Gomèz Perèz (Expert.ai, Spain) * Madalina Croitoru (University of Montpellier, CNRS, LIRMM, France) * Achim Rettinger (University of Trier, Germany)
PROGRAM COMMITTEE * Preslav Nakov, MBZUAI, United Arabe Emirates * Serena Villata, I3S, CNRS, France * Ronald Denaux, Amazon, USA * Filip Ilievski, Vrije Universiteit Amsterdam, The Netherlands * Elena Montiel, Universidad Politécnica de Madrid, Spain * Sandra Bringay, University Paul Valéry, France * Carlos Badenes, Universidad Politécnica de Madrid, Spain * Ioana Manolescu, Inria Saclay, France * Dino Ienco, INRAE, France * Colin Porlezza, Univ. della Svizzera Italiana, Switzerland * Katarina Boland, Heinrich Heine Universität, Germany * Gabriella Lapesa, GESIS, Germany * Jonas Fegert, FZI, Germany * Michael Färber, TU-Dresden, Germany * Salim Hafid, University of Montpellier, France * Pavlos Fafalios, FORTH, Greece * Andrés García Silva, Expert.ai, Spain * Sarah Labelle, University Paul Valéry, France * Pablo Calleja, Universidad Politécnica de Madrid, Spain
Patricia Martín Chozas Assistant Professor at the Applied Linguistics Department Postdoctoral Researcher at the Ontology Engineering Group (Artificial Intelligence Department)
ETSI Informáticos - Universidad Politécnica de Madrid
P hone: (+34) 910673091 _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info