CALL FOR PARTICIPATION AT IberLEF 2025
PastReader 2025
IberLEF Task on Transcription of Historical Content
First edition - Transcribing texts from the past
Shared task website: https://sites.google.com/view/pastreader2025/home
Held as part of the evaluation forum IberLEF 2025
https://sites.google.com/view/iberlef-2025 in the XLI edition of the
International Conference of the Spanish Society for Natural Language
Processing (SEPLN 2025 https://eventos.ita.es/sepln_2025/inicio/)
September 23, 2025. Zaragoza, Spain
Dear All,
We are pleased to inform you that registration is now open for Task
'PastReader 2025: IberLEF Task on Transcription of Historical Content
(First Edition) – Transcribing Texts from the Past.
The PastReader task was held as part of IberLEF 2025, the shared evaluation
campaign for Natural Language Processing systems in Spanish and other
Iberian languages, collocated with SEPLN 2025 Conference.
This is a novel task focusing on the correction of text extracted from
digitized historical documents. Participants in this task must be able
to generate
clean and corrected versions of texts extracted via OCR from the Spanish
historical press. The corrected text should be faithful to the original,
and take into account common errors derived from the digitization and OCR
process. For this edition, the collection is based on the Hemeroteca
Digital of the National Library of Spain (BNE).
-
A dataset of digitized historical press from the BNE will be used.
-
The collection contains millions of digitized pages of Spanish
newspapers and magazines.
-
The texts are in PDF format with OCR.
-
The corpus includes publications from the 17th to the 20th century.
-
The publications cover a wide variety of topics: politics, satire,
humor, science, religion, illustration, entertainment, sports, art, and
literature.
-
The goal is to advance the automation of the transcription process.
TASK
Two tasks have been created related to the basic workflow in a
transcription process: extraction of text from scanned documents (OCR) and
curation of the extracted text to fix found errors:
-
Task 1: Error correction. In this task, participants will be provided
with the output of an OCR system and will be asked to generate clean and
corrected versions of the extracted texts.
-
Task 2: End-to-end extraction. Due to the advance in multimodal systems,
this task aims to explore end-to-end approaches, using scanned pages as
input and expecting to produce curated texts as output.
DATA
For this shared tasks, three subsets of data have been prepared:
-
Training set: 8,959 pages (Scanned PDF, OCR output, and corrected
text).
-
Development set: 500 pages (Scanned PDF, OCR output, and corrected text).
-
Test set: Subtask 1: 2,736 pages (OCR output only released to
participants). Subtask 2: 2,736 pages (Scanned PDF only released to
participants).
The quality of the OCR results varies due to several factors, such as the
date of digitization, available technology, the state of preservation of
the originals, and the complexity of the text structure. Efforts have been
made to improve these texts, including collaborative corrections through
the ComunidadBNE platform. The manually corrected output serves as a
valuable resource for testing and training technology.
Participating in this task is a great opportunity to advance historical
text processing. You'll work with a large dataset from the National Library
of Spain (BNE), improving OCR correction skills and contributing to
research. Your contribution will aid in digitizing historical documents for
future access.
To participate, go to: https://forms.gle/iBwuUzjZdc2JyFDKA
IMPORTANT DATES
Feb 3rd: Registration open
Mar 17th: Release of training corpora
Mar 31st: Registration closed
Apr 7th: Release of test corpora and start of the evaluation campaign
Apr 14th: End of evaluation campaign (deadline for submission of runs)
Apr 18th: Publication of official results and release of test gold labels
May 12th: Deadline for paper submission
May 30th: Acceptance notification
Jun 16th: Camera-ready submission deadline
July 3rd: Final camera-ready submission deadline (to IberLEF organizers)
Sep, TBD: Publication of proceedings
Sep, TBD: IberLEF Workshop at SEPLN 2025
ORGANIZING COMMITTEE
- Arturo Montejo Ráez (Universidad de Jaén).
- Elena Sánchez Nogales (Biblioteca Nacional de España).
- Gloria Expósito Álvarez (Biblioteca Nacional de España).
- L. Alfonso Ureña López (Universidad de Jaén).
- María Teresa Martín Valdivia (Universidad de Jaén).
- Jaime Collado Montañez (Universidad de Jaén).
- Isabel Cabrera De Castro (Universidad de Jaén).
- María Victoria Cantero Romero (Universidad de Jaén).
- Ana García Serrano (UNED).
- Rocio Ortuño Casanova (UNED).
- Yanco Amor Torterolo Orta (UNED).
Best regards,
The PastReader 2025 organizing committee
[image: Universidad de Jaén] <https://www.ujaen.es/> Arturo Montejo Ráez
Profesor Titular de Universidad | Associated Professor (Tenured)
amontejo(a)ujaen.es
Universidad de Jaén
Departamento de Informática, A3-114
Las Lagunillas s/n, 23071 - Jaén (Spain)
+34 953 212 882
<https://www.ujaen.es/servicios/sinformatica/sites/servicio_sinformatica/fil…>
ORCID: http://orcid.org/0000-0002-8643-2714
Researcher ID: D-3387-2009
SINAI Research Group <https://sinai.ujaen.es>
[image: Universidad de Jaén] <https://www.ujaen.es/> *Antes de imprimir
este mensaje, piense si es necesario. Proteger el medio ambiente es cosa de
todos.*
*** CLÁUSULA DE CONFIDENCIALIDAD ***
Este mensaje se dirige exclusivamente a su destinatario y puede contener
información privilegiada o confidencial. Si no es Ud. el destinatario
indicado, queda notificado de que la utilización, divulgación o copia sin
autorización está prohibida en virtud de la legislación vigente. Si ha
recibido este mensaje por error, se ruega lo comunique inmediatamente por
esta misma vía y proceda a su destrucción.
This message is intended exclusively for its recipient and may contain
information that is CONFIDENTIAL. If you are not the intended recipient you
are hereby notified that any dissemination, copy or disclosure of this
communication is strictly prohibited by law. If this message has been
received by mistake, please let us know immediately via e-mail and delete
it.
10th Symposium on Corpus Approaches to Lexicogrammar (LxGr2025)
CALL FOR PAPERS
Deadline for abstract submission: 4 April 2025
The symposium will take place online on Friday 11 and Saturday 12 July 2025.
LxGr primarily welcomes papers reporting on corpus-based research on any aspect of the interaction of lexis and grammar -- particularly studies that interrogate the system lexicogrammatically to get lexicogrammatical answers. However, position papers discussing theoretical or methodological issues, as well as descriptions or demonstrations of tools or resources are also welcome, as long as they are relevant to both lexicogrammar and corpus linguistics.
The theme of LxGr2025 is: Conceptions of Lexicogrammar: How can corpus linguistics shed light on its nature?
If you would like to present, send an abstract of 500 words (excluding references) to lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>.
* Abstracts for research papers should specify the research focus (research questions or hypotheses), the corpus, the methodology (techniques, metrics), the theoretical orientation, and the main findings.
* Abstracts for position papers should specify the theoretical orientation and the potential contribution to both lexicogrammar and corpus linguistics.
* Abstracts for tools or resources should provide a clear description of the main functions, and specify the potential contribution to both lexicogrammar and corpus linguistics.
Full papers will be allocated 35 minutes (including 10 minutes for discussion).
Work-in-progress reports will be allocated 20 minutes (including 5 minutes for discussion).
There will be no parallel sessions.
Participation is free.
For details, visit the LxGr website: https://sites.edgehill.ac.uk/lxgr
If you have any questions, please contact lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>.
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
Dear Friends,
I hope this email finds you well. I would like to share my latest article
on Hate Speech in Social Media. The analysis is based on appraisal and
collocation networks.
Lima-Lopes, Rodrigo Esteves de. 2025. “Lexical Patterns of Religious
Conservatism: A Study of Social Media Reactions to an Art Exhibition in
Brazil.” *Digital Studies/Le champ numérique* 15(1): 1–36.
https://doi.org/10.16995/dscn.15132.
*Abstract*
*This study examines the dynamics of hate speech on social media, focusing
on comments opposing the announcement of the Queermuseu – Cartografias da
Diferença na Arte Brasileira (Cartographies of Diversity in Brazilian Art)
exhibition in southern Brazil. Employing appraisal system,
Systemic-Functional Linguistics, and network analysis, the research
investigates the linguistic and semantic patterns underlying the
interactions. Lexical and interaction networks were qualitatively
interpreted in order to understand the interaction amongst users and the
reaction towards such an exhibition. Results reveal that conservative
discourse significantly drives comments towards negative evaluation,
framing the exhibition as a threat to religious and traditional family
values. This research contributes to understanding the role of digital
platforms in amplifying intolerance and the mechanisms of polarized
discourse in Brazilian Portuguese.*
All the best,
Rodrigo
*---*
*Prof. Dr. LD. Rodrigo Esteves de Lima Lopes*
*(Ele/He/Er/Il)*
*Universidade Estadual de Campinas*
Livre Docente em Linguagem e Tecnologia ||
Prof. Hab. in Language and Technology ||
Professor Associado || Tenured Associate Professor ||
Depto. de Linguística Aplicada || Dept. of Applied Linguistics ||
CV (Português) <http://lattes.cnpq.br/1654734521861377> *||* ORCID
<https://orcid.org/0000-0003-3681-1553> *||* Google Scholar
<https://scholar.google.com.br/citations?user=q1V4jksAAAAJ&hl=pt-BR> *|| *
rll307(a)unicamp.br ||
PhD scholarship in Culture-Aware Evaluation of Large Language Models in Low-Resourced Languages at Centre for Language Technology, The Department of Nordic Studies and Linguistics, University of Copenhagen
The Centre for Language Technology (cst.ku.dk<https://cst.ku.dk/english/>) at The Department of Nordic Studies and Linguistics at the University of Copenhagen is inviting applications for a 3-year PhD scholarship starting on 1 September 2025 or soon after. The selected candidate will be co-supervised by both the Centre for Language Technology and the Department of Computer Science and will be physically located in both places during their employment.
The PhD scholarship is funded by the AI initiative ‘Sikker platform til udvikling af transparente danske sprogmodeller’ (‘secure platform for the development of Danish language model’), which is part of the AI Strategy embarked in 2024 by the Ministry of Digital Affairs (www.digmin.dk<https://www.digmin.dk/Media/638687214351712933/Strategisk%20indsats%20for%2…>). The candidate will be affiliated to the project group ‘Danish Foundation Models’, (www.foundationmodels.dk<https://www.foundationmodels.dk/>) which is a collaboration between several major universities in Denmark addressing the aforementioned initiative, including Aarhus University, Southern Danish University, the Alexandra Institute and the University of Copenhagen.
For more information, please see https://jobportal.ku.dk/phd/?show=163674
Patrizia Paggio
Associate Professor
University of Copenhagen
Centre for Language Technology
paggio(a)hum.ku.dk<mailto:paggio@hum.ku.dk>
Professor
University of Malta
Institute of Linguistics and Language Technology
patrizia.paggio(a)um.edu.mt<mailto:patrizia.paggio@um.edu.mt>
Selected recent publications and upcoming projects:
Paggio, P., Manex Agirrezabal, M., Navarretta, C. and Vitasovic, L. (2024) Multimodal behaviour in an online environment: The GEHM Zoom corpus collection. In Proceedings of LREC-COLING 2024, Torino, Italia. https://archive.org/details/GEHM_meeting_corpus
MultiplEYE DK - Enabling multilingual eye-tracking data collection - Funded by the Carlsberg Foundation https://www.carlsbergfondet.dk/det-har-vi-stoettet/cf24-2005/
We are happy to announce the 7th edition of the Summer School in Digital Humanities and Digital Communication, which will be hosted by the Department of Studies on Language and Culture of the University of Modena and Reggio Emilia, in collaboration with the Fondazione Marco Biagi and with the patronage of AIA. As part of the Doctoral Programme in Human Sciences, the Summer School aims to provide PhD students and young researchers with methodological tools for the study of digital communication and data analysis. This year’s focus is on challenges and opportunities of integrating traditional methods with innovative tools, with topics ranging from digital resources for research in the humanities to the use of new information technologies for data analysis. The programme combines lectures by invited speakers and workshops where young researchers can present their work and get feedback from the invited speakers.
Abstract submission deadline: March 28th
Notification of acceptance: April 11th
Date: June 3rd-6th, 2025
Location: Modena, Italy
Registration fee: € 100,00
Further information can be found here: https://www.summerschooldigitalhumanities.unimore.it/2025-edition/
KOLLOQUIUM “AUTORSCHAFT UND INDIVIDUELLER SPRACHGEBRAUCH” - Call for Abstracts
Vom 14. bis 15. November 2025 findet an der Ruhr-Universität Bochum ein linguistisches Kolloquium mit dem Titel "Autorschaft und individueller Sprachgebrauch" statt.
Die Veranstaltung wird von der Germanistik an der RUB in Kooperation mit dem Bereich Autorenerkennung des Bundeskriminalamts (BKA) Wiesbaden ausgerichtet.
In Abhängigkeit von Kontext und Motivation (linguistisch, philologisch, forensisch, technisch) dient die Bestimmung der Autorschaft unterschiedlichen Zielen. Im forensischen Bereich soll die linguistische Auswertung der Strafverfolgung helfen, das hinter dem Text stehende Individuum zu identifizieren, da sich für den Verfasser durch den Text rechtliche Konsequenzen ergeben. Ein zentrales Anwendungsfeld der forensischen Linguistik ist daher die Autorschaftsbestimmung.
Immer dann, wenn danach gefragt wird, wer einen Text geschrieben haben könnte, richtet sich das Interesse auf die individualisierende Seite des Sprachgebrauchs: Das Individuelle, und mit ihm sprachliche Variation, ihre Ausgestaltung und ihre Funktionalität, treten dann gegenüber dem Systemisch-Allgemeinen von Sprache in den Vordergrund. Der individuelle Sprachgebrauch wird zum Mittel, über das eine Zuschreibung an einen Autor erfolgt.
Die Veranstaltung will eine Diskussion zwischen verschiedenen Zugängen zum Thema anregen und ein Forum zum Austausch bieten. Der Fokus sollte auf Forschung zum Deutschen liegen. Die Einbindung in den forensischen Kontext ist ausdrücklich erwünscht, aber andere Zugänge sind ebenso willkommen. Wir laden Kolleginnen und Kollegen ein, die sich auf der Grundlage deutschsprachigen Materials mit forensischer Autorschaftsanalyse befassen, wie auch Forschende, die sich (jenseits einer forensischen Zielsetzung) für die Zusammenhänge von Individuum, Sprache und Autorschaft interessieren und dazu arbeiten.
Von den Beiträgen erhoffen wir uns Impulse zu folgenden Bereichen:
• In welchen Kontexten und für welche Forschungsfragen werden Texte in Hinblick auf ihre Autorschaft analysiert? Welche sprachlichen Aspekte werden dabei als relevant angesehen?
• Was macht das Individuelle am Sprachgebrauch aus und in welcher Form lässt es sich ermitteln?
• Welche linguistischen Variablen fließen in die Analyse mit ein und wie realisieren sie sich im Geschriebenen? Wie ergänzen weitere, außersprachliche Faktoren die Analyse?
• Welche Instrumentarien und Analyseverfahren werden für Autorenprofile und/oder Textvergleiche genutzt, mit welchen Ergebnissen? Wie geht man mit komplexeren Merkmalen um (z. B. Argumentationen, Stilzügen, Sprachhandlungen)?
• Welche Formen der Autorschaft sind (forensisch-linguistisch) relevant? Was definiert Autorschaft z. B. im kollaborativen Schreiben und wie geht man (forensisch-linguistisch) mit multipler Autorschaft um?
• Was behindert, was fördert die Identifizierung eines individuellen Sprachgebrauchs? Wo kommt linguistische Analyse an ihre Grenzen? Was ist darüber hinaus bei Verfassern im kriminellen Kontext zu bedenken?
Weitere mögliche Themenvorschläge:
• Variablen/Parameter des Autorenprofils
• Bewertung und Interpretation von Merkmalen
• inter- und intra-individuelle Variation, Stil und Textsorte
• Gruppensprache, Gruppenidentität und Autor
• Textproduktionsbedingungen und ihre Auswirkung auf individualtypische Merkmale
• Verstellungformen, Verstellungsstrategien
• Kombinationen qualitativer und quantitativer Ansätze, korpusbasierte Analysen
• Anwendungsbeispiele (aus der Praxis)
• interdisziplinäre Fragestellungen und Analyseansätze
Es sind alle Beiträge willkommen, die sich mit diesen und ähnlichen Fragen aus theoretischer, empirischer oder praktischer Perspektive befassen. Die Vorträge sollen eine Dauer von 20 Minuten nicht überschreiten, es folgt jeweils eine 10-minütige Diskussionsrunde. Wir bitten um Abstracts im Umfang von ca. 350 Wörtern (exkl. Literatur) mit maximal fünf Literaturangaben im pdf-Format an die Emailadresse autorschaft2025(a)rub.de . Ein Zeitslot für Posterpräsentationen ist ebenfalls vorgesehen. Vortragssprachen sind Deutsch und Englisch.
Eingeladene Sprecher:innen:
Lars Bülow (LMU München)
Dana Roemling (University of Birmingham)
Markus Shiegg (Universität Freiburg)
Call for Abstracts:
30.05.2025 Ende der Einreichfrist für Abstracts
31.07.2025 Benachrichtigung der Vortragenden/Posterpräsentierenden
Anmeldung:
Anmeldungen für Teilnehmende sind ab 1.9.2025 möglich. Nähere Informationen folgen.
Veranstaltungsort:
Landesspracheninstitut (LSI) in der Ruhr-Universität Bochum, Max-Kade-Halle
Laerholzstraße 84
44801 Bochum
Organisationskomitee:
Maria Berger (RUB)
Eilika Fobbe (BKA)
Nora Giljohann (RUB)
Steffen Hessler (RUB)
Kerstin Kucharczik (RUB)
Karin Pittner (RUB)
Tatjana Scheffler (RUB)
Website: https://staff.germanistik.rub.de/kolloquium-autorschaftsanalyse/
---
Tatjana Scheffler (she/her)
GB 5/157
Ruhr-Universität Bochum
Digital Forensic Linguistics
Fakultät für Philologie, Germanistisches Institut
Universitätsstraße 150
44780 Bochum
Germany
Mail: tatjana.scheffler(a)rub.de
Web: http://staff.germanistik.rub.de/digitale-forensische-linguistik/
Mastodon: https://fediscience.org/@tschfflr
Tel.: +49 234 32-21471
Dear colleagues,
(Apologize if you received multiple emails from different mailing lists)
We are delighted to announce the call for task proposals for NTCIR-19.
NTCIR (NII Testbeds and Community for Information Access Research) is a
series of evaluation conferences that mainly focus on information access
with East Asian languages and English. The first NTCIR conference (NTCIR-1)
took place in August/September 1999, and the latest NTCIR-18 conference
will be held on June 10-13, 2025. Research teams from all over the world
participate in one or more NTCIR tasks to advance the state of the art and
to learn from one another's experiences.
It is time to call for task proposals for the next NTCIR (NTCIR-19), which
will start in September 2025 and conclude in December 2026. Task proposals
will be reviewed by the NTCIR Program Committee, and organizers of accepted
tasks will have a chance to present their proposed tasks at the NTCIR-18
Conference held in NII, Tokyo, Japan, from June 10-13, 2025.
* IMPORTANT DATES:
*March 31, 2025: Task Proposal Submission Due (Anywhere on Earth)*May 15,
2025: Acceptance Notification of Task Proposals
June 10-13, 2025: NTCIR-18 Conference (Organizers of accepted tasks have a
chance to present their proposed tasks)
* SUBMISSION LINK:
*https://easychair.org/conferences/?conf=ntcir19proposal
<https://easychair.org/conferences/?conf=ntcir19proposal>*
* NTCIR-19 TENTATIVE SCHEDULE:
January 2026: Dataset release*
January-June 2026: Dry run*
March-July 2026: Formal run*
August 1, 2026: Evaluation results return
August 1, 2026: Task overview release (draft)
September 1, 2026: Submission due of participant papers (draft)
November 1, 2026: Camera-ready participant paper due
December 2026: NTCIR-19 Conference at NII, Tokyo, Japan
(* indicates that the schedule can be different for different tasks)
* WHO SHOULD SUBMIT NTCIR-19 TASK PROPOSALS?
We invite new task proposals within the expansive field of information
access. Organizing an evaluation task entails pinpointing significant
research challenges, strategically addressing them through collaboration
with fellow researchers (including co-organizers and participants),
developing the requisite evaluation framework to propel advancements in the
state of the art, and generating a meaningful impact on both the research
community and future developments.
Prospective applicants are urged to underscore the real-world applicability
of their proposed tasks by utilizing authentic data, focusing on practical
tasks, and solving tangible problems. Additionally, they should confront
challenges in evaluating information access technology, such as the
extensive number of assessments needed for evaluation, ensuring privacy
while using proprietary data, and conducting live tests with actual users.
In the era of large language models (LLMs), these models are anticipated to
significantly influence daily human activities. Nonetheless, the content
produced by LLMs often exhibits issues, such as hallucinations. NTCIR-19
encourages tasks that focus on the evaluation of the quality of content
generated by LLMs continued from NTCIR-18 as well as information access
exploiting LLMs, including generative information retrieval (IR), IR using
generative queries, conversational search using generated utterances,
evaluation using LLM (relevance judgements or language annotation using
LLM), and RAG.
* PROPOSAL TYPES:
We will accept two types of task proposals:
- Proposal of a Core task:
This is for fostering research on a particular information access problem
by providing researchers with a common ground for evaluation. New test
collections and evaluation methods may be developed through the
collaboration between task organizers (proposers) and task participants. At
NTCIR-18, the core tasks are AEOLLM, FairWeb-2, FinArg-2, Lifelog-6,
MedNLP-CHAT, RadNLP, and Transfer-2. Details can be found at
http://research.nii.ac.jp/ntcir/NTCIR-18/tasks.html.
- Proposal of a Pilot task:
This is recommended for organizers who propose to focus on a novel
information access problem, and there are uncertainties either in task
design or organization. It may focus on a sub-problem of an information
access problem and attract a smaller group of participating teams than core
tasks. However, it may grow into a core challenging task in the next round
of NTCIR. At NTCIR-18, the pilot tasks are HIDDEN-RAD, SUSHI, and U4.
Details can be found at http://research.nii.ac.jp/ntcir/NTCIR-18/tasks.html.
Organizers are expected to run their tasks mainly with their own funding
and to make the task as self-sustaining as possible. A part of the fund can
be supported by NTCIR, which is called "seed funding." It is usually used
for some limited purposes such as hiring relevance assessors. The seed
funding allocated to each task varies depending on requirements and the
number of accepted tasks. Typical cases would be around 1M JPY for a core
task and around 0.5M JPY for a pilot task (note that the amount is subject
to change).
Please submit your task proposal as a PDF file via EasyChair by March 31,
2025 (Anywhere on Earth).
https://easychair.org/conferences/?conf=ntcir19proposal
* TASK PROPOSAL FORMAT:
The proposal should not exceed four pages in A4 single-column format. The
first three pages should contain the main part and appendix, and the last
page should contain only a description of the data to be used in the task.
Please describe the data in as much detail as possible so that we can help
your data release process after the proposal is accepted. In the past
NTCIRs, it took much time to create memorandums for data release, which
sometimes slowed down the task organization.
Main part
- Task name and short name
- Task type (core or pilot) - Abstract
- Motivation
- Methodology
- Expected results
Appendix
- Names and contact information of the organizers - Prospective participants
- Data to be used and/or constructed
- Budget planning
- Schedule
- Other notes
Data (to be used in your task) - Details
(Please describe the details of the data, which should include the source
of the data, methods to collect the data, range of the data, etc.)
- License
(Please make sure that you have a license to distribute the data, and
details of the license should be provided. If you do not have permission to
release the data yet, please describe your plan to get the permission.)
- Distribution
(Please describe how you plan to distribute the data to participants. There
are mainly three choices: distributed by the data provider, distributed by
organizers, and distributed by NII.)
- Legal / Ethical issues
(If the data can cause legal or ethical problems, please describe how you
propose to address them. e.g., some medical data may need approval from an
ethical committee. e.g., some Web data may need filtering for excluding
discriminative messages.)
If you want NII to distribute your data to task participants on your
behalf, please email ntc-admin(a)nii.ac.jp before your task proposal
submission attaching the task proposal.
* REVIEW CRITERIA:
- Importance of the task to the information access community and the
society - Timeliness of the task
- Organizers’ commitment in ensuring a successful task
- Financial sustainability (self-sustainable tasks are encouraged)
- Soundness of the evaluation methodology
- Detailed description about the data to be used
- Language scope
* NTCIR-19 PROGRAM CO-CHAIRS:
Qingyao Ai (Tsinghua University, China)
Chung-Chi Chen (National Institute of Advanced Industrial Science and
Technology (AIST), Japan)
Shoko Wakamiya (Nara Institute of Science and Technology (NAIST), Japan)
* NTCIR-19 GENERAL CHAIRS:
Charles Clarke (University of Waterloo, Canada)
Noriko Kando (National Institute of Informatics, Japan)
Makoto P. Kato (University of Tsukuba, Japan)
Yiqun Liu (Tsinghua University, China)
SciVQA: Scientific Visual Question Answering Shared Task
Hosted as part of the SDP 2025 Workshop
July 31 or August 1st, 2025 (tbc)
Vienna, Austria
(co-located with ACL 2025)
SciVQA Shared Task: https://sdproc.org/2025/scivqa.html
SDP 2025 Workshop: https://sdproc.org/2025/index.html
Task Overview
Scholarly articles convey valuable information not only through unstructured text but also via (semi-)structured figures such as charts and diagrams. Automatically interpreting the semantics of knowledge encoded in these figures can be beneficial for downstream tasks such as question answering (QA).
In the SciVQA challenge, participants will develop multimodal QA systems using a dataset of scientific figures from ACL Anthology and arXiv papers. Each figure image is annotated with seven QA pairs and includes metadata such as caption, figure ID, figure type (e.g., compound, line graph, bar chart, scatter plot, etc.), QA pair type. This shared task specifically focuses on closed-ended visual (i.e., addressing visual attributes of a figure such as colour, shape, size, height, etc.) and non-visual (not addressing figure visual attributes) questions.
Evaluation
Systems will be evaluated using metrics such as BLEU, METEOR, and ROUGE. Automated evaluations of submitted systems will be done through the Codabench platform (link will be provided soon on the webpage).
Important Dates
Release of training data: April 1, 2025
Release of testing data: April 15, 2025
Deadline for system submissions: May 16, 2025
Paper submission deadline: May 23, 2025
Notification of acceptance: June 13, 2025
Camera-ready paper due: June 20, 2025
Workshop: July 31, 2025 or August 1, 2025 (TBA)
Participants are also invited to submit papers on their systems. Successful submissions will be published in the proceedings of the SDP 2025 workshop.
Organizers
Ekaterina Borisova (DFKI, Berlin, Germany)
Georg Rehm (DFKI, Berlin, Germany)