[Corpora-List] Re: Fwd: looking for summarization datasets in italian and spanish langs

6 Aug 2023


      Hi Saeed,
There are a few summarization datasets in Italian.
We have published (https://ceur-ws.org/Vol-3033/paper65.pdf) a dataset
extracted from Wikipedia, available from Hugging Face (
https://huggingface.co/datasets/Silvia/WITS).
Another group has recently published some datasets in the news domain (
https://www.mdpi.com/2078-2489/13/5/228), ie, from the newspaper Il Post (
https://huggingface.co/datasets/ARTeLab/ilpost) and Fanpage (
https://huggingface.co/datasets/ARTeLab/fanpage). They also automatically
translated MLSum into Italian.
Previously, there were some Italian splits of multilingual datasets, e.g.
WikiLingua.
Unfortunately, I do not know much about datasets in Spanish.
I hope this helps.
Regards,
Silvia
Il ven 4 ago 2023, 11:49 Saeed Farzi via Corpora corpora@list.elra.info
ha scritto:
...
Hi guys,
I am going to implement a summarization system in the medical domain in
Italian and  Spanish. So I am looking for free summarization datasets both
in the public and medical domains in both languages.
Any help would be appreciated.
sincerely
Ciao
--
*Dr. Saeed Farzi,*
Faculty of Computer Engineering,
K. N. Toosi University of Technology, Tehran, Iran.
Phone: +98-21-8462450-401
Fax:   +98-21-88462066
P.O. Box: 16315-1355,
Web: http://wp.kntu.ac.ir/saeedfarzi/
Lab: https://www.trlab.ir/
--
*Dr. Saeed Farzi,*
Faculty of Computer Engineering,
K. N. Toosi University of Technology, Tehran, Iran.
Phone: +98-21-8462450-401
Fax:   +98-21-88462066
P.O. Box: 16315-1355,
Web: http://wp.kntu.ac.ir/saeedfarzi/
Lab: https://www.trlab.ir/

Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info
-- 
--
Le informazioni contenute nella presente comunicazione sono di natura 
privata e come tali sono da considerarsi riservate ed indirizzate 
esclusivamente ai destinatari indicati e per le finalità strettamente 
legate al relativo contenuto. Se avete ricevuto questo messaggio per 
errore, vi preghiamo di eliminarlo e di inviare una comunicazione 
all’indirizzo e-mail del mittente.

--
The information transmitted is 
intended only for the person or entity to which it is addressed and may 
contain confidential and/or privileged material. If you received this in 
error, please contact the sender and delete the material.

2026

2025

2024

2023

2022

[Corpora-List] Re: Fwd: looking for summarization datasets in italian and spanish langs