Hi Saeed,
There are a few summarization datasets in Italian.
We have published (https://ceur-ws.org/Vol-3033/paper65.pdf) a dataset extracted from Wikipedia, available from Hugging Face ( https://huggingface.co/datasets/Silvia/WITS).
Another group has recently published some datasets in the news domain ( https://www.mdpi.com/2078-2489/13/5/228), ie, from the newspaper Il Post ( https://huggingface.co/datasets/ARTeLab/ilpost) and Fanpage ( https://huggingface.co/datasets/ARTeLab/fanpage). They also automatically translated MLSum into Italian.
Previously, there were some Italian splits of multilingual datasets, e.g. WikiLingua.
Unfortunately, I do not know much about datasets in Spanish.
I hope this helps.
Regards,
Silvia
Il ven 4 ago 2023, 11:49 Saeed Farzi via Corpora corpora@list.elra.info ha scritto:
Hi guys, I am going to implement a summarization system in the medical domain in Italian and Spanish. So I am looking for free summarization datasets both in the public and medical domains in both languages. Any help would be appreciated. sincerely Ciao -- *Dr. Saeed Farzi,* Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran. Phone: +98-21-8462450-401 Fax: +98-21-88462066 P.O. Box: 16315-1355, Web: http://wp.kntu.ac.ir/saeedfarzi/ Lab: https://www.trlab.ir/
-- *Dr. Saeed Farzi,* Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran. Phone: +98-21-8462450-401 Fax: +98-21-88462066 P.O. Box: 16315-1355, Web: http://wp.kntu.ac.ir/saeedfarzi/ Lab: https://www.trlab.ir/
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info