A position as Postdoctoral Research Fellow in Natural Language Processing is available within MediaFutures:Research Centre for Responsible Media Technology & Innovation at the Language Technology Group (LTG) at the University of Oslo (UiO), Norway.
The closing date is December 13th, 2024.
For more information about the position and the research group, please see the full announcement here:
https://www.jobbnorge.no/en/available-jobs/job/270966/postdoctoral-research…
Please do not hesitate to contact me for any further information.
Best regards,
Lilja
Note the paper submission deadline: 30 November, 2024
Workshop website: https://comparable.lisn.upsaclay.fr/bucc2025/
COLING website: https://coling2025.org/
Keynote speaker: Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi
**************************************************************
* Motivation
In the language engineering and linguistics communities, research in
comparable corpora has been motivated by two main reasons. In language
engineering, on the one hand, it is chiefly motivated by the need to
use comparable corpora as training data for statistical NLP
applications such as statistical and neural machine translation or
cross-lingual retrieval. In linguistics, on the other hand, comparable
corpora are of interest because they enable cross-language discoveries
and comparisons. It is generally accepted in both communities that
comparable corpora consist of documents that are comparable in content
and form in various degrees and dimensions across several
languages. Parallel corpora are on the one end of this spectrum, and
unrelated corpora are on the other.
In recent years, the use of comparable corpora for pre-training Large
Language Models (LLMs) has led to their impressive multilingual and
cross-lingual abilities, which are relevant to a range of applications,
including Information Retrieval, Machine Translation, Cross-lingual text
classification, etc. The linguistic definitions and observations related
to comparable corpora can improve methods to mine such corpora or
to improve cross-lingual transfer of LLMs. Therefore, it is of great interest
to bring together builders and users of such corpora.
* Shared Task
This year we will run a shared task aimed at detecting translations of
terms via comparable corpora. Please see the website for details: https://comparable.limsi.fr/bucc2025/bucc2025-task.html
* Topics
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and
pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide
for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words,
multi-word expressions, proper names, named entities, sentences, and
paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from
comparable corpora
- Induction of multilingual word classes from comparable corpora
Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive
linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic
linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
* Workshop Organizers
- Serge Sharoff (University of Leeds)
- Ayla Rigouts Terryn (Université de Montréal (UdeM), Mila)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay)
- Reinhard Rapp (University of Mainz, Germany)
* Program Committee
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences,
Iran)
- Eleftherios Avramidis (DFKI, Germany)
- Gabriel Bernier-Colborne (National Research Council, Canada)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Alex Fraser (University of Munich, Germany)
- Natalia Grabar (University of Lille, France)
- Amal Haddad Haddad (Universidad de Granada, Spain)
- Amir Hazem (University of Tokyo, Japan)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan).
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (RWS, USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Pranaydeep Singh (University of Gent, Belgium)
- Richard Sproat (Google, USA)
- Marko Tadić (University of Zagreb, Croatia)
- François Yvon (Sorbonne Université, France)
ROMCIR 2025: The 5th International Workshop on Reducing Online Misinformation through Credible Information Retrieval
Co-located with ECIR 2025: The 47th European Conference on Information Retrieval
Lucca, Italy | April 10, 2025
Workshop website: https://romcir.disco.unimib.it
Submission link: https://easychair.org/conferences/?conf=romcir2025
____________________________________________________________________________________________________
GENERAL DESCRIPTION
The fifth edition of ROMCIR concerns providing access to users to (topically) relevant and factually accurate information, to
mitigate the human-generated or AI-generated information disorder phenomenon concerning distinct domains.
By "information disorder" we mean all forms of communication pollution, from misinformation made out of ignorance, automatically
built based on biased content, to intentional sharing of false content (generated both manually and automatically).
In this context, all those approaches that can serve to assess the factual accuracy of information circulating online and
in social media in particular find their place. This topic is very broad, as it concerns different contents (e.g., Web pages,
news, reviews, medical information, online accounts, etc.), different Web and social media platforms (e.g., microblogging
platforms, social networking services, social question-answering systems, etc.), different purposes (e.g., identifying false
information, accessing information based on its truthfulness, retrieving truthful information, etc.), and different open issues
related in particular to AI (e.g., explainability of search results, assessment of the truthfulness of automatically generated
content, generative models to support IRSs, etc.).
****************************************************************************************************
THEMES
The themes of interest include, but are not limited to, the following:
* Access to and retrieval of truthful information
* Bot/spam/troll detection
* Computational fact-checking
* Credibility assessment of online documents
* Crowdsourcing for information truthfulness assessment
* Disinformation/misinformation/bias detection
* Evaluation strategies to assess information truthfulness
* Generative models and information truthfulness assessment
* Human-in-the-loop misinformation detection
* Information polarization in online communities, echo chambers
* Propaganda identification/analysis
* Retrieval of credible and truthful information
* Security, privacy, and information truthfulness
* Societal reaction to misinformation
* Stance detection
* Trust and reputation
Data-driven approaches, supported by publicly available datasets, are more than welcome.
****************************************************************************************************
CONTRIBUTIONS
The Workshop solicits the sending of two types of contributions relevant to the Workshop and suitable to generate
discussion:
* Original, unpublished contributions (pre-prints submitted to ArXiv are eligible) that will be included in an open-access
post-proceedings volume of CEUR Workshop Proceedings (http://ceur-ws.org/), indexed by both Scopus and DBLP.
* Already published or preliminary work that will not be included in the post-proceedings volume.
All submissions will undergo SINGLE-BLIND peer review by the Program Committee.
Submissions are to be done electronically through the EasyChair at:
https://easychair.org/conferences/?conf=romcir2025
****************************************************************************************************
SUBMISSION INSTRUCTIONS
Submissions must be:
* Regular papers: between 10 and 14 pages long
* Short papers: Between 5 and 9 pages long
We recommend that authors use the new CEUR-ART style for writing papers to be published:
* An Overleaf page for LaTeX users is available at:
https://www.overleaf.com/project/671e05abc213fddad9644a94
* An offline version with the style files including DOCX template files is available at:
http://ceur-ws.org/Vol-XXX/CEURART.zip
* The paper must contain, as the name of the conference: ROMCIR 2025: The 5th Workshop
on Reducing Online Misinformation through Credible Information Retrieval (held as part of ECIR
2025: The 47th European Conference on Information Retrieval), April 10, 2025, Lucca, Italy
* The title of the paper should follow the regular capitalization of English (e.g., Example of a Title of a Paper Correctly Capitalized)
* Please, choose the one-column template
* According to CEUR-WS policy, the papers will be published under a CC BY 4.0 license:
https://creativecommons.org/licenses/by/4.0/deed.en
If the paper is accepted, authors will be asked to sign (at pen) an author agreement with CEUR:
* In case you do not employ Third-Party Material (TPM) in your draft, sign the document at:
https://ceur-ws.org/ceur-author-agreement-ccby-ntp.pdf?ver=2024-06-04
* If you do use TPM, the agreement can be found at:
https://ceur-ws.org/ceur-author-agreement-ccby-tp.pdf?ver=2024-06-04
For further information: https://ceur-ws.org/HOWTOSUBMIT.html
****************************************************************************************************
IMPORTANT DATES (AoE)
* Abstract submission: January 05, 2025
* Paper submission: January 12, 2025
* Decision notification: February 16, 2025
* Workshop day: April 10, 2025
****************************************************************************************************
ORGANIZERS
* Udo Kruschwitz (https://www.linkedin.com/in/udo-kruschwitz-57106b5/), University of Regensburg, Regensburg, Germany
* Marinella Petrocchi (https://www.iit.cnr.it/en/marinella.petrocchi/), IIT-CNR, Pisa, Italy
* Marco Viviani (https://ikr3.disco.unimib.it/people/marco-viviani/), University of Milano-Bicocca, Milan, Italy
********************************************************************************
CoMeDiNLP: Context and Meaning--Navigating Disagreements in NLP Annotations
https://unimplicit.github.io/
Workshop held in conjunction with COLING 2025
January 19/20, 2025
********************************************************************************
Disagreements among annotators pose a significant challenge in Natural Language
Processing, impacting the quality and reliability of datasets and consequently
the performance of NLP models. This workshop aims to explore the complexities of
annotation disagreements, their causes, and strategies towards their effective
resolution, with a focus on meaning in context.
The quality and reliability of annotated data is crucial for the development of
robust NLP models. However, managing disagreements among annotators poses
significant challenges to researchers and practitioners. Such disagreements can
stem from various factors, including subjective interpretations, cultural biases
and ambiguous guidelines. Early research has highlighted the impact of annotator
disagreements on data quality and model performance (e.g. Artstein and Poesio,
2008; Pustejovsky and Stubbs, 2012; Plank et al., 2014).
More recent work on perspectivism in NLP, such as that by Basile et al. (2021),
highlights the importance of embracing multiple perspectives in annotation tasks
to better capture the diversity of human language. This approach argues for the
inclusion of various viewpoints to improve the robustness and fairness of NLP
models. On the modeling side, various methods for dealing with annotation
disagreements have been proposed. For example, Hovy et al. (2013) and Passonneau
and Carpenter (2014) identify and weigh annotator reliability to better aggregate
contributions, whereas recent approaches following the perspectivism approach
leverage inherent disagreements in subjective tasks to train models handling
diverse opinions (Davani et al., 2022; Deng et al., 2023).
== Call for Submissions ==
We invite both long (8 pages) and short (4 page) papers. The limits refer to the
content and any number of additional pages for references are allowed. The
papers should follow the COLING 2025 formatting instructions.
Each submission must be anonymized, written in English, and contain a title and
abstract. We especially welcome papers that address the following themes, for a
single type of disagreement or annotation disagreements in general:
- New benchmarks for detecting or categorizing disagreements
- Models and modeling strategies for variations in annotation
- Evaluation schemes and metrics for phenomena without a single ground truth
- Phenomena that are not yet within reach with current NLP technology.
To encourage discussion and community building and to bootstrap potential
collaborations, we elicit, in addition to shared task papers and regular
"archival" track papers, also non-archival submissions. These can take 2 forms:
- Works in progress, that are not yet mature enough for a full submission, can
be submitted in the form of a title and abstract. Abstracts may be up to two
pages in length.
- Already published work, or work currently under submission elsewhere, can be
submitted in the form of an abstract and a copy of the submission/publication.
These works will be reviewed for topical fit and accepted submissions will be
presented as posters. Depending on the final workshop program, selected works
may be presented in panels. We plan for these to be an opportunity for
researchers to present and discuss their work with the relevant community.
Please submit your papers here: https://softconf.com/coling2025/CM-ND-NLP25/
== Important Dates ==
November 18, 2024: Due date for workshop and shared task papers [1]
December 1-3, 2024: Author response period
December 5, 2024: Notification of acceptance
December 13, 2024: Camera-ready submission deadline
January 19/20, 2025: Workshop date
All deadlines are 11:59pm UTC-12 ("anywhere on Earth").
[1] If you plan to submit a paper but require a deadline extension, please send
us an email to michael.roth(a)utn.de and dominik.schlechtweg(a)ims.uni-stuttgart.de
== Organizers ==
Michael Roth, University of Technology Nuremberg
Dominik Schlechtweg, University of Stuttgart
== Program Committee ==
David Alfter, University of Gothenburg
Valerio Basile, University of Turin
Felipe Bravo, University of Chile
Jing Chen, Hong Kong Polytechnic University
Naihao Deng, University of Michigan
Aida Mostafazadeh Davani, Google Research
Diego Frassinelli, University of Konstanz / LMU Munich
Haim Dubossarsky, Queen Mary University
Simon Hengchen, iguanodon.ai & Université de Genève
Sandra Kübler, Indiana University
Andrei Kutuzov, University of Oslo
Elisa Leonardelli, Fondazione Bruno Kessler
Marie-Catherine de Marneffe, UCLouvain
Maja Pavlovic, Queen Mary University
Siyao Peng, LMU Munich
Pauline Sander, University of Stuttgart
Pia Sommerauer, Vrije Universiteit Amsterdam
Nina Tahmasebi, University of Gothenburg
Alexandra Uma
Frank D. Zamora-Reina, University of Chile
Wei Zhao, University of Aberdeen
---
Prof. Michael Roth [he/him]
Natural Language Understanding Lab
University of Technology Nuremberg
Technische Universität Nürnberg
Deadline extended: 28 November (extended)
Keynote Speaker: Ilan Pappe
Panel Discussion: Digital Archives and Cultural Heritage in the LLMs Era
Nakba-NLP 2025
International Workshop on Nakba Narratives as Language Resources
Part of the COLING 2025 Conference (virtual)
January 19, 2025
https://sina.birzeit.edu/nakba-nlp [1]
إغناء الرواية والنكبة الفلسطينية بتقنيات معالجة اللغة والذكاء الاصطناعي
(مدونات، صور، فيديو، اخبار، خطاب، تحيز، شبكات تواصل اجتماعي، نماذج
لغوية، تصنيف، احداث، ....)
We invite submissions for Nakba-NLP 2025, a workshop dedicated to
exploring and preserving Nakba narratives through the application of
artificial intelligence, natural language processing, and corpus
linguistics. We seek contributions on the following topics:
◈ Digitization of oral and written narratives
◈ Creation and labeling of language corpora and datasets
◈ Digital archives, metadata, and semantic/content mark-up
◈ Annotation tools and annotation guidelines
◈ Document classification, topic modeling, and information retrieval
◈ Named entity recognition for identifying people, places,
organizations, and events
◈ Entity linking and relationship extraction
◈ Event detection and event argument extraction
◈ Knowledge Graphs and Linked Data
◈ Vocabularies, dictionaries, and ontologies
◈ Data visualization
◈ Knowledge representation
◈ Machine translation, summarisation, and paraphrasing
◈ Natural Language Generation
◈ Large Language Models
◈ Sentiment analysis and emotional content extraction
◈ Discourse analysis (e.g., bias, offensive language, and
misinformation) related to Nakba narratives
◈ Voice & dialogue-based systems; ASR
◈ Palestinian dialects (written and spoken)
Suggested Datasets: a list of datasets can be found here
https://t.ly/00Ul6 [2]
Important Dates:
=====================
All deadlines are 11:59 pm UTC-12 (anywhere on Earth).
- Submission Deadline: 28 November 2024
- Notifications of Acceptance: 5 December 2024
- Camera Ready Deadline: 13 December 2024 (cannot be changed)
Organizing Committee:
=====================
- Mustafa Jarrar, Birzeit University, Palestine
- Nizar Habash, New York University, UAE
- Mo El-Haj, Lancaster University, UK
- Zeina Jallad, Harvard Law School, USA
- Camille Mansour, Paris-Sorbonne University, France
- Diana Allan, McGill University, Canada
- Paul Rayson, Lancaster University, UK
Publicity Chairs
=====================
- Amal Haddad, University of Granada, Spain
- Sanad Malaysha, Birzeit University, Palestine
Contact: Nakba-NLP25_coling2025(a)softconf.com
--
Links:
------
[1]
https://urldefense.com/v3/__https://sina.birzeit.edu/nakba-nlp/__;!!D9dNQww…
[2]
https://urldefense.com/v3/__https://t.ly/00Ul6__;!!D9dNQwwGXtA!Qs4o1RM4JHxc…
In this newsletter:
Join LDC for membership year 2025
Spring 2025 data scholarship application deadline
New publications:
LORELEI Yoruba Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T10>
Samrómur Synthetic<https://catalog.ldc.upenn.edu/LDC2024S12>
________________________________
Join LDC for membership year 2025
It's time to renew your LDC membership for 2025. Current (2024) members who renew their membership before March 3, 2025, will receive a 10% discount. New or returning organizations will receive a 5% discount if they join the Consortium by March 3.
In addition to receiving new publications, current LDC members enjoy the benefit of licensing older data from our Catalog of 950+ holdings at reduced fees. Current-year for-profit members may use most data for commercial applications.
Plans for next year's publications are in progress. Among the expected releases are:
* Iraqi Arabic - English Lexical Database: a set of six interrelated tables (roots, lemmas, wordforms, multi-word expressions, English definitions, example phrases) presenting each Iraqi Arabic word in Arabic script and IPA format, a result of LDC's collaboration with Georgetown University Press to enhance and update three dialectal Arabic dictionaries
* AIDA topic source data and annotations: multimodal source data and annotations in multiple languages (Russian, English, Spanish) for information and entity extraction
* 2015 NIST Language Recognition Evaluation Test Set: 164,000+ segments of conversational telephone speech and broadcast narrow band speech in six linguistic varieties (Arabic, Spanish, English, Chinese, Slavic, French) representing 20 languages, used in NIST's 2015 language recognition evaluation
* BOLT CALLFRIEND CALLHOME CTS audio, transcripts and translations: previously unpublished Chinese and Egyptian Arabic telephone conversations from the CALLFRIEND and CALLHOME collections, with transcripts and translations developed by LDC for the DARPA BOLT program
* Chinese Sentence Pattern Structure Treebank: 5,000+ sentences from ancient and modern Chinese texts with syntactic annotation based on sentence constituent analysis, developed by Beijing Normal University and Peking University
* IARPA MATERIAL language packs: conversational telephone speech, transcripts, English translations, annotations, and queries in multiple languages (e.g., Georgian, Kazakh, Lithuanian)
* LORELEI: representative and incident language packs containing monolingual text, bi-text, translations, annotations, supplemental resources, and related tools in various languages (e.g., Hungarian, Hindi, Amharic, Somali)
For full descriptions of all LDC data sets, browse our Catalog<https://catalog.ldc.upenn.edu/>. Visit Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for details on membership, user accounts and payment.
Spring 2025 data scholarship application deadline
Applications are now being accepted through January 15, 2025, for the Spring 2025 LDC data scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarships<https://www.ldc.upenn.edu/language-resources/data/data-scholarships> page for more information about program rules and submission requirements.
________________________________
New publications:
LORELEI Yoruba Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T10> was developed by LDC and is comprised of approximately 7.2 million words of Yoruba monolingual text, 127,000 Yoruba words translated from English data, and 810,000 words of Yoruba-English parallel text. Approximately 77,000 words were annotated for named entities, over 25,000 words were annotated for full entity (including nominals and pronouns) and simple semantic annotation, and around 10,000 words were annotated for noun phrase chunking. Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
Samrómur Synthetic<https://catalog.ldc.upenn.edu/LDC2024S12> was developed by the Language and Voice Lab, Reykjavik University<https://lvl.ru.is/> and contains 72 hours of Icelandic synthetic speech, transcripts and metadata. Source sentences were extracted from the Samrómur platform<https://samromur.is>, comprised of texts and transcripts covering various genres. Text was processed through a text-to-speech system developed by Reykjavik University's Language and Voice Lab to generate speech files. Synthesized speech was created with 44 voices (22 male, 22 female) at four different speed rates for a total of 220 speakers and 62,700 utterances (with 285 sentences/speaker).
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
LaTeCH-CLfL 2025:
The 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
to be held on May 3rd or 4th, 2025 in conjunction with NAACL 2025 <https://2025.naacl.org/> in Albuquerque, NM.
https://sighum.wordpress.com/latech-clfl-2025/
First Call for Papers (with apologies for cross-posting)
Organisers: Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz
LaTeCH-CLfL 2025 is the ninth in a series of meetings for NLP researchers who work with data from the broadly understood arts, humanities and social sciences, and for specialists in those disciplines who apply NLP techniques in their work. The workshop continues a long tradition of annual meetings. The SIGHUM Workshops on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH) ran ten times in 2007-2016. The five Workshops on Computational Linguistics for Literature (CLfL) took place in 2012-2016. The first eight joint workshops (LaTeCH-CLfL) were held in 2017-2024.
Topics and content
In the Humanities, Social Sciences, Cultural Heritage and literary communities, there is increasing interest in, and demand for, NLP methods for semantic and structural annotation, intelligent linking, discovery, querying, cleaning and visualization of both primary and secondary data. This is even true of primarily non-textual collections, given that text is also the pervasive medium for metadata. Such applications pose new challenges for NLP research: noisy, non-standard textual or multi-modal input, historical languages, vague research concepts, multilingual parts within one document, and so no. Digital resources often have insufficient coverage; resource-intensive methods require (semi-) automatic processing tools and domain adaptation, or intense manual effort (e.g., annotation).
Literary texts bring their own problems, because navigating this form of creative expression requires more than the typical information-seeking tools. Examples of advanced tasks include the study of literature of a certain period, author or sub-genre, recognition of certain literary devices, or quantitative analysis of poetry.
Topics of interest include, but are not limited to, the following:
• adaptation of NLP tools to Cultural Heritage, Social Sciences, Humanities and literature;
• automatic error detection and cleaning of textual data;
• complex annotation schemas, tools and interfaces;
• creation (fully- or semi-automatic) of semantic resources;
• creation and analysis of social networks of literary characters;
• discourse and narrative analysis/modelling, notably in literature;
• emotion analysis for the humanities and for literature;
• generation of literary narrative, dialogue or poetry;
• identification and analysis of literary genres;
• interpretability of large language models output for DH-related tasks (explainable AI);
• linking and retrieving information from different sources, media, and domains;
• low-resource and historical language processing;
• modelling dialogue literary style for generation;
• modelling of information and knowledge in the Humanities, Social Sciences, and Cultural Heritage;
• profiling and authorship attribution;
• search for scientific and/or scholarly literature;
• work with linguistic variation and non-standard or historical use of language.
Information for authors
We invite papers on original, unpublished work in the topic areas of the workshop. In addition to long papers, we will consider short papers and system descriptions (demos). We also welcome position papers. Please find submission requirements on the website https://sighum.wordpress.com/latech-clfl-2025/.
Important dates (tentative)
Workshop paper due: January 30, 2025
Notification of acceptance: March 1, 2025
Camera-ready papers due: March 10, 2025
Workshop date: May 3rd or 4th, 2025
More on the organizers
Diego Alves, Language Science and Technology, Saarland University
Yuri Bizzoni, Center for Humanities Computing / School for Communication and Culture, Århus University
Stefania Degaetano-Ortlieb, Language Science and Technology, Saarland University
Anna Kazantseva, National Research Council Canada
Janis Pagel, Department of Digital Humanities, University of Cologne
Stan Szpakowicz, School of Electrical Engineering and Computer Science, University of Ottawa
Contact
latech-clfl(a)googlegroups.com <mailto:latech-clfl@googlegroups.com>
AbjadNLP 2025 [1]
The 1st Workshop on NLP for Languages Using Arabic Script
https://wp.lancs.ac.uk/abjad/cfp/
CALL FOR PAPERS
CALL FOR PAPERS: THE 1ST WORKSHOP ON NLP FOR LANGUAGES USING ARABIC
SCRIPT (ABJADNLP 2025)
Co-located with COLING 2025 Conference, Abu Dhabi, UAE (19-20 January
2025)
Submission URL [2]
AbjadNLP is dedicated to advancing innovation and gaining deeper
insights into Natural Language Processing (NLP) for languages that use
the Arabic script. Our primary focus is on Abjad and Ajami languages
that utilise the Arabic script or its variations. Traditionally
associated with Semitic languages, Abjad scripts represent consonants in
every syllable. In contrast, Ajami scripts denote the alphabetic use of
the Arabic script in various African contexts, representing non-Arabic
languages. We are interested in research on languages that fall under
the Abjad or Ajami categories that use the Arabic script or any
variations of it.
We invite contributions, discussions, and explorations that delve deep
into the unique linguistic structures, resources, challenges, and
untapped potential presented by Abjad and Ajami languages within the
realm of NLP and language resources. Our goal is to create synergies
among researchers by addressing the diverse phenomena and challenges
inherent in these rich linguistic traditions.
The workshop is proud to highlight our connections with the Masakhane
NLP community and collaborations with institutions worldwide, such as
COMSATS on Urdu, and the long-standing UCREL NLP Group at Lancaster
University, whose work encompasses over 20 languages worldwide,
including Abjad and Ajami languages.
Note: We chose the name Abjad for simplicity, but our focus includes
Abjad and other languages that have adopted the Arabic and Perso-Arabic
scripts, as well as Ajami languages. We acknowledge that Sorani Kurdish,
when written in Arabic script, follows an alphabet style rather than an
Abjad style.
TOPICS OF INTEREST:
* Core Technologies: morphological analysis, disambiguation,
tokenisation, POS tagging, named entity detection, chunking, parsing,
semantic role labelling, sentiment analysis, language modelling, etc.
* Applications: machine translation, speech recognition, speech
synthesis, optical character recognition, assistive technologies, social
media, etc.
* Resources and Tools: dictionaries, annotated data, corpora,
orthography descriptions, font technology, glyph rendering, text input
methodologies, spell-checking, speech-to-text solutions, BLARK
descriptions, open access corpora.
* Cultural and Sociolinguistic Considerations: text processing,
transliteration challenges, and solutions, cultural contexts in NLP
applications.
SUBMISSION GUIDELINES:
We follow the COLING 2025 standards for submission format and
guidelines. Submissions should conform to the following types:
* Long papers: Up to eight (8) pages, presenting substantial,
original, completed, and unpublished work.
* Short papers: Up to four (4) pages, describing a small focused
contribution, negative results, system demonstrations, etc.
KEY DATES:
* 1st Call for Papers Announcement: 16 July 2024
* 2nd Call for Papers Announcement: 16 August 2024
* Paper Submission Deadline: 2 December 2024
* Notification of Paper Acceptance: 6 December 2024
* Camera-ready Paper Deadline: 13 December 2024
* Workshop Date: 19 or 20 January 2025
ORGANISING COMMITTEE:
General Chair: Mo El-Haj, Lancaster University
Programme Chairs:
* Hugh Paterson III, Collaborative Scholar
* Saad Ezzini, Lancaster University
* Ignatius Ezeani, Lancaster University
Review Committee:
* Mahum Hayat Khan, University of La Rioja
* Muhammad Sharjeel, COMSATS University Islamabad
Publication Chair: Sina Ahmadi, University of Zurich
Publicity Chairs:
* Cynthia Amol, Maseno University
* Amal Haddad Haddad, University of Granada
* Jaleh Delfani, University of Surrey
Advisory Committee:
* Ruslan Mitkov, Lancaster University
* Paul Rayson, Lancaster University
--
Amal Haddad Haddad (She/her)
Facultad de Traducción e Interpretación
Universidad de Granada |https://www.ugr.es/personal/amal-haddad-haddad
Lexicon Research Group |http://lexicon.ugr.es/haddad
Co-Convenor, BAAL SIG 'Humans, Machines,
Language'|https://r.jyu.fi/humala
Event Coordinator, BAAL SIG 'Language, Learning and Teaching'
===============
Cláusula de Confidencialidad: "Este mensaje se dirige exclusivamente a
su destinatario y puede contener información privilegiada o
confidencial. Si no es Ud. el destinatario indicado, queda notificado de
que la utilización, divulgación o copia sin autorización está prohibida
en virtud de la legislación vigente. Si ha recibido este mensaje por
error, se ruega lo comunique inmediatamente por esta misma vía y proceda
a su destrucción.
This message is intended exclusively for its addressee and may contain
information that is CONFIDENTIAL and protected by professional
privilege. If you are not the intended recipient you are hereby notified
that any dissemination, copy or disclosure of this communication is
strictly prohibited by law. If this message has been received in error,
please immediately notify us via e-mail and delete it"
===============
Links:
------
[1] https://wp.lancs.ac.uk/abjad/
[2] https://softconf.com/coling2025/AbjadNLP25/
NAKBA-NLP 2025
The 1st International Workshop on Nakba Narratives as Language Resources
Part of the COLING-2025 [1] Conference
Abu Dhabi, UAE (Fully Virtual)
January 20, 2025
CALL FOR PAPERS
We invite submissions for Nakba-NLP 2025, a workshop dedicated to the
exploration and preservation of Nakba narratives through the application
of artificial intelligence, natural language processing, and corpus
linguistics. All submitted papers should explain their relevance to the
topic of 'Nakba Narratives as Language Resources'. The organisers
reserve the right to reject any papers that incite hatred, refute
established facts, or undermine the suffering of individuals.
We seek contributions on the following issues of interest:
* Digitisation of oral and written narratives
* Creation and labeling of language corpora and datasets
* Digital archives, metadata, and semantic/content mark-up
* Annotation tools and annotation guidelines
* Document classification, topic modeling, and information retrieval
* Named entity recognition for identifying people, places,
organizations, and events
* Entity linking and relationship extraction
* Event detection and event argument extraction
* Knowledge Graphs and Linked Data
* Vocabularies, dictionaries, and ontologies
* Data visualisation
* Knowledge representation
* Machine translation, summarisation, and paraphrasing
* Natural Language Generation
* Large Language Models
* Sentiment analysis and emotional content extraction
* Discourse analysis (e.g., bias, offensive language, and
misinformation) related to Nakba narratives
* Voice & dialogue-based systems; ASR
* Palestinian dialects (written and spoken)
Participants are invited to use the following archives: Institute for
Palestine Studies [2], The Palestinian Museum [3], Nakba-Archive [4],
POHA [5],Alhaq [6],ICHR [7], as well as Wikipedia and the Wikidata
Knowledge Graph.
SUBMISSION DETAILS
All submitted papers must clearly state and explain their relevance to
the topic of 'Nakba Narratives as Language Resources'. The organisers
reserve the right to reject any papers that incite hatred, refute
established facts, or undermine the suffering of individuals.
Submissions may be of two types:
* Long papers - up to eight (8) pages maximum, presenting substantial,
original, completed, and unpublished work.
* Short papers - up to four (4) pages, describing a small focused
contribution, negative results, system demonstrations, etc.
The workshop supports the COLING anti-harassment policy: Policy [8].
COLING 2025 submission templates: Template [9].
Submission URL: Please submit here [10].
IMPORTANT DATES
* Submission Deadline: 25 November 2024
* Notifications of Acceptance: 5 December 2024
* Camera Ready Deadline: 13 December 2024 (cannot be changed).
Links:
------
[1] https://coling2025.org/
[2] https://www.palestine-studies.org/
[3] https://palmuseum.org/en
[4] https://www.nakba-archive.org/
[5] https://libraries.aub.edu.lb/poha/
[6] https://www.alhaq.org/
[7] https://www.ichr.ps/en
[8] https://coling2022.org/policy
[9] https://coling2025.org/calls/main_conference_papers/
[10] https://softconf.com/coling2025/Nakba-NLP25/