---------- Call for Papers: Canadian AI 2025 ----------
---------- May 26-29, 2025, in Calgary, Alberta ----------
We are now inviting researchers to submit papers in all areas of Artificial Intelligence, either theoretical or applied, to the 38th Canadian Conference on Artificial Intelligence taking place in Calgary on May 26-29. We also welcome the submission of position papers, which present evidence-based arguments for a particular point of view without necessarily presenting a new system.
**Paper submissions are due by Monday, Feb 10, 2025 (11:59 p.m. AoE time zone).**
Conference proceedings will be published in PubPub open-access online format and submitted to be indexed/abstracted in leading indexing services such as DBLP, ACM, and Google Scholar.
---------- Submission details ----------
Canadian AI is accepting submissions of both long and short papers. Long papers must be no longer than 12 pages, and short papers must be no longer than 6 pages, including references. Submissions in both LaTeX and Word are accepted. More information and submission templates are available under Submission Details here:
https://www.caiac.ca/en/conferences/canadianai-2025/call-papers
**The portal for submission is now open and can be found here:**
https://cmt3.research.microsoft.com/CANADIANAI2025/
Papers submitted to the conference must not have already been published, or accepted for publication, or be under review by a journal or another conference (preprint is acceptable if the title is different). Submissions will go through a double-blind review process by Program Committee members to assess originality, significance, technical merit, and clarity of presentation. As such, submissions must be anonymized, and papers that fail to do so will be desk rejected without a review.
---------- Topics of interest include: ----------
- Agent Systems
- AI Applications
- Automated Reasoning
- Case‐based Reasoning
- Cognitive Models
- Constraint Satisfaction
- Data Mining
- Deep Learning and Neural Models
- E‐Commerce
- Ethics in AI, AI for social good
- Evolutionary Computation
- Explainable AI
- Fair, Secure, Private, and Trusted AI
- Games
- Information Retrieval and Search
- Knowledge Management
- Knowledge Representation
- Large Language Models
- Machine Learning
- Multimedia Processing
- Natural Language Processing
- Planning
- Robotics
- Uncertainty
- User Modeling
- Web Mining and Applications
Authors of accepted long papers will be allotted time for an oral presentation during the conference. Accepted short papers will also be allotted time for a 5-minute oral presentation, followed by a poster session presentation. It is mandatory for at least one author of each accepted paper to attend the conference in person to present their work. Authors are expected to agree to this requirement before submitting their paper for review.
Furthermore, the corresponding author of each paper must complete and sign a copyright form on behalf of all authors associated with the paper. It is important that the corresponding author who signs the copyright form matches the corresponding author listed on the paper.
---------- Awards ----------
A Best Paper Award and a Best Student Paper Award will be given at the conference, respectively, to the authors of each best paper, as judged by the Best Paper Award Selection Committee. For the Best Student Paper Award, the first author must be a registered student at the time of submitting the paper.
---------- Important dates ----------
- Submission deadline: Monday, Feb 10, 2025 (11:59 p.m. AoE time zone)
- Author notification: Tuesday, April 1, 2025
- Camera-ready copy due: Tuesday, April 15, 2025 (11:59 p.m. AoE time zone)
- Conference dates: May 26-29 2025
---------- Program Chairs ----------
Paula Branco
School of Electrical Engineering and Computer Science, University of Ottawa
pbranco(a)uottawa.ca
https://uniweb.uottawa.ca/view/profile/members/4218?lang=en
Amine Trabelsi
Département d'informatique, Université de Sherbrooke
Amine.Trabelsi(a)USherbrooke.ca
https://www.usherbrooke.ca/informatique/trabelsi
We look forward to your participation in Canadian AI 2025!
[Apologies for cross-posting]
The Laboratoire de Linguistique Formelle (www.llf.cnrs.fr <http://www.llf.cnrs.fr/>, LLF) is seeking to support applications in linguistics and language sciences to Research Associate positions at the French Centre National de la Recherche Scientifique (cnrs.fr <http://cnrs.fr/>).
CNRS Research Associate positions are full-time permanent positions intended for candidates in their early career. Applicants must hold a PhD by the application deadline. Knowledge of French is not required.
Although CNRS recruits researchers by way of a national competition, applicants are encouraged to select one or more research labs to which they would like to be assigned, and support is crucial for a successful application.
Located at Université Paris Cité (u-paris.fr <http://u-paris.fr/>), the LLF has about 80 members, including 36 permanent faculty members, working on every subfield of linguistics. In recent years, it has extended its focus from formal and theoretical linguistics to domains such as psycholinguistics, sociolinguistics, experimental linguistics, computational linguistics, dialogue, typology, and Sign language linguistics.
The LLF is interested in supporting a limited number of applicants, with an excellent research record and willing to develop a project that would fit the lab's areas of inquiry.
The official call for application will be published in early December, 2024 with an application deadline in early January, 2025 (https://carrieres.cnrs.fr/en/external-competitions-for-researchers-m-f/). Prospective applicants that wish to be supported by the LLF are invited to contact the lab by December 13, sending a CV (including a publication list) and a short description of their research profile to direction.llf(a)listes.u-paris.fr <mailto:direction.llf@listes.u-paris.fr>. Decisions on whether support is granted will be taken by December 18.
Olivier Bonami
Professeur de linguistique, Université Paris Cité
Directeur du Laboratoire de Linguistique Formelle
UMR 7110 - Université de Paris & CNRS
Tel: +33 1 57 27 57 97
Bâtiment Olympe de Gouges
8 place Paul Ricoeur
75013 Paris
Bureau 520
============================================
Interspeech 2025
17 - 21 August, Rotterdam, The Netherlands
https://www.interspeech2025.org/
============================================
Call for Satellite Workshops
https://www.interspeech2025.org/call-for-workshops
============================================
Important Dates
===============
Proposals of workshops to ISCA Workshop portal (for ISCA endorsement):
8 January 2025
Proposals of workshops to IS2025 Satellite Workshop Committee (after
having gotten ISCA endorsement): 1 February 2025
Notification by IS2025 Satellite Workshop Committee:
15 February 2025
Submissions for satellite workshop proposals are invited for Interspeech 2025!
The Interspeech 2025 satellite workshops committee calls for proposals
for satellite workshops. The aim of these satellite workshops is to
stimulate discussion in research areas related to speech and language.
Advertising your workshop as an Interspeech 2025 satellite event
greatly increases the visibility of your workshop.
Please note that all Interspeech 2025 satellite workshops need to
obtain ISCA endorsement. This is a novel Interspeech requirement. ISCA
endorsement may entail financial and/or practical help in organizing
your workshop. For details, please see the guidelines for ISCA
endorsement here.
In order to obtain ISCA endorsement, proposals for Interspeech 2025
satellite workshops should first be submitted to the ISCA workshop
portal. This ISCA endorsement procedure takes about two weeks. Once
ISCA endorsement has been granted, proposals for Interspeech 2025
satellite workshops should be submitted to the Interspeech satellite
workshop committee at satelliteevents(a)interspeech2025.org.
For more information about organizing a satellite workshop, please
contact the satellite workshop chairs at
satelliteevents(a)interspeech2025.org.
Interspeech 2025 satellite workshop proposal criteria
We invite workshop proposals that meet the general criteria below:
• Workshops should take place around the same time as Interspeech
2025. Interspeech 2025 takes place from 17 to 21 August. On Sunday 17
August several tutorials are planned. Applicants who wish to organize
a satellite event are therefore discouraged to organize the event on
Sunday 17 August, so as to avoid thematic overlap with any Interspeech
tutorial planned for that day.
• Workshops should take place within reasonable travel distance
from Rotterdam, the Netherlands
In addition, proposals need to meet the ISCA criteria below:
• topic should be in the ISCA scope
• organizing committees as well as invited speakers have to be
international and diverse
• organizing committee has to (ideally) cover more than one university
• keynote speaker(s) has to be someone relevant for her/his domain
Roadmap
1. As of now: Applicants may contact the satellite workshop committee
(for questions, or if you would like to have an idea of the planned
satellite workshop proposals so far).
2. 8 January 2025: Submission deadline for applicants to submit
satellite workshop proposals to ISCA workshop portal in order to
obtain ISCA endorsement
3. 1 February 2025: Submission deadline for applicants to submit their
proposal to the satellite workshop committee after having gotten ISCA
endorsement (submissions after 1 February 2025 might be considered
with less priority). We as Interspeech 2025 satellite workshop
committee will only check whether multiple research teams intend to
organize satellite events around the same (or largely overlapping)
topics. Please send us a brief description (approx. 2 pages) of your
proposed workshop, including target audience, title, topic, location
and date, organizational team, and website. Please also inform us
about the status of your ISCA endorsement application.
4. 15 February 2025: Decision notification by the satellite workshop
committee. Thereafter, satellite workshop organizers may contact the
professional conference organizer (PCO) for additional advice
(pco(a)interspeech2025.org).
A position as Postdoctoral Research Fellow in Natural Language Processing is available within MediaFutures:Research Centre for Responsible Media Technology & Innovation at the Language Technology Group (LTG) at the University of Oslo (UiO), Norway.
The closing date is December 13th, 2024.
For more information about the position and the research group, please see the full announcement here:
https://www.jobbnorge.no/en/available-jobs/job/270966/postdoctoral-research…
Please do not hesitate to contact me for any further information.
Best regards,
Lilja
Note the paper submission deadline: 30 November, 2024
Workshop website: https://comparable.lisn.upsaclay.fr/bucc2025/
COLING website: https://coling2025.org/
Keynote speaker: Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi
**************************************************************
* Motivation
In the language engineering and linguistics communities, research in
comparable corpora has been motivated by two main reasons. In language
engineering, on the one hand, it is chiefly motivated by the need to
use comparable corpora as training data for statistical NLP
applications such as statistical and neural machine translation or
cross-lingual retrieval. In linguistics, on the other hand, comparable
corpora are of interest because they enable cross-language discoveries
and comparisons. It is generally accepted in both communities that
comparable corpora consist of documents that are comparable in content
and form in various degrees and dimensions across several
languages. Parallel corpora are on the one end of this spectrum, and
unrelated corpora are on the other.
In recent years, the use of comparable corpora for pre-training Large
Language Models (LLMs) has led to their impressive multilingual and
cross-lingual abilities, which are relevant to a range of applications,
including Information Retrieval, Machine Translation, Cross-lingual text
classification, etc. The linguistic definitions and observations related
to comparable corpora can improve methods to mine such corpora or
to improve cross-lingual transfer of LLMs. Therefore, it is of great interest
to bring together builders and users of such corpora.
* Shared Task
This year we will run a shared task aimed at detecting translations of
terms via comparable corpora. Please see the website for details: https://comparable.limsi.fr/bucc2025/bucc2025-task.html
* Topics
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and
pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide
for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words,
multi-word expressions, proper names, named entities, sentences, and
paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from
comparable corpora
- Induction of multilingual word classes from comparable corpora
Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive
linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic
linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
* Workshop Organizers
- Serge Sharoff (University of Leeds)
- Ayla Rigouts Terryn (Université de Montréal (UdeM), Mila)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay)
- Reinhard Rapp (University of Mainz, Germany)
* Program Committee
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences,
Iran)
- Eleftherios Avramidis (DFKI, Germany)
- Gabriel Bernier-Colborne (National Research Council, Canada)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Alex Fraser (University of Munich, Germany)
- Natalia Grabar (University of Lille, France)
- Amal Haddad Haddad (Universidad de Granada, Spain)
- Amir Hazem (University of Tokyo, Japan)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan).
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (RWS, USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Pranaydeep Singh (University of Gent, Belgium)
- Richard Sproat (Google, USA)
- Marko Tadić (University of Zagreb, Croatia)
- François Yvon (Sorbonne Université, France)
ROMCIR 2025: The 5th International Workshop on Reducing Online Misinformation through Credible Information Retrieval
Co-located with ECIR 2025: The 47th European Conference on Information Retrieval
Lucca, Italy | April 10, 2025
Workshop website: https://romcir.disco.unimib.it
Submission link: https://easychair.org/conferences/?conf=romcir2025
____________________________________________________________________________________________________
GENERAL DESCRIPTION
The fifth edition of ROMCIR concerns providing access to users to (topically) relevant and factually accurate information, to
mitigate the human-generated or AI-generated information disorder phenomenon concerning distinct domains.
By "information disorder" we mean all forms of communication pollution, from misinformation made out of ignorance, automatically
built based on biased content, to intentional sharing of false content (generated both manually and automatically).
In this context, all those approaches that can serve to assess the factual accuracy of information circulating online and
in social media in particular find their place. This topic is very broad, as it concerns different contents (e.g., Web pages,
news, reviews, medical information, online accounts, etc.), different Web and social media platforms (e.g., microblogging
platforms, social networking services, social question-answering systems, etc.), different purposes (e.g., identifying false
information, accessing information based on its truthfulness, retrieving truthful information, etc.), and different open issues
related in particular to AI (e.g., explainability of search results, assessment of the truthfulness of automatically generated
content, generative models to support IRSs, etc.).
****************************************************************************************************
THEMES
The themes of interest include, but are not limited to, the following:
* Access to and retrieval of truthful information
* Bot/spam/troll detection
* Computational fact-checking
* Credibility assessment of online documents
* Crowdsourcing for information truthfulness assessment
* Disinformation/misinformation/bias detection
* Evaluation strategies to assess information truthfulness
* Generative models and information truthfulness assessment
* Human-in-the-loop misinformation detection
* Information polarization in online communities, echo chambers
* Propaganda identification/analysis
* Retrieval of credible and truthful information
* Security, privacy, and information truthfulness
* Societal reaction to misinformation
* Stance detection
* Trust and reputation
Data-driven approaches, supported by publicly available datasets, are more than welcome.
****************************************************************************************************
CONTRIBUTIONS
The Workshop solicits the sending of two types of contributions relevant to the Workshop and suitable to generate
discussion:
* Original, unpublished contributions (pre-prints submitted to ArXiv are eligible) that will be included in an open-access
post-proceedings volume of CEUR Workshop Proceedings (http://ceur-ws.org/), indexed by both Scopus and DBLP.
* Already published or preliminary work that will not be included in the post-proceedings volume.
All submissions will undergo SINGLE-BLIND peer review by the Program Committee.
Submissions are to be done electronically through the EasyChair at:
https://easychair.org/conferences/?conf=romcir2025
****************************************************************************************************
SUBMISSION INSTRUCTIONS
Submissions must be:
* Regular papers: between 10 and 14 pages long
* Short papers: Between 5 and 9 pages long
We recommend that authors use the new CEUR-ART style for writing papers to be published:
* An Overleaf page for LaTeX users is available at:
https://www.overleaf.com/project/671e05abc213fddad9644a94
* An offline version with the style files including DOCX template files is available at:
http://ceur-ws.org/Vol-XXX/CEURART.zip
* The paper must contain, as the name of the conference: ROMCIR 2025: The 5th Workshop
on Reducing Online Misinformation through Credible Information Retrieval (held as part of ECIR
2025: The 47th European Conference on Information Retrieval), April 10, 2025, Lucca, Italy
* The title of the paper should follow the regular capitalization of English (e.g., Example of a Title of a Paper Correctly Capitalized)
* Please, choose the one-column template
* According to CEUR-WS policy, the papers will be published under a CC BY 4.0 license:
https://creativecommons.org/licenses/by/4.0/deed.en
If the paper is accepted, authors will be asked to sign (at pen) an author agreement with CEUR:
* In case you do not employ Third-Party Material (TPM) in your draft, sign the document at:
https://ceur-ws.org/ceur-author-agreement-ccby-ntp.pdf?ver=2024-06-04
* If you do use TPM, the agreement can be found at:
https://ceur-ws.org/ceur-author-agreement-ccby-tp.pdf?ver=2024-06-04
For further information: https://ceur-ws.org/HOWTOSUBMIT.html
****************************************************************************************************
IMPORTANT DATES (AoE)
* Abstract submission: January 05, 2025
* Paper submission: January 12, 2025
* Decision notification: February 16, 2025
* Workshop day: April 10, 2025
****************************************************************************************************
ORGANIZERS
* Udo Kruschwitz (https://www.linkedin.com/in/udo-kruschwitz-57106b5/), University of Regensburg, Regensburg, Germany
* Marinella Petrocchi (https://www.iit.cnr.it/en/marinella.petrocchi/), IIT-CNR, Pisa, Italy
* Marco Viviani (https://ikr3.disco.unimib.it/people/marco-viviani/), University of Milano-Bicocca, Milan, Italy
********************************************************************************
CoMeDiNLP: Context and Meaning--Navigating Disagreements in NLP Annotations
https://unimplicit.github.io/
Workshop held in conjunction with COLING 2025
January 19/20, 2025
********************************************************************************
Disagreements among annotators pose a significant challenge in Natural Language
Processing, impacting the quality and reliability of datasets and consequently
the performance of NLP models. This workshop aims to explore the complexities of
annotation disagreements, their causes, and strategies towards their effective
resolution, with a focus on meaning in context.
The quality and reliability of annotated data is crucial for the development of
robust NLP models. However, managing disagreements among annotators poses
significant challenges to researchers and practitioners. Such disagreements can
stem from various factors, including subjective interpretations, cultural biases
and ambiguous guidelines. Early research has highlighted the impact of annotator
disagreements on data quality and model performance (e.g. Artstein and Poesio,
2008; Pustejovsky and Stubbs, 2012; Plank et al., 2014).
More recent work on perspectivism in NLP, such as that by Basile et al. (2021),
highlights the importance of embracing multiple perspectives in annotation tasks
to better capture the diversity of human language. This approach argues for the
inclusion of various viewpoints to improve the robustness and fairness of NLP
models. On the modeling side, various methods for dealing with annotation
disagreements have been proposed. For example, Hovy et al. (2013) and Passonneau
and Carpenter (2014) identify and weigh annotator reliability to better aggregate
contributions, whereas recent approaches following the perspectivism approach
leverage inherent disagreements in subjective tasks to train models handling
diverse opinions (Davani et al., 2022; Deng et al., 2023).
== Call for Submissions ==
We invite both long (8 pages) and short (4 page) papers. The limits refer to the
content and any number of additional pages for references are allowed. The
papers should follow the COLING 2025 formatting instructions.
Each submission must be anonymized, written in English, and contain a title and
abstract. We especially welcome papers that address the following themes, for a
single type of disagreement or annotation disagreements in general:
- New benchmarks for detecting or categorizing disagreements
- Models and modeling strategies for variations in annotation
- Evaluation schemes and metrics for phenomena without a single ground truth
- Phenomena that are not yet within reach with current NLP technology.
To encourage discussion and community building and to bootstrap potential
collaborations, we elicit, in addition to shared task papers and regular
"archival" track papers, also non-archival submissions. These can take 2 forms:
- Works in progress, that are not yet mature enough for a full submission, can
be submitted in the form of a title and abstract. Abstracts may be up to two
pages in length.
- Already published work, or work currently under submission elsewhere, can be
submitted in the form of an abstract and a copy of the submission/publication.
These works will be reviewed for topical fit and accepted submissions will be
presented as posters. Depending on the final workshop program, selected works
may be presented in panels. We plan for these to be an opportunity for
researchers to present and discuss their work with the relevant community.
Please submit your papers here: https://softconf.com/coling2025/CM-ND-NLP25/
== Important Dates ==
November 18, 2024: Due date for workshop and shared task papers [1]
December 1-3, 2024: Author response period
December 5, 2024: Notification of acceptance
December 13, 2024: Camera-ready submission deadline
January 19/20, 2025: Workshop date
All deadlines are 11:59pm UTC-12 ("anywhere on Earth").
[1] If you plan to submit a paper but require a deadline extension, please send
us an email to michael.roth(a)utn.de and dominik.schlechtweg(a)ims.uni-stuttgart.de
== Organizers ==
Michael Roth, University of Technology Nuremberg
Dominik Schlechtweg, University of Stuttgart
== Program Committee ==
David Alfter, University of Gothenburg
Valerio Basile, University of Turin
Felipe Bravo, University of Chile
Jing Chen, Hong Kong Polytechnic University
Naihao Deng, University of Michigan
Aida Mostafazadeh Davani, Google Research
Diego Frassinelli, University of Konstanz / LMU Munich
Haim Dubossarsky, Queen Mary University
Simon Hengchen, iguanodon.ai & Université de Genève
Sandra Kübler, Indiana University
Andrei Kutuzov, University of Oslo
Elisa Leonardelli, Fondazione Bruno Kessler
Marie-Catherine de Marneffe, UCLouvain
Maja Pavlovic, Queen Mary University
Siyao Peng, LMU Munich
Pauline Sander, University of Stuttgart
Pia Sommerauer, Vrije Universiteit Amsterdam
Nina Tahmasebi, University of Gothenburg
Alexandra Uma
Frank D. Zamora-Reina, University of Chile
Wei Zhao, University of Aberdeen
---
Prof. Michael Roth [he/him]
Natural Language Understanding Lab
University of Technology Nuremberg
Technische Universität Nürnberg
Deadline extended: 28 November (extended)
Keynote Speaker: Ilan Pappe
Panel Discussion: Digital Archives and Cultural Heritage in the LLMs Era
Nakba-NLP 2025
International Workshop on Nakba Narratives as Language Resources
Part of the COLING 2025 Conference (virtual)
January 19, 2025
https://sina.birzeit.edu/nakba-nlp [1]
إغناء الرواية والنكبة الفلسطينية بتقنيات معالجة اللغة والذكاء الاصطناعي
(مدونات، صور، فيديو، اخبار، خطاب، تحيز، شبكات تواصل اجتماعي، نماذج
لغوية، تصنيف، احداث، ....)
We invite submissions for Nakba-NLP 2025, a workshop dedicated to
exploring and preserving Nakba narratives through the application of
artificial intelligence, natural language processing, and corpus
linguistics. We seek contributions on the following topics:
◈ Digitization of oral and written narratives
◈ Creation and labeling of language corpora and datasets
◈ Digital archives, metadata, and semantic/content mark-up
◈ Annotation tools and annotation guidelines
◈ Document classification, topic modeling, and information retrieval
◈ Named entity recognition for identifying people, places,
organizations, and events
◈ Entity linking and relationship extraction
◈ Event detection and event argument extraction
◈ Knowledge Graphs and Linked Data
◈ Vocabularies, dictionaries, and ontologies
◈ Data visualization
◈ Knowledge representation
◈ Machine translation, summarisation, and paraphrasing
◈ Natural Language Generation
◈ Large Language Models
◈ Sentiment analysis and emotional content extraction
◈ Discourse analysis (e.g., bias, offensive language, and
misinformation) related to Nakba narratives
◈ Voice & dialogue-based systems; ASR
◈ Palestinian dialects (written and spoken)
Suggested Datasets: a list of datasets can be found here
https://t.ly/00Ul6 [2]
Important Dates:
=====================
All deadlines are 11:59 pm UTC-12 (anywhere on Earth).
- Submission Deadline: 28 November 2024
- Notifications of Acceptance: 5 December 2024
- Camera Ready Deadline: 13 December 2024 (cannot be changed)
Organizing Committee:
=====================
- Mustafa Jarrar, Birzeit University, Palestine
- Nizar Habash, New York University, UAE
- Mo El-Haj, Lancaster University, UK
- Zeina Jallad, Harvard Law School, USA
- Camille Mansour, Paris-Sorbonne University, France
- Diana Allan, McGill University, Canada
- Paul Rayson, Lancaster University, UK
Publicity Chairs
=====================
- Amal Haddad, University of Granada, Spain
- Sanad Malaysha, Birzeit University, Palestine
Contact: Nakba-NLP25_coling2025(a)softconf.com
--
Links:
------
[1]
https://urldefense.com/v3/__https://sina.birzeit.edu/nakba-nlp/__;!!D9dNQww…
[2]
https://urldefense.com/v3/__https://t.ly/00Ul6__;!!D9dNQwwGXtA!Qs4o1RM4JHxc…
In this newsletter:
Join LDC for membership year 2025
Spring 2025 data scholarship application deadline
New publications:
LORELEI Yoruba Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T10>
Samrómur Synthetic<https://catalog.ldc.upenn.edu/LDC2024S12>
________________________________
Join LDC for membership year 2025
It's time to renew your LDC membership for 2025. Current (2024) members who renew their membership before March 3, 2025, will receive a 10% discount. New or returning organizations will receive a 5% discount if they join the Consortium by March 3.
In addition to receiving new publications, current LDC members enjoy the benefit of licensing older data from our Catalog of 950+ holdings at reduced fees. Current-year for-profit members may use most data for commercial applications.
Plans for next year's publications are in progress. Among the expected releases are:
* Iraqi Arabic - English Lexical Database: a set of six interrelated tables (roots, lemmas, wordforms, multi-word expressions, English definitions, example phrases) presenting each Iraqi Arabic word in Arabic script and IPA format, a result of LDC's collaboration with Georgetown University Press to enhance and update three dialectal Arabic dictionaries
* AIDA topic source data and annotations: multimodal source data and annotations in multiple languages (Russian, English, Spanish) for information and entity extraction
* 2015 NIST Language Recognition Evaluation Test Set: 164,000+ segments of conversational telephone speech and broadcast narrow band speech in six linguistic varieties (Arabic, Spanish, English, Chinese, Slavic, French) representing 20 languages, used in NIST's 2015 language recognition evaluation
* BOLT CALLFRIEND CALLHOME CTS audio, transcripts and translations: previously unpublished Chinese and Egyptian Arabic telephone conversations from the CALLFRIEND and CALLHOME collections, with transcripts and translations developed by LDC for the DARPA BOLT program
* Chinese Sentence Pattern Structure Treebank: 5,000+ sentences from ancient and modern Chinese texts with syntactic annotation based on sentence constituent analysis, developed by Beijing Normal University and Peking University
* IARPA MATERIAL language packs: conversational telephone speech, transcripts, English translations, annotations, and queries in multiple languages (e.g., Georgian, Kazakh, Lithuanian)
* LORELEI: representative and incident language packs containing monolingual text, bi-text, translations, annotations, supplemental resources, and related tools in various languages (e.g., Hungarian, Hindi, Amharic, Somali)
For full descriptions of all LDC data sets, browse our Catalog<https://catalog.ldc.upenn.edu/>. Visit Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for details on membership, user accounts and payment.
Spring 2025 data scholarship application deadline
Applications are now being accepted through January 15, 2025, for the Spring 2025 LDC data scholarship program which provides university students with no-cost access to LDC data. Consult the LDC Data Scholarships<https://www.ldc.upenn.edu/language-resources/data/data-scholarships> page for more information about program rules and submission requirements.
________________________________
New publications:
LORELEI Yoruba Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T10> was developed by LDC and is comprised of approximately 7.2 million words of Yoruba monolingual text, 127,000 Yoruba words translated from English data, and 810,000 words of Yoruba-English parallel text. Approximately 77,000 words were annotated for named entities, over 25,000 words were annotated for full entity (including nominals and pronouns) and simple semantic annotation, and around 10,000 words were annotated for noun phrase chunking. Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
Samrómur Synthetic<https://catalog.ldc.upenn.edu/LDC2024S12> was developed by the Language and Voice Lab, Reykjavik University<https://lvl.ru.is/> and contains 72 hours of Icelandic synthetic speech, transcripts and metadata. Source sentences were extracted from the Samrómur platform<https://samromur.is>, comprised of texts and transcripts covering various genres. Text was processed through a text-to-speech system developed by Reykjavik University's Language and Voice Lab to generate speech files. Synthesized speech was created with 44 voices (22 male, 22 female) at four different speed rates for a total of 220 speakers and 62,700 utterances (with 285 sentences/speaker).
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104