First call for participation in the SemEval 2025 shared task: Multilingual Characterization and Extraction of Narratives from Online News.
This task challenges participants to analyze news articles and automatically identify narratives, classify them, and determine the roles played by relevant entities. The task is multilingual, covering five languages: Bulgarian, English, Hindi, Portuguese, and Russian.
Participants can choose to participate in one or more of the following subtasks:
- Subtask 1: Entity Framing – Classify the roles of named entities within news articles.
- Subtask 2: Narrative Classification – Classify each article based on all the (sub)narratives given a specific domain.
- Subtask 3: Narrative Extraction – Generate short textual explanations for dominant narratives in the articles.
The task covers news articles from two domains, namely, Ukraine-Russia War and Climate Change.
Important Dates:
- 31 January 2025: Test submission deadline
- 28 February 2025: System paper submission deadline
- 31 March 2025: Notification to authors
- 21 April 2025: Camera-ready papers due
- Summer 2025: SemEval workshop
For more details, visit https://propaganda.math.unipd.it/semeval2025task10/ or contact us at semeval2025narratives-task-participants(a)googlegroups.com.
We look forward to your participation and to advancing the state of multilingual news narrative analysis together!
--
Ion Androutsopoulos (http://www.aueb.gr/users/ion/)
Professor of AI, Head of Department, Information Processing Lab Director, NLP Group Co-director
Department of Informatics, Athens University of Economics and Business
and Adjunct Researcher, "Archimedes" Research Unit, Research Center "Athena"
The Computer Science department at Johns Hopkins University is hiring
tenure-track faculty. Our search includes two tracks: 1) Data Science and
AI, and 2) All other areas of Computer Science.
We offer special support for *spousal/partner placement. *Additionally,
starting October 1, *Early Action* hiring will consider candidates for fall
semester interviews with potentially early offers that have typical spring
deadlines. We encourage candidates to apply early to take advantage of
flexible scheduling and potentially receive an early offer before they
proceed to spring interviews. All applications submitted by December 1,
2024 will receive full consideration.
Our search supports the large-scale expansion of the Whiting School of
Engineering, which will add 150 new tenure-track professors at all ranks,
including 30 Bloomberg Distinguished Professorships and 80 positions that
will be part of the university’s new Data Science and AI Institute. This
expansion includes a new building and extensive computational resources
that will establish Johns Hopkins as one of the largest and leading
engineering schools with a top AI research program. The expansion will grow
JHU CS to become one of the largest computer science departments at a U.S.
private university.
Feel free to forward this to interested parties.
Job Ad: https://www.cs.jhu.edu/about/employment-opportunities/
Applications: http://apply.interfolio.com/153420
Data Science AI: https://engineering.jhu.edu/Datascience-AI/
Computer Science DSAI: https://engineering.jhu.edu/Datascience-AI/CS/
Best,
Mark Dredze
==============
John C Malone Professor
Associate Head of Research and Strategic Initiatives, Computer Science
Johns Hopkins University
Call for papers: 1st Workshop on Computational Humor (CHum 2025)
================================================================
The 1st Workshop on Computational Humor (CHum 2025) will take place
virtually on January 19, 2025 as part of the 31st International
Conference on Computational Linguistics (COLING 2025).
Scope and topics
----------------
CHum 2025 aims to foster further work on modeling the processes of humor
with current methods in computational linguistics and natural language
processing, against the theoretical backdrop of humor research and with
reference to relevant corpora of textual, visual, and multimodal
materials. A principal goal of the workshop is to unite researchers who
can together probe the limits of various meaning representations --
symbolic, neural, and hybrid -- for humor processing.
We welcome contributions on any topic relevant to the computational
processing of humor, including but not limited to the following:
* LLMs, knowledge representation
* Resources and evaluation
* Human-computer interaction
* Computer-mediated communication
* Assisted content creation
* Machine and computer-assisted translation
* Digital humanities applications
* Formal modeling of humor
* Proof-of-concept humor detection and classification
Particularly encouraged are submissions describing inter- or
multi-disciplinary work, whether completed or in progress, and position
papers that critically discuss the past, present, and future of
computational humor systems.
Submission instructions
-----------------------
Long and short papers should be formatted according to the same
guidelines for the main COLING 2025 conference papers
<https://coling2025.org/calls/submission_guidlines/> and submitted
through START: <https://softconf.com/coling2025/CompHum25/>
Important dates
---------------
All deadlines are at 23:59 UTC-12:00 ("anywhere on Earth").
* Initial submission: November 15, 2024
* Notification of acceptance: December 2, 2024
* Camera-ready submission: December 13, 2024
* Workshop: January 19, 2025
Organizers
----------
* Christian F. Hempelmann, Texas A&M University-Commerce
* Julia Rayz, Purdue University
* Tiansi Dong, Fraunhofer IAIS
* Tristan Miller, University of Manitoba
Further information
-------------------
* Website: <https://chum2025.github.io/>
* E-mail: chum(a)groups.io
--
Dr. Tristan Miller, Assistant Professor
Department of Computer Science, University of Manitoba
https://logological.org/ | Tel. +1 204 474 6792
The HITS Independent Postdoc Program offers a great opportunity for highly talented young scientists wanting to transition from PhD student to junior group leader. It supports young scientists in exploring their own ideas and testing new hypotheses. High-risk, high-gain projects are encouraged. Selected postdocs will collaborate with group leaders at HITS while developing and pursuing their independent research projects.
What we offer:
The Fellowship is awarded for 2 years, with an option for a 1-year extension after positive evaluation. We offer a vibrant research community and a highly interdisciplinary and international working environment, with close links to Heidelberg University and the Karlsruhe Institute of Technology (KIT). In addition, successful candidates benefit from outstanding computing resources and various courses offered at HITS. A competitive salary, relocation and childcare allowances are provided.
Find the current job opening here: https://www.h-its.org/research/independent-postdoc-program/
2024 Deadline: November 3rd, 2024
Please don't hesitate to get in contact with Michael Strube (michael.strube ät h-its.org) if you have any questions.
--
Michael Strube
NLP Group
HITS gGmbH
Schloss-Wolfsbrunnenweg 35
69118 Heidelberg, Germany
https://www.h-its.org/nlp
The 40th ACM/SIGAPP Symposium on Applied Computing
ACM SAC 2025
March 31 - April 4, 2025 - Catania, Italy
Knowledge and Natural Language Processing Track
*****************************************
Important Dates
Author deadline for submissions: September 20, 2024 October 4, 2024:
Author notification of acceptance: October 30, 2024
Author camera ready and registration due: November 29, 2024
*****************************************
Aim
Aim of the Knowledge and Natural Language Processing (KNLP) track at ACM SAC is to investigate techniques and application of knowledge engineering and natural language processing, two extremely interdisciplinary and lively research areas at the core of Artificial Intelligence.
In particular, the track welcomes contributions combining and complementing methods and approaches from both areas.
Topics of interest include, but are not limited to:
- Natural Language Processing
NLP tasks for Knowledge Extraction
NLP for Ontology Population and Learning
Sentiment Analysis and Opinion Mining for Knowledge Applications
Interplay between Language and Ontologies
NLP for Explainable Knowledge
Machine Translation techniques for Multi-lingual Knowledge
NLP for the Web
(Large) Language Models and Knowledge
- Knowledge
Knowledge to improve NLP tasks
Knowledge for Information Retrieval
Knowledge-based Sentiment Analysis and Opinion Mining
Combining Knowledge and Deep Learning for NLP
Knowledge for Text Summarization and Generation
Knowledge for Persuasion
Knowledge-based Machine Translation
Knowledge for the Web
Linked Data for NLP
Knowledge-based NL Explainability
RAG and Knowledge injection for Language Models
- Applications
Real-world applications that exploit Knowledge and NLP
Knowledge and NLP Systems for Big Data scenarios
Knowledge and NLP technology for diverse, equitable, and inclusive society
Deployment of Knowledge and NLP Systems in specific domains, such as:
Digital Humanities and Social Sciences
eGovernment and public administration
Life sciences, health and medicine
News and Data Streaming
*****************************************
Paper Submission
Research papers and experience reports related to the above topics are solicited. Submissions must not have been published or be concurrently considered for publication elsewhere. Papers should be submitted in PDF using the ACM-SAC proceedings format using the submission link on the SAC 2025 website (https://www.sigapp.org/sac/sac2025/). Authors' names and affiliations should be entered separately at the submission site and not appear in the submitted papers. Each submission will be reviewed in a DOUBLE-BLIND process according to the ACM-SAC Regulations. Student Research Competition (SRC) submissions are welcome (see SAC 2025 website for details).
Full papers are limited to 8 pages, in camera-ready format, included in the registration fee. Authors have the option to include up to two (2) extra pages (paying an extra charge).
Posters are limited to 2 pages, in camera-ready format, included in the registration fee. Authors have the option to include only one (1) extra page (paying an extra charge).
SRC Abstracts are limited to 3 pages, in camera-ready format, included in the registration fee. No extra pages are allowed.
Paper selection is based on originality, technical contribution, presentation quality, and relevance to the Knowledge and Natural Language Processing Track. Some papers may be accepted as posters.
Paper registration is required, allowing the inclusion of the paper/poster in the conference proceedings. An author or a proxy attending SAC MUST present the paper. This is a requirement for the paper/poster to be included in the ACM digital library. No-show of registered papers and posters will result in excluding them from the ACM digital library.
*****************************************
Track Co-Chairs
Patrizio Bellan, Fondazione Bruno Kessler (FBK)
Marco Bombieri, Università degli Studi di Verona
Mauro Dragoni, Fondazione Bruno Kessler (FBK)
Marco Rospocher, Università degli Studi di Verona
*****************************************
Programme Committee
TBA
*****************************************
General Inquiries
For further information, please visit SAC Knowledge and Natural Language Processing Track (https://knlp-sac.github.io/2025/) and SAC 2024 conference websites (https://www.sigapp.org/sac/sac2025/) or feel free to contact the Track Co-Chairs at knlp(a)fbk.eu<mailto:knlp@fbk.eu> .
--
--
Le informazioni contenute nella presente comunicazione sono di natura
privata e come tali sono da considerarsi riservate ed indirizzate
esclusivamente ai destinatari indicati e per le finalità strettamente
legate al relativo contenuto. Se avete ricevuto questo messaggio per
errore, vi preghiamo di eliminarlo e di inviare una comunicazione
all'indirizzo e-mail del mittente.
--
The information transmitted is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. If you received this in
error, please contact the sender and delete the material.
*Call for Papers*
*FIRE 2024: 16th meeting of the Forum for Information Retrieval Evaluation*
12th - 15th December 2024
DA-IICT, Gandhinagar, India
*Submission Deadline: 22nd September 2024 (Extended)*
Website: fire.irsi.org.in
Submission Link : https://cmt3.research.microsoft.com/FIRE2024
------------------------------
The 16th meeting of the Forum for Information Retrieval Evaluation 2024
will be held at Dhirubhai Ambani Institute of Information and Communication
Technology (DA-IICT), Gandhinagar, India. It will be an in-person
conference. We are seeking submissions of high-quality and original papers.
Submissions will be reviewed by experts on the basis of the originality of
the work, the validity of the results, chosen methodology, writing quality
and the overall contribution to the field of IR/NLP. Authors are also
encouraged to describe work in progress and late-breaking research results.
*Topics of interest include, but are not limited to*
1. Search and Ranking: Research on core algorithmic topics in IR:
1. Queries and query analysis (e.g., Query understanding, query
reformulation, query representation etc.)
2. Retrieval models and ranking (e.g., Cross lingual IR with a
particular focus on Indian languages, ranking algorithms,
language models,
retrieval algorithms, learning to rank etc)
3. Efficiency and scalability (e.g., distributed search, search
engine architecture, indexing, crawling etc).
4. Supervised/Weakly supervised deep neural networks.
5. Other domain-specific applications of IR.
2. Evaluation: Research on evaluation of IR systems:
1. User centric evaluation (e.g., User experience, user engagement
etc)
2. System centric evaluation (e.g., Evaluation metrics).
3. Query Performance Prediction and its applications.
3. Generative Models for IR/NLP.
1. Conversational and Interactive Search Systems
2. Simulated Data for Personalized IR.
3. Issues related to fairness and trustworthiness of IR/Recsys models
4. In-Context Learning or Retrieval Augmented Generation for Search
and NLP downstream tasks, such as Question Answering, Summarization
5. Domain-specific generation, such as Code generation, Argument
generation, Workflow generation.
4. Explainability, Fairness and Trust of IR/Recsys models
1. Explainable models for ranking, text classification/clustering,
summarization etc.
2. User studies for explainable AI (XAI) applied to IR/Recsys
3. Issues related to fairness and trustworthiness of IR/Recsys models
5. Multimodal and Crossmodal IR/Recsys model
1. Visual Question Answering
2. Image search/recommendation
3. Question answering
4. Multimodal document summarization
*Important dates*
*25th July 2024* Paper submission link
<https://cmt3.research.microsoft.com/FIRE2024> will be available
*5th September 2024 22nd September 2024 * Paper submission deadline
*30th October 2024 * Paper acceptance notification
*10th November 2024 * Camera ready copy submission deadline
*12th-15th December 2024* In-person conference
Note: All submission deadlines are 11:59 PM AoE Time Zone (Anywhere on
Earth).
*Submission Guidelines*
Submissions must describe substantial, original and unpublished work.
Wherever appropriate, concrete evaluation and analysis must be included. If
the paper being submitted is under review at any other venue, the same
should be explicitly mentioned when making the submission. Such a paper, if
accepted, should be withdrawn from all other places.
The FIRE conference track, this year, is subdivided into 2 different
subtracks (described later), each with a different scope and objective, and
with different reviewing policy. Please make sure that you are submitting
your paper in the correct track. No requests for switching papers across
tracks will be entertained after the deadline for paper submission expires.
Submissions will be taken through Microsoft CMT. Select the "Conference
Track" while submitting the paper and in the subject area select the
appropriate paper type (Regular Paper, Resource and Demo Paper, Extended
Abstract). Please note that incorrect submission will be desk rejected.
Link to submit the paper: https://cmt3.research.microsoft.com/FIRE2024
*Paper Template and Submission*
The submitted papers must follow LNCS (Springer conference) template
available on
https://www.overleaf.com/latex/templates/springer-lecture-notes-in-computer…
. The only accepted format of submissions is PDF. Papers which do not
conform to the requirements may get rejected without review. Please note
that it is the responsibility of the authors to ensure that the PDF
submission has been uploaded successfully (we suggest that you try
downloading your paper again yourself, to check). Authors are invited to
submit in any of the following tracks:
- *Regular paper*
Similar to last year, this year, also we don’t make any explicit
distinction between long and short papers. More time will be allocated for
longer papers during the presentation. Submitted papers can be of *variable
length* up to a *maximum of 12* pages of *content (excluding references)*
.
Reviewing policy: Double-blind.
- *Resource and Demo paper*
The papers submitted at this track should describe data or software
resources towards a research problem that will be helpful to the IR/NLP/AI
community. Such resources should ideally be made publicly available for
reviewers to judge the merit of the resources. The demo papers should
contain a link to a working software that demonstrates the application of
existing research methods as a proof-of-the-concept.
Reviewing policy: Single-blind.
Number of pages: *variable length* up to a *maximum of 9* pages of *content
(excluding references).*.
*Double-blind Reviewing Policy*
All submissions to the regular track of FIRE conference 2024 will be
reviewed on the basis of originality, relevance, importance, and clarity.
For papers submitted to the regular track, the authors must not mention
their names or institutional details anywhere in the paper. Authors should
refer to themselves in third person when citing their own work. Expressions
like "In our earlier work..." or "We previously showed that..." must be
avoided.
*Presentation Requirements*
If accepted, at least one author will have to register for the
conference and present their work in-person.
*Conference Track Co-ordinator*
- Debasis Ganguly (University of Glasgow, UK)
- Debarshi Kumar Sanyal (Indian Association for the Cultivation of
Science, Kolkata, India)
For queries related to conference please email us at [ clia(a)isical.ac.in
]
For latest updates subscribe the FIRE mailing List [
https://groups.google.com/forum/#!forum/fire-list ]
Dear Colleagues,
The CfP for the 22nd Annual Workshop of the Australasian Language Technology Association - ALTA 2024 - is now open and has been extended to 27th September (23:59hrs Anywhere on Earth UTC -12)
Details are available on our website at https://alta2024.alta.asn.au/calls/papers and a summary follows.
---
Important Dates
* Submission deadline for short/long papers, presentation abstracts and industry demonstrations:
20 September 2024 (23:59 Anywhere On Earth UTC-12).
* Main conference: 3 December and 4 December 2024, ANU, Canberra, ACT, hybrid (in person and online)
Overview
The 22nd Annual Workshop of the Australasian Language Technology Association (ALTA) will be held in a hybrid format at the Australian National University, Canberra, from 2 December to 4 December 2024 and also online.
The ALTA 2024 workshop is the key local forum for socialising research results in Natural Language Processing (NLP) and Computational Linguistics (CL). It will feature presentations, posters, and demonstrations from students, industry, and academic researchers. Like previous years, we also encourage submissions and participation from industry and government researchers and developers. Note that ALTA is listed in the CORE 2023 Conference Rankings as Australasian C<https://www.core.edu.au/conference-portal>.
Topics
ALTA invites the submission of papers and presentations on all aspects of NLP and CL, including, but not limited to:
* Commonsense Reasoning.
* Computational Social Science and Cultural Analytics.
* Dialogue and Interactive Systems.
* Discourse and Pragmatics.
* Efficient Methods for NLP.
* Ethics in NLP.
* Information Extraction.
* Information Retrieval and Text Mining.
* Interpretability, Interactivity and Analysis of Models for NLP.
* Language Grounding to Vision, Robotics and Beyond.
* Language Modeling and Analysis of Language Models.
* Linguistic Theories, Cognitive Modeling and Psycholinguistics.
* Machine Learning for NLP.
* Machine Translation.
* Multilinguality and Linguistic Diversity.
* Natural Language Generation.
* NLP Applications.
* Phonology, Morphology and Word Segmentation.
* Question Answering.
* Resources and Evaluation.
* Semantics: Lexical, Sentence level, Document Level, Textual Inference, etc.
* Sentiment Analysis, Stylistic Analysis, and Argument Mining.
* Speech and Multimodality.
* Summarisation.
* Syntax, Parsing and their Applications.
We particularly encourage submissions that broaden the scope of our community by considering practical applications of language technology and multidisciplinary research. We also specifically encourage submissions from the industry.
Format and instructions for authors
Please refer to our CfP webpage for specifics.<https://alta2024.alta.asn.au/calls/papers>
We are using OpenReview for submissions, and invite submissions of three different formats: (1) Original Research Papers, (2) Abstract-based Presentations, and (3) Industry Demonstrations.
---
You can follow ALTA on social media at the following links:
*
LinkedIn (page): https://www.linkedin.com/company/australasian-language-technology-associati…
*
LinkedIn (group):https://www.linkedin.com/groups/1849979/
*
Twitter: https://twitter.com/altanlp
*
Mastodon: https://sigmoid.social/@ALTAnlp
*
Hashtag is #ALTA2024
With kind regards, on behalf of the ALTA 2024 Team:
Dr Gabriela Ferraro, General Chair
Professor Tim Baldwin, Program Chair
Dr Sergio José Rodríguez Méndez, Program Chair
Dr Nicholas Kuo, Program Chair
Dr Anton Malko, Publication Chair
Dr Dawei Chen, Technology Chair
A/Prof Shunichi Ishihara, Finance Chair
Charbel El-Khaissi, PhD candidate, Sponsorship Chair
Ned Cooper, PhD candidate, Local Chair
Kathy Reid, PhD candidate, Publicity Chair
Apologies for cross-posting!
There’s still time to register for the Artificial Intelligence Research in
Applied Linguistics (AIRiAL) 2024 Conference. Register here
<https://sites.google.com/tc.columbia.edu/airialconference/airial-2024/regis…>
for the conference.
Theme
AI in Education: Empowering Learners & Preparing Educators
Location
Teachers College, Columbia University
Smith Learning Theater (in person)
Dates
September 27-28, 2024
Plenary Speakers
Yanis Ben Amor, Columbia University
Monica Arés, Imperial College Business School
The AL & TESOL Language and Technology Research Group
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home>
in the Applied Linguistics & TESOL program at Teachers College will host
the second annual Conference on Artificial Intelligence Research in Applied
Linguistics (AIRiAL). The theme of this conference emphasizes the
transformative role of artificial intelligence (AI) in education and
language teaching, focusing on AI literacy among learners and the
preparedness of educators for the AI-driven future. We are interested in
contributions that showcase AI technologies prioritizing human values,
ethics, and the enhancement of human capabilities in the context of applied
linguistics. Submissions may cover a wide array of topics within the scope
of AI literacy and applied linguistics.
We look forward to seeing you at the conference!
For more details, visit our conference website here.
<https://sites.google.com/tc.columbia.edu/airialconference/airial-2024-home>
--
Erik Voss, Ph.D.
Assistant Professor, Applied Linguistics & TESOL program
Language & Technology Specialization
Department of Arts & Humanities
Teachers College, Columbia University
TC Faculty Profile <https://www.tc.columbia.edu/faculty/ev2449/>, Linkedin
Profile <https://www.linkedin.com/in/erik-voss-ph-d-941a3ab9>, Google
Scholar <https://scholar.google.com/citations?user=FMnVdjcAAAAJ&hl=en>
ALTESOL Language & Technology Research Group
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home>
Editor-in-Chief of NYS TESOL Journal
Associate Editor of Language Assessment Quarterly
*Latest Publications*
Voss, E. et al. (2023). The Use of Assistive Technologies Including
Generative AI by Test Takers in Language Assessment: A Debate of Theory and
Practice. <https://doi.org/10.1080/15434303.2023.2288256> LAQ Journal
Voss, E. (2024) Duolingo Webinar: Current Applications of Artificial
Intelligence in Language Assessment
<https://youtu.be/b-mjLmvXLBU?si=nmph76-lizkfzi1J> (1 hour)
Voss, E. (2024). Language Assessment and Artificial Intelligence.
<https://books.google.com/books?hl=en&lr=&id=ht8aEQAAQBAJ&oi=fnd&pg=PA112&ot…>
The Concise Companion to Language Assessment.
The 9th Biomedical Linked Annotation Hackathon (BLAH9)
13 - 17 January, 2025
Tachikawa, Tokyo, Japan
https://blah9.linkedannotation.org/
Submission due of project proposals : 18 Oct., 2024
SPECIAL THEME
Ensuring Robustness in LLM-based Research: Reproducibility,
Interoperability, and Reliable Evaluation.
INTRODUCTION
BLAH (Biomedical Linked Annotation Hackathon) represents a series of annual
hackathon events, specifically designed to foster open collaboration. The
goal is to achieve a breakthrough in the sharing and linking of various
resources for biomedical literature annotation and mining. By enhancing the
interoperability of these resources, the initiative aims to substantially
increase both the productivity and the impact within the community.
The 9th edition of BLAH (BLAH9) will be held under the special theme "Ensuring
Robustness in LLM-based Research: Reproducibility, Interoperability, and
Reliable Evaluation."
Reproducibility and reliable evaluation are key to ensure that research
remains robust and trustworthy. However, with the recent surge in research
using large language models (LLMs), these important principles have become
largely unclear. Interoperability, a vital component for fostering robust
collaboration and promoting open science, has similarly faced challenges as
LLM-based research expands. Now, two years into the surge of LLM-based
research, it is an opportune moment to reassess and prioritize these
critical aspects of research and development to ensure long-term
sustainability and rigor in the field.
CALL FOR PROJECT PROPOSALS
We are seeking project proposals from individuals and teams interested in
advancing biomedical literature annotation and mining, with a particular
focus this year on enhancing reproducibility, interoperability, and
reliable evaluation in the context of using large language models (LLMs).
Proposals should be structured to achieve measurable outcomes through
collaboration during the hackathon, with clearly defined objectives that
can lead to meaningful insights by the end of the event.
Suggested proposal topics may include, but are not limited to:
- Enhancing interoperability in LLM-based annotation and mining
- Developing reliable evaluation frameworks for LLM-based annotation and
mining
- Improving reproducibility in LLM-based annotation and mining
- ...
Submission due of project proposals is 18 Oct., 2024
TRAVEL SUPPORT
Those who submit project proposals are eligible to apply for travel
support. See the homepage for detailed information.
PROGRAM COMMITTEE
- Jin-Dong Kim - DBCLS, ROIS-DS
- Fabio Rinaldi - IDSIA
- Zhiyong Lu - NCBI, NLM
- Lars Juhl Jensen - Univ. Copenhagen
In this newsletter:
LDC data and commercial technology development
New publications:
L2-KSU Native and Non-Native Arabic Speech<https://catalog.ldc.upenn.edu/LDC2024S11>
MATERIAL Somali-English Language Pack<https://catalog.ldc.upenn.edu/LDC2024S10>
________________________________
LDC data and commercial technology development
For-profit organizations are reminded that an LDC membership is a pre-requisite for obtaining a commercial license to almost all LDC databases. Non-member organizations, including non-member for-profit organizations, cannot use LDC data to develop or test products for commercialization, nor can they use LDC data in any commercial product, or for any commercial purpose. LDC data users should consult corpus-specific license agreements for limitations on the use of certain corpora. Visit the Licensing<https://www.ldc.upenn.edu/data-management/using/licensing> page for further information.
________________________________
New publications:
L2-KSU Native and Non-Native Arabic Speech<https://catalog.ldc.upenn.edu/LDC2024S11> was developed by King Saud University<http://ksu.edu.sa/en/> (KSU) and contains approximately six hours of Modern Standard Arabic read speech from 80 subjects, along with transcripts and speaker metadata.
The speech data was collected in 2022 from 40 native and 40 non-native speakers. Native speakers were from Saudi Arabia, Egypt, and Palestine, and provided audio recordings through the crowd sourcing platform Khamsat<https://khamsat.com/>. Non-native speakers were Central and West African students enrolled in KSU's Arabic Linguistics Institute; they provided speech recordings on site. All subjects read a series of ten sentences, repeating each sentence multiple times.
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
*
MATERIAL Somali-English Language Pack<https://catalog.ldc.upenn.edu/LDC2024S10> was developed by Appen<http://www.appen.com/> for the IARPA (Intelligence Advanced Research Projects Activity) MATERIAL<https://www.iarpa.gov/index.php/research-programs/material> (Machine Translation for English Retrieval of Information in Any Language) program. It contains 80 hours of Somali conversational telephone speech, transcripts, English translations, annotations, and queries.
Calls were made using different telephones (e.g., mobile, landline) from a variety of environments. Transcripts cover approximately 10% of the speech files, and approximately 4% of the speech files were translated into English. This release also includes domain annotations, English queries, and their relevance annotations.
The MATERIAL program focused on underserved languages with the ultimate goal to build cross language information retrieval systems to find speech and text content using English search queries.
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104