**
*The Research Training Group 2853 “Neuroexplicit Models of Language,
Vision, and Action” is looking for*
*
Twelve PhD Students - Fall 2025
Neuroexplicit models combine neural and human-interpretable (“explicit”)
models in order to overcome the limitations that each model class has
separately. They include neurosymbolic models, which combine neural and
symbolic models, but also e.g. combinations of neural and physics-based
models. In the RTG, we will improve the state of the art in natural
language processing (“Language”), computer vision (“Vision”), and
planning and reinforcement learning (“Action”). We also develop novel
machine learning techniques for neuroexplicit models (“Foundations”).
Our overarching aim is to contribute to a better understanding of the
cross-cutting design principles of effective neuroexplicit models
through interdisciplinary collaboration.
The RTG is scheduled to grow to a total of 24 PhD students by 2025. An
excellent and international group of twelve PhD students and one postdoc
have already joined the RTG. Through the inclusion of ~20 associated PhD
students and postdocs funded from other sources, it will be one of the
largest research centers on neuroexplicit or neurosymbolic models in the
world.
The RTG brings together researchers at Saarland University, the Max
Planck Institute for Informatics, the Max Planck Institute for Software
Systems, the CISPA Helmholtz Center for Information Security, and the
German Research Center for Artificial Intelligence (DFKI). All of these
institutions are collocated on the same campus in Saarbrücken, Germany.
The positions will be funded for four yearsat the TV-L E13 100% pay
scale. They are intended to start in September 2025, but could start a
little earlier or later depending on the student’s availability. You
should have or be about to complete an MSc degree in computer science or
a related field and have demonstrated expertise in one of the research
areas of the RTG, e.g. through an excellent Master’s thesis or relevant
publications.
The RTG is part of the Saarland Informatics Campus, one of the leading
centers for researchin computer science, artificial intelligence, and
natural language processing in Europe. The Saarland Informatics Campus
brings together 900 researchers and 2500 students from 81 countries. The
CISPA Helmholtz Center, located on the same campus, is home to an
additional 350 researchers and on track to grow to 800 by 2026.
Researchers at SIC and CISPA are part of the ELLIS network and have been
awarded more than 40 ERC grants.
Each PhD student in the RTG will be jointly supervised by two PhD
advisorsfrom the list of Principal Investigators below. Each student
will freely define their own research topic; we encourage the choice of
topics that cross the traditional boundaries of research fields.
Students may be affiliated with Saarland University or with one of the
participating institutes.
Vera Demberg, Saarland University - Computational Linguistics
Jörg Hoffmann, Saarland University - AI Planning
Dietrich Klakow, Saarland University - Natural Language Processing
Alexander Koller, Saarland University - Computational Linguistics
Bernt Schiele, MPI for Informatics - Computer Vision, Machine Learning
Philipp Slusallek, DFKI and Saarland University - Computer Graphics,
Artificial Intelligence
Christian Theobalt, MPI for Informatics - Visual Computing, Machine Learning
Mariya Toneva, MPI for Software Systems - Computational Neuroscience,
Machine Learning
Isabel Valera, Saarland University - Machine Learning
Jilles Vreeken, CISPA - Machine Learning, Causality
Joachim Weickert, Saarland University - Mathematical Data Analysis
Verena Wolf, DFKI and Saarland University - Modeling and Simulation,
Reinforcement Learning
Ellie Pavlick, Brown University and Google AI, will join us regularly as
a Mercator Fellow.
Please send your application by 26 November 2024to
apply(a)neuroexplicit.org <mailto:apply@neuroexplicit.org>and include the
reference number W2543. We aim to conduct job interviews in January 2025.
For more details on the position, including what materials to submit
with your application, please see our website:
https://www.neuroexplicit.org/jobs/
<https://www.neuroexplicit.org/jobs/#phd-2023>
*
Dear all,
Our Chair of Multilingual Computational Linguistics is offering a
position for an Akademischer Rat (research assistant) for 3 years with
the possibility of extension by 3 more years. We look for a candidate
who can teach topics in Multilingual Computational Linguistics with an
open focus on any topic related to multilingual computational approaches
to linguistic typology, historical linguistics, or psycholinguistics.
Deadline for application is November 20, more information can be found here:
https://www.uni-passau.de/fileadmin/dokumente/beschaeftigte/Stellenangebote…
Sincerely,
Mattis List
--
Prof. Dr. Johann-Mattis List
Chair of Multilingual Computational Linguistics
University of Passau
Dr.-Hans-Kapfinger-Str. 16
04032 Passau
Germany
Chair Website: https://phil.uni-passau.de/multilinguale-computerlinguistik/
Personal Website: https://lingulist.de
Telephone: +49(0)851/509-3480
** Apologies for cross-postings **
==============
Call for Papers @ Fifth Conference on Language, Data and Knowledge (LDK
2025)
Dates: 9-12 September 2025
Location: Naples, Italy
Website: http://2025.ldk-conf.org
Twitter/X: https://x.com/LDKconference
Submission Deadline: 06/03/2025
Submission page: https://openreview.net/group?id=LDK/2025/Conference
==============
We invite submissions to the fifth biennial conference on Language, Data
and Knowledge (LDK 2025) to be held in Naples, Italy in September 2025.
This conference aims to bring together researchers from across different
disciplines concerned with the acquisition, treatment, curation and the
use of language data in the context of data science and knowledge-based
applications. This edition builds upon the success of the inaugural
event held in Galway, Ireland in 2017, the second LDK in Leipzig,
Germany in 2019, the third LDK in Zaragoza, Spain in 2021, and the
fourth LDK in Vienna, Austria in 2023.
Paper Submission
We welcome submissions of relevance to the topics listed below.
Submissions can be in the form of:
Long papers: 9–12 pages;
Short papers: 4–6 pages.
All submission lengths are given including references. Accepted
submissions will be published in an open-access conference proceedings
volume and indexed in ACL anthology and DBLP, free of charge for
authors. The ACL templates should therefore be used for all conference
submissions.
As the reviewing process is single-blind, submissions should not be
anonymised. Papers should be submitted via OpenReview at the following
address:
https://openreview.net/group?id=LDK/2025/Conference
All papers must represent original work. When submitted, the submission
must not have been previously published*, and the material in it must
not have been/be submitted for review at another journal or conference
while under review at LDK 2025.
*This excludes papers on preprint archives, such as arXiv, which we do
not consider to have been previously published.
The conference will be hybrid (face-to-face and remote). Note that at
least one author of each accepted paper must register to present the
paper at the conference (either remotely or on-site).
Topics
Relevant topics for the conference include, but are not limited to, the
following fields:
Language Data
>Language data construction and acquisition
>Language data annotation
>FAIR data practices for language data
>Language data portals and metadata about language data
>Organisational and infrastructural management of language data
>Multilingual, multimedia and multimodal language data
>Evaluation, provenance and quality of language data
>Visualisation of language data
>Standards and interoperability of language data
>Legal aspects of publishing language data
>Under-resourced languages
>e-Lexicography
>Semantic processing
Knowledge Graphs
>Linguistic Linked Data and the multilingual Semantic Web
>Ontologies, terminologies, wordnets, framenets and related resources
>Information and knowledge extraction (taxonomy extraction, ontology
>learning)
>Data, information and knowledge integration across languages
>(Cross-lingual) ontology alignment
>Entity linking and relatedness
>Linked data profiling
>Knowledge representation and reasoning
>Knowledge graphs for corpora processing and analysis
>Neuro Symbolic Artificial Intelligence
Methods and Applications for Language, Data and Knowledge
>Question answering and semantic search
>Text analytics on big data
>NLP for language documentation and preservation
>Speech recognition and synthesis
>Spoken language processing
>Semantic content management
>Computer-aided language learning
>Natural language interfaces to big data
>Knowledge-based NLP
>Deep learning and machine learning for and on LLOD
>Language Models and Foundation Models (Language and Multimodal Models).
>Generative Artificial Intelligence and Language, Data, Knowledge Graphs
>Use Cases in Language, Data and Knowledge
>
Contributions are welcome where the topics above - and others within the
scope of Language, Data and Knowledge - are applied to domain-specific
use cases, including but not limited to: social sciences and humanities,
legal, life sciences, FinTech, cybersecurity.
Organising Committee
Conference Chairs:
Jorge Gracia, University of Zaragoza, Spain
Dagmar Gromann, University of Vienna, Austria
Program Chairs:
Mehwish Alam, Telecom Paris, Institut Polytechnique de Paris, France
Andon Tchechmedjiev, Institut Mines Telecom | EuroMov Digital Health in
Motion
Workshop and Tutorial Chairs:
Katerina Gkirtzou, ILSP/Athena Research Center, Greece
Slavko Zitnik, University of Ljubljana, Slovenia
Local Organisers:
Maria Pia Buono - University of Naples “L’Orientale”, Italy
Johanna Monti - University of Naples “L’Orientale”, Italy
Important Dates:
Paper submission deadline: 6th March, 2025
Acceptance/Rejection Notification: 8th May, 2025
Pre/Post Conference events: 9 to 12 September, 2025
Main conference: 10-11 September, 2025
All deadlines are 23:59 AoE (anywhere on Earth)
Dear All,
We're pleased to announce an extension for the special issue of Language and Law / Linguagem e Direito on "Language, Law and Rights: Balancing AI Driven Technology and Equity." Due to demand, the new deadline for submissions is November 15th, 2024.
For this special issue, we particularly welcome both theoretical and empirical contributions that challenge prominent understandings, from a language, law and rights perspective, on:
* relationships and tensions between language, law, rights, and technology;
* linguistic imperialism via technology;
* emerging digital divides and other social issues;
* co-designing technology with diverse communities;
* assistive technology for language access;
* accessibility considerations for language rights;
* best practice to balance innovation and equity by maintaining a dialogue with technology developers, communities, researchers and policymakers;
* best practice for promoting linguistic equality and equity through regulations and policy.
Keywords: Language, technology, human-machine interaction, minorities, human-centricity, law, rights, justice, equity
Themes:
* Co-creation between humans; co-creation with AI-driven technology (co-AI)
* Non-converging goals (e. g. efficiency vs. customization, bias vs. fairness, short-term gains vs. long-term sustainability, commercial Interests vs. social good, power dynamics and control vs. individual choices)
* Quality of life, law, regulation and ethics
* Linguistic justice
* Cultural issues
* Sociological issues of language rights in relation to technology, its development, and deployment
* Accessibility - access to services (public services and otherwise)
* Language rights and language policy and planning
* Glottopolitics and computer-assisted communication
* Language rights and multilingual administrations
* Minoritised languages and human geography
* Technology-mediated communication in multilingual democracies
* Participation of linguistic minority groups through remote interpreting
* Minoritised/indigenous language media and social media
* AI and data mining in under-resourced languages
* Agency issues in human-machine interaction and language rights Language rights and international law (normative frameworks)
* Linguistic Human Rights as individual and collective rights to choose the language/s for communication
* Language Rights of vulnerable witnesses
* Psychosocial factors in using linguistic varieties in public services
* Human-Centred augmented translation
* Machine translation, post-editing tools and minority languages
Length: ≃ 7000-8000 words [Guidelines available here<https://drive.google.com/file/d/1_9rd9r4XmgSY2twXSVab-ZQWFDjmevy8/view?usp=…>]
Book reviews: Suggest books published recently to be reviewed for the special issue.
Important Dates for Vol. 12(1), 2025 (June, 2025):
Full article submission: November 15, 2024
For more information please follow the link: https://ojs.letras.up.pt/index.php/LLLD/announcement/view/158
For further information please contact: Angela Soltan <angela<mailto:angela@soltan.md>@soltan.md<mailto:angela@soltan.md>>. Or one of the other guest editors: Rebekah Rousi <rebekah.rousi(a)uwasa.fi<mailto:rebekah.rousi@uwasa.fi>>, Lucia Ruiz Rosendo <Lucia.Ruiz(a)unige.ch<mailto:Lucia.Ruiz@unige.ch>>
On behalf of the editorial team, Angela Soltan
Rui Sousa Silva
Faculdade de Letras, Universidade do Porto | Faculty of Arts and Humanities, University of Porto
www.linguisticaforense.pt | https://s.up.pt/qjur | http://tinyurl.com/37w2ec6x
Publicação mais recente / Latest publication: Cyber Hate Speech Detection and Analysis: An Evidence-Based Forensic Linguistics Approach<https://doi.org/10.1007/978-3-031-51248-3_8>
AVISO DE CONFIDENCIALIDADE: Esta mensagem e os seus anexos são confidenciais e dirigidos unicamente aos destinatários da mesma. Se não for o destinatário, solicito que não faça qualquer uso do seu conteúdo e proceda à sua eliminação, notificando-me do sucedido. Obrigado.
//
CONFIDENTIALITY WARNING: This message and its attachments are confidential and exclusively addressed to the recipients above. Should you not be one of the recipients, I kindly ask you not to make use of its contents and delete the message and its attachments. Please reply to this e-mail to warn me about this incident. Thank you.
Bielefeld University has an open position as group leader in Natural Language Processing at the Faculty of Technology.
We are looking for a postdoctoral researcher who can independently set up and lead a group on Natural Language Processing. The group leader is expected to build up an independent research profile in the field of Natural Language Processing (NLP) and to publish the research work at international conferences (e.g. ACL, EMNLP, COLING, AAAI, …).
The group leader will be affiliated with the „Semantic Computing Lab“ [1] led by Prof. Philipp Cimiano
The research of the group leader should be compatible with the research topics of the „Semantic Computing“ lab. Possible research topics include:
• Robustness and safety of large language models for NLP
• Ethical aspects and FAIRness of NLP systems
• Fine-tuning and transfer learning for the adaptation of LLMs
• LLMs for information extraction
• Auto-ML for LLMs
• Argument Mining
• Common Sense Knowledge for NLP
• Temporal Reasoning for NLP
The successful candidate has an excellent track record in the field of NLP with demonstrated ability to publish at top tier conferences. Ideally, the candidate will have teaching experience and experience in third party funding acquisition.
The position involves teaching duties corresponding to 4 hours per week.
The position is available for 3 years and can be extended for up to 3 further years.
Inquires and applications should be sent to cimiano(a)techfak.uni-bielefeld.de <mailto:cimiano@techfak.uni-bielefeld.de> <mailto:cimiano@techfak.uni-bielefeld.de>. The application deadline is October 25th. Applications should include a CV, research statement and teaching concept.
[1] https://www.uni-bielefeld.de/fakultaeten/technische-fakultaet/arbeitsgruppe…
Prof. Dr. Philipp Cimiano
AG Semantic Computing
Coordinator of the Cognitive Interaction Technology Center (CITEC)
Co-Director of the Joint Artificial Intelligence Institute (JAII)
Universität Bielefeld
Tel: +49 521 106 12249
Fax: +49 521 106 6560
Mail: cimiano(a)cit-ec.uni-bielefeld.de
Personal Zoom Room: https://uni-bielefeld.zoom-x.de/my/pcimiano
Office CITEC-2.307
Universitätsstr. 21-25
33615 Bielefeld, NRW
Germany
[Spanish version below]
Please consider contributing and/or forwarding to appropriate colleagues and groups.
*******We apologize for the multiple copies of this e-mail******
Call for papers for issue 74 of the journal Procesamiento del Lenguaje Natural
http://www.sepln.org/en/journalhttp://www.sepln.org/en/journal/author-guidelines
Introduction
The aim of the journal Procesamiento del Lenguaje Natural is to provide a forum for the publication of scientific-technical articles in the field of Natural Language Processing (NLP), for both the national and international scientific community. The articles must be unpublished and cannot be simultaneously submitted for publication in other journals or conference proceedings. The journal also aims to promote the development of areas related to NLP, disseminate research carried out, identify future guidelines for basic research, and present software applications in this field. Every year the Sociedad Española de Procesamiento del Lenguaje Natural (SEPLN) (Spanish Society for the Natural Language Processing) publishes two issues of the journal, including original articles, presentations of R&D projects, book reviews, and summaries of PhD theses.
The scientific quality of the Journal is supported by the 2023 JCR index (JIF: 1.2, JCI: 0.39, Q2-Linguistics - Q4-Computer Sciences, Artificial Intelligence ESCI), the SCImago Journal Ranking (2023 SJR: 0.677, Q2-Computer Science Applications, Q1-Linguistics and Language), the Scopus Index (2023 CiteScore: 5.4) and the index SNIP (Source Normalized Impact per Paper) with 2.07 points. More information at: http://www.sepln.org/en/journal/quality.
Topics
NLP for low-resource languages
Efficient and sustainable NLP methods
Ethics, Bias and Fairness in NLP
Truthworthy and explainability in NLP
Security and privacy in NLP
Text and Multimodal Generation
Multimodality and Language Grounding to Vision
Knowledge and common sense
Computational lexicography and terminology
Linguistic theories, Cognitive Modeling and Psycolinguistics
Morphological and Syntactic analysis
Corpus linguistics
Development of linguistic resources and tools
Semantics, pragmatics, and discourse
Machine translation
Speech synthesis and recognition
Audio indexing and retrieval
Dialogue systems and interactive systems/ Conversational assistants
Monolingual and multilingual information extraction and retrieval
Question answering systems
Automatic textual content analysis
Sentiment analysis, opinion mining and argument mining
Plagiarism detection
Negation and speculation processing
Text summarization
Text simplification
Image retrieval
NLP in specific domains (Medicine, Law, Education)
Submission Information
The proposal must be submitted by November 22nd, 2024 and must meet certain format and style requirements.
All submissions must be in PDF format and submitted electronically using the OpenReview system.
Submitted papers will be subjected to a blind review by at least three members of the program committee.
Categories of papers
Regular papers with original contributions.
Summary of PhD thesis.
Information for Authors
The proposals can be written in Spanish or English and should be at most 10 A4-size pages of content, plus unlimited pages for references, and 4 pages maximum for summaries of PhD theses.
The papers must include the following sections:
The title of the communication (in English and Spanish).
An abstract in English and Spanish (maximum 150 words).
A list of keywords or related topics (in English and Spanish).
The documents must not include headers or footers.
As reviewing will be blind, the paper should not include the authors’ names and affiliation. Furthermore, self-references that reveal the author’s identity should be avoided. The articles should only include the title, the abstract, the keywords and the proposal.
We recommend using the LaTeX and Word templates that can be downloaded from the SEPLN web (author guidelines have been updated): http://www.sepln.org/index.php/en/journal/author-guidelines
Note on camera ready
The final version of the paper (camera ready) should be submitted together with a cover letter explaining how the suggestions of the reviewers were implemented in the final version. This cover letter will be considered in order to accept or finally reject the selected paper.
Preprint policy
The Journal allows the publication of preprints (non-refereed paper posted online, such as ArXiv) anytime, but during the review period the preprint must indicate that the paper it is “under review” in the Journal Procesamiento del Lenguaje Natural. Likewise, if the paper is accepted, the preprint must be updated with the DOI, name of the Journal and the bibliographic information of the paper.
Important dates
Submission deadline: November 22nd, 2024
Notification of acceptance: January 27th, 2025
Camera ready: February 7th, 2025
Publication: March 2025
Contact person: Aitziber Atutxa (aitziber.atucha(a)ehu.eus)
Editorial Committee of the Procesamiento del Lenguaje Natural
--------------------------------------------------------------------------------------------------------------------
***********Disculpen si reciben varias copias de este mensaje ************
Por favor, si lo considera oportuno, distribuya este llamamiento entre sus colegas.
Petición de artículos para la revista Procesamiento del Lenguaje Natural nº 74.
http://www.sepln.org/la-revistahttp://www.sepln.org/la-revista/informacion-para-autores
Objetivos de la revista
La revista Procesamiento del Lenguaje Natural es un foro de publicación de artículos científico-técnicos en el ámbito del Procesamiento del Lenguaje Natural (PLN), tanto para la comunidad científica nacional como internacional. Los artículos tienen que ser inéditos y no haber sido postulados para ser publicados simultáneamente en otras revistas o actas de congresos. La revista quiere potenciar el desarrollo de las diferentes áreas relacionadas con el PLN, mejorar la divulgación de las investigaciones que se llevan a cabo, identificar las futuras directrices de la investigación básica y mostrar las posibilidades reales de aplicación en este campo. Anualmente la SEPLN (Sociedad Española para el Procesamiento del Lenguaje Natural) publica dos números de la revista, que incluyen artículos originales, presentaciones de proyectos, reseñas bibliográficas y resúmenes de tesis doctorales.
La calidad científica de la Revista está respaldada por el índice del JCR 2023 (JIF: 1.2, JCI: 0.39, Q2-Linguistics - Q4-Computer Sciences, Artificial Intelligence ESCI), el índice SCImago Journal Ranking (2023 SJR: 0.677, Q2-Computer Science Applications, Q1-Linguistics and Language), el índice de Scopus (2023 CiteScore: 5.4) y el índice SNIP (Source Normalized Impact per Paper) con 2,07 puntos. Más información en http://www.sepln.org/la-revista/calidad.
Áreas temáticas
PLN para lenguas con recursos limitados
Diversidad y PNL para lenguas de bajos recursos
Métodos de PNL eficientes y sostenibles
LLM: Diseño, Creación, Evaluación
Ética, Sesgo y Equidad en la PNL
PNL veraz y explicable
Seguridad y Privacidad en PNL
Generación Texto y Multimodal
Multimodalidad y fundamento del lenguaje para la visión
Conocimiento y sentido común
Teorías lingüísticas, modelado cognitivo y psicolingüística
Análisis Morfológico y Sintáctico
Lingüística de corpus
Desarrollo de recursos y herramientas lingüísticas
Semántica, pragmática y discurso
Traducción automática
Reconocimiento y síntesis de habla
Indexación y recuperación de Audio
Sistemas de diálogo y sistemas interactivos/Asistentes conversacionales
Recuperación y extracción de información monolingüe y multilingüe
Sistemas de búsqueda de respuestas
Análisis automático de contenido textual
Análisis de opiniones, emociones y minería de la argumentación
Detección de plagio
Procesamiento de la negación y la especulación
Resumen automático de texto
Simplificación de texto
Recuperación de imágenes
PLN especifico al dominio (Medico, Juridico-administrativo, Educación, etc)
Envío de trabajos
Las propuestas de trabajos (artículos y resúmenes de tesis) podrán ser enviadas hasta la fecha límite del 22 de Noviembre de 2024.
El envío y la revisión de las propuestas se realizarán exclusivamente en formato PDF y se gestionarán a través del sistema OpenReview.
La evaluación de los trabajos pasará por un proceso de revisión ciego realizado como mínimo por tres miembros del consejo asesor de la SEPLN.
Tipos de trabajos
Artículos sobre contribuciones originales.
Reseñas de tesis doctorales.
Instrucciones para los Autores
Los trabajos pueden estar escritos en español o en inglés y su longitud máxima será de 10 páginas de contenido más un número ilimitado de páginas de referencias para los artículos científicos, y de un máximo de 4 páginas para los resúmenes de tesis.
Las propuestas deben contener los siguientes apartados:
El título del artículo (en español e inglés).
Un resumen en español y un abstract en inglés de un máximo de 150 palabras.
Un listado de temas relacionados o palabras clave (en español e inglés).
Los documentos no podrán incluir cabeceras ni pies de página.
Como la fase de revisión de los trabajos es ciega, en los artículos que se envíen no se debe incluir ninguna referencia a los autores ni referencias propias que revelen la identidad de los mismos. Todas las contribuciones deben contener únicamente el título, el resumen, las palabras claves y la propuesta.
En el caso de los resúmenes de tesis, el anonimato no es necesario.
Los trabajos deben seguir el formato de las revistas de la SEPLN disponible en la siguiente dirección: http://www.sepln.org/la-revista/informacion-para-autores
Las guías se han actualizado, por favor, utilicen las que están disponibles en la página web de la revista.
Nota sobre la versión final
La versión final del trabajo (camera ready) debe enviarse con un documento en el que se explique cómo se han implementado las sugerencias de los revisores. Dicho documento se tendrá en cuenta para aceptar o rechazar el trabajo en cuestión.
Política de prepublicación
La revista permite publicar una versión no revisada de los artículos en plataformas de prepublicación (plataformas de artículos no evaluados como ArXiv). Sin embargo, durante el periodo de revisión se debe indicar que el artículo está “en revisión” en la revista Procesamiento del Lenguaje Natural. Si el artículo es aceptado, se debe actualizar la publicación en la plataforma de prepublicación con el DOI, nombre de la revista y la información bibliográfica del artículo.
Fechas importantes
Envío de trabajos: 22 de Noviembre 2024
Notificación de aceptación/rechazo: 27 de Enero 2025
Versión final: 7 de febrero de 2025
Publicación: Marzo de 2025
Persona de contacto:Aitziber Atutxa (aitziber.atucha(a)ehu.eus)
Consejo de redacción de la revista Procesamiento del Lenguaje Natural.
In this newsletter:
LDC/Penn receives US Dept of Education research grant
Membership year 2025 publication preview
Fall 2024 data scholarship recipients
New publications:
RST Continuity Corpus<https://catalog.ldc.upenn.edu/LDC2024T08>
MultiTACRED<https://catalog.ldc.upenn.edu/LDC2024T09>
________________________________
LDC/Penn receives US Dept of Education research grant
LDC and Penn's Graduate School of Education and Department of Computer and Information Science are part of a team that was recently awarded a $10 million grant from the US Department of Education<https://ies.ed.gov/funding/grantsearch/details.asp?ID=6066> to develop the Using Generative Artificial Intelligence for Reading R&D Center (U-GAIN Reading) which will explore using generative AI to improve elementary school reading instruction for English learners. Led by the education nonprofit Digital Promise, U-GAIN Reading will build on an existing research-based tutoring platform, Amira Learning, that is used by more than 1 million students each year. The LDC/Penn team will contribute expertise in computational linguistics, computer science, and learning analytics. An evaluation team at MDRC will measure learner outcomes both to improve the R&D and to benchmark its eventual impacts. Additional experts in the science of reading, ethics, and strategies for national impact will support the project's work. Data developed in the project will be shared with the community through the LDC Catalog.
Membership year 2025 publication preview
The 2025 membership year is approaching and plans for next year's publications are in progress. Among the expected releases are:
* Iraqi Arabic - English Lexical Database: a set of six interrelated tables (roots, lemmas, wordforms, multi-word expressions, English definitions, example phrases) presenting each Iraqi Arabic word in Arabic script and IPA format, a result of LDC's collaboration with Georgetown University Press to enhance and update three dialectal Arabic dictionaries
* AIDA topic source data and annotations: multimodal source data and annotations in multiple languages (Russian, English, Spanish) for information and entity extraction
* 2015 NIST Language Recognition Evaluation Test Set: 164,000+ segments of conversational telephone speech and broadcast narrow band speech in six linguistic varieties (Arabic, Spanish, English, Chinese, Slavic, French) representing 20 languages, used in NIST's 2015 language recognition evaluation
* BOLT CALLFRIEND CALLHOME CTS Audio, Transcripts and Translations: previously unpublished Chinese and Egyptian Arabic telephone conversations from the CALLFRIEND and CALLHOME collections, with transcripts and translations developed by LDC for the DARPA BOLT program
* Chinese Sentence Pattern Structure Treebank: 5,000+ sentences from ancient and modern Chinese texts with syntactic annotation based on sentence constituent analysis, developed by Beijing Normal University and Peking University
* IARPA MATERIAL language packs: conversational telephone speech, transcripts, English translations, annotations, and queries in multiple languages (e.g., Georgian, Kazakh, Lithuanian)
* LORELEI: representative and incident language packs containing monolingual text, bi-text, translations, annotations, supplemental resources, and related tools in various languages (e.g., Hungarian, Hindi, Amharic, Somali)
Check your inbox for more information about membership renewal.
Fall 2024 data scholarship recipients
Congratulations to the recipients of LDC's Fall 2024 data scholarships:
Yomma Gamaleldin: Alexandria University (Egypt): Master's student, Computer and Systems Engineering Department. Yomma is awarded a copy of Qatari Corpus of Argumentative Writing (LDC2022T04) for her work in Arabic automated essay scoring.
Arhane Mahaganapathy: Jaffna University (Sri Lanka): Master's student, Department of Computer Science. Ahrane is awarded copies of IARPA Babel Tamil Language Pack (LDC2017S13) and Multi-Language Telephone Speech 2011 - South Asian (LDC2017S14) for her work in Tamil speech-to-text systems.
Sivashanth Suthakar: Jaffna University (Sri Lanka): Master's student, Department of Computer Science. Sivashanth is awarded copies of CAMIO Transcription Languages (LDC2022T07) and LORELEI Tamil Representative Language Pack (LDC2023T03) for his work in Tamil OCR systems.
Oshan Yalegama: University of Moratuwa (Sri Lanka): BSc, Electronic and Telecommunication Engineering. Oshan is awarded copies of CSR-I (WSJ0) Complete (LDC93S6A) and TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1) for his work in audio signal processing.
Samer Mohammed Yaseen: Sana'a University (Yemen): PhD candidate, Faculty of Computer and Information Technology. Samer is awarded a copy of Arabic Newswire Part 1 (LDC2001T55) for his work in Arabic information retrieval.
________________________________
New publications:
RST Continuity Corpus<https://catalog.ldc.upenn.edu/LDC2024T08> was developed at Åbo Akademi University and Humboldt-Universität zu Berlin and contains annotations for continuity dimensions added to RST Discourse Treebank (LDC2002T07)<https://catalog.ldc.upenn.edu/LDC2002T07>. RST Discourse Treebank is a collection of English news texts from the Penn Treebank<https://catalog.ldc.upenn.edu/LDC99T42> annotated for rhetorical relations under the RST (Rhetorical Structure Theory) framework. In RST Continuity Corpus, the relations are annotated for the seven continuity dimensions: time, space, reference, action, perspective, modality, and speech act. The relations are also annotated for polarity, order of segments, nuclearity, and context.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
MultiTACRED<https://catalog.ldc.upenn.edu/LDC2024T09> was developed by the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology Lab<https://www.dfki.de/en/web/research/research-departments/speech-and-languag…> and is a machine translation of TAC Relation Extraction Dataset (LDC2018T24)<https://catalog.ldc.upenn.edu/LDC2018T24> (TACRED) into twelve languages with projected entity annotations. TACRED is a large-scale relation extraction dataset containing 106,264 examples built over English newswire and web text used in the NIST TAC KBP English slot filling evaluations during the period 2009-2014. The training and evaluation data for the TAC KBP slot filling tasks was developed by the Linguistic Data Consortium.
TACRED training, development, and test splits were translated into Arabic, Chinese, Finnish, French, German, Hindi, Hungarian, Japanese, Polish, Russian, Spanish, and Turkish using DeepL<https://www.deepl.com/> or Google Translate<https://translate.google.com>. The test split was back-translated into English to generate machine-translated English test data. TACRED annotations are specified by token offsets. For translation, tokens were concatenated with white space, and the entity offsets were converted into XML-style markers to denote argument.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
We are happy to release
SinaTools - Open Source Toolkit for Arabic NLP and NLU
We are excited to release SinaTools - Open Source Toolkit for Arabic NLP and NLU, which consists of Python APIs, command lines, online demos, and many datasets - free for both commercial and non-commercial purposes. It outperforms all related tools in all tasks in speed and accuracy. It includes the following modules:
▸ Morphology Tagger: Lemmatizer, POS tagger, root tagger.
▸ WSD Tagger: Pipeline of semantic taggers: single-word WSD, multi-word WSD, and NER
▸ Synonyms Generator: Extends a set of synonyms with more synonyms.
▸ Semantic Relatedness: Association between two sentences across various dimensions, meaning, underlying concepts, domain-specificity, etc.
▸ Named Entity Recognition: Nested and flat NER, 21 entity types.
▸ Relation Extraction: Extract events and their arguments (agents, locations, and dates).
▸ Diacritic-Based Matching: Decides whether two Arabic words are the same taking into account diacratization compatibility.
▸ Utilities: A set of useful NLP methods for sentence splitting, duplicate word removal, Arabic Jaccard similarity metrics, transliteration, and others.
Try and Download: https://sina.birzeit.edu/sinatools.
Article:
Tymaa Hammouda, Mustafa Jarrar, Mohammed Khalilia: SinaTools: Open Source Toolkit for Arabic Natural Language Understanding <https://www.jarrar.info/publications/HJK24.pdf>. In Proceedings of the 2024 AI in Computational Linguistics (ACLING 2024), Procedia Computer Science, Dubai. ELSEVIER. https://www.jarrar.info/publications/HJK24.pdf
--Mustafa
__________________________
Mustafa Jarrar, PhD
Professor of Artificial Intelligence
Chair, PhD Program in Computer Science
Birzeit University, Palestine
Page: http://www.jarrar.info <http://www.jarrar.info/>
SinaLab: https://sina.birzeit.edu <https://sina.birzeit.edu/>
The Department of Linguistics at Boston University seeks to hire an Assistant Professor specializing in Computational Linguistics as part of a cluster hiring initiative in AI led by the Faculty of Computing & Data Sciences (CDS). The application will be handled centrally through CDS, and we encourage candidates interested in the Department of Linguistics to explicitly include the keyword "Linguistics" in their primary research interests in their AJO application. We will give full consideration to applications submitted by December 1, 2024, and will continue the review on a rolling basis until April 15, 2025.
The candidate will hold a primary appointment in Linguistics, with a secondary appointment in or affiliation with CDS. We expect the candidate to conduct research and teach courses in Computational Linguistics and related areas at all levels, and advise undergraduate and graduate students. The candidate's research must be at the intersection of Linguistics and AI; this could include, but is not limited to: studying how linguistic theory and experimental linguistic methodology can productively inform the evaluation of new language technologies, how domain knowledge about specific languages---especially understudied languages---can help generalize the benefits of such technologies in more equitable ways, or how the newly emerging capacities of AI can expand the computational linguists' modeling and analysis toolkit, potentially leading to methodological innovations and discoveries that were not possible before. We furthermore expect the candidate to hold a secondary specialization in at least one subfield of Linguistics. Requirements for this position include a PhD in relevant disciplines (e.g., Linguistics, Cognitive Science, Computer Science) in hand by the start date, plus demonstrated excellence in teaching and research.
For questions, please contact the search chair (Paul Hagstrom, hagstrom(a)bu.edu). For details on the cluster hiring initiative, and on how to apply, see the official advertisement from CDS below.
Official CDS announcement: https://www.bu.edu/cds-faculty/culture-community/join-us/faculty-positions/…
Application link: https://academicjobsonline.org/ajo/jobs/28310
Najoung Kim
Assistant Professor
Department of Linguistics & Computer Science (affiliate), Boston University
https://najoung.kim 🍪
====================================================================
CFP II Andaluz.IA Forum
December 20, 2024, Antigua Escuela de Magisterio (Universidad de Jaén)
*Deadline for submissions today*
====================================================================
From ten Andalusian universities, the Joint Research Center of the European
Commission, and several Andalusian researchers currently at other national
and international institutions, we are organizing the II Andaluz.IA Forum
<https://sites.google.com/view/andaluzia/home>, a meeting whose main
objective is to show the potential and give visibility to the academic and
research community in Artificial Intelligence in our region. This forum
seeks to highlight the work of Andalusian scientists, both those who are
currently working in Andalusia, as well as those who have spent part of
their training or career in the region, regardless of their current place
of work.
The first edition of the Andaluz.IA forum
<https://sites.google.com/view/andaluzia2023/> was organized at Universidad
Pablo Olavide in Seville and this year it will take place at Universidad de
Jaén. With this second edition, we want to continue highlighting the great
potential for research and academic development in Artificial Intelligence
that Andalusia has, in areas such as machine learning, deep learning,
robotics and natural language processing.
The event will be held in person on December 20, 2024 at the Antigua
Escuela de Magisterio (Universidad de Jaén). Interested researchers can
participate by presenting their results in oral or poster format, provided
that they have been accepted in relevant conferences or journals in the
area. In addition, professionals and companies wishing to participate may
do so through sponsorship or direct participation by registering on the
event's website.
For more information on registration, submission of papers and forms of
sponsorship, please consult the following link:
https://sites.google.com/view/andaluzia/call-for-papers.
IMPORTANT DATES
● *Deadline for submission of papers: October 15, 2024*
● Notification of acceptance: November 4, 2024
● Deadline for registration at a reduced rate: November 15, 2024
CONTACT INFORMATION
maite(a)ujaen.es
sjzafra(a)ujaen.es
ORGANIZING COMMITTEE
https://sites.google.com/view/andaluzia/organisers
[image: Universidad de Jaén] <http://www.uja.es/> *Salud María Jiménez
Zafra*
sjzafra(a)ujaen.es
Universidad de Jaén
Grupo de Investigación SINAI <http://sinai.ujaen.es/> | Departamento de
Informática
EPS Jaén, Edificio A3, Despacho 326
Campus Las Lagunillas s/n 23071 - Jaén | +34 953212992
[image: Universidad de Jaén] <http://www.uja.es/>