Job offer: Researcher for Multimodal Fake-News and Disinformation Detection at DFKI Berlin
Final reminder: Application deadline: Jan 23, 2024.
The German Research Center for Artificial Intelligence (DFKI) has operated as a non-profit, Public-Private-Partnership (PPP) since 1988. DFKI combines scientific excellence and commercially-oriented value creation with social awareness and is recognized as a major "Center of Excellence" by the international scientific community. In the field of artificial intelligence, DFKI as Germany’s biggest public and independent organisation dedicated to AI research and development, has focused on the goal of human-centric AI for more than 30 years. Research is committed to essential, future-oriented areas of application and socially relevant topics.
We are looking for a highly motivated research assistant to join our existing team and work on a project focused on fake-news and disinformation detection from speech and multimedia data. Content authenticity verification of speech combined with other modalities like text, visuals or meta-data will be a center part. In any case, xAI and bias analysis are aspects of high relevance to the position as well.
The successful candidate will work closely with high-impact partners in this field, e.g. Technical University of Berlin, RBB (Berlin TV and news broadcaster), Deutsche Welle (Germany's broadcaster abroad), and 5 other partners.
Responsibilities will include developing and testing different AI/NLP models and techniques, analyzing the performance of machine learning models in the context of applicable fake-news and disinformation fighting for journalists, and communicating project progress and results to relevant stakeholders. The position offers opportunities for pursuing a doctorate and publishing research results in scientific journals and conferences.
Qualified candidates will have a completed university degree in (technical) computer science or computational linguistics, excellent programming skills in Python, and a strong background in machine learning/AI and signal processing or NLP. Previous experience in the field of fake-news or spoofing / authenticity detection of multimedia data is an advantage.
DFKI offers an agile and lively international and interdisciplinary environment for working in a self-determined manner. If you are interested in contributing to cutting-edge research and working with a dynamic team, please apply!
More details and link: https://jobs.dfki.de/en/vacancy/researcher-m-f-d-547585.html
Application deadline: Jan 23, 2024.
In terms of questions please don’t hesitate to contact tim.polzehl(a)dfki.de
--
Dr.-Ing. Tim Polzehl
Senior Researcher
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)
German Research Center for Artificial Intelligence
Speech & Language Technology
Associate Senior Researcher
Technische Universität Berlin
Quality and Usability Lab
DFKI Labor Berlin
Alt-Moabit 91c, D-10559 Berlin, Germany
Tel.: +49.30.238951863
Fax: +49 30 23895 1810
E-Mail tim.polzehl(a)dfki.de<mailto:tim.polzehl@dfki.de>
-------------------------------------------------------------
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Trippstadter Straße 122, 67663 Kaiserslautern, Germany
Geschäftsführung:
Prof. Dr. Antonio Krüger (Vorsitzender)
Helmut Ditzer
Vorsitzender des Aufsichtsrats:
Dr. Ferri Abolhassan
Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------
Journal of Data Mining and Digital Humanities (JDMDH)
organizes a special issue about the topic
Chinese Natural Language Processing for Digital Humanities (CNLP4DH)
As a reminder JDMDH is an international-based journal managed by French
national research institutions and green open access (no charge for readers
and authors).
This special issue is dedicated to natural language processing for digital
humanities involving the documents written in Chinese, including Modern,
Ancient and dialectal Chinese. Mandarin, which is the national official and
main common language, can be accepted and research on texts written in
other languages, such as Tibet, Inner Mongolia, etc., is also welcome.
A list of suitable topics includes but are not limited to:
- Text analysis and processing related to humanities using computational
methods
- Dataset creation and curation for NLP (e.g. digitization, datafication,
and data preservation).
- Research on cultural heritage collections such as national archives and
libraries using NLP
- NLP for error detection, correction, normalization and denoising data
- Generation and analysis of literary works such as poetry and novels
- Analysis and detection of text genres
- Word segmentation, part-of-speech tagging of Ancient Chinese
- Large Language Models (LLM) for Chinese in Digital Humanities
- Cross modal Models (text-speech-video-image) for Chinese in Digital
Humanities
- Visualization of text analytics
- Ontology models for natural language text
- Applications in Chinese Literature, Traditional Chinese medicine,
Learning Chinese language as second language, Sentiment Analysis in Chinese
Social Media, China Cultural Heritage, Chinese History, Ancient Chinese
language
submission guideline: https://jdmdh.episciences.org/page/submissions
Paper submission : https://jdmdh.episciences.org/submit
Website and more details: https://jdmdh.episciences.
org/page/chinese-natural-language-processing-for-digital-humanities-cnlp4dh#
Email: jdmdh(a)episciences.org
Guest Editors:
Dr. Wenhe FENG (Guangdong University of Foreign Studies, Laboratory of
Language Engineering and Computing)
Dr. Bin LI (Nanjing Normal University, School of Chinese Language and
Literature, Center of Linguistic Big Data and Computational Humanities)
Dr. Nicolas TURENNE (Guangdong University of Foreign Studies, School of
Information Science and Technology)
Dr. Tong WEI (Beijing University, Digital Humanities Center)
************************************************************
************************
June 10-13, 2024
Held in conjunction with the UMR Parsing Workshop, June 14, 2024
University of Colorado, Boulder
https://umr4nlp.github.io/web/SummerSchool.html
Impressive progress has been made in many aspects of natural language processing (NLP) in recent years. Most notably, the achievements of transformer-based large language models such as ChatGPT would seem to obviate the need for any type of semantic representation beyond what can be encoded as contextualized word embeddings of surface text. Advances have been particularly notable in areas where large training data sets exist, and it is advantageous to build an end-to-end training architecture without resorting to intermediate representations. For any truly interactive NLP applications, however, a more complete understanding of the information conveyed by each sentence is needed to advance the state of the art. Here, "understanding'' entails the use of some form of meaning representation. NLP techniques that can accurately capture the required elements of the meaning of each utterance in a formal representation are critical to making progress in these areas and have long been a central goal of the field. As with end-to-end NLP applications, the dominant approach for deriving meaning representations from raw textual data is through the use of machine learning and appropriate training data. This allows the development of systems that can assign appropriate meaning representations to previously unseen text.
In this four-day course, instructors from the University of Colorado and Brandeis University will describe the framework of Uniform Meaning Representations (UMRs), a recent cross-lingual, multi-sentence incarnation of Abstract Meaning Representations (AMRs), that addresses these issues and comprises such a transformative representation. Incorporating Named Entity tagging, discourse relations, intra-sentential coreference, negation and modality, and the popular PropBank-style predicate argument structures with semantic role labels into a single directed acyclic graph structure, UMR builds on AMR and keeps the essential characteristics of AMR while making it cross-lingual and extending it to be a document-level representation. It also adds aspect, multi-sentence coreference and temporal relations, and scope. Each day will include lectures and hands-on practice.
Topics to be covered June 10-13:
1. The basic structural representation of UMR and its application to multiple languages;
2. How UMR encodes different types of MWE (multi-word expressions), discourse and temporal relations, and TAM (tense-aspect-modality) information in multiple languages, and differences between AMR and UMR;
3. Going from IGT (interlinear glossed text) to UMR graphs semi-automatically;
4. Formal semantic interpretation of UMR incorporating a continuation-based semantics for scope phenomena involving modality, negation, and quantification;
5. Extension to UMR for encoding gesture in multimodal dialogue, Gesture AMR (GAMR), which aligns with speech-based UMR to account for situated grounding in dialogue.
The fifth day of the summer school, June 14, will be co-located with a UMR Parsing Workshop, focusing on parsing algorithms that generate AMR and UMR representations over multiple languages.
https://umr4nlp.github.io/web/UMRParsingWorkshop.html
To apply, please complete this form by Jan. 30, 2024.
https://www.colorado.edu/linguistics/umrs-boulder-summer-school-application
Other important dates:
● Notification of acceptance: Feb. 20, 2024
● Confirmation of participation: Mar. 1, 2024
● Arrival in Boulder June 9, departure June 15, 2024.
Participation will be fully funded (reasonable airfare, lodging, and meals). This summer school has been made possible by funding from NSF Collaborative Research: Building a Broad Infrastructure for Uniform Meaning Representations (Award # 2213805), with additional support from the University of Colorado Boulder and the CLEAR Center.
*First Call for Participation*
The fourth edition of the MEDIQA shared tasks
<https://sites.google.com/view/mediqa-shared-tasks> include three
tasks on Multimodal
Medical Answer Generation & Medical Error Correction, organized at CLEF
<https://clef2024.imag.fr/> & NAACL-ClinicalNLP 2024
<https://clinical-nlp.github.io/2024/>.
- *Website: https://sites.google.com/view/mediqa2024
<https://sites.google.com/view/mediqa2024>*
*1) Multimodal & Multilingual Medical Answer Generation *
The rapid development of telecommunication technologies, the increased
demands for healthcare services, and recent pandemic needs, have
accelerated the adoption of remote clinical diagnosis and treatment. In
addition to live meetings with doctors which may be conducted through
telephone or video, asynchronous options such as e-visits, emails, and
messaging chats have also been proven to be cost-effective and convenient. We
focus on the problem of clinical dermatology multimodal query response
generation. Consumer health question answering has been the subject of past
challenges and research; however, these prior works only focus on
text. Previous
work on visual question answering have focused mainly on radiology images
and did not include additional clinical text input. Also, while there is
much work on dermatology image classification, much prior work is related
to lesion malignancy classification for dermatoscope images. To the best of
our knowledge, this is the first challenge and study of a problem that
seeks to automatically generate clinical responses, given textual clinical
history, as well as user generated images and queries.
MEDIQA-MAGIC <https://www.imageclef.org/2024/medical/mediqa>: Multimodal &
Generative Telemedicine in Dermatology *@ CLEF 2024, September 2024,
Grenoble, France*
- Participants will be given textual inputs which may include clinical
history and a query, along with one or more associated images. The
task will consist in generating a relevant textual response.
*MEDIQA-M3G <https://sites.google.com/view/mediqa2024/mediqa-m3g>:
*Multilingual
& Multimodal Medical Answer Generation @ *NAACL-ClinicalNLP, June 2024,
Mexico City, Mexico *
- Inputs will include text which give clinical context and queries, as
well as one or more images. The challenge will tackle the generation a
relevant textual response to the query. Participants can opt to work
on one or multiple languages: *Chinese* (Simplified), *English*, and
*Spanish*.
*2) Medical Error Detection & Correction *
Large language models (LLMs) show promise in being applied on unseen tasks
with competitive ability. However, by construction, such models have a key
vulnerability; their ability is only as good as its underlying training data.
Since LLMs rely on large corpora of textual data (often from the world wide
web) for training, their data is almost impossible to manually curate at
scale. If the data contains false information or only one perspective or
type of information, the ability of LLMs to discern factual information may
be hindered. Also, as a consequence to their own success, some online content
may be entirely generated by LLMs that are prone to hallucinated
information. In addition, in specialized domains, online information can be
unreliable, harmful, and contain logical inconsistencies that may hinder
the models' reasoning ability. However, most previous works on common sense
detection have focused on the general domain. In this task, we seek to
address the problem of identifying and correcting (common sense) medical
errors in clinical notes. From a human perspective, these errors require
medical expertise and knowledge to be both identified and corrected.
MEDIQA-CORR
<https://sites.google.com/view/mediqa2024/mediqa-corr?authuser=0>: Medical E
rror Detection & Correction @ *NAACL-ClinicalNLP, June 2024, Mexico City,
Mexico *
- Participants will be given a snippet of clinical text and asked
to (i) detect
whether the text includes a medical error, (ii) identify the text
span associated with the error, if a medical error exists, and
(iii) provide
a free text correction.
Contact
- For more updates, join our mailing list
https://groups.google.com/g/mediqa-nlp
- If you have any questions, please email us at
mediqa-nlp(a)googlegroups.com
Organizers
- Asma Ben Abacha
<https://www.microsoft.com/en-us/research/people/abenabacha/>,
Microsoft, USA
- Wen-wai Yim <https://www.linkedin.com/in/wen-wai-yim-b20b2420>,
Microsoft, USA
- Meliha Yetisgen <https://faculty.washington.edu/melihay/>, University
of Washington, USA
- Fei Xia <https://faculty.washington.edu/fxia/>, University of
Washington, USA
- Martin Krallinger <https://www.bsc.es/krallinger-martin>, Barcelona
Supercomputing Center (BSC), Spain
If you are interested in applying large language models (LLMs) for
information extraction (IE),
we have open internship and PhD positions at LIPN / LLF (CNRS) in Paris.
Details can be found here: http://tinyurl.com/3pkcnwyb
Best regards,
Nadi Tomeh
LIPN-CNRS
Université Sorbonne Paris Nord
The Digital Linguistics Group at Bielefeld University is seeking applications for a
** Full-time (100%) Research Assistant / Ph.D. Student position **
to work in a newly established project on modelling computational linguistic creativity in reference games between interactive dialogue agents. The project “B02” is part of a newly established Collaborative Research Center (CRC 1646) on “Linguistic Creativity in Communication” funded by the German Research Agency (DFG) for 4 years.
The project will investigate human speakers' linguistic creativity in iterated dialogue tasks in changing environments and model the creative formation of reference strategies in artificial dialogue agents whose linguistic knowledge is represented in a language model trained on interaction data. The main goal is to develop a computational referring expression generation and dialogue architecture that accounts for individual and partner-specific linguistic creativity and incorporates general reasoning and planning mechanisms that explore and transform the agent's language model according to the needs of an ongoing interaction. It will be carried out by the Digital Linguistics Group (Prof. Hendrik Buschmeier) and the Computational Linguistics Group (Prof. Sina Zarrieß) at Bielefeld University, and will encompass experimental and computational methods.
The announced position will be focusing on analyzing dialogue data and computationally modeling the dialogue and interactional components of a dialogue agent. In addition, the PhD student will work on related empirical and computational aspects of linguistic creativity, in collaboration with other projects of the CRC. See here: https://www.techfak.uni-bielefeld.de/~hbuschme/files/CRC1646-B02.pdf
The duration of the position is about 3 1/2 years (until end of 2027). Salary is 100% TVL-E13 scale (about 4.000,- EUR per month before taxes, depending on relevant work experience).
Bielefeld is the vibrant center of the region of East Westphalia and Germany’s greenest big city with a lot of cultural, entertainment, and recreational opportunities. It is located in the center of Germany, surrounded by beautiful forests, and connected to Germany’s high-speed rail system. Bielefeld University is a strong research-oriented university with more than 20.000 students and a famous commitment to interdisciplinary research. It hosts major research centers such as the Center for Cognitive Interaction Technology (CITEC) or the Center for Interdisciplinary Research (ZiF).
Application deadline is 25th of January, but later applications will be considered too until the position has been filled. If you are interested to learn more about the position, please get in contact with Hendrik Buschmeier
(hbuschme(a)uni-bielefeld.de).
For information on how to apply please refer to:
https://uni-bielefeld.hr4you.org/job/view/3067/research-positions?page_lang…
More information on the CRC:
https://www.uni-bielefeld.de/fakultaeten/linguistik-literaturwissenschaft/f…
--
Hendrik Buschmeier
Digital Linguistics Lab
Faculty of Linguistics and Literary Studies, Bielefeld University
https://purl.org/net/hbuschme
*Apologies for cross-posting*
In 2024, WILDRE is hosting a *Shared Task on Code-mixed Less-resourced
Sentiment Analysis for Indo-Aryan Languages.*
Code-mixing, the dynamic interplay of multiple languages within a single
discourse, is a widespread linguistic phenomenon observed in multilingual
societies. Code-mixing is particularly intriguing when observed in closely
related languages.
We invite you to participate in our shared task at the WILDRE workshop,
which is co-located with LREC-COLING 2024. This shared task addresses the
complexities of code-mixed data from less-resourced similar languages for
sentiment analysis. We will provide annotated data for the following
code-mixed languages:
1. Magahi-Hindi-English
2. Bangla-English-Hindi
3. Hindi-English
The evaluation will be in two different Tracks:
*A. Track 1:* Given training and validation data to determine the comment's
polarity (positive, negative, neutral or mixed) in the same code-mixed
setting.
1. Hindi-English
2. Magahi-Hindi-English
3. Bangla-English
4. Combined all the language pairs (1+2+3)
*B. Track 2:* Given unlabelled test data for the code-mixed Maithili
language (Maithi-Hindi-English), leverage any or all of the available
training datasets in Track 1 to determine the sentiment of a comment in the
target language.
Important Links:
- Registration Link <https://forms.gle/HVRK1W1hHqBwtgpu6>
- WILDRE Workshop Link <http://sanskrit.jnu.ac.in/conf/wildre7/index.jsp>
- GitHub
<https://github.com/wildre-workshop/wildre-7_code-mixed-sentiment-analysis>
Important Dates:
- Dec 22, 2023: Registration
- Jan 10, 2024: Train and Validation Data set Release [to get the data,
please register]
- Feb 15, 2024: Test Set Release
- Feb 23, 2024: System Submission Due
- Feb 29, 2024: System Results
- March 15, 2024: System Description Paper Due
- March 28, 2024: Paper notification of acceptance
*SEM brings together researchers interested in the semantics of (many and diverse!) natural languages and its computational modeling. The conference embraces data-driven, neural, and probabilistic approaches, as well as symbolic approaches and everything in between; practical applications as well as theoretical contributions are welcome. The long-term goal of *SEM is to provide a stable forum for the growing number of NLP researchers working on all aspects of semantics of (many and diverse!) natural languages.
Topics of interest:
* Lexical semantics and word representations
* Compositional semantics and sentence representations
* Statistical, machine learning, and deep learning methods in semantic tasks
* Multilingual and cross-lingual semantics
* Word sense disambiguation and induction
* Semantic parsing, and syntax-semantics interface
* Frame semantics and semantic role labeling
* Textual inference, textual entailment, and question answering
* Formal approaches to semantics
* Extraction of events and of causal and temporal relations
* Entity linking, pronouns and coreference
* Discourse, pragmatics, and dialogue
* Machine reading
* Extra-propositional aspects of meaning
* Multiword and idiomatic expressions
* Metaphor, irony, and humor
* Knowledge mining and acquisition
* Common sense reasoning
* Language generation
* Semantics in NLP applications: sentiment analysis, abusive language detection, summarization, fact-checking, etc.
* Multidisciplinary research on semantics
* Grounding and multimodal semantics
* Psycholinguistcs
* Interpretability and Explainability
* Human semantic processing
* Semantic annotation, evaluation, and resources
* Ethical aspects and bias in semantic representations
We encourage authors to think about the ethical aspects of their work, and to address and discuss all ethical questions and implications relevant to their research. STARSEM values reproducibility and particularly welcomes submissions that adhere to the reproducibility guidelines as specified here.
Submission Instructions
Submissions must describe unpublished work and be written in English. We solicit both long and short papers. Please note that double submission of papers will need to be notified at submission.
Long papers describe original research and may consist of up to eight (8) pages of content, plus unlimited pages for references. Appendices are allowed after the references, but the paper should be self-contained and reviewers will not be required to check the appendices, if any. Final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers' comments can be taken into account. Short papers describe original focused research and may consist of up to four (4) pages, plus unlimited pages for references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers comments in their final versions.
Submissions should follow the ARR formatting requirements. The deadline for direct submissions is Feb 22, 2024, and these submissions will be reviewed by the *SEM-2024 program committee. ACL Rolling Review (ARR) submissions can be committed to *SEM up to March 22, 2024 (authors of ARR-reviewed papers need to include their OpenReview link with reviews in the submission form). Both types of submissions are through OpenReview. Limitations and Ethics Statement sections are allowed and encouraged, but they are not mandatory. They should be placed after the conclusion and they will not count towards the overall page limit.). In *SEM there is no special policy against multiple submissions, but this should be notified to the Program Chairs.
Submission link: https://openreview.net/group?id=aclweb.org/StarSEM/2024/Conference
Important Dates
Direct submission deadline Feb 22, 2024
ARR-reviewed paper submission deadline Mar 22, 2024
Notification of acceptance Apr 22, 2024
Camera-ready deadline May 5, 2024
Conference date Jun 16, 2024
[apologies for cross-posting]
ISIR, in Paris, has an open position for a two year non-permanent junior
researcher / postdoc in Machine Translation / Large Language Models.
Details available here:
https://emploi.cnrs.fr/Offres/CDD/UMR7222-FRAYVO-001/Default.aspx?lang=EN
Please apply before Jan, 31th, 2024.
Best
F
--
---
François Yvon
ISIR/CNRS
4 Place Jussieu
F 75005 Paris
https://fyvo.github.io
The TEICAI Workshop is welcoming paper commitments from ARR; due to the delay in EACL notifications, we are extending the deadline to January 20th.
Workshop website: https://sites.google.com/view/teicai2024
Submission link: https://softconf.com/eacl2024/TEICAI-2024/
Submission Deadline: 20 January 2024 (Anywhere on Earth)
Authors are required to submit their paper alongside the reviews it received, provided as a supplementary PDF file. In cases where the paper has undergone revisions since its original submission, authors should also include a separate file briefly outlining the changes made. The acceptance of these papers for the workshop will be determined by the organizers based on the paper's quality and relevance to the workshop's theme.
We are also pleased to announce that our sponsor, e-COST ACTION Language in the Human-Machine Era (LITHME), is offering two to three travel grants for authors of selected accepted papers. More information about LITHME can be found at https://lithme.eu/.
Workshop Organizers:
Sviatlana Höhn, LuxAI, Luxembourg
Nina Hosseini-Kivanani, Faculty of Science, Technology and Medicine (FSTM), University of Luxembourg, Luxembourg
Dimitra Anastasiou, Luxembourg Institute of Science and Technology, Luxembourg
Angela Soltan, State University of Moldova, Moldova
Bettina Migge, University College Dublin, Ireland
Doris Dippold, University of Surrey, UK
Fred Philippy, Zortify, Luxembourg
Ekaterina Kamlovskaya, Translatables
Program Committee:
A list of program committee members is available on the workshop website.
For any preliminary questions, you're welcome to reach out to teicai2024(a)gmail.com .
You can follow us on LinkedIn (TEICAI) and Twitter (teicai2024) to get more updates about the workshop.
On behalf of the organizers
Nina Hosseini-Kivanani
University of Luxembourg