Job Title: Postdoctoral Researcher in Natural Language Processing
Institution: Technical University of Munich (TUM)
Location: Heilbronn, Germany
Salary: TVL E-13 (possible option: E-14)
Contract: 100%, two years, renewal is possible
Start Date: September 1st, 2024
Responsibilities:
Conduct independent research in the area of Natural Language
Processing (NLP)
Develop and apply new NLP techniques to solve real-world problems
Publish high-quality research papers in top-tier journals and conferences
Supervise PhD students
Teach Bachelors and Masters courses in Machine Learning and/or Natural
Language Processing
Qualifications:
PhD in NLP or a related field
Strong publication record in top-tier NLP conferences and journals
Expertise in one or more NLP techniques, such as machine translation,
natural language understanding, or natural language generation
Excellent programming skills in Python and other relevant programming
languages
Strong communication and presentation skills in English (German is not
necessary, but would be a plus; other languages are also of strong
interest)
Not required, but a big plus: teaching experience, supervising experience
Benefits:
Competitive salary and benefits package
Opportunity to work with leading researchers in the field of NLP
Access to state-of-the-art research facilities
Opportunity to publish your research in top-tier journals and conferences
Opportunity to teach and mentor students
About the School:
The School of Computation, Information and Technology at the Technical
University of Munich (TUM) is one of the leading research institutions
in the fields of Natural Language Processing and Machine Learning.
The new Fraser NLP lab at TUM is seeking a highly motivated and
talented postdoctoral researcher to join our team. We are located at
the new multi-university campus (the Bildungscampus, where the ETH
Zurich center will also be located in the future) in Heilbronn, near
Stuttgart.
The research interests of the lab include all areas of NLP, with a
particular focus on multilingual language models and machine
translation.
The postdoc position will involve work on low-resource languages in
both machine translation and multilingual language models.
To Apply:
Please submit a brief application letter, your CV, a research
statement, contact information for two references, and a list of
publications in a single PDF named postdoc_lastname_firstname.pdf (e.g.,
postdoc_fraser_alexander.pdf).
Please send the PDF (or any questions) to Dr. Lukas Edman at:
lukas(a)cis.lmu.de
Deadline:
Applications will be accepted on a rolling basis. We encourage you to apply
early.
--
Prof. Dr. Alexander Fraser
Chair for Data Analytics & Statistics
Technical University of Munich
School of Computation, Information and Technology
Heilbronn Campus
Web: http://alexfraser.github.io
Job Title: PhD Student Researcher in Natural Language Processing
Institution: Technical University of Munich (TUM)
Location: Heilbronn, Germany
Salary: TVL E-13
Contract: 100%, three years
Start Date: September 1st, 2024
Responsibilities:
Conduct research in the area of Natural Language Processing (NLP)
Develop and apply new NLP techniques to solve real-world problems
Publish high-quality research papers in top-tier journals and conferences
Supervise Bachelors and Masters students
Assist teaching Bachelors and Masters courses in Machine Learning
and/or Natural Language Processing
Qualifications:
Masters in NLP, Computer Science or a related field
Projects in NLP or machine learning
Excellent programming skills in Python and other relevant programming
languages
Strong communication and presentation skills in English (German is not
necessary, but would be a plus; other languages are also of strong
interest)
Benefits:
Competitive salary and benefits package
Opportunity to work with leading researchers in the field of NLP
Access to state-of-the-art research facilities
Opportunity to publish your research in top-tier journals and conferences
About the School:
The School of Computation, Information and Technology at the Technical
University of Munich (TUM) is one of the leading research institutions
in the fields of Natural Language Processing and Machine Learning.
The new Fraser NLP lab at TUM is seeking a highly motivated and
talented phd student / researcher to join our team. We are located at
the new multi-university campus (the Bildungscampus, where the ETH
Zurich center will also be located in the future) in Heilbronn, near
Stuttgart.
The research interests of the lab include all areas of NLP, with a
particular focus on multilingual language models and machine
translation.
For the PhD student position, two possible research foci are:
machine translation of metaphors and machine translation of
low-resource languages.
To Apply:
Please submit a brief application letter, your CV, a research
statement, contact information for two references, and a list of
publications in a single PDF named phd_lastname_firstname.pdf (e.g.,
phd_fraser_alexander.pdf).
Please send the PDF (or any questions) to Wen Lai at:
lavine(a)cis.lmu.de
Deadline:
Applications will be accepted on a rolling basis. We encourage you to apply
early.
--
Prof. Dr. Alexander Fraser
Chair for Data Analytics & Statistics
Technical University of Munich
School of Computation, Information and Technology
Heilbronn Campus
Web: http://alexfraser.github.io
In this newsletter:
LDC at LREC-COLING 2024
New publications:
Call My Net 1<https://catalog.ldc.upenn.edu/LDC2024S05>
Automatic Content Extraction for Portuguese<https://catalog.ldc.upenn.edu/LDC2024T05>
________________________________
LDC at LREC-COLING 2024
LDC will be exhibiting at LREC-COLING 2024<https://lrec-coling-2024.org/> hosted by the European Language Resources Association (ELRA) and the International Committee on Computational Linguistics (ICCL) May 20-25 in Turin, Italy. Stop by our table to learn more about recent developments at the Consortium and the latest publications.
LDC staff members will also be presenting current work on topics including Spanless Event Annotation for Corpus-Wide Complex Event Understanding; Schema Learning Corpus: Data and Annotation Focused on Complex Events; and KoFREN: Comprehensive Korean Word Frequency Norms Derived from Large Scale Free Speech Corpora.
LDC will post conference updates via social media. We look forward to seeing you in Italy!
________________________________
New publications:
Call My Net 1<https://catalog.ldc.upenn.edu/LDC2024S05> was developed by LDC and contains 364 hours of conversational telephone speech in four languages (Tagalog, Cebuano, Cantonese, and Mandarin) collected in 2015 from 221 native speakers located in the Philippines and China along with metadata and speaker demographic information. Recordings and data from this collection were used to support the NIST 2016 Speaker Recognition Evaluation<https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016>.
Speakers made 10 telephone calls each to people within their existing social networks, using different handsets and under a variety of noise conditions. Speakers were connected through a robot operator to carry on casual conversations on topics of their choice. All recordings were manually audited to confirm language and speaker requirements. The documentation for this release includes metadata about phone type, noise conditions, and call quality. Speaker demographic information on year of birth, sex, and native language is also included.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
Automatic Content Extraction for Portuguese<https://catalog.ldc.upenn.edu/LDC2024T05> was developed at INESC TEC - Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência<https://www.inesctec.pt/en> and consists of automatic Brazilian Portuguese and European Portuguese translations of the English text and annotations in ACE 2005 Multilingual Training Corpus (LDC2006T06)<https://catalog.ldc.upenn.edu/LDC2006T06>.
ACE 2005 Multilingual Training Corpus was developed by LDC to support the Automatic Contract Extraction (ACE)<https://www.ldc.upenn.edu/collaborations/past-projects/ace> program, specifically, by providing training data for the 2005 technology evaluation. It contains 1,800 files of mixed genre text in Arabic, English, and Chinese annotated for entities, relations, and events. The objective of the ACE program was to develop automatic content extraction technology to support automatic processing of human language in text form. Text genres included newswire, broadcast news, broadcast conversation, weblog, discussion forums, and conversational telephone speech.
For this translation, the English data was partitioned into training, development, and test sets. The documents were split into sentences and each event mention was assigned to its sentence. Source sentences and their annotations were translated into Brazilian Portuguese using Google Translate<https://translate.google.com/> and into European Portuguese using DeepL Translate<https://www.deepl.com/en/translator>. An alignment algorithm and a parallel corpus word aligner were used to handle mismatches between translated annotations and their translated sentences.
2024 members can access this corpus through their LDC account. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
Event Notification Type: Test set and submission instructions released.
Website:
https://pan.webis.de/clef24/pan24-web/oppositional-thinking-analysis.html#s…
*TEST SET AND INSTRUCTIONS RELEASED*
*Oppositional Thinking Analysis PAN@CLEF*
Dear all,
As announced, we are excited to communicate that the test set for the
evaluation phase together with the instructions on how to participate are
released and can be consulted on the shared task website:
https://pan.webis.de/clef24/pan24-web/oppositional-thinking-analysis.html#s…
Please, find below some important dates to keep in mind:
- February 23rd, 2024: Training Set Release
- May 15th, 2024: Test Set Release
- May 30th, 2024: Submission Deadline
- June 15th, 2024: Participant paper submission Midnight CEST
- July 1st, 2024: Peer review notification
- July 7th, 2024: Camera-ready participant papers submission Midnight
CEST
Once again, thank you for your interest and support. If you have any
questions or need assistance at any point during the campaign, please feel
free to reach out to us.
Warm regards,
Francisco Rangel
on behalf of the Oppositional Thinking Analysis Task Committee
Dear colleagues,
Please find below the first call for proposals for this year's new shared
task proposals to be presented during the Generation Challenges session of
INLG.
=======================================
GenChal @ 17th International Conference on Natural Language Generation
Tokyo, Japan, September 23-27 2024
INLG Twitter: @inlgmeeting
INLG website:https://inlg2024.github.io/ <https://inlg2023.github.io/>
=======================================
Submission deadline: June 24th 2024
We invite submissions of papers describing ideas for future shared tasks in
the general area of language generation (Generation Challenges 2024).
Proposed tasks can be in the area of core NLG, or in other research areas
in which language is generated. Examples include, but are not limited to:
data-to-text NLG, text-to-text generation (including MT and summarisation),
combining core NLG and MT, combining core NLG and text summarisation, NLG
quality estimation, NLG evaluation metrics, and/or generating language from
heterogeneous data, including image and video.
The Generation Challenges (GenChal) are an umbrella event designed to bring
together a variety of shared-task efforts that involve the generation of
natural language. This year, Generation Challenges will be held as a
workshop at the 17th International Conference on Natural Language
Generation (INLG 2024 <https://inlg2024.github.io/>), scheduled on
September 23-27 2024. The workshop will follow the format of previous
GenChal results sessions, with presentations of results by the organisers
of the generation challenges that are currently running, a poster session
for task participants to present their submissions, as well as
presentations of proposals for new shared tasks in the Task Proposals
Track, and discussion sessions. You can see some of the previous GenChal
tasks in the past GenChal proceedings on the ACL Anthology (see e.g. 2022
<https://aclanthology.org/volumes/2022.inlg-genchal/> or 2023
<https://aclanthology.org/volumes/2023.inlg-genchal/>) or on the dedicated
repository <https://sites.google.com/view/genchalrepository/home>.
Submissions should describe possible future tasks in detail, including
information regarding organisers, task description, motivating theoretical
interest and/or application context, size and state of completion of data
to be used, schedule and evaluation plans. Accepted shared tasks will be
run in the 2025 iteration of INLG.
Important dates
-
Submission deadline: June 24th 2024
-
Notification: July 15th 2024
-
Camera-ready submission: August 16th 2024
-
Workshop at INLG conference: September 23rd-24h 2024
All deadlines are 11.59 pm UTC -12h ("anywhere on Earth").
Submissions and format
Submissions in the Shared Task Proposals track should be no more than 4
(four) pages long excluding citations, and should follow the ACLPUB
formatting guidelines <https://acl-org.github.io/ACLPUB/formatting.html> (you
will find LaTeX style files and Microsoft Word templates under this link).
Proposals should be uploaded to the SoftConf
<https://softconf.com/n/inlg2024/user/scmd.cgi?scmd=submitPaperCustom&pageid…>
GenChal
submission page, using the Submission type New shared task proposal.
Submissions will be peer-reviewed by the program committee. As reviewing
will not be blind, there is no need to anonymise papers.
This is not intended to be a selective process, since the aim is to discuss
new potential shared tasks with INLG delegates. However, the organisers
reserve the right to reject proposals which do not fall within the scope of
the GenChal initiative, or which do not follow guidelines. Accepted
submissions will be published in separate GenChal 2023 proceedings on the
ACL Anthology, as was done in 2022
<https://aclanthology.org/volumes/2022.inlg-genchal/> and 2023
<https://aclanthology.org/volumes/2023.inlg-genchal/>.
Looking forward to seeing you at INLG!
Miruna and Simon, GenChal chairs, on behalf of the INLG'24 organisers
*ADAPT Research Centre / Ionaid Taighde ADAPT*
*School of Computing, Dublin City University, Glasnevin Campus
/ Scoil na Ríomhaireachta,
Campas Ghlas Naíon, Ollscoil Chathair Bhaile Átha Cliath*
PrivateNLP 2024: Fifth Workshop on Privacy in Natural Language Processing at ACL 2024
Final Call For Papers
ACL PrivateNLP is a full day workshop taking place on August 15, 2024 in conjunction with ACL 2024.
Workshop website: https://sites.google.com/view/privatenlp/
Important Dates:
• [Extended] Submission Deadline: May 30, 2024
• Acceptance Notification: June 17, 2024
• Camera-ready versions: July 01, 2024
• Workshop: August 15, 2024
Privacy-preserving data analysis has become essential in the age of Large Language Models (LLMs) where access to vast amounts of data can provide gains over tuned algorithms. A large proportion of user-contributed data comes from natural language e.g., text transcriptions from voice assistants.
It is therefore important to curate NLP datasets while preserving the privacy of the users whose data is collected, and train LLMs models that only retain non-identifying user data.
The workshop aims to bring together practitioners and researchers from academia and industry to discuss the challenges and approaches to designing, building, verifying, and testing privacy preserving systems in the context of Natural Language Processing.
Topics of interest include but are not limited to:
* Privacy in Large Language Models
* Generating privacy preserving test sets
* Inference and identification attacks
* Generating Differentially private derived data
* NLP, privacy and regulatory compliance
* Private Generative Adversarial Networks
* Privacy in Active Learning and Crowdsourcing
* Privacy and Federated Learning in NLP
* User perceptions on privatized personal data
* Auditing provenance in language models
* Continual learning under privacy constraints
* NLP and summarization of privacy policies
* Ethical ramifications of AI/NLP in support of usable privacy
* Homomorphic encryption for language models
Submissions:
Accepted papers will be presented orally or as posters and included in the workshop proceedings. Submissions are open to all, and are to be submitted anonymously. All papers will be refereed through a double-blind peer review process by at least three reviewers with final acceptance decisions made by the workshop organizers.
OpenReview direct submission: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/PrivateNLP
Organizers:
Sepideh Ghanavati, University of Maine
Abhilasha Ravichander, Allen AI
Niloofar Mireshghallah, University of Washington
Ivan Habernal, Paderborn University
Seyi Feyisetan, Amazon
Patricia Thaine, Private AI
Vijayanta Jain, University of Maine
Timour Igamberdiev, Technical University of Darmstadt
Contact us: privatenlp24-orga(a)lists.uni-paderborn.de
GermEval2024 Shared Task: GerMS-Detect -- Sexism Detection and Annotator Disagreement Prediction in German Online News Fora
=====================================================================================
2nd CALL FOR PARTICIPATION
We would like to invite you to the GermEval Shared Task GerMS-Detect on Sexism Detection and Annotator Disagreement Prediction in German Online News Fora collocated with Konvens 2024 (https://konvens-2024.univie.ac.at/).
Competition Website: https://ofai.github.io/GermEval2024-GerMS/
Important Dates
------------------
Development phase: May 1 - June 5, 2024 (ongoing)
Competition phase: June 7 - June 25, 2024
Paper submission due: July 1, 2024
Camera ready due: July 20, 2024
Shared Task @KONVENS: 10 September, 2024
Task description
------------------
This shared task is not just about the detection of sexism/misogyny in comments posted in (mostly) German language to the comment section of an Austrian online newspaper: many of the texts to be classified contain ambiguous language, very subtle ways to express misogyny or sexism or lack important context. For these reasons, there can be quite some disagreement between annotators on the appropriate label. In many cases, there is no single correct label. For this reason the shared task is not just about correctly predicting a single label chosen from all the labels assigned by human annotators, but about models which can predict the level of disagreement, the range of labels assigned by annotators or the distribution of labels to expect for a specific group of annotators.
For details see the Competition Website (https://ofai.github.io/GermEval2024-GerMS/).
Organizers
------------
The task is organized by the Austrian Research Institute for Artificial Intelligence (OFAI).
Organizing team
------------------
Brigitte Krenn (brigitte.krenn (AT) ofai.at)
Johann Petrak (johann.petrak (AT) ofai.at)
Stephanie Gross (stephanie.gross (AT) ofai.at)
I have written several little text manipulation tools that I would like anyone interested to try out and give me comments and suggestions on. They are pure JavaScript (no libraries) and work using arrays so they can handle texts up to about the size of a novel. They can be used from their website or alternatively saved and used offline on any device from a phone to a laptop.
The tools include:
vlviewtext.html: a tool for viewing a text file in either text or concordance mode with fast switching between the two views.
vlmakelist.html: a tool for creating a wordlist or frequency list from a text file, the former as a csv file, the latter as html or csv
vltaglist.html: a tool for creating or editing tagged wordlists with up to three levels of tags
The tools may be found at:
https://vincilingua.ca/Tools/index.html
Each comes with a basic online manual and sample texts in English and French may be found on the site.
Please send any comments or suggestions to me directly at lessardg(a)protonmail.com.
With thanks in advance,
Greg Lessard
Dear All,
We invite paper submissions to the Workshop on COuntering Disinformation
with AI (CODAI), which will take place on 20 October at ECAI 2024.
*Website:* https://codai2024.github.io/
*Important dates*
Submission deadline: 24th May 2024
Accept/Reject Communications: 1st July 2024
Camera-ready papers due: 22nd July 2024
Workshop date: 20 October 2024
All deadlines are 11:59 pm UTC-12 (“anywhere on earth”).
*Overview*
Social media platforms which have been designed primarily to allow users to
create and share content with others, have become integral parts of modern
communication, enabling people to connect with each other as well as for
broadcasting information to a wider audience. On one side these platforms
provide an opportunity to facilitate discussions in an open and free
environment. On the flip side, new societal issues have started emerging on
these platforms. Among all the issues, the topic of misinformation has been
prevalent on these platforms. The term misinformation is an umbrella term
which encompasses various entities such as fake news, hoaxes, rumors to
name a few. While misinformation refers to non-intentional spread of
non-authentic information, the term disinformation points to spreading of a
piece of inauthentic information with certain malign intentions.
*Topics*
Areas of interest to include, but are not limited to, the following:
- Information diffusion models for understanding and thwarting the
spread of low-quality information;
- Characterization and detection of coordinated inauthentic behavior;
- Novel techniques for detecting malicious accounts (e.g., bots, cyborgs
and trolls);
- Information diffusion models for understanding and thwarting the
spread of low-quality information;
- Understanding and detection of disinformation;
- Study, inference and detection of narratives in disinformation
campaigns;
- Impact/Harm of misinformation on society.
- Case-studies on the spread and impact of fake news in controversial
topics such as politics, health, climate change, economics, migration.
- Social and psychological studies, or data analytics related to
misinformation spreaders.
- Metrics, tools and methods for measuring the impact of fake news and
of coordinated inauthentic behaviors;
- Datasets for evaluation.
*Submission Link:* https://chairingtool.com/
*Submission Types*
*Original submissions:* The submissions will be reviewed through a
double-blind process and must remain anonymous. They can be either short
papers (2-4 pages) or long papers (6-8 pages), with additional pages
allowed for references. .
*Non-archival option:* In addition to regular paper submissions, authors
have the option of submitting previous research or abstract as non-archival.
Accepted submissions will be presented at the workshop as oral
presentations.
*Format and styling*
Submissions should be formatted according to the ECAI formatting
instructions and not exceed 7 pages (plus 1 extra page for references).
All submissions should use the ECAI 2024 template and formatting
requirements specified by ECAI.
Please send any questions about the workshop to codaihelp(a)gmail.com
*Organisers*
Rajesh Sharma, University of Tartu, Estonia
Anselmo Peñas, Universidad Nacional de Educación a Distancia (UNED), Spain