Dear list members (esp. those in the Japanese community),
for a cross-linguistic evaluation of co-reference annotations, I was
interested into looking into the NAIST Coreference Corpus, which is based
on the Kyoto Corpus. Luckily, both annotations are available, but not the
primary text. According to the documentation of both corpora, it is
necessary to acquire the Mainichi Shimbun CD-ROM (1995), first. I really
tried my best, and I followed several catalogues (incl.
https://www.jaist.ac.jp/project/NLP_Portal/doc/LR/lr-cat-e.html#jp:mainichi…),
but the URL is points to (
https://www.nichigai.co.jp/sales/mainichi/mainichi-data.html) isn't
operational any more. Does anyone know where and how to buy that CDROM? Is
there another way to get access to that data?
Thanks a lot,
Christian
Journal of Data Mining and Digital Humanities (JDMDH)
organizes a call for papers about the topic
Chinese Natural Language Processing for Digital Humanities (CNLP4DH)
As a reminder JDMDH is an international-based journal managed by French
national research institutions and green open access (no charge for readers
and authors).
This special issue is dedicated to natural language processing for digital
humanities involving the documents written in Chinese, including Modern,
Ancient and dialectal Chinese. Mandarin, which is the national official and
main common language, can be accepted and research on texts written in
other languages, such as Tibet, Inner Mongolia, etc., is also welcome.
A list of suitable topics includes but are not limited to:
- Text analysis and processing related to humanities using computational
methods
- Dataset creation and curation for NLP (e.g. digitization, datafication,
and data preservation).
- Research on cultural heritage collections such as national archives and
libraries using NLP
- NLP for error detection, correction, normalization and denoising data
- Generation and analysis of literary works such as poetry and novels
- Analysis and detection of text genres
- Word segmentation, part-of-speech tagging of Ancient Chinese
- Large Language Models (LLM) for Chinese in Digital Humanities
- Cross modal Models (text-speech-video-image) for Chinese in Digital
Humanities
- Visualization of text analytics
- Ontology models for natural language text
- Applications in Chinese Literature, Traditional Chinese medicine,
Learning Chinese language as second language, Sentiment Analysis in Chinese
Social Media, China Cultural Heritage, Chinese History, Ancient Chinese
language
Website and more details:
https://jdmdh.episciences.org/page/chinese-natural-language-processing-for-…
submission guideline: https://jdmdh.episciences.org/page/submissions
Paper submission : https://jdmdh.episciences.org/submit
Guest Editors:
Dr. Wenhe FENG (Guangdong University of Foreign Studies, Laboratory of
Language Engineering and Computing)
Dr. Bin LI (Nanjing Normal University, School of Chinese Language and
Literature, Center of Linguistic Big Data and Computational Humanities)
Dr. Nicolas TURENNE (Guangdong University of Foreign Studies, School of
Information Science and Technology)
Dr. Tong WEI (Beijing University, Digital Humanities Center)
************************************************************************************
Second Call for papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024
Website:
https://mormor-karl.github.io/events/CALD-pseudo/
Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/
Submission Deadline: Monday, 18 December 2023
We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* December 18, 2023: paper submission deadline
* January 17, 2024: resubmission of already pre-reviewed ARR papers
* January 20, 2024: notification of acceptance
* January, 30 2024: camera-ready papers due
* March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL)
[Introduction]
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected.
[Topics of Interest]
CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as
* automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person;
* developing context-sensitive algorithms for replacement of personal information in unstructured data;
* studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization;
* effectiveness of pseudonymization as a way of protecting writer identity;
*
reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data;
* constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those;
* approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data;
* pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance;
* and numerous other open questions.
[Submission Guidelines]
Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions
* Short papers (up to 4 pages) for ongoing or preliminary work
All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files
Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/
Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.)
We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop.
[Invited speakers]
We are happy to announce that the workshop will host two invited speakers:
*
Anders Søgaard, University of Copenhagen, Denmark
*
Ildikó Pilán, the Norwegian Computing Center, Norway
[Workshop Organizers]
* Elena Volodina, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Simon Dobnik, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå university, Sweden
[Program Committee]
A list of program committee members is available on the workshop website.
[Contact]
For inquiries, please contact mormor.karl(a)svenska.gu.se
ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat…
___________________
Elena Volodina, PhD, Docent
https://spraakbanken.gu.se/en/about/staff/elena
Life is like a mirror. Smile at it and it smiles back at you.
Peace Pilgrim
[apologies for potential cross-posting]
==================================================================================================
Bridging Neurons and Symbols for Natural Language Processing and
Knowledge Graphs Reasoning @ LREC-COLING 2024
=====================================
Co-located with LREC-COLING in Turin, Italy
21st May 2024
Workshop webpage:https://neusymbridge.github.io/
Call for Papers
--------------------
The 1st Workshop on Bridging Neurons and Symbols for Natural Language
Processing and Knowledge Graphs Reasoning — to be held at LREC-COLING
2024 — will promote two directions for exploring neural reasoning:
starting from existing neural networks to enhance the reasoning
performance with the target of symbolic-level reasoning, and starting
from symbolic reasoning to explore its novel neural implementation.
These two directions will ideally meet somewhere in the middle and will
lead to representations that can act as a bridge for novel neural
computing, which qualitatively differs from traditional neural networks,
and for novel symbolic computing, which inherits the good features of
neural computing. Hence the name of our workshop, with a focus on
Natural Language Processing and Knowledge Graph reasoning.
Topics (include, but are not limited to)
--------------------------------------------------
• Proposing novel knowledge representations that are derived from
transdisciplinary research
• Using knowledge graphs or other types of symbolic Knowledge to improve
the quality of LLMs
• Exploring the reasoning mechanism of LLMs
• Distilling symbolic knowledge from LLMs
• Proposing benchmark datasets and evaluation matrices for
neuro-symbolic approaches to NLP tasks
• Proposing novel NLP tasks for neuro-symbolic approaches
• NLP applications in classification, sense-disambiguation, sentiment
analysis, question-answering, knowledge graph reasoning
• Critical analysis of traditional deep learning or LLMs
• Analysing spatial reasoning of LLMs
• Proposing novel neural computing that may reach symbolic-level reasoning
• Proposing benchmark datasets and matrices to evaluate the gap between
neural reasoning and symbolic reasoning
• Addressing efficiency issues in neuro-symbolic systems
• Identifying challenges and opportunities of neuro-symbolic systems
• Developing retrieval augmented models for combining KG and LLMs
• Applying neuro-symbolic approaches to humor generation and other
real-life applications
Submissions:
------------------
• The papers should be submitted as a PDF document, conforming to the
formatting guidelines provided in the call for papers of LREC-COLING
conference (https://lrec-coling-2024.org/authors-kit/)
• Submissions via Softconf/START Conference Manager
athttps://softconf.com/lrec-coling2024/neusymbridge2024/
Important Dates
---------------------
• Submission Deadline: Mar 3rd
• Notification of Acceptance: April 10th
• Camera Ready Deadline: Apr 21st
• Workshop: May 21st
Keynotes
--------------------------------
• Pascale Fung - The Hong Kong University of Science and Technology
• Alessandro Lenci - Università di Pisa
• Juanzi Li - Tsinghua University
• Volker Tresp - Ludwig Maximilian University of Munich
Organisation Committee
--------------------------------
• Tiansi Dong - Fraunhofer IAIS
• Erhard Hinrichs - University of Tübingen
• Zhen Han - Amazon Inc.
• Kang Liu - Chinese Academy of Sciences
• Yangqiu Song - The Hong Kong University of Science and Technology
• Yixin Cao - Singapore Management University
• Christian F. Hempelmann - Texas A&M-Commerce
• Rafet Sifa - University of Bonn
Programme Committee
-------------------------------
• Claire Bonial - U.S. Army DEVCOM Army Research Laboratory
• Meiqi Chen - Peking University
• Shuo Chen - Ludwig Maximilian University of Munich
• Hejie Cui - Emory University
• Xinyu Dai - Nanjing University
• Zifeng Ding - Ludwig Maximilian University of Munich
• Kathrin Erk - The University of Texas at Austin
• Irlan G Gonzalez - Bosch Center for Artificial Intelligence
• Shizhu He - Institute of Automation, Chinese Academy of Sciences
• Bailan He - Ludwig Maximilian University of Munich
• Jens U. Kreber - Saarland University
• Sandra Kübler - Indiana University
• Hang Li - Ludwig Maximilian University of Munich
• Honglei Li - Northumbria University
• Yong Liu - Plunk
• Xinze Liu - Nanyang Technological University
• Xin Liu - Amazon Inc.
• Tong Liu - Ludwig Maximilian University of Munich
• Yunfei Long - Essex University
• Yubo Ma - Nanyang Technological University
• Emanuele Marconato - University of Trento
• Petra Osenova - University of Sofia
• Parth Padalkar - University of Texas at Dallas
• Martha Palmer - University of Colorado
• Barbara Plank - Ludwig Maximilian University of Munich
• Julia Rayz - Purdue University
• Ryan Riegel - IBM Research
• Timo Schick - Meta AI
• Christoph Schommer - University of Luxembourg
• Wangtao Sun - Institute of Automation, Chinese Academy of Sciences
• Xun Wang - Microsoft Corporation
• Jingpei Wu - Ludwig Maximilian University of Munich
• Kai Xiong - Harare Institute of Technology
• Yuan Yang - Georgia Institute of Technology
• Michihiro Yasunaga - Stanford University
• Jiahao Ying - Singapore Management University
• Ziqian Zeng - South China University of Technology
• Hongming Zhang - Tencent AI Lab, Seattle
• Gengyuan Zhang - Ludwig Maximilian University of Munich
==================================================================================================
Call for papers: Second Workshop on Computation and Written Language (CAWL
2024)
CAWL 2024 will be held in conjunction with LREC-COLING 2024 on May 21 in
Torino, Italy. The workshop will feature an invited talk by Nizar Habash
(NYU Abu Dhabi), and has a special theme for workshop submissions: Writing
Systems of Africa. Annual CAWL workshops are organized under the guidance
of the newly formed ACL Special Interest Group on Writing Systems and
Written Language (SIGWrit). We welcome submissions of scientific papers to
be presented at the workshop and archived in the ACL Anthology. Please see
explicit submission guidelines below, including details on topics of
interest and the special workshop theme, and see the workshop webpage
https://sigwrit.org/workshops/cawl2024/ for additional relevant information.
Most work in NLP focuses on language in its canonical written form. This
has often led researchers to ignore the differences between written and
spoken language or, worse, to conflate the two. Instances of conflation are
statements like “Chinese is a logographic language" or “Persian is a
right-to-left language", variants of which can be found frequently in the
ACL anthology. These statements confuse properties of the language with
properties of its writing system. Ignoring differences between written and
spoken language leads, among other things, to conflating different words
that are spelled the same (e.g., English bass), or treating as different,
words that have multiple spellings (e.g., Japanese umai ‘tasty’, which can
be written 旨い, うまい, ウマい, or 美味い).
Furthermore, methods for dealing with written language issues (e.g.,
various kinds of normalization or conversion) or for recognizing text input
(e.g. OCR & handwriting recognition or text entry methods) are often
regarded as precursors to NLP rather than as fundamental parts of the
enterprise, despite the fact that most NLP methods rely centrally on
representations derived from text rather than (spoken) language. This
general lack of consideration of writing has led to much of the research on
such topics to largely appear outside of ACL venues, in conferences or
journals of neighboring fields such as speech technology (e.g., text
normalization) or human-computer interaction (e.g., text entry).
This workshop will bring together researchers who are interested in the
relationship between written and spoken language, the properties of written
language, the ways in which writing systems encode language, and
applications specifically focused on characteristics of writing systems.
Topics of interest include but are not limited to:
- Text entry
- Text tokenization
- Disambiguation of abbreviations and homographs
- Grapheme-to-phoneme conversion, transliteration, and diacritization
- Text normalization for speech and for processing "informal" genres of
text
- Computational study of literary devices involving writing systems,
such as eye dialect
- Information-theoretic and machine-learning approaches to decipherment
- Methods for specialized text genres, e.g., clinical notes
- Optical character (incl. handwriting) recognition and historical
document processing
- Orthographic representation for unwritten languages
- Spelling error detection and correction
- Script normalization and encoding
- Writing system typology and its relevance to speech and language
processing
We invite submissions on the relationship between written and spoken
language, the properties of written language, the ways in which writing
systems encode language, and applications specifically focused on
characteristics of writing systems.
Additionally, we particularly encourage, and will prioritize, papers
on the special
theme of the workshop: Writing Systems of Africa. African languages make
use of a wide variety of writing systems, from those based on the
Perso-Arabic or Latin scripts throughout Africa, the Ge'ez script in the
Horn of Africa, or the Tifinagh script for Berber languages in North
Africa, to recently invented writing systems such as the Adlam alphabet
created for Fula. Issues arising from the adaptation of scripts to new
languages, such as Ajami or orthographies using the Latin script, would be
of interest. For example, the primary language of instruction in the
schools of Mali is French, so that speakers of Bambara, despite not
generally being taught to read that language in the schools, will often
make use of either the Latin script that they learned via French in school
or the Perso-Arabic (Ajami) script from religious instruction to write
their language. Bambara is also sometimes written with the modern N'Ko
script. Given this diversity of options, Bambara written language can be
extremely varied, presenting major challenges to corpus building and
automatic language processing methods.
Important dates:
Paper submission deadline: February 22, 2024 (anywhere in the world)
Notification of acceptance: March 25, 2024
Camera-ready paper due: April 5, 2024
Workshop date: May 21, 2024
Submission Guidelines
Please submit short (4 page) or long (8 page) submissions in PDF format to
https://softconf.com/lrec-coling2024/cawl2024/. Both short and long paper
submissions will be reviewed in the same process. Authors should follow the
formatting guidelines of LREC-COLING 2024, available in the authors kit (
https://lrec-coling-2024.org/authors-kit/), and we will follow the paper
submission and reviewing policies detailed in the LREC-COLING 2024 call for
papers (https://lrec-coling-2024.org/2nd-call-for-papers/). Note that, as
with the main conference, reviewing is double-anonymous, i.e., reviewers
will not know author identity and vice versa, hence no author information
should be included in the papers; self-reference that identifies the
authors should be avoided or anonymised. Accepted papers will appear in the
workshop proceedings in the ACL anthology.
For questions about the submission guidelines, please contact workshop
organizers at cawl.workshop.2024(a)gmail.com.
Organizers:
- Kyle Gorman <https://wellformedness.com/>, Graduate Center, City
University of New York & Google, USA
- Emily Prud’hommeaux <http://cs.bc.edu/~prudhome/>, Boston College, USA
- Brian Roark <https://lanzaroark.org/brian-roark/>, Google, USA
- Richard Sproat <https://rws.xoba.com/>, Google DeepMind, Japan
Program Committee:
- David Ifeoluwa Adelani <https://dadelani.github.io/>, University
College London, UK
- Manex Agirrezabal <https://manexagirrezabal.github.io/>, University of
Copenhagen, Denmark
- Sina Ahmadi <https://sinaahmadi.github.io/>, George Mason University,
USA
- Cecilia Alm <https://www.rit.edu/directory/coagla-cecilia-alm>,
Rochester Institute of Technology, USA
- Mark Aronoff <https://linguistics.stonybrook.edu/faculty/mark.aronoff/>,
Stony Brook University, USA
- Steven Bedrick
<https://www.ohsu.edu/school-of-medicine/csee/steven-bedrick>, Oregon
Health & Science University, USA
- Taylor Berg-Kirkpatrick <https://cseweb.ucsd.edu/~tberg/>, UC San
Diego, USA
- Amalia Gnanadesikan
<https://scholar.google.com/citations?user=HkNhAoAAAAAJ&hl=en>,
University of Maryland, USA
- Christian Gold
<https://www.fernuni-hagen.de/english/research/clusters/catalpa/about-catalp…>,
CATALPA, FernUniversität in Hagen, Germany
- Alexander Gutkin <https://research.google/people/AlexanderGutkin/>,
Google, UK
- Nizar Habash
<https://nyuad.nyu.edu/en/academics/divisions/science/faculty/nizar-habash.h…>,
NYU Abu Dhabi, United Arab Emirates
- Yannis Haralambous
<https://www.imt-atlantique.fr/en/person/yannis-haralambous>, IMT
Atlantique & CNRS Lab-STICC, France
- Cassandra Jacobs <https://www.acsu.buffalo.edu/~cxjacobs/>, University
at Buffalo, USA
- Martin Jansche
<https://scholar.google.com/citations?user=z8yPdQQAAAAJ&hl=en>, Amazon,
UK
- Kathryn Kelley
<https://www.unibo.it/sitoweb/kathrynerin.kelley/research>, Università
di Bologna, Italy
- George Kiraz <https://www.ias.edu/scholars/george-kiraz>, Princeton
University, USA
- Christo Kirov <https://ckirov.github.io/>, Google, USA
- Jordan Kodner <https://jkodner05.github.io/>, Stony Brook University,
USA
- Anoop Kunchukuttan <http://anoopk.in/>, Microsoft, India
- Yang Li <https://npuliyang.github.io/>, Northwestern Polytechnical
University, China
- Constantine Lignos <https://lignos.org/>, Brandeis University, USA
- Zoey Liu <https://zoeyliu18.github.io/>, University of Florida, USA
- Jalal Maleki <https://liu.se/en/employee/jalma87>, Linköping
University, Sweden
- M. Willis Monroe <https://www.willismonroe.com/>, University of New
Brunswick, Canada
- Gerald Penn <http://www.cs.toronto.edu/~gpenn/>, University of
Toronto, Canada
- Yuval Pinter <https://www.cs.bgu.ac.il/~pintery/>, Ben-Gurion
University of the Negev, Israel
- William Poser <https://billposer.org/>, independent scholar, Canada
- Shruti Rijhwani <https://shrutirij.github.io/>, Google, USA
- Maria Ryskina <https://ryskina.github.io/>, MIT, USA
- Anoop Sarkar
<https://www.sfu.ca/computing/people/faculty/anoopsarkar.html>, Simon
Fraser University, Canada
- Lane Schwartz <http://dowobeha.github.io/>, University of Alaska,
Fairbanks, USA
- Djamé Seddah <http://pauillac.inria.fr/~seddah/>, Sorbonne University
& Inria, France
- Shuming Shi
<https://scholar.google.com/citations?user=Lg31AKMAAAAJ&hl=en>, Tencent,
China
- Claytone Sikasote <https://csikasote.github.io/>, University of Zambia
(UNZA), Zambia
- Fabio Tamburini <https://corpora.ficlit.unibo.it/People/Tamburini/>,
University of Bologna, Italy
- Kumiko Tanaka-Ishii <https://www.cl.rcast.u-tokyo.ac.jp/Top.html>,
University of Tokyo, Japan
- Lawrence Wolf-Sonkin
<https://aclanthology.org/people/l/lawrence-wolf-sonkin/>, Google, USA
- Martha Yifiru Tachbelie
<https://scholar.google.com/citations?user=9N37SgoAAAAJ>, Addis Ababa
University, Ethiopia
Call for Participation
We are announcing the first BEA (2024) shared-task on automated prediction of Difficulty And Response Time for Multiple Choice Questions (DART-MCQ).
Motivation
For standardized exams to be fair and valid, test questions, otherwise known as items, must meet certain criteria. One important criterion is that the items should cover a wide range of difficulty levels to gather information about the abilities of test takers effectively. Additionally, it is essential to allocate an appropriate amount of time for each item: too little time can make the exam speeded, while too much time can make it inefficient.
There is growing interest in predicting item characteristics such as difficulty and response time based on the item text. However, due to difficulties with sharing exam data, efforts to advance the state-of-the-art in item parameter prediction have been fragmented and conducted in individual institutions, with no transparent evaluation on a publicly available dataset. In this Shared Task, we bridge this gap by sharing practice item content and characteristics from a high-stakes medical exam called the United States Medical Licensing Examination® (USMLE®) for the exploration of two topics: predicting item difficulty (Track 1) and item response time (Track 2) based on item text.
Participation
The shared-task has two separate tracks as follows:
• Track 1: Given the item text and metadata, predict the item difficulty variable.
• Track 2: Given the item text and metadata, predict the time intensity variable.
Important Dates
Training data release: January 15
Test data release: February 10
Results due: February 16
Announcement of winners: February 21
Paper submissions due: March 10
Camera-ready papers due: April 22
Links
For more information about the shared task, see: https://sig-edu.org/sharedtask/2024
Organizers
Victoria Yaneva, National Board of Medical Examiners
Peter Baldwin, National Board of Medical Examiners
Kai North, George Mason University
Brian Clauser, National Board of Medical Examiners
Saed Rezayi, National Board of Medical Examiners
Yiyun Zhou, National Board of Medical Examiners
Le An Ha, Ho Chi Minh City University of Foreign Languages - Information Technology (HUFLIT)
Polina Harik, National Board of Medical Examiners
The SemEval-2024 Task 8 test set is now available!
(apologies for cross-posting)
For “Multigenerator, Multidomain, and Multilingual Black-Box
Machine-Generated Text Detection”, we have prepared machine-generated and
human-written texts in multiple languages.
You can access the test set in the link below:
https://drive.google.com/drive/folders/10DKtClzkwIIAatzHBWXZXuQNID-DNGSG?us…
Submit your solution by 31 January 2024,
The task description and the training data are available at:
https://github.com/mbzuai-nlp/SemEval2024-task8
Hello all,
I’m posting this on behalf of Suzan Verberne for a vacancy in our joint 4D Picture project, please note the deadline is rapidly approaching …
We have a vacancy for a postdoctoral researcher on Natural Language Processing in the Health and Medical Domain, in Leiden, the Netherlands:
https://www.lumc.nl/en/about-lumc/werken-bij/vacancies/d.23.bh.ak.116-postd…
The deadline for application is January 22nd.
-----
Postdoc researcher Natural Language Processing in the Health and Medical Domain
-----
About your role
-----
The position is part of the interdisciplinary, international Horizon Europe project 4D PICTURE<https://4dpicture.eu/>. The 4D PICTURE project aims to improve shared decision making between patients with cancer (and their families) and healthcare providers, by using a design-method called ‘MetroMapping’ to improve care paths. For these aims, the project draws on large amounts of evidence from different types of European data.
The postdoc position is embedded in work package 3: ‘Text mining and citizen science’, under the supervision of Suzan Verberne, professor of Natural Language Processing. The key project tasks of the postdoc are medical named entity recognition, medical entity linking, and analysis of written (informal) data by patient and healthcare providers. You will work with Dutch-language data, but fluency in Dutch is not required for the position.
There is space in the position to engage in curiosity-driven research in the context of domain-specific NLP. Method development, paper writing, and participation in key conferences are part of the job, and grant proposal writing for personal development is encouraged.
As a postdoc researcher, your key responsibilities will include conducting research in the area of health/medical NLP and actively participating in activities of the 4D PICTURE project (project meetings, research collaboration, organizational activities). You will also co-supervise BSc, MSc and PhD students on topics related to domain-specific NLP. Lastly, you will actively participate in the Text Mining and Retrieval research group (group meetings, research collaboration).
About you
-----
- A PhD in Natural Language Processing or a strongly related field.
- Knowledge of the health/medical domain and domain-specific NLP.
- First author papers published in respected and relevant conference proceedings or journals.
- Good writing skills and proficiency of the English language.
- Able to work independently, in a team, and in a student (co-)supervisory role.
- An academic, creative, and curious mindset.
- Willing to learn Dutch on a basic level.
Our offer
-----
Getting better by breaking new ground; that's our mission. This applies not only to healthcare, but also to our employees. In order to be able to continue to learn and develop, we offer internal and external training. You are also entitled to an end-of-year bonus (8,3%), holiday allowance, sports budget and bicycle scheme. Furthermore, as an employee of LUMC, you are also affiliated with the ABP pension fund. This means that 70% of your pension premium is paid by LUMC, leaving you with a higher net salary.
About your workplace
-----
You will be appointed as a researcher in the interdisciplinary European project 4D PICTURE (Work package 3: Text mining and citizen science). Your appointment is at the Leiden University Medical Center (LUMC), and you will have a guest appointment and office in the Leiden Institute of Advanced Computer Science (LIACS), where you will be embedded in the Text Mining and Retrieval Group to work on Natural Language Processing. The research group has many active collaborations, weekly group meetings, and discussions among all group members. The 4D PICTURE project is a stimulating, interdisciplinary environment that offers many opportunities for expanding your network.
--
Suzan Verberne, full professor
Leiden Institute of Advanced Computer Science
Email: s.verberne(a)liacs.leidenuniv.nl<mailto:s.verberne@liacs.leidenuniv.nl>
http://liacs.leidenuniv.nl/~verberneshttp://tmr.liacs.nl<http://tmr.liacs.nl/>
--
Paul Rayson
Director of UCREL and Professor of Natural Language Processing
SCC Data Theme Lead
School of Computing and Communications, InfoLab21, Lancaster University, Lancaster, LA1 4WA, UK.
Web: https://www.research.lancs.ac.uk/portal/en/people/Paul-Rayson/
Tel: +44 1524 510357
Contact me on Teams<https://teams.microsoft.com/l/chat/0/0?users=p.rayson@lancaster.ac.uk>
*******************************************************
EAMT 2024: The 25th Annual Conference of
The European Association for Machine Translation
24 - 27 June 2024
Sheffield, UK
https://eamt2024.sheffield.ac.uk/
@eamt_2024 (X account)
Keynote speaker: Alexandra Birch (University of Edinburgh, UK)
Paper submission deadline: 08 March 2024
More information:
https://eamt2024.sheffield.ac.uk/conference-calls/call-for-papers
*******************************************************
The European Association for Machine Translation (EAMT) invites everyone
interested in machine translation (MT) and translation-related tools and
resources ― developers, researchers, users, translation and localization
professionals and managers ― to participate in this conference.
Driven by the state of the art, the research community will demonstrate
their cutting-edge research and results. Professional MTusers will provide
insights into successful MT implementation of MT in business scenarios as
well as implementation scenarios involving large corporations, governments,
or NGOs. Translation scholars and translation practitioners are also
invited to share their first-hand MT experience, which will be addressed
during a special track.
Note that papers that have been archived in arXiv can be accepted for
submission provided that they have not already been published elsewhere.
EAMT 2024 has four tracks, namely Research: Technical, Research:
Translators & Users, Implementations & Case Studies, and Products &
Projects.
*** Research: technical ***
Submissions (up to 10 pages, plus unlimited pages for references and
appendices) are invited for reports of significant research results in any
aspect of MT and related areas. Such reports should include a substantial
evaluation component, or have a strong theoretical and/or methodological
contribution where results and in-depth evaluations may not be appropriate.
Papers are welcome on all topics in the areas of MT and translation-related
technologies, including, but not limited to:
- Deep-learning approaches for MT and MT evaluation
- Advances in classical MT paradigms: statistical, rule-based, and hybrid
approaches
- Comparison of various MT approaches
- Technologies for MT deployment: quality estimation, domain adaptation,
etc.
- Resources and evaluation
- MT in special settings: low resources, massive resources, high volume,
low computing resources
- MT applications: translation/localization aids, speech translation,
multimodal MT, MT for user generated content (blogs, social networks), MT
in computer-aided language learning, etc.
- Linguistic resources for MT: corpora, terminologies, dictionaries, etc.
- MT evaluation techniques, metrics, and evaluation results
- Human factors in MT and user interfaces
- Related multilingual technologies: natural language generation,
information retrieval, text categorization, text summarization, information
extraction, optical character recognition, etc.
Papers should describe original work. They should emphasise completed work
rather than intended work, and should indicate clearly the state of
completion of the reported results. Where appropriate, concrete evaluation
results should be included.
Papers should be anonymized, prepared according to the templates specified
below, and be no longer than 10 pages (plus unlimited pages for references
and appendices). Submit the paper as a PDF to OpenReview:
https://openreview.net/group?id=EAMT.org/2024/Technical_Track. Submissions
that do not conform to the required styles may be rejected without review.
**Track co-chairs
Rachel Bawden (Inria, Paris)
Víctor M Sánchez-Cartagena (University of Alicant)
*** Research: translators & users ***
Submissions (up to 10 pages, plus unlimited pages for references and
appendices) are invited for academic research on all topics related to how
professional translators and other types of MT users interact with, are
affected by, or conceptualise MT. Papers should report significant research
results with a strong theoretical and/or methodological contribution.
Topics for the track include, but are not limited to:
- The impact of MT and post-editing: including studies on processes,
effort, strategies, usability, productivity, pricing, workflows, and
post-editese
- Human factors and psycho-social aspects of MT adoption (ergonomics,
motivation, and social impact on the profession, relationship between user
profiles and MT adoption)
- Emerging areas for MT & post-editing: e.g. audiovisual, game
localisation, literary texts, creative texts, social media, health care
communication, crisis translation
- MT and ethics
- The impact of using translators’ metadata and user activity data for
monitoring their work
- The evaluation and reception of different modalities of translation:
human translation, post-edited, raw MT
- MT and interpreting
- Human evaluations of MT output
- MT for gisting and the impact of MT on users: use cases, expectations,
perceptions, trust, views on acceptability
- MT and usability
- MT and education/language learning
- MT in the translation/interpreting classroom
Papers should describe original work. They should emphasise completed work
rather than intended work, and should indicate clearly the state of
completion of the reported results.
Papers should be anonymized, prepared according to the templates specified
below, and be no longer than 10 pages (plus unlimited pages for references
and appendices). Submit the paper as a PDF to OpenReview:
https://openreview.net/group?id=EAMT.org/2024/Research_Translators_Users_Tr….
Submissions that do not conform to the required styles may be rejected
without review.
** Track co-chairs
Patrick Cadwell (DCU)
Ekaterina Lapshinova-Koltunski (University of Hildesheim)
*** Implementations & case studies ***
Submissions (approximately 4–6 pages) are invited for reports on case
studies and implementation experience with MT in organisations of all
types, including small businesses, large corporations, governments, NGOs,
or language service providers. We also invite translation practitioners to
share their views and observations based on their day-to-day experience
working with MT in a variety of environments.
Topics for the track include, but are not limited to:
- Integrating or optimising MT and computer-assisted translation in
translation production workflows (translation memory/MT thresholds, mixing
online and offline tools, using interactive MT, dealing with MT confidence
scores)
- Managing change when implementing and using MT (e.g. switching between
multiple MT systems, limiting degradations when updating or upgrading an MT
system)
- Implementing open-source MT (e.g. strategies to get support, reports on
taking pilot results into full deployment, examples of advanced
customization sought and obtained thanks to the open-source paradigm,
collaboration within open-source MT projects)
- Evaluating MT in a real-world setting (e.g. error detection strategies
employed, metrics used, productivity or translation quality gains achieved)
- Ethical and confidentiality issues when using MT, especially MT in the
cloud
- Using MT in social networking or real-time communication (e.g. enterprise
support chat, multilingual content for social media)
- MT and usability
- Implementing MT to process multilingual content for assimilation purposes
(e.g. cross-lingual information retrieval, MT for e-discovery or spam
detection, MT for highly dynamic content)
- MT in literary, audiovisual, game localization and creative texts
- Impact of MT and post-editing on translation practices and the
profession: processes, effort, compensation,
- Psycho-social aspects of MT adoption (ergonomics, motivation, and social
impact on the profession)
- Error analysis and post-editing strategies (including automatic
post-editing and automation strategies)
- The use of translators’ metadata and user activity data in MT development
- Freelance translators’ independent use of MT
- MT and interpreting
Papers should highlight real-world use scenarios, solutions, and problems
in addition to describing MT integration processes and project settings.
Where solutions do not seem to exist, suggestions for MT researchers and
developers should be clearly emphasized. For papers on implementations and
case studies produced by academics, we require co-authorship with the
actual organizations working with MT implementations.
Papers (approximately 4–6 pages, with a maximum of 10 pages -- plus
unlimited pages for references) should be formatted according to the
templates specified below and submitted as PDF files to Open Review:
https://openreview.net/group?id=EAMT.org/2024/Implementations_Case_Studies_….
Anonymization is not required in the Implementations & Case Studies track
submissions. Submissions that do not conform to the required styles may be
rejected without review.
** Track co-chairs
Vera Cabarrão (Unbabel)
Konstantinos Chatzitheodorou (Strategic Agenda)
*** Products & Projects ***
Submissions (2 pages, including references) are invited on either of the
subtracks (Products or Projects).
- Products: Tools for MT, computer-aided translation, and other translation
technologies (including commercial products and free/open-source
software). Descriptions should include information about product
availability and licensing, an indication of cost if applicable, basic
functionality, (optionally) a comparison with other products, and a
description of the technologies used. The authors should be ready to
present the tools in the form of demos or posters during the conference.
- Projects: Research projects, funded through grants obtained in
competitive public or private calls related to MT. Descriptions should
contain: project title and acronym, funding agency, project reference,
duration, list of partner institutions or companies in the consortium if
there is one, project objectives, and a summary of partial results
available or final results if the project has ended. The authors should be
ready to present the projects in the form of posters during the conference.
This follows on from the successful ‘project villages’ held at the last
EAMT conferences.
There will be a poster boaster session for this track, in which authors
will have 120 seconds to attract attendees to their posters or demos with a
two-slide presentation.
Submissions should be formatted according to the templates specified
below. Anonymization is not required. Submissions should be no longer than
2 pages (including references), and submitted as PDF files to OpenReview:
https://openreview.net/group?id=EAMT.org/2024/Products_Projects_Track.
Track chairs
Helena Moniz (University of Lisbon (FLUL), INESC-ID)
Mikel Forcada (University of Alicant)
*** Templates for writing your proposal ***
There templates available in the following formats (check our website --
https://eamt2024.sheffield.ac.uk/conference-calls/call-for-papers):
- LaTeX
- Cloneable Overleaf template
- Word
- Libre Office/Open Office
- PDF
*** Important deadlines ***
- Deadline for paper submission: 8 March 2024
- Notification to authors: 8 April 2024
- Camera ready deadline: 22 April 2024
- Author Registration: 8 May 2024
All deadlines are at 23:59 CEST.
*** Local organising committee ***
Carolina Scarton (University of Sheffield)
Charlotte Prescott (ZOO Digital)
Chris Bayliss (ZOO Digital)
Chris Oakley (ZOO Digital)
Xingyi Song (University of Sheffield)
--
*Carolina Scarton*
Lecturer in Natural Language Processing
Department of Computer Science
University of Sheffield
http://staffwww.dcs.shef.ac.uk/people/C.Scarton/
*******************************************************
EAMT 2024: The 25th Annual Conference of
The European Association for Machine Translation
24 - 27 June 2024
Sheffield, UK
https://eamt2024.sheffield.ac.uk/
@eamt_2024 (X account)
Keynote speaker: Alexandra Birch (University of Edinburgh, UK)
Tutorial proposal deadline: 08 March 2024
Tutorial date: 27 June 2024
More information:
https://eamt2024.sheffield.ac.uk/conference-calls/call-for-tutorials
*******************************************************
*** Overview ***
The European Association for Machine Translation (EAMT) invites proposals
for tutorials to be held in conjunction with the EAMT 2024 conference
taking place in Sheffield, UK, from 24 to 27 June, with tutorials held on
27 June. We seek proposals in all areas of machine translation (see the
call for papers of the main conference for the focus areas of EAMT 2024).
The aim of a tutorial is primarily to help the audience develop an
understanding of particular technical, applied, and business matters
related to research, development, and use of MT and translation technology.
Presentations of particular technological solutions or systems are welcome,
provided that they serve as illustrations of broader scientific
considerations.
We recommend that the tutorial covers work by the presenters as well as by
other researchers. The submission should explain that this breadth is
ensured. Tutorials should not be “self-invited talks”.
*** Submission Details ***
Proposals should not exceed 4 pages of content (plus unlimited pages for
references), should be in PDF format, and should contain the following:
- A title and authors, affiliations, and contact information.
- A brief description of the tutorial content and its relevance to the
machine translation community.
- Short description of the target audience and any expected prerequisite
background the audience should be aware of.
- An outline of the tutorial structure content and how it will be covered
in a three-hour slot (half-day). In exceptional cases, six-hour tutorial
slots (full day) are available. These time limits do not include coffee
breaks, e.g., a three-hour tutorial, in fact, occupies a 3.5-hour slot, and
a six-hour tutorial occupies a 7-hour slot.
- Diversity considerations, e.g. use of multilingual data, indications of
how the described methods scale up to various languages or domains,
participation of both senior and junior instructors, demographic and
geographical diversity of the instructors, plans for how to diversify
audience participation, etc.
- Reading list. Work that you expect the audience to read before the
tutorial can be indicated by an asterisk. Recommended papers should provide
the breadth of authorship and include work by other authors, and work from
other disciplines is welcome if relevant.
- For each tutorial presenter, a one-paragraph statement of their research
interests and areas of expertise for the tutorial topic, as well as
experience in instructing an international audience.
An estimate of the audience size for the tutorial. If the same or a similar
tutorial has been given before, include information on where any previous
version of the tutorial was given and how many attendees the tutorial
attracted.
- A description of special requirements for technical equipment.
Tutorial proposals should be submitted as PDF files to OpenReview:
https://openreview.net/group?id=EAMT.org/2024/Tutorials_Track.
Submissions should be formatted according to the templates specified below.
Anonymisation is not required. Submissions should be no longer than 4 pages
(excluding references).
*** Templates for writing your proposal ***
There templates available in the following formats (check our website --
https://eamt2024.sheffield.ac.uk/conference-calls/call-for-papers):
- LaTeX
- Cloneable Overleaf template
- Word
- Libre Office/Open Office
- PDF
*** Evaluation Criteria ***
Each tutorial proposal will be evaluated according to its clarity and
preparedness, novelty or timely character of the topic, and instructors’
experience.
** Tutorial Instructor Responsibilities ***
Accepted tutorial presenters will be notified by 8 April 2024. They must
then provide abstracts of their tutorials for inclusion in the conference
registration material by the specific conference deadlines. The description
should be in two formats: (a) an ASCII version that can be included in
email announcements and published on the conference website, and (b) a PDF
version for inclusion in the electronic proceedings (detailed instructions
will be provided). Tutorial speakers must provide tutorial materials by 15
May 2024. The final submitted tutorial materials must minimally include
copies of the course slides and a bibliography for the material covered in
the tutorial.
For each tutorial being held at EAMT 2024, we offer free registration to
the conference for one tutor only.
*** Important Dates ***
- Submission deadline for tutorial proposals: 8 March 2024
- Notification of acceptance: 8 April 2024
- Tutorial slides + abstract + bibliography + any other materials: 15 May
2024
All deadlines are at 23:59 CEST.
*** Workshop Co-Chairs ***
Mary Nurminen (Tampere University)
Diptesh Kanojia (University of Surrey)
*** Local organising committee ***
Carolina Scarton (University of Sheffield)
Charlotte Prescott (ZOO Digital)
Chris Bayliss (ZOO Digital)
Chris Oakley (ZOO Digital)
Xingyi Song (University of Sheffield)
--
*Carolina Scarton*
Lecturer in Natural Language Processing
Department of Computer Science
University of Sheffield
http://staffwww.dcs.shef.ac.uk/people/C.Scarton/