UCLouvain is looking for:
a postdoctoral researcher in machine learning / natural language processing
- Full-time (100%) fixed-term contract of two years
- for the Centre de traitement automatique du langage (Cental) within the
Institut Langage & Communication (IL&C) in UCLouvain (Louvain-la-Neuve)
- Start date : as soon as possible
This postdoctoral position offer is part of a research project led by the
Cental (https://uclouvain.be/fr/instituts-recherche/ilc/cental) around
legal data processing.
Regarding the concrete application, the project aims at automatizing
the analysis
of documents related to clinic trials (meeting minutes, legal documents,
contracts, ...) to assess their compliance to RGPD. The proposed solution
should thus be flexible enough to, on one hand, ensure that the model(s)
can be adapted to the various document types and, on the other hand, limit
the need of specialists' expertise for training data annotation. In
consequence, the scientific core of this project is directly related fo the
question of few-shot learning, which we intend to address through active
learning and meta-learning.
The role of the hired postdoc will be to (1) develop the resources needed
for learning, (2) implement an architecture that incorporates active
learning and meta-learning, (3) evaluate the models and (4) implement the
components into a web service. The postdoc will also be required to
disseminate the results through scientific publications and/or reports.
Work environment:
CENTAL is part of the Institut Langage & Communication (
https://uclouvain.be/fr/instituts-recherche/ilc), in UCLouvain. This
university is located in Louvain-la-Neuve, Belgium (
https://uclouvain.be/fr/sites/louvain-la-neuve), a walkable city, that offers
a pleasant and dynamic living environment. The research project will be
supervised by Patrick Watrin.
Required skills:
- A completed PhD in Computer Science, Machine Learning, NLP or a similar
domain.
- Excellent programming skills:
- Python
- TensorFlow/Keras or PyTorch
- Linux (server administration)
- Knowledge of the main supervised learning algorithms and deep learning
algorithms is required
- A good knowledge of the main NLP tools and algorithms is a plus
- Strong research track record (publications, conferences, etc.)
- Autonomy, teamwork, ability to understand and analyze needs,
adaptability
- Excellent command of the French language (at least C1) and good command
of English (at least B2)
Conditions:
- Fixed-term contract of one year, renewable once
- Salary based on experience, ranging from 4250€ to 4850€ (monthly, gross)
The position requires residency in Belgium. Candidates from outside the EU
are responsible for obtaining the adequate visa and/or permits, with support
from the UCLouvain.
How to apply:
- Deadline : February 15
- The application file should be sent electronically to Patrick Watrin (
patrick.watrin(a)uclouvain.be) and contain:
- A detailed resume showing the adequate qualifications and skills,
as well and the scientific/academic experiences and publications;
- A cover letter in french, describing your interest for the role,
how your profile complies with the project's needs, etc.;
- A recommendation letter in french or in english.
The shortlisted candidates will be invited to participate in a remote videocall
(details will be communicated in a timely manner).
The Autogramm project (https://autogramm.github.io/en) invites applications for a 3-year PhD position starting between now and October 2023. The position is funded by ANR (Agence National de la recherche), France.
Applications and questions can be sent to Sylvain Kahane <sylvain(a)kahane.fr>
Applications should include:
- Cover letter outlining interest in the position
- Names of two referees
- Curriculum Vitae (CV) with publications (if applicable)
- Copy of MA degree
- University grade sheet of at least the two last years
Today, we have databases concerning several dozen languages, including corpora annotated according to the same principle, thanks in particular to corpora annotated in interlinear gloss (IGT, see for example the Pangloss collection, https://pangloss.cnrs.fr) or with the Universal Dependencies annotation scheme (UD, https://universaldependencies.org and its SUD variant, https://surfacesyntacticud.github.io/). These databases allow typological studies and have several advantages:
- the results obtained are based directly on primary data (corpora) and not secondary data (grammars written by linguists). (This is only partially true, since the results still depend on the choices made by a linguist in selecting the corpus and annotating it; nevertheless, these choices are visible and can be discussed.)
- the results are reproducible as long as the data are freely accessible;
- the nature of the data allows for quantitative results: we will not say that a language is OV or VO, but that it has such and such a percentage of OV constructions, and we will be able to observe directly on the data which factors determine the distribution between OV and VO (Levshina 2019, Gerdes et al. 2019, Futrell et al. 2015). (See also https://typometrics.elizia.net/#/.)
The goal of the thesis topic is to contribute to the development of quantitative typology by participating in the construction of a quantitative database on a large number of typologically diverse languages and by focusing on the exploitation of such a dataset (Levshina 2022). The originality of the project lies in the fact that we are working on quantitative data and not on categorical features like existing typological databases (see in particular the Word Atlas of Language Structure online, https://wals.info/, which gives access to data on more than 2500 languages).
The following questions can be studied:
- How to identify cross-linguistic regularities, such as quantitative entailment universals, from a set of corpora of world languages (see for example Gerdes et al. 2021)? How can we make inferences between quantitatively valued features?
- What quantitative information can be extracted from a corpus that is useful for a typological study? Which features require prior annotation of the data and what is the nature of the annotations needed (see for example the case of IGT for morphosyntactic features and treebanks for word order).
- How to identify the typological signature of a language from an annotated corpus and determine what makes it special within a group of languages (see Bickel & Nichols 2002 and AutoTyp project).
- How to take into account the imbalance of a database that is not representative of the distribution of languages in the world, but includes a higher proportion of languages from certain regions or families (Indo-European languages, Semitic languages, East Asian languages, etc.) to the detriment of other regions or families (Papua New Guinea, Oceania, Sub-Saharan Africa, Amerindian languages, aboriginal languages)? (see Guzmán Naranjo & Becker 2022).
- How to solve the question of the commensurability of the categories used in the description of the different languages? How can we check the consistency of the data? This question can be addressed by studying the consistency of treebanks of the same language or language family. How to detect the presence of aberrations in some treebanks (categorization choices not conforming to the universal scheme, e.g. assignment of the subject relation in ergative languages, use of the ADJ category in languages without real adjectives, etc.)?
- How to visualize multidimensional quantitative data? Linguistic data pose many challenges.
The work will be conducted in collaboration with the members of the ANR Autogramm project (https://autogramm.github.io/), researchers in field linguistics, typology, formal linguistics and automatic language processing. It could lead, with the help of engineers, to the constitution of a typometric database accompanied by query and data visualization tools.
Bickel & Nichols 2021
Futrell 2015
Gerdes et al. 2019
Gerdes et al. 2021
Guzmán Naranjo & Becker 2022
Levshina 2019
Levshina 2022
Hello All,
Happy New Year 2023 ! Sorry for cross-posting .
Please feel free to spread a word about the PhD position on "Computational
Journalism" in my group.
Computational Social Science group (https://css.cs.ut.ee/) is looking for
motivated researchers who are interested in working on the topics
of computational journalism, especially on understanding echo
chambers, biasness in news media, fairness in news media
applications (recommendation). We expect the candidate to know one or more
aspects of the following techniques and programming languages (if not all):
(i) Preferred programming languages: Python or R.
(ii) Exploratory data analysis: feature extraction, visualization, etc.
(ii) Machine learning and deep learning with some hands-on experience.
(iv) Social media analysis: This includes collecting data
from Twitter/Reddit and analyze it for more insight. An ideal
candidate should be mindful of what's going on social media as well.
(v) Social network analysis and Natural Language Processing.
Program Benefits
================
The funding covers the student fees and a monthly stipend of 2000
Euros (gross salary) for 4 years and Tuition fee is waived.
Health insurance is provided
Academic and industrial professional development including travel support.
Interaction with world-renowned external board members and speakers.
Travel grant for attending conferences and workshops.
Location of PhD study: Institute of Computer Science, University of Tartu,
Estonia.
Institute of Computer Science is located in the University of Tartu Delta
Centre (https://delta.ut.ee/en/) and it is a unique multidisciplinary
centre for digital technology, analytics and economic thought, bringing
together more than 2500 students, university teachers, scientists and R&D
staff from companies. In short you will get an opportunity to work in a
diverse environment and collaborate with colleagues. Delta Centre opened in
January 2020 and is one of the most modern centres of digital technology,
analytical and economic thought in the Nordic region.
University of Tartu is the leading higher education and research center in
Estonia, with more than 16000 students and 1800 academic staff. It is also
the highest ranked university in the Baltic States according to both the
Times Higher Education and the QS World University rankings. University of
Tartu's Institute of Computer Science, ranks 176-200 (according to Times
Higher Education), and hosts 750 Bachelors and Masters students and 60
doctoral students. The institute has a strong international orientation:
over 40% of graduate students and a quarter of academic and research staff
members are international. Graduate teaching in the institute is in English.
Estonia is famous for its e-approach and home to many startups like Skype,
Transferwise and Bolt to name a few. Tartu, university town, is the second
largest city of Estonia and is relatively less expensive (compared to its
neighbors like Sweden and Finland) and is surrounded by nature within the
walkable distance from the city.
The applicant should have:
- Applicant should have a master's degree in computer science, mathematics
or other relevant discipline,
- Excellent programming skills.
- A good command of spoken and written English,
- Background in statistics/Data Mining/Machine Learning, social
media analysis would be ideal. Knowledge of social network analysis would
be an additional advantage.
Applications with a CV (max. 2 page), with experience in research
(publications) and knowledge of programming languages/tools, can be sent to
rajesh.sharma(a)ut.ee with the subject "PhD application".
If you have any queries, please do not hesitate to contact me.
Kind Regards
Rajesh Sharma,
Associate Professor
Head, Computational Social Science Group
Institute of Computer Science
University of Tartu, Estonia.
Group webpage https://css.cs.ut.ee/
Dear colleagues,
Happy new year ! We are extending the deadline for this call to the 15th of February. At the request of some authors, we also adapted the most recent JLM LaTeX template so that it be compatible with overleaf, it can be found here: https://fr.overleaf.com/latex/templates/template-for-journal-of-language-mo…
Please find below the updated call:
-------------
We invite researchers in the broad area of computational morphology to submit their recent, unpublished work to a special issue of the Journal of Language Modelling <https://jlm.ipipan.waw.pl/index.php/JLM><https://jlm.ipipan.waw.pl/index.php/JLM>.
Motivation:
Computational techniques have a long history of use in the study of morphology, where they have been used both for practical tasks such as the analysis and production of complex word forms and for theoretical ones such as structural and informational analysis of morphological systems. As both systems and datasets improve, these techniques are increasingly developed and evaluated on a typologically diverse array of languages, including many which are endangered or lack large-scale resources. Detailed comparisons across languages can help to reveal typological biases or assumptions within existing computational techniques [1, 2]. Alternatively, computational methods and analyses can also shed light on questions within linguistic typology [3, 4, 5, 6].
The goal of this special issue is to bring researchers from multiple communities together in exploring issues of linguistic typology across a wide range of different languages and phenomena. We encourage the submission of work on endangered or less-studied languages.
The Journal of Language Modelling is a free (for readers and authors alike) open-access peer-reviewed journal. All articles are peer-reviewed by at least 3 reviewers, usually including at least one member of the Editorial Board.
Topics of interest:
- Typological clustering or classification of languages
- Investigation of particular linguistic features which improve or detract from the performance of computational morphology tools
- Comparison of morphological structures (e.g., inflection classes, implicative networks) across typologically different languages
- Investigation of diachronic typological change using computational methods
- Creation, curation or analysis of typological databases via computational methods
Submissions:
The submissions should be journal papers, not proceedings papers, totalling 25-50 pages, excluding references.
Authors are advised to use the online manuscript submission for the journal. Make sure to select the special issue when asked to provide the article type. More information, including formatting instructions for authors can be found on the journal's webpage at: https://jlm.ipipan.waw.pl/index.php/JLM/about/submissions. An adaptation of the LaTeX template for overleaf can be found at: https://fr.overleaf.com/latex/templates/template-for-journal-of-language-mo….
Important dates:
Call for papers issued: 15/7/2022
Submissions due: 15/1/2023 --- extended to 15/02/2023
Author notification: Spring 2023
Guest editors:
Sacha Beniamine (University of Surrey)
Micha Elsner (The Ohio State University)
Katharina Kann (University of Colorado, Boulder)
References
[1] Ryan Cotterell, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner, and Mans Hulden. 2016a. The SIGMORPHON 2016 shared Task— Morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 10–22, Berlin, Germany. Association for Computational Linguistics.
[2] Huiming Jin, Liwei Cai, Yihui Peng, Chen Xia, Arya McCarthy, and Katharina Kann. 2020. Unsupervised morphological paradigm completion. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6696– 6707, Online. Association for Computational Linguistics.
[3] Neil Rathi, Michael Hahn, and Richard Futrell. 2021. An Information-Theoretic Characterization of Morphological Fusion. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10115–10120, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
[4] Parker, J., Reynolds, R., & Sims, A. (2022). Network Structure and Inflection Class Predictability: Modeling the Emergence of Marginal Detraction. In A. Sims, A. Ussishkin, J. Parker, & S. Wray (Eds.), Morphological Diversity and Linguistic Cognition (pp. 247-281). Cambridge: Cambridge University Press. DOI: 10.1017/9781108807951.010
[5] Guzmán Naranjo, Matías and Becker, Laura. Statistical bias control in typology. Linguistic Typology, to appear, 2021. DOI: 10.1515/lingty-2021-0002
[6] Sacha Beniamine. 2021. One lexeme, many classes: Inflection class systems as lattices. In Berthold Crysmann & Manfred Sailer (eds.), One-to-many relations in morphology, syntax, and semantics, 23--51. Berlin: Language Science Press. DOI: 10.5281/zenodo.4729789
Sinn und Bedeutung 28 will take place at Ruhr University Bochum (RUB) from September 5-8, 2023. The conference is jointly organized by the RUB Department of Linguistics, the Linguistic Data Science Lab, the Department of German Language and Literature, and the Departments of Philosophy I and II. The conference will feature a three-day main session (Sept. 6-8) and two parallel one-day special sessions on The Semantics and Pragmatics of Co-Speech / Co-Sign Communication and on Big Data in Semantics and Pragmatics (Sept. 5).
Conference Website: https://www.ruhr-uni-bochum.de/sub28/
Invited Speakers (main session):
— Dorothy Ahn (Rutgers University)
— Hazel Pearson (Queen Mary University of London)
— Graham Priest (City University of New York, University of Melbourne, RUB)
Invited Speakers (special sessions):
Semantics and Pragmatics of Co-Speech / Co-Sign Communication
— Cornelia Ebert (Goethe University Frankfurt)
Big Data in Semantics and Pragmatics
— Racquel Fernandez (University of Amsterdam)
Call for Papers
We invite abstract submissions for talks or posters on topics pertaining to natural language semantics, pragmatics, the syntax-semantics interface, super semantics, philosophy of language, and psycho-/neurolinguistic investigations related to meaning. We specifically welcome submissions on the semantics of under-represented languages and phenomena.
Abstracts should contain original research that, at the time of submission, has neither been published nor accepted for publication. One person can submit at most one abstract as sole author and one abstract as co-author (or two co-authored abstracts) for the main session and special session combined.
Submissions must be anonymous and must not reveal the identity of the authors in any form.
Abstracts should fit two pages (letter size or A4 paper, 2.54cm or 1 inch margins on all sides, 12 point font, Times New Roman), with an additional third page used *exclusively* for the following elements: references (obligatory), large figures or tables, as many lines of text as there are lines of glosses and translations in non-English glossed examples.
Abstracts must be submitted in PDF format via EasyChair by Wednesday, March 15, 2023 (23:59 Central European Standard Time): https://easychair.org/conferences/?conf=sub28. Easychair will open for submissions on January 15, 2023.
Note: Since Bochum gets very busy during the summer, we strongly recommend booking your accommodation as early as possible (with a cancellation option).
Important Dates:
— Submission deadline: March 15, 2023
— Notification of acceptance: May 30, 2023
— Special sessions: September 5, 2023
— Main session: September 6-8, 2023
Organizers:
— Kristina Liefke (RUB Philosophy II)
— Ralf Klabunde (RUB Linguistics, Linguistic Data Science Lab)
— Agata Renans (RUB Linguistics)
— Daniel Gutzmann (RUB German Language & Literature)
— Tatjana Scheffler (RUB German Language & Literature)
— Dolf Rami (RUB Philosophy I)
— Heinrich Wansing (RUB Philosophy I)
— Markus Werning (RUB Philosophy II)
Email: sub28(a)ruhr-uni-bochum.de <mailto:sub28@ruhr-uni-bochum.de>
---
Jun.-Prof. Dr. Tatjana Scheffler (she/her)
GB 5/157
Ruhr-Universität Bochum
Fakultät für Philologie, Germanistik
Universitätsstraße 150
44780 Bochum
Germany
Mail: tatjana.scheffler(a)rub.de
Web: http://staff.germanistik.rub.de/digitale-forensische-linguistik/
Tel.: +49 234 32-21471
Apologies for cross-posting.
----------------------------------------
*The International Conference on Spoken Language Translation*
ACL – 20th IWSLT 2023
July 13-14, 2023 – Toronto, Canada
http://iwslt.org
The International Conference on Spoken Language Translation (IWSLT) is the
premier annual conference for all aspects of Spoken Language Translation.
Every year, the conference organizes and sponsors open evaluation campaigns
around key challenges in simultaneous and consecutive translation, under
real-time/low latency or offline conditions and under low-resource or
multilingual constraints. System descriptions and results from
participants’ systems and scientific papers related to key algorithmic
advances and best practices are presented.
IWSLT is the venue of the SIGSLTs, the Special Interest Group on Spoken
Language Translation of ACL, ISCA and ELRA. With a track record of 19
years, IWSLT benchmarks and proceedings serve as references for all
researchers and practitioners working on speech translation and related
fields.
In 2023, IWSLT will be co-located with ACL and will be run as a hybrid
meeting.
Important Dates
January 14, 2023: Release of shared task training and dev data
April 24, 2023: Scientific paper submission deadline
April 01-15, 2023: Evaluation period
May 22, 2023: Notification of acceptance
June 06, 2023: Camera-ready paper due
July 12, 2023: Pre-recorded video due
July 13-14, 2023: Conference
Evaluation
IWSLT 2023 features shared tasks <https://iwslt.org/2023/#shared-tasks>
that address the following focus areas:
– Speech translation of talks
– Speech-to-speech translation of multi-source data
– Speech dubbing of multi-source data
– Dialectal and Low-resource speech translation
– Formality control for SLT
Training and development data for each shared task will be prepared and
released by the respective organizers (for further information on this
initiative, please refer to the website). Participants will receive
instructions about how to submit their runs. The results of all tasks will
be collected and discussed in an overview paper that will be presented at
the conference. In addition, participants have the opportunity to present
their work through a system paper that will be published in the ACL
Proceedings.
Conference
IWSLT also invites submissions of scientific papers to be published in the
ACL Proceedings and presented either in oral or poster format. The
conference selects high-quality, original contributions on theoretical and
practical issues of spoken language translation research, technologies and
applications.
Contact
Please send an email to iwslt-evaluation-campaign(a)googlegroups.com if you
have any questions related to the shared tasks.
Thanks,
Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
(IWSLT organisers)
Dear Corpora members,
Please find below a CFP for the next "Journées de la Linguistique de
Corpus".
* The 11th International Conference on Corpus Linguistics *
3-6 July 2023, Grenoble, France
* Call for Papers *
https://jlc2023.sciencesconf.org/
The International Conference on Corpus Linguistics (JLC), founded by
Geoffrey Williams in 2001 at the University of South Brittany, Lorient,
France, regularly draws together an interdisciplinary community whose
research focus is corpus linguistics. After seven gatherings in Lorient
and an interlude in Orleans in 2015 (8th International Conference on
Corpus Linguistics), the conference alighted in Grenoble in early July
2017 and in November 2019, organized by the LIDILEM Laboratory with
contributions from LIG, ILCEA4, Litt&Arts and the MSH-Alpes. Université
Grenoble Alpes is honored to host this international conference again
from July 3rd to July 6th 2023. The JLC’23 are organized in
collaboration with other labs from French universities (Lyon,
Montpellier, Toulouse): DDL, ICAR, Praxiling, CLLE.
The objective of JLC'23 is to (re)unite a community that adopts various
approaches, be they methodological or disciplinary, to promote corpus
linguistics, and to contribute to the evolution of practices in the
field by building bridges between different approaches to digital
corpora. The participants are invited to share and compare their
knowledge of tools, experiences, and findings.
In the tradition of previous conferences, the JLC in Grenoble will offer
three days of presentations, guest speakers and discussion sessions
among the participants. Training sessions on tools and methods will be
organized over a half day.
This edition of the JLC will put a particular focus on corpora and
didactics. A part of the conference will be specifically dedicated to
this theme. We expect papers that show and question the use of corpora
in teaching, be they feedback from real uses, presentation of
methodological approaches for various audiences, or more theoretical
points of view...
These days will not be limited to this theme and will be open to all
kinds of contributions on written, oral or multimodal corpora, which may
concern, in a non-exhaustive way :
1. Linguistic approaches to corpora
2. Methods and tools
3. Variations, genres, and discourse
4. Applications and uses of corpora for teaching and learning,
translation, terminology...
Guest speakers include: Florence Mourlhon-Dallies + another speaker to
be confirmed
Submissions for a presentation or a demonstration in French or English
should not exceed three pages (excluding figures and bibliographic
references) and must be anonymous. They will get double peer-reviewing
by members of the scientific board. JLC2023 will adopt the SciencesConf
system to manage communication proposals. In addition to classic
presentations, you may also propose a demonstration (identical
submission guidelines).
Publication: following the colloquium, authors are welcome to submit an
article. This collection of articles will be reviewed and published online.
Timetable:
1. First CFP: November 2022
2. Submission deadline: *Friday February 3rd 2023*
3. Notification of acceptance: Mid-April 2023
4. Final submission version: Friday May 19th 2023
5. Registration begins: May 2023
Best regards
--
Marie-Paule Jacques /Mobilisée pour la défense du service public de
l'enseignement supérieur et de la recherche/ Maitre de conférences HDR
Sciences du langage - Senior Lecturer in Linguistics INSPE et LIDILEM
(Laboratoire de linguistique et didactique des langues étrangères et
maternelles) Université Grenoble Alpes
BIONLP 2023 and Shared Tasks @ ACL 2023
https://aclweb.org/aclwiki/BioNLP_Workshop#SHARED_TASKS_2023
*Tentative* Important Dates(All submission deadlines are 11:59 p.m.
UTC-12:00 “anywhere on Earth”)May 1, 2023: Workshop Paper Due DateJune 15,
2023: Camera-ready papers dueBioNLP 2023 Workshop at ACL, July 13 OR 14,
2023, Toronto, Canada
Please watch for the updates!
SUBMISSION INSTRUCTIONS-----------------------------------------Two types
of submissions are invited: full papers and short papers.
Full papers should not exceed eight (8) pages of text, plus unlimited
references. These are intended to be reports of original research. BioNLP
aims to be the forum for interesting, innovative, and promising work
involving biomedicine and language technology, whether or not yielding high
performance at the moment. This by no means precludes our interest in and
preference for mature results, strong performance, and thorough
evaluation. Both types of research and combinations thereof are
encouraged.
Short papers may consist of up to four (4) pages of content, plus unlimited
references. Appropriate short paper topics include preliminary results,
application notes, descriptions of work in progress, etc.
Electronic SubmissionSubmissions must be electronic and in PDF format,
using the Softconf START conference management system Submissions need to
be anonymous.
*The submission site will be announced shortly.*
Dual submission policy: papers may NOT be submitted to the BioNLP 2017
workshop if they are or will be concurrently submitted to another meeting
or publication.
WORKSHOP OVERVIEW AND
SCOPE---------------------------------------------------The BioNLP workshop
associated with the ACL SIGBIOMED special interest group has established
itself as the primary venue for presenting foundational research in
language processing for the biological and medical domains. The workshop is
running every year since 2002 and continues getting stronger. BioNLP
welcomes and encourages work on languages other than English, and inclusion
and diversity. BioNLP truly encompasses the breadth of the domain and
brings together researchers in bio- and clinical NLP from all over the
world. The workshop will continue presenting work on a broad and
interesting range of topics in NLP. The interest to biomedical language has
broadened significantly due to the COVID-19 pandemic and continues to grow:
as access to information becomes easier and more people generate and access
health-related text, it becomes clearer that only language technologies can
enable and support adequate use of the biomedical text.
BioNLP 2023 will be particularly interested in language processing that
supports DEIA (Diversity, Equity, Inclusion and Accessibility). The work on
detection and mitigation of bias and misinformation continues to be of
interest. Research in languages other than English, particularly,
under-represented languages, and health disparities are always of interest
to BioNLP.
Other active areas of research include, but are not limited to:
Tangible results of biomedical language processing applications;Entity
identification and normalization (linking) for a broad range of semantic
categories;Extraction of complex relations and events;Discourse
analysis;Anaphora/coreference resolution;Text mining / Literature based
discovery;Summarization;Τext simplification;Question Answering;Resources
and strategies for system testing and evaluation;Infrastructures and
pre-trained language models for biomedical NLP (Processing and annotation
platforms);Development of synthetic data & data augmentation;Translating
NLP research into practice;Getting reproducible results.
SHARED TASKS 2023-------------------------------------Shared Tasks on
Summarization of Clinical Notes and Scientific Articles
The first task focuses on Clinical Text.
Task 1A. Problem List SummarizationAutomatically summarizing patients’ main
problems from the daily care notes in the electronic health record can help
mitigate information and cognitive overload for clinicians and provide
augmented intelligence via computerized diagnostic decision support at the
bedside. The task of Problem List Summarization aims to generate a list of
diagnoses and problems in a patient’s daily care plan using input from the
provider’s progress notes during hospitalization.This task aims to promote
NLP model development for downstream applications in diagnostic decision
support systems that could improve efficiency and reduce diagnostic errors
in hospitals. This task will contain 768 hospital daily progress notes and
2783 diagnoses in the training set, and a new set of 300 daily progress
notes will be annotated by physicians as the test set. The annotation
methods and annotation quality have previously been reported here. The goal
of this shared task is to attract future research efforts in building NLP
models for real-world decision support applications, where a system
generating relevant and accurate diagnoses will assist the healthcare
providers’ decision-making process and improve the quality of care for
patients.
Task 1B. Radiology report summarizationRadiology report summarization is a
growing area of research. Given the Findings and/or Background sections of
a radiology report, the goal is to generate a summary (called an Impression
section) that highlights the key observations and conclusions of the
radiology study.
The research area of radiology report summarization currently faces an
important limitation: most research is carried out on chest X-rays. To
palliate these limitations, we propose two datasets: A shared summarization
task that includes six different modalities and anatomies, totalling 79,779
samples, based on the MIMIC-III database.A shared summarization task on
chest x-ray radiology reports with images and a brand new out-of-domain
test-set from Stanford.
SEE MORE at: https://vilmedic.app/misc/bionlp23/sharedtask
Task 2. Lay Summarization of Biomedical Research ArticlesBiomedical
publications contain the latest research on prominent health-related
topics, ranging from common illnesses to global pandemics. This can often
result in their content being of interest to a wide variety of audiences
including researchers, medical professionals, journalists, and even members
of the public. However, the highly technical and specialist language used
within such articles typically makes it difficult for non-expert audiences
to understand their contents.
Abstractive summarization models can be used to generate a concise summary
of an article, capturing its salient point using words and sentences that
aren’t used in the original text. As such, these models have the potential
to help broaden access to highly technical documents when trained to
generate summaries that are more readable, containing more background
information and less technical terminology (i.e., a “lay summary”).
This shared task surrounds the abstractive summarization of biomedical
research articles, with an emphasis on controllability and catering to
non-expert audiences. Through this task, we aim to help foster increased
research interest in controllable summarization that helps broaden access
to technical texts and progress toward more usable abstractive
summarization models in the biomedical domain.
For more information, see:
Main site: https://biolaysumm.org/CodaLab page - subtask 1:
https://codalab.lisn.upsaclay.fr/competitions/9541CodaLab page - subtask 2:
https://codalab.lisn.upsaclay.fr/competitions/9544
*Workshop Organizers* Dina Demner-Fushman, US National Library of
Medicine Kevin Bretonnel Cohen, University of Colorado School of Medicine
Sophia Ananiadou, National Centre for Text Mining and University of
Manchester, UK Jun-ichi Tsujii, National Institute of Advanced Industrial
Science and Technology, Japan
BIONLP 2023 and Shared Tasks @ ACL 2023
https://aclweb.org/aclwiki/BioNLP_Workshop#SHARED_TASKS_2023
*Tentative* Important Dates
(All submission deadlines are 11:59 p.m. UTC-12:00 “anywhere on Earth”)
May 1, 2023: Workshop Paper Due Date
June 15, 2023: Camera-ready papers due
BioNLP 2023 Workshop at ACL, July 13 OR 14, 2023, Toronto, Canada
Please watch for the updates!
SUBMISSION INSTRUCTIONS
-----------------------------------------
Two types of submissions are invited: full papers and short papers.
Full papers should not exceed eight (8) pages of text, plus unlimited
references. These are intended to be reports of original research. BioNLP
aims to be the forum for interesting, innovative, and promising work
involving biomedicine and language technology, whether or not yielding high
performance at the moment. This by no means precludes our interest in and
preference for mature results, strong performance, and thorough
evaluation. Both types of research and combinations thereof are
encouraged.
Short papers may consist of up to four (4) pages of content, plus unlimited
references. Appropriate short paper topics include preliminary results,
application notes, descriptions of work in progress, etc.
Electronic Submission
Submissions must be electronic and in PDF format, using the Softconf START
conference management system
Submissions need to be anonymous.
*The submission site will be announced shortly.*
Dual submission policy: papers may NOT be submitted to the BioNLP 2017
workshop if they are or will be concurrently submitted to another meeting
or publication.
WORKSHOP OVERVIEW AND SCOPE
---------------------------------------------------
The BioNLP workshop associated with the ACL SIGBIOMED special interest
group has established itself as the primary venue for presenting
foundational research in language processing for the biological and medical
domains. The workshop is running every year since 2002 and continues
getting stronger. BioNLP welcomes and encourages work on languages other
than English, and inclusion and diversity. BioNLP truly encompasses the
breadth of the domain and brings together researchers in bio- and clinical
NLP from all over the world. The workshop will continue presenting work on
a broad and interesting range of topics in NLP. The interest to biomedical
language has broadened significantly due to the COVID-19 pandemic and
continues to grow: as access to information becomes easier and more people
generate and access health-related text, it becomes clearer that only
language technologies can enable and support adequate use of the biomedical
text.
BioNLP 2023 will be particularly interested in language processing that
supports DEIA (Diversity, Equity, Inclusion and Accessibility). The work on
detection and mitigation of bias and misinformation continues to be of
interest. Research in languages other than English, particularly,
under-represented languages, and health disparities are always of interest
to BioNLP.
Other active areas of research include, but are not limited to:
Tangible results of biomedical language processing applications;
Entity identification and normalization (linking) for a broad range of
semantic categories;
Extraction of complex relations and events;
Discourse analysis;
Anaphora/coreference resolution;
Text mining / Literature based discovery;
Summarization;
Τext simplification;
Question Answering;
Resources and strategies for system testing and evaluation;
Infrastructures and pre-trained language models for biomedical NLP
(Processing and annotation platforms);
Development of synthetic data & data augmentation;
Translating NLP research into practice;
Getting reproducible results.
SHARED TASKS 2023
-------------------------------------
Shared Tasks on Summarization of Clinical Notes and Scientific Articles
The first task focuses on Clinical Text.
Task 1A. Problem List Summarization
Automatically summarizing patients’ main problems from the daily care notes
in the electronic health record can help mitigate information and cognitive
overload for clinicians and provide augmented intelligence via computerized
diagnostic decision support at the bedside. The task of Problem List
Summarization aims to generate a list of diagnoses and problems in a
patient’s daily care plan using input from the provider’s progress notes
during hospitalization.This task aims to promote NLP model development for
downstream applications in diagnostic decision support systems that could
improve efficiency and reduce diagnostic errors in hospitals. This task
will contain 768 hospital daily progress notes and 2783 diagnoses in the
training set, and a new set of 300 daily progress notes will be annotated
by physicians as the test set. The annotation methods and annotation
quality have previously been reported here. The goal of this shared task is
to attract future research efforts in building NLP models for real-world
decision support applications, where a system generating relevant and
accurate diagnoses will assist the healthcare providers’ decision-making
process and improve the quality of care for patients.
Task 1B. Radiology report summarization
Radiology report summarization is a growing area of research. Given the
Findings and/or Background sections of a radiology report, the goal is to
generate a summary (called an Impression section) that highlights the key
observations and conclusions of the radiology study.
The research area of radiology report summarization currently faces an
important limitation: most research is carried out on chest X-rays. To
palliate these limitations, we propose two datasets: A shared summarization
task that includes six different modalities and anatomies, totalling 79,779
samples, based on the MIMIC-III database.
A shared summarization task on chest x-ray radiology reports with images
and a brand new out-of-domain test-set from Stanford.
SEE MORE at: https://vilmedic.app/misc/bionlp23/sharedtask
Task 2. Lay Summarization of Biomedical Research Articles
Biomedical publications contain the latest research on prominent
health-related topics, ranging from common illnesses to global pandemics.
This can often result in their content being of interest to a wide variety
of audiences including researchers, medical professionals, journalists, and
even members of the public. However, the highly technical and specialist
language used within such articles typically makes it difficult for
non-expert audiences to understand their contents.
Abstractive summarization models can be used to generate a concise summary
of an article, capturing its salient point using words and sentences that
aren’t used in the original text. As such, these models have the potential
to help broaden access to highly technical documents when trained to
generate summaries that are more readable, containing more background
information and less technical terminology (i.e., a “lay summary”).
This shared task surrounds the abstractive summarization of biomedical
research articles, with an emphasis on controllability and catering to
non-expert audiences. Through this task, we aim to help foster increased
research interest in controllable summarization that helps broaden access
to technical texts and progress toward more usable abstractive
summarization models in the biomedical domain.
For more information, see:
Main site: https://biolaysumm.org/
CodaLab page - subtask 1: https://codalab.lisn.upsaclay.fr/competitions/9541
CodaLab page - subtask 2: https://codalab.lisn.upsaclay.fr/competitions/9544
*Workshop Organizers*
Dina Demner-Fushman, US National Library of Medicine
Kevin Bretonnel Cohen, University of Colorado School of Medicine
Sophia Ananiadou, National Centre for Text Mining and University of
Manchester, UK
Jun-ichi Tsujii, National Institute of Advanced Industrial Science and
Technology, Japan
Workshop on Language-Based AI Agent Interaction with Children
https://aichildinteraction.github.io/
February 21st, 2023, in Los Angeles, USA & Virtual (Hybrid Format)
Paper Submission Deadline: January 13th, 2023 (extended)
Easychair: https://easychair.org/conferences/?conf=aiaic23
Contact: https://groups.google.com/g/ai-child-interactions or
aichildinteraction(a)gmail.com
More information about registering to our workshop for those not
attending IWSDS will be shared in a separate call for participation
closer to the workshop day.
===================================================
In this workshop, we aim to bring together researchers looking into
multimodal interactions between children and artificial agents to
discuss research problems that center around interactivity and go beyond
just processing child speech. We are interested in discussing approaches
to collecting and annotating datasets involving child speech, intent
classification in child speech, designing dialogue flow with artificial
agents that primarily interact with children, as well as repair
strategies, active listening behavior, and other aspects of dialogue
modeling. Moreover, multiparty conversations involving several children,
children, and their adult caregivers or several artificial agents are of
particular interest to this workshop.
Acknowledging the early-stage nature of research in this area, the
workshop will invite short position papers as contributions. In addition
to selected talks that will be invited based on the submitted papers, we
will host roundtable discussions allowing attendees to discuss ideas,
share challenges they have faced, and highlight ideas for future research.
## Topics of Interest
The workshop welcomes contributions across a wide range of topics
including, but not limited to:
Natural Language Understanding of Child Speech
Dialogue Modeling of Child-Agent, Child-Robot, Child-Child, and
Child-Adult Speech
Conversational Flow and Repair in Dialogue with Children
Multiparty-Interaction Involving Children
Multimodal Processing of Child Interactions
Automatic Speech Recognition of Child Speech
Evaluating Child Interactions with Artificial Agents/Robots
Challenges in Designing Interactions for Children
Datasets of Child-Child, Child-Adult, or Child-Agent/Robot Interaction
Ethics and Responsible AI for Child-Agent/Robot Interaction
Related Topics
## Important Dates
- Paper submission deadline: January 13th, 2023 (extended)
- Author Notification deadline: February 1st, 2023
- Workshop: February 21st, 2023 (morning session)
## Submission Guidelines
We invite short position papers of 3-4 pages (plus additional pages for
references and appendices without page limitation), including work in
progress containing preliminary results, technical reports, case
studies, surveys, and state-of-the-art research in language-based AI
agent interactions with children. Recently submitted or published papers
are welcome to be submitted to this workshop if they are highly relevant
to the topic of the workshop. Please select the appropriate track during
the EasyChair submission to mark the submission accordingly.
Papers will be reviewed for their relevance, novelty, and scientific and
technical soundness.
Submissions do not need to be anonymized for review. All manuscripts
must be written in English and submitted electronically in PDF format
via EasyChair: https://easychair.org/conferences/?conf=aiaic23
Accepted papers will be published on the workshop website. However,
papers are still considered non-archival and can be submitted to other
conferences.
Authors of accepted papers are expected to present their paper during
the workshop in the form of a short talk, which can either be given in
person in Los Angeles, USA, or virtually via Zoom.
Authors should use the official IWSDS template:
Latex Style and Template:
https://drive.google.com/open?id=1mnzjvTlIVEsdPb2IZXbzxU8WRJj3mLiJ
Overleaf: https://www.overleaf.com/read/djcrwzgrdjvj
Word Template:
https://drive.google.com/open?id=1WmO9iLvJtO0cH1E0VSC1bPsC0vRDpzbd
## Participation
Participation in our workshop will be possible online or in person in
Los Angeles. If you are planning on attending IWSDS, then you can
register for the conference here:
https://sites.google.com/view/iwsds2023/registration
Registration to the main conference includes the workshop participation.
If you would like to participate in our workshop only, please wait for
further instructions on the registration process, which we will share in
a separate call for participation closer to the workshop day.
## Contact
If you have questions, please get in touch via our public Google Group
https://groups.google.com/g/ai-child-interactions or by sending an
e-mail to aichildinteraction(a)gmail.com