Application Deadline: 30 August 2023
Details
This project has a specific focus in managing the single greatest threat to global health, the increasing burden from infections caused by bacteria that are resistant to antibiotics (antimicrobial resistance, AMR). Doctors (humans) can’t reliably know which antibiotic to administer in an emergency. In fact, based on our earlier research they get it wrong about 20% of the time. A serious bacterial infection will look the same whether the bacteria causing the infection are resistant to certain antibiotics or not, and the first antibiotic must be selected on very limited information and be given the first hour of admission to hospital if there is a risk they have developed an infection that is spreading through their body. Understandably, this ‘high stakes’ uncertainty promotes the use of ‘broad-spectrum’ antibiotics which should be held in reserve for known drug-resistant infections.
Natural language processing (NLP) has the potential to safely unlock successful antimicrobial stewardship for AMR at the first dose. In earlier work, we used quantitative and categorical data from electronic health records (EHRs) from patients who needed emergency hospital admission to see which antibiotics were given in the emergency room, how often a patient was prescribed an antibiotic that their bacterial infection was resistant to (under-prescribing), and how often a broad-spectrum antibiotic was used when another antibiotic alternative would have been equally effective (over-prescribing). We trained a machine learning algorithm that was allowed to under-prescribe at the same rate as doctors (about 20% of the time), that could also reduce the use of broad-spectrum antibiotics by about 40% by anticipation of which patients were unlikely to have an AMR infection. This powerful proof-of-concept work shows the huge potential for AI in personalised medicine and antimicrobial stewardship at the first and most important dose. Taking the next steps in AI for AMR. We know that a lot of important information is held in free text clinician notes that aren’t reflected in the data we used to build the model, and want to understand what valuable information contained in the free text data would help improve prediction accuracy.
This project aims to analyse free-text clinician notes to retrieve valuable information that can improve the prescribing of antibiotics by more accurately predicting an individual patient’s risk of having an antibiotic-resistant infection. We are seeking a motivated student to undertake a 4 year funded PhD, in collaboration with Shionogi, a pharmaceutical company with offices in London.
Eligiblity
The successful candidate will hold a bachelor’s degree (or above) in Computer Science, Physics, Mathematics, Psychology or related discipline and have proven experience in computational linguistics, natural language processing, machine learning. Previous experience of applying AI methods to the medical domain is a strong advantage. Furthermore, the candidate will have strong programming skills, expertise in machine learning approaches and be excited be the challenges of interdisciplinary research between medicine and computer science. We want our PhD student cohorts to reflect our diverse society. UoB is therefore committed to widening the diversity of our PhD student cohorts. UoB studentships are open to all and we particularly welcome applications from under-represented groups, including, but not limited to BAME, disabled and neuro-diverse candidates. We also welcome applications for part-time study.
The University of Birmingham works closely with University Hospitals Birmingham NHS Foundation Trust (UHB), which is the single-largest Acute NHS Trust in the UK, and serves the healthcare needs of over 1.2m people in the second-largest city in the UK. PIONEER, the Health Data Research Hub for Acute Care, alone includes >1.2m patient episodes per year with >10yrs longitudinal health data. This experienced collaboration means we are uniquely positioned to develop, model and then later embed AI-supported antimicrobial stewardship within a clinical trial and electronic prescribing systems. The student will be located at the Institute of Microbiology and Infection (IMI) of the University of Birmingham, the largest academic research institute in the field of microbiology and infectious diseases in the United Kingdom. The IMI is part of the School of Medical and Dental Sciences, defining the future of health and medicine through the provision of innovative education and exceptional research.
Throughout the PhD project, regular meetings with industry partner colleagues at Shionogi will be held to monitor progression and support the student in their research. About Shionogi Established in Japan 140 years ago, Shionogi has a history of drug discovery and scientific rigour in addressing some of the toughest challenges in healthcare. Shionogi’s work in antimicrobial resistance (AMR) is a key part of our contribution to the UN Sustainable Development Goals (SDGs) - we invest the highest proportion of our pharmaceutical revenues in relevant anti-infectives R&D compared to other large pharmaceutical companies. Shionogi announced the first-ever licence agreement for an antibiotic to treat serious bacterial infections between a pharmaceutical company and a non-profit organisation driven by public health priorities. Working with the Global Antibiotic Research and Development Partnership (GARDP) and the Clinton Health Access Initiative (CHAI), the agreement aims to provide 135 countries with access. At Shionogi, our belief is that sustainable growth hinges not only on new drug creation, but also on consolidating our strengths in areas of strategic focus. Through external partnerships, we seek to bring benefits to more patients through collaboration in areas where it would be difficult for us to go it alone. Globally, the number of our partners, including partnerships across a range of industries, including academia, enables us to accelerate innovation to better help societies manage some of the most important public health threats and to take on areas where the unmet clinical need is greatest.
Funding Notes
The position offered is for three and a half years full-time study. The current (2023-24) value of the award is stipend; £18,622 pa; tuition fee: £4,712 pa. Awards are usually incremented on 1 October each following year. The package includes a Macbook Air and funding for additional training and conference attendance.
References
Moran E, Robinson E, Green C, Keeling M, Collyer B. Towards personalized guidelines: using machine-learning algorithms to guide antimicrobial selection. J Antimicrob Chemother. 2020. doi:10.1093/jac/dkaa222
Cavallaro M, Moran E, Collyer B, McCarthy ND, Green C, Keeling MJ. Informing antimicrobial stewardship with explainable AI. bioRxiv. 2022. doi:10.1101/2022.08.12.22278678
https://www.findaphd.com/phds/project/natural-language-processing-of-electr…
With best regards,
Mark Lee
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham
www.cs.bham.ac.uk/~mgl<http://www.cs.bham.ac.uk/~mgl>
15th meeting of /Forum for Information Retrieval Evaluation* HASOC-2023*/
We are excited to announce the 5th edition of HASOC, consisting of four
interesting shared tasks. We invite you to participate.
*Task 1 focus on identifying hate speech, offensive language, and
profanity in different languages using natural language processing
techniques.*
* Task 1A deals with identifying hate and offensive content in
Sinhala, a low-resource Indo-Aryan language spoken in Sri Lanka. The
task involves classifying tweets into Hate and Offensive (HOF) or
Non-Hate and Offensive (NOT). The dataset for this task is based on
the Sinhala Offensive Language Detection dataset.
* Task 1B focuses on identifying hate and offensive content in
Gujarati, another low-resource Indo-Aryan language spoken by
approximately 50 million people in India. Similarly, participants
need to classify tweets into HOF or NOT categories. The training set
for this task consists of around 200 tweets.
For more details, please visit task 1 page
<https://hasocfire.github.io/hasoc/2023/task1.html>.
*Task 2, Identification of Conversational Hate-Speech in Code-Mixed
Languages (ICHCL), addresses the challenge of identifying hate speech
and offensive content in code-mixed conversations on social media.
Code-mixed text includes multiple languages within a single
conversation. The task is divided into two subtasks.*
* In Task 2a, participants need to perform binary classification on
conversational tweets with tree-structured data. They must determine
whether a tweet, comment, or reply contains hate speech, offensive
language, or profanity (HOF) or is non-hate and offensive (NOT). The
classification should consider both the individual content and
support for hate expressed in the parent tweet.
* Task 2b involves the classification of conversational tweets with
tree-structured data into specific forms of hate. Participants must
identify if the tweet, comment, or reply contains standalone hate
(SHOF), contextual hate (CHOF) that supports hate expressed in the
parent, or if it is non-hate (NONE).
For more details, please visit Task 2 webpage.
<https://hasocfire.github.io/hasoc/2023/ichcl.html>
*Task 3 aims to detect hateful spans within a sentence already
considered hateful. A hate span is a set of continuous tokens that, in
tandem, communicate the explicit hatefulness in a sentence.*
* For instance, in the statement, "Women ... Can't live with them...
Can't shoot them," the portion highlighted in bold will be
considered a hateful span. This shared task aims to extract all such
spans from a hateful text.
* The input texts are all in English. The detection of hateful spans
is achieved by mapping this into a sequence labeling problem. For
every token of the sequences, we have manually annotated the start
and end of a hateful span. This is achieved by the BIO notation
tagging, where B' represents the beginning of the hate span,' I'
forms the continuation of a hate span, and' O' represents the
non-hate tag. The task is then to learn the correct sequence of the
BIO tags for a given sentence. For example, in the above sentence,
the tag sequence for the preprocessed sentence will be of the form
"women can't live with them can't shoot them" → "O O O O O B I I";
"I" notation cannot exist on its own and will always be preceded by
either an "I" or "B". Consequently, a “B” notation can be
immediately followed by an “O” in case the span is just a single word.
For more details, please visit Task 3 webpage.
<https://lcs2.in/hatenorm-2023/>
*Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese
languages. It is a binary classification task. Each dataset (for the
three languages) consists of a list of sentences with their
corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is
primarily collected from Twitter, Facebook, and Youtube comments.
*
The Macro F1 score will be the yardstick of the task. Team rank will be
determined based on the Macro F1 score of the first part.
For more details, please visit Task 4 webpage.
<https://sites.google.com/view/hasoc-2023-annihilate-hates/home>
Registration for all four tasks is open on our registration page.
<https://hasocfire.github.io/hasoc/2023/registration.html>
We believe that your expertise and contribution will be invaluable in
advancing the state-of-the-art hate speech classification. We encourage
you to participate in this exciting shared task and contribute to the
research community.
Regards,
HASOC organizing team
Hi everyone
City, University of London are looking for postdoc NLP research fellow to work on the VISION - Violence, Health and Society project, to develop methods for extraction of information on violence from public sector records. Details below. Please circulate to anyone you think might be interested.
https://www.city.ac.uk/about/jobs/apply/details.html?nPostingId=1579&nPosti…
best wishes
--
Angus
> [Apologies for cross-posting]
> ======================================================================
> EXTENDED DEADLINE TO **JULY 31**
> ======================================================================
>
> SIMBig 2023 - 10th International Conference on Information Management and Big Data
> Where: Instituto Politécnico Nacional, Mexico DF, MEXICO
> When: October 18 - 20, 2023
> Website: https://simbig.org/SIMBig2023/
>
> ======================================================================
>
> OVERVIEW
> ----------------------------------
>
> SIMBig 2023 seeks to present new methods of Artificial Intelligence (AI), Data Science, Machine Learning, Natural Language Processing, Semantic Web, and related fields, for analyzing, managing, and extracting insights and patterns from large volumes of data.
>
>
> KEYNOTE SPEAKERS (to be confirmed)
> ----------------------------------
>
 Mona Diab, Meta AI, USA
 Huan Liu, Arizona State University, USA
>
> and more to be announced soon...
>
> IMPORTANT DATES
> ----------------------------------
>
> July 24, 2023 July 31, 2023 --> Full papers and short papers due
> August 28, 2023 --> Notification of acceptance
> September 10, 2023 --> Camera-ready versions
> October 18 - 20, 2023 --> Conference held in Mexico DF, Mexico
>
> PUBLICATION
> ----------------------------------
>
> All accepted papers of SIMBig 2023 (tracks including) will be published with Springer CCIS Series <https://www.springer.com/series/7899> (to be confirmed).
>
> Best papers of SIMBig 2023 (tracks including) will be selected to submit an extension to be published in the Springer SN Computer Science Journal. <https://www.springer.com/journal/42979>
 
> TOPICS OF INTEREST
> ----------------------------------
>
> SIMBig 2023 has a broad scope. We invite contributions on theory and practice, including but not limited to the following technical areas:
>
> Artificial Intelligence
> Big/Masive Data
> Data Science
> Machine Learning
> Deep Learning
> Natural Language Processing
> Semantic Web
> Data-driven Software Engineering
> Data-driven software adaptation
> Healthcare Informatics
> Biomedical Informatics
> Data Privacy and Security
> Information Retrieval
> Ontologies and Knowledge Representation
> Social Networks and Social Web
> Information Visualization
> OLAP and Business intelligence
> Crowdsourcing
>
> SPECIAL TRACKS
> ----------------------------------
>
> SIMBig 2023 proposes six special tracks in addition to the main conference:
>
> ANLP <https://simbig.org/SIMBig2023/en/anlp.html> - Applied Natural Language Processing
> DISE <https://simbig.org/SIMBig2023/en/dise.html> - Data-Driven Software Engineering
> EE-AI-HPC <https://simbig.org/SIMBig2023/en/eeaihpc.html> - Efficiency Enhancement for AI and High-Performance Computing
> SNMAM <https://simbig.org/SIMBig2023/en/snmam.html> - Social Network and Media Analysis and Mining
>
> CONTACT
> ----------------------------------
>
> SIMBig 2023 General Chairs
>
> Juan Antonio Lossio-Ventura, National Institutes of Health, USA (juan.lossio(a)nih.gov <mailto:juan.lossio@nih.gov>)
> Hugo Alatrista-Salas, Pontificia Universidad Católica del Perú, Peru (halatrista(a)pucp.pe <mailto:halatrista@pucp.pe>)
15th meeting of Forum for Information Retrieval Evaluation HASOC-2023
We are excited to announce the 5th edition of HASOC, consisting of four interesting shared tasks. We invite you to participate.
Task 1 focuses on identifying hate speech, offensive language, and profanity in different languages using natural language processing techniques.
* Task 1A is identifying hate and offensive content in Sinhala, a low-resource Indo-Aryan language spoken mainly in Sri Lanka. The task involves classifying tweets into Hate and Offensive (HOF) or Non-Hate and Offensive (NOT). The training set for this task is based on the Sinhala Offensive Language Detection dataset, which contains 10,000 tweets.
* Task 1B focuses on identifying hate and offensive content in Gujarati, another low-resource Indo-Aryan language spoken by approximately 50 million people in India. Similarly, participants need to classify tweets into HOF or NOT categories. The training set for this task consists of around 200 tweets.
For more details, please visit task 1 page<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>.
Task 2, Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL), addresses the challenge of identifying hate speech and offensive content in code-mixed conversations on social media. Code-mixed text includes multiple languages within a single conversation. The task is divided into two subtasks.
* In Task 2a, participants need to perform binary classification on conversational tweets with tree-structured data. They must determine whether a tweet, comment, or reply contains hate speech, offensive language, or profanity (HOF) or is non-hate and offensive (NOT). The classification should consider both the individual content and support for hate expressed in the parent tweet.
* Task 2b involves the classification of conversational tweets with tree-structured data into specific forms of hate. Participants must identify if the tweet, comment, or reply contains standalone hate (SHOF), contextual hate (CHOF) that supports hate expressed in the parent, or if it is non-hate (NONE).
For more details, please visit Task 2 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Task 3 aims to detect hateful spans within a sentence already considered hateful. A hate span is a set of continuous tokens that, in tandem, communicate the explicit hatefulness in a sentence.
* For instance, in the statement, "Women ... Can't live with them... Can't shoot them," the portion highlighted in bold will be considered a hateful span. This shared task aims to extract all such spans from a hateful text.
* The input texts are all in English. The detection of hateful spans is achieved by mapping this into a sequence labeling problem. For every token of the sequences, we have manually annotated the start and end of a hateful span. This is achieved by the BIO notation tagging, where B' represents the beginning of the hate span,' I' forms the continuation of a hate span, and' O' represents the non-hate tag. The task is then to learn the correct sequence of the BIO tags for a given sentence. For example, in the above sentence, the tag sequence for the preprocessed sentence will be of the form "women can't live with them can't shoot them" → "O O O O O B I I"; "I" notation cannot exist on its own and will always be preceded by either an "I" or "B". Consequently, a “B” notation can be immediately followed by an “O” in case the span is just a single word.
For more details, please visit Task 3 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese languages. It is a binary classification task. Each dataset (for the three languages) consists of a list of sentences with their corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is primarily collected from Twitter, Facebook, and Youtube comments.
The Macro F1 score will be the yardstick of the task. Team rank will be determined based on the Macro F1 score of the first part.
For more details, please visit Task 4 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Registration for all four tasks is open on our registration page.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
We believe that your expertise and contribution will be invaluable in advancing the state-of-the-art hate speech classification. We encourage you to participate in this exciting shared task and contribute to the research community.
Regards,
HASOC organizing team
Apologies for multiple posting
***********************************
*------------------------------------------------------------------------------------------------Machine
Translation for Indian Languages (MTIL)
2023------------------------------------------------------------------------------------------------*
We invite all IR and NLP researchers and enthusiasts to participate in the
MTIL track (https://mtilfire.github.io/mtil/2023/) held in conjunction with
the Forum for Information Retrieval Evaluation (FIRE) 2023 (
http://fire.irsi.res.in/).
Indian languages have many linguistic complexities. Though some Indian
languages share syntactic similarities, some possess intricate
morphological structures. At the same time, some Indian languages are
low-resource. Therefore the machine translation models should address these
unique challenges in translating between Indian languages.
The MTIL track consists of two tasks:
1. *General Translation Task (Task 1):* Task participants should build a
machine translation model to translate sentences of the following language
pairs:
1. Hindi-Gujarati
2. Hindi-Kannada
3. Kannada-Hindi
4. Hindi-Odia
5. Odia-Hindi
6. Hindi-Punjabi
7. Punjabi-Hindi
8. Hindi-Sindhi
9. Urdu-Kashmiri
10. Telugu-Hindi
11. Hindi-Telugu
12. Urdu-Hindi
13. Hindi-Urdu
2. *Domain Specific Translation Task (Task 2)*: Task participants will
build machine translation models for Governance and Healthcare domains.
1. Healthcare:
a. Hindi-Gujarati
b. Kannada-Hindi
c. Hindi-Odia
d. Odia-Hindi
e. Hindi-Punjabi
f. Kannada-Hindi
2. Governance:
a. Hindi-Gujarati
b. Kannada-Hindi
c. Hindi-Odia
d. Odia-Hindi
e. Hindi-Punjabi
f. Kannada-Hindi
*Dataset:*
The primary source of parallel language pairs is Bharat Parallel Corpus
Collection (BPCC), released by AI4Bharat (https://ai4bharat.iitm.ac.in/bpcc
).
Participants are encouraged to add datasets of their choice, including
parallel corpora and monolingual datasets, to train their models.
More information on registration and participation in the track can be
found here: https://mtilfire.github.io/mtil/2023/
This track is being done in association with BHASHINI (
https://bhashini.gov.in/)
*Organisers*
- Prasenjit Majumder, DAIICT Gandhinagar,India and TCG CREST,
Kolkata,India
- Arafat Ahsan, IIIT-Hyderabad,India
- Asif Ekbal, IIT-Patna,India
- Saran Pandian, DAIICT Gandhinagar,India
- Ramakrishna Appicharla, IIT-Patna ,India
- Surupendu Gangopadhyay, DAIICT Gandhinagar,India
- Ganesh Epili, DAIICT Gandhinagar,India
- Dreamy Pujara, DAIICT Gandhinagar,India
- Misha Patel, DAIICT Gandhinagar,India
- Aayushi Patel, DAIICT Gandhinagar,India
- Bhargav Dave, DAIICT Gandhinagar,India
- Mukesh Jha, DAIICT Gandhinagar,India
We are seeking a highly motivated and talented individual to join our research team as a Postdoctoral Researcher in the field of Natural Language Processing. The position offers an exciting opportunity to investigate the computational and algorithmic aspects underlying modern Artificial Intelligence systems, with a specific focus on the algorithmic and application aspects of NLP-based technologies. Successful candidates will work closely with Prof. Debora Nozza, Prof. Dirk Hovy, and the MilaNLP lab.
Your profile:
- a Ph.D. in Computer Science, Computational Linguistics/NLP, Machine Learning, Data Science, or related fields.
- Excellent programming skills in Python.
- Fluency in spoken and written English. Knowledge of Italian is NOT a requirement.
- Knowledge of current neural network models and implementation tools for neural networks (e.g., PyTorch).
- Experience with publications in top-tier venues in the field of NLP/Computational Linguistics.
Position Details:
- Starting date: Oct 1 2023, or any time thereafter
- Duration: 2 years
- Deadline: 1st September 2023
- Competitive Salary: Applicants from outside Italy may qualify for a researcher taxation scheme
How to apply:
Go to the Bocconi postdoc job market page https://jobmarket.unibocconi.eu/?type=a&urlBack=/wps/wcm/connect/Bocconi/Si… and search for “Natural language processing”, where you can also find the official job description. Candidates should attach publications and a cover letter to their application.
Online interviews will take place during September 2023. Please contact debora.nozza(at)unibocconi.it if you have any questions.
Dear colleagues,
Our research group TurkuNLP at the University of Turku, Finland, has an opening for *a post doc position in corpus linguistics or NLP.*.
The position is part of the research project "Massively Multilingual Modeling of Registers in Web-Scale Data," (MMMReg) which is funded by the Academy of Finland. The project aims to explore language use in the digital world at a massively multilingual scale using neural networks. The specific focus of the project is on web registers, such as news, blogs, and how-to pages. The primary goals of the project are to analyze the linguistic characteristics of web registers across languages and to develop machine learning methods for modeling registers in large web datasets at a massively multilingual scale.
The position is for one year, starting on September 1, 2023.
The closing date for the applications is August 7, 2023 (UTC+3)
For more information on the position, please visit https://www.utu.fi/en/university/come-work-with-us/open-vacancies
Do not hesitate to get in touch if you have any questions!
Best regards,
Veronika Laippala
TurkuNLP, University of Turku, Finland
Third Workshop DL4LD 2023
Deep learning for linguistic linked data: Addressing Deep Learning,
Relation Extraction, and Linguistic Data with a Case Study on The Bigger
Analogy Test Set (BATS) https://vecto.space/projects/BATS/
Venue: Vienna, Austria, University of Vienna & online
Website: http://dl4ld2023.mruni.eu/
Date: 13 September 2023
The Cost Action CA18209 NexusLinguarum ( https://nexuslinguarum.eu )
invites you to attend the Third Workshop Deep Learning for Linguistic
Linked Data: Addressing Deep Learning, Relation Extraction, and
Linguistic Data with a Case Study on The Bigger Analogy Test Set (BATS)
– DL4LD 2023, organized as part of LDK 2023 ( http://2023.ldk-conf.org ).
We are glad to announce that the program features one keynote, Assistant
Professor Michael Cochez, and seven oral presentations.
DL4LD 2023 will be a hybrid event (in-person and online) open to anyone
interested in the topic. Online participation is still possible but
requires prior registration. The registration form (
https://ldk-registration.univie.ac.at ) for online participation will be
open until 3rd September 2023.
We are very much looking forward to seeing you in Vienna or online.
Program available here: http://dl4ld2023.mruni.eu/?page_id=323