---------------------------------------------------
TREC 2023 NeuCLIR
---------------------------------------------------
Cross-language Information Retrieval (CLIR) has been studied at TREC and subsequent evaluations for more than twenty years. Prior to the application of deep learning, strong statistical approaches were developed that work well across many languages. As with most other language technologies though, neural computing has led to significant performance improvements in information retrieval. CLIR has just begun to incorporate neural advances.
The TREC 2023 NeuCLIR track presents a cross-language information retrieval challenge. NeuCLIR topics are written in English. NeuCLIR has three target language collections in Chinese, Persian, and Russian. Topics are written in the traditional TREC format: a short title and a sentence-length description. Systems are to return a ranked list of documents for each topic. Results will be pooled, and systems will be evaluated on a range of metrics.
This year, we include two new challenges: retrieval from a corpus that includes multiple languages, and retrieval from a corpus of technical documents.
--- Task Description ---
* Single-Language News Retrieval
* Multi-Language News Retrieval
* Single-Language Technical Abstract Retrieval
* Website: https://neuclir.github.io/
* Mailing List: https://groups.google.com/g/neuclir-participants
--- Important Dates ---
Already: Evaluation document collection released
Already: Track guidelines released
Already: CLIR/MLIR: Topics released
June 30, 2023: CLIR/MLIR: Submissions due to NIST
June 30, 2023: Technical Document Topic Release
August 1, 2023: Technical Document Submission
September 30, 2023: Results distributed to participants
November 2023: TREC 2023
--- Organizing Committee ---
Dawn Lawrie, Johns Hopkins University, HLTCOE
Sean MacAvaney, University of Glasgow
James Mayfield, Johns Hopkins University, HLTCOE
Paul McNamee, Johns Hopkins University, HLTCOE
Douglas W. Oard, University of Maryland
Luca Soldaini, Allen Institute for AI
Eugene Yang, Johns Hopkins University, HLTCOE
8th Symposium on Corpus Approaches to Lexicogrammar (LxGr2023)
The symposium will take place online on 6-8 July 2023.
The programme, links to abstracts, and registration details are here:
https://sites.edgehill.ac.uk/lxgr/lxgr2023
Participation is free. Last day of registration is 4 July 2023.
If you have any questions, please contact lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>.
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
Apologies for cross-posting
Submission deadline: June 2, 2023 extended until June 16, 2023
Artificial Intelligence Research in Applied Linguistics (AIRiAL)
Conference at Teachers College, Columbia University
Theme
The Future of Artificial Intelligence in Applied Linguistics
Location
Teachers College, Columbia University
Dates
September 29-30, 2023
Plenary Speakers
Kadriye Ercikan, Vice President of Research at ETS
Alina von Davier, Chief of Assessment at Duolingo
CALL FOR PROPOSALS
The AL & TESOL Language and Technology Research Group
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home>
in the Applied Linguistics & TESOL program at Teachers College will host
the Conference on Artificial Intelligence Research in Applied Linguistics
(AIRiAL)
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/event…>.
This conference is a forum for scholarly discussions on Artificial
Intelligence research in Applied Linguistics (e.g., Natural Language
Processing, Speech Technologies, Computer Vision, and Biometrics). Applied
Linguistics is a broad field including scholarship about language analysis
and how language is learned in order to achieve some purpose or solve some
problem in the real world. It includes areas such as language acquisition,
language assessment, language use, language & technology and other related
sub-fields.
We welcome abstracts exploring the relationship between Applied Linguistics
and Artificial Intelligence that align with our conference theme. Research
areas include, but are not limited to:
-
Affective computing
-
Automated scoring
-
Conversational AI
-
Intelligent tutoring
-
Immersive technologies
-
Language models
-
Multilingual ASR
-
AI literacy education
-
AI policy decisions
-
and other related topics
Presentation Types
Papers
Formal presentations of completed research making original scholarly
contributions. Presenters will have 15-minutes to discuss their papers,
followed by 5 minutes for questions and comments from the audience.
Posters
Poster sessions provide an opportunity for the presentation of work
visually. Poster topics can be works-in-progress and research that is being
planned as well as completed projects. Presenters will discuss their
posters with participants informally during a one-hour poster session.
Proposal Evaluation Criteria
Proposals will be evaluated on (1) Contribution to the field of AI in AL,
(2) Quality of the proposal, and (3) Clarity of the abstract.
Preparation and Submission of Proposals
Please submit your abstract through this submission form
<https://tccolumbia.qualtrics.com/jfe/form/SV_50jI0FuFAvt5xem>.
Abstracts should be max. 250 words.
Submission deadline: June 2, 2023 extended until June 16, 2023
Notification date: June 30, 2023
Student Paper Award
An award will be presented to the best student paper presentation at the
conference. All authors on student papers must be actively-enrolled
graduate students at the time of the conference.
--
Erik Voss, Ph.D.
Assistant Professor, Applied Linguistics & TESOL program
Language & Technology Specialization
Department of Arts & Humanities
Teachers College, Columbia University
TC Faculty Profile <https://www.tc.columbia.edu/faculty/ev2449/>, Linkedin
Profile <https://www.linkedin.com/in/erik-voss-ph-d-941a3ab9>, Google
Scholar <https://scholar.google.com/citations?user=FMnVdjcAAAAJ&hl=en>
ALTESOL Language & Technology Research Group
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home>
AIRiAL 2023 Conference CFP
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/event…>
Open
Now
*Latest Publications*
Voss, E. (2022). Argument-based validation in the time of the COVID-19
pandemic
<https://www.taylorfrancis.com/chapters/edit/10.4324/9781003221463-6/argumen…>.
(Ch. 5) Routledge.
Voss, E. (2023). Proctoring remote language assessments
<https://www.routledge.com/Fundamental-Considerations-in-Technology-Mediated…>.
(Ch. 12) Routledge.
Dear Colleagues,
We invite you to submit a paper to a special issue about "Artificial
Intelligence and Smart Technologies for Achieving Sustainable Goals
<https://www.mdpi.com/journal/sustainability/special_issues/Z1I87XD9C3>".
Deadline for manuscript submissions: *30 August 2023*
*Introduction*
The United Nations’ Sustainable Development Goals (SDGs) provide a
blueprint for a fairer and more sustainable world for everyone. One of the
key ways to achieve these goals is through the use of artificial
intelligence (AI) and smart technologies.
Artificial intelligence (AI) has been hailed as a game-changer for
sustainable development. It has the potential to help achieve the
Sustainable Development Goals (SDGs) in a number of ways, including by
providing decision support for sustainable development planning, helping to
optimize resource use, and increasing transparency and accountability.
Smart technologies offer great potential for sustainable development
applications. Smart technologies can help to improve the efficiency of
resource use, e.g., by reducing wastage, and can also help to improve
transparency and accountability.
In this Special Issue, we invite papers that explore the potential of AI
and smart technologies for sustainable development. We are particularly
interested in papers that describe applications of these technologies that
have the potential to make a real difference to the achievement of SDGs. We
welcome papers from all sectors, including academia, industry, government,
and non-governmental organizations (NGOs).
Topics of interest include, but are not limited to, the following:
- Applications of AI and smart technologies in sustainable development;
- The potential of AI and smart technologies to help achieve the SDGs;
- Barriers and challenges to the use of AI and smart technologies for
sustainable development;
- Future directions for the use of AI and smart technologies in
sustainable development;
- Ethical considerations in the use of AI and smart technologies for
sustainable development;
- The role of AI and smart technologies in sustainable development
policy.
Submitted papers should not have been previously published nor be currently
under consideration for publication elsewhere. All papers will be
thoroughly refereed through a single-blind peer-review process. A guide for
authors and other relevant information for submission of manuscripts is
available on the Instructions for Authors page.
Special issue page:
https://www.mdpi.com/journal/sustainability/special_issues/Z1I87XD9C3
--
Hend S. Al-Khalifa, PhD
Professor
Information Technology Department
CCIS, King Saud University
Saudi Arabia, Riyadh
Website: http://fac.ksu.edu.sa/hendk
Research: http://iwan.ksu.edu.sa/
Tel: +966-11-8051437
===============
I strive to make a science of my teaching,
and when appropriate, I disseminate the results.
Dear colleagues,
We are happy to invite you to join the *Arabic NER SharedTask 2023*
<https://dlnlp.ai/st/wojood/> which will be organized as part of the WANLP
2023. We will provide you with a large corpus and Google Colab notebooks to
help you reproduce the baseline results.
دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود
المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها.
*INTRODUCTION*
Named Entity Recognition (NER) is integral to many NLP applications. It is
the task of identifying named entity mentions in unstructured text and
classifying them to predefined classes such as person, organization,
location, or date. Due to the scarcity of Arabic resources, most of the
research on Arabic NER focuses on flat entities and addresses a limited
number of entity types (person, organization, and location). The goal of
this shared task is to alleviate this bottleneck by providing Wojood, a
large and rich Arabic NER corpus. Wojood consists of about 550K tokens (MSA
and dialect, in multiple domains) that are manually annotated with 21
entity types.
*REGISTRATION*
Participants need to register via this form (
*https://forms.gle/UCCrVNZ2LaPviCZS6* <https://forms.gle/UCCrVNZ2LaPviCZS6>).
Participating teams will be provided with common training development
datasets. No external manually labelled datasets are allowed. Blind test
data set will be used to evaluate the output of the participating teams.
Each team is allowed a maximum of 3 submissions. All teams are required to
report on the development and test sets (after results are announced) in
their write-ups.
*FAQ*
For any questions related to this task, please check our *Frequently Asked
Questions*
<https://docs.google.com/document/d/1XE2n89mFLic2P9DO_sAD51vy734BOt0kgtZ6bFf…>
*IMPORTANT DATES*
- March 03, 2023: Registration available
- May 25, 2023: Data-sharing and evaluation on development set
Avaliable
- June 10, 2023 June 30, 2023: Registration deadline (Extended)
- July 20, 2023: Test set made available
- July 30, 2023: Evaluation on test set (TEST) deadline
- Augest 29, 2023: Shared task system paper submissions due
- October 12, 2023: Notification of acceptance
- October 30, 2023: Camera-ready version
- TBA: WANLP 2023 Conference.
* All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).
** All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).*
*CONTACT*
For any questions related to this task, please contact the organizers
directly using the following email address: *NERShare...(a)gmail.com
<https://groups.google.com/>* or join the google group:
*https://groups.google.com/g/ner_sharedtask2023*
<https://groups.google.com/g/ner_sharedtask2023>.
*SHARED TASK*
As described, this shared task targets both flat and nested Arabic NER. The
subtasks are:
*Subtask 1:* *Flat NER*
In this subtask, we provide the Wojood-Flat train (70%) and development
(10%) datasets. The final evaluation will be on the test set (20%). The
flat NER dataset is the same as the nested NER dataset in terms of
train/test/dev split and each split contains the same content. The only
difference in the flat NER is each token is assigned one tag, which is the
first high-level tag assigned to each token in the nested NER dataset.
*Subtask 2:* *Nestd NER*
In this subtask, we provide the Wojood-Nested train (70%) and development
(10%) datasets. The final evaluation will be on the test set (20%).
*METRICS*
The evaluation metrics will include precision, recall, F1-score. However,
our official metric will be the micro F1-score.
The evaluation of shared tasks will be hosted through CODALAB. Teams will
be provided with a CODALAB link for each shared task.
-*CODALAB link for NER Shared Task Subtask 1 (Flat NER)*
<https://codalab.lisn.upsaclay.fr/competitions/11594>
-*CODALAB link for NER Shared Task Subtask 2 (Nestd NER)*
<https://dlnlp.ai/st/wojood/>
*BASELINES*
Two baseline models trained on Wojood (flat and nested) are provided:
*Nested NER baseline:* is presented in this *article*
<https://aclanthology.org/2022.lrec-1.387/>, and code is available in
*GitHub* <https://github.com/SinaLab/ArabicNER>. The model achieves a micro
F1-score of 0.9059 (note that this baseline does not handle nested entities
of the same type).
*Flat NER baseline:* same code repository for nested NER (*GitHub*
<https://github.com/SinaLab/ArabicNER>) can also be used to train flat NER
task. Our flat NER baseline achieved a micro F1-score of 0.8785.
*GOOGLE COLAB NOTEBOOKS*
To allow you to experiment with the baseline, we authored four Google Colab
notebooks that demonstrate how to train and evaluate our baseline models.
[1] *Train Flat NER*
<https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>:
This notebook can be used to train our ArabicNER model on the flat NER task
using the sample Wojood data found in our repository.
[2] *Evaluate Flat NER*
<https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>:
this notebook will use the trained model saved from the notebook above to
perform evaluation on unseen dataset.
[3] *Train Nested NER*
<https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>:
This notebook can be used to train our ArabicNER model on the nested NER
task using the sample Wojood data found in our repository.
[4] *Evaluate Nested NER*
<https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>:
this notebook will use the trained model saved from the notebook above to
perform evaluation on unseen dataset.
*ORGANIZERS*
- Mustafa Jarrar, Birzeit University
- Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
- Mohammed Khalilia, Birzeit University
- Bashar Talafha, University of British Columbia
- AbdelRahim Elmadany, University of British Columbia
- Nagham Hamad, Birzeit University
- Alaa Omer, Birzeit University
[Apologies for multiple postings]
We are happy to announce that 1 new written corpus is now available in
our catalogue.
*Archives of "El Mundo" Newspaper – Years 2020-2022
<http://catalog.elra.info/en-us/repository/browse/ELRA-W0332/>*
ISLRN: 124-545-396-179-3 <http://www.islrn.org/resources/124-545-396-179-3>
This corpus consists of 45,658 articles in Spanish from electronic
archives of "El Mundo" Newspaper between 2020 and 2022. A few articles
also come from publications from other related media: El Mundo Alicante,
El Mundo Andalucía, El Mundo Baleares, El Mundo Catalunya, El Mundo
Valéncia et Expansión. The number of articles available per year is as
follows:
- 2020: 15,073 articles
- 2021: 14,461 articles
- 2022: 16,124 articles
TOTAL: 45,658 articles
All articles are provided in text format, including HTML tags.
This data is released thanks to Unidad Editorial Información General,
S.L.U., Spain.
This corpus may be also obtained as separate years as follows:
Archives of "El Mundo" Newspaper – Year 2020
<http://catalog.elra.info/en-us/repository/browse/ELRA-W0333/>
Archives of "El Mundo" Newspaper – Year 2021
<http://catalog.elra.info/en-us/repository/browse/ELRA-W0334/>
Archives of "El Mundo" Newspaper – Year 2022
<http://catalog.elra.info/en-us/repository/browse/ELRA-W0335/>
For more information on the catalogue or if you would like to enquire
about having your resources distributed by ELRA, please *contact us*
<mailto:contact@elda.org>.
_________________________________________
Visit the *ELRA Catalogue of Language Resources* <http://catalog.elra.info>
Visit the *Universal Catalogue* <http://universal.elra.info>**
*Archives *
<http://www.elra.info/en/catalogues/language-resources-announcements>of
ELRA Language Resources Catalogue Updates
/Our apologies if you have received multiple copies of this announcement./
*Post Title:* *PhD Studentship in Causal Machine Learning for Multi-modal
data in NLP/Healthcare*
*Location: ADAPT Centre, MTU, Cork Campus, Ireland *
*Anticipated Start Date: **September, 2023*
*Closing Date:* *27 June, 2023*
We are seeking highly motivated and talented individual to join our
research team as PhD candidate. This is a full-time, fully-funded position
that offers the opportunity to work on innovative projects and make a
significant contribution at the interface between machine learning/deep
learning, healthcare and Natural Language Processing (NLP) with potential
research direction in one of the following areas: Causal reasoning for
multi-modal
generation, Causal discovery from multi-modal data, Causal reasoning
for multi-modal
decision making, Causal inference across modalities and Evaluation metrics
for multi-modal causal learning. However, we are open to align the Ph.D.
research project with your individual interests and expertise. The specific
focus and trajectory of the research will be influenced by your personal
preferences and research objectives. Your unique perspective and ideas are
highly encouraged and valued, as they will contribute to shaping the
research project. The successful candidate will be hosted at ADAPT *Centre
@ MTU <Centre@MTU>*, Ireland and closely work with a team of mentors from
academia and industry.
*Why ADAPT Centre?*
-
Contribute to the ADAPT research agenda that pioneers and combines
research in AI driven technologies: Natural Language Processing,
Video/Text/Image/Speech processing, digital engagement & HCI, semantic
modeling, personalisation, privacy & data governance.
-
Work with our interdisciplinary team of leading experts from the
complementary fields of, Social Sciences, Communications, Commerce/Fintech,
Ethics, Law, Health, Environment and Sustainability.
-
Leverage our success. ADAPT’s researchers have signed 43 collaborative
research projects, 52 licence agreements and oversee 16 active
commercialisation funds and 52 commercialisation awards. ADAPT has won 40
competitive EU research projects and obtained €18.5 million in
non-exchequer non-commercial funding. Additionally, six spinout companies
have been formed. ADAPT’s researchers have produced over 1,500 journal and
conference publications and nearly 100 PhD students have been trained.
As an ADAPT funded PhD researcher you will have access to a network of 85
global experts and over 250 staff as well as a wide multi-disciplinary
ecosystem across 8 leading Irish universities. We can influence and inform
your work, share our networks and collaborate with you to increase your
impact, and accelerate your career opportunities. Specifically we offer:
1.
Opportunity to build your profile at international conferences and
global events.
2.
A solid career pathway through formalised training & development, expert
one-on-one supervision and exposure to top specialists.
3.
A Fully funded, 4 year PhD postgraduate studentship which includes a
tax-free stipend of approx. €18,500 per year for up to four years including
tuition fees, research and equipment costs and all costs associated with
training related covered.
*Minimum qualifications*
-
Master’s degree in either Natural Language Processing, Artificial
Intelligence, Machine Learning, Data Science, Computer Science, Computer
Engineering, Electrical and Electronic Engineering or related disciplines
with strong programming skills.
-
Expertise and interest in Machine Learning/Natural Language
Processing/Causal
Machine Learning
-
Previous scientific publication experience preferred.
-
Excellent written and verbal communication and interpersonal skills
*Application Process **(incomplete application will not be considered)*
Interested candidates can send an application with the following documents
directly to Mohammed Hasanuzzaman (*mohammed.hasanuzzaman(a)adaptcentre.ie*
<mohammed.hasanuzzaman(a)adaptcentre.ie>)
1.
Detailed curriculum vitae, including – if applicable – relevant
publications;
2.
Transcripts of degrees,
3.
The name and email contacts of two academic referees,
4.
A cover letter/letter of introduction (max 2000 words). In the letter,
applicants should include the following details:
1.
An explanation of your interest in the research to be conducted and
why you believe they are suitable for the position.
2.
Details of your final year undergraduate project (if applicable)
3.
Details of your MSc project (if applicable)
4.
Details of any relevant modules previously taken, at undergraduate
and/or Master level.
5.
Details of any relevant work experience (if applicable)
------------------------------------------------------------------------------------------------------
*Dr. Mohammed Hasanuzzaman, Lecturer, Munster Technological University
<https://www.mtu.ie/> *
*Funded Investigator, ADAPT Centre- <https://www.adaptcentre.ie/> A
<https://www.adaptcentre.ie/>* World-Leading SFI Research Centre
<https://www.adaptcentre.ie/>
*Member, Lero, the SFI Research Centre for Software
<https://lero.ie/>**C**hercheur
Associé*, GREYC UMR CNRS 6072 Research Centre, France
<https://www.greyc.fr/en/home/>
*Associate Editor:** IEEE Transactions on Affective Computing, Nature
Scientific Reports, IEEE Transactions on Computational Social Systems, ACM
TALLIP, PLOS One, Computer Speech and Language*
Dept. of CS
Munster Technological University
Bishopstown campus
Cork e: mohammed.hasanuzzaman(a)adaptcentre.ie <email(a)adaptcentre.ie>/
Ireland https://mohammedhasanuzzaman.github.io/
[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=sig…>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=sig…>
13/06/23,
13:09:55
Deadline extension: 3rd Workshop on Computational Linguistics for the Political and Social Sciences (CPSS 2023): https://sites.google.com/view/cpss2023konvens/home-page
* Workshop description *
This workshop aims at bringing together researchers and ideas from computational linguistics/NLP and the text-as-data community from political and social science to foster collaboration and catalyze further interdisciplinary research efforts between these communities.
* Potential topics *
- Modeling political communication with NLP (e.g. topic classification, position measurement)
- Mining policy debates from heterogeneous textual sources
- Modeling complex social constructs (e.g. populism, polarization, identity) with NLP methods
- Political and social bias in language models
- Methodological insights in interdisciplinary collaboration: workflows, challenges, best practices
- Application of NLP methods to understand and support democratic decision making
- Resources and tools for Political/Social Science research
- … and more
* Important dates *
- Submission deadline: June 24, 2023
- Notification of acceptance: July 18, 2023
- Camera-ready deadline: July 22, 2023
- Workshop: September 22, 2023
The workshop is co-located with KONVENS 2023 in Ingolstadt (https://www.thi.de/konvens-2023).
* Submissions *
We solicit two types of submissions:
- archival papers describing original and unpublished work (long papers: max. 8 pages, references/appendix excluded; short papers: max 4 pages, references/appendix excluded). Accepted papers will be published in the ACL anthology. For the submission format, refer to the KONVENS template.
- non-archival papers (1-page abstracts, references excluded) describing already published research or ongoing work
The two formats will meet the need of researchers from different communities, allowing the exchange of ideas in a "get to know each other" environment which we hope will foster future collaborations.
For more information, please refer to the workshop website: https://sites.google.com/view/cpss2023konvens/home-page
If you have any questions, please feel free to contact the workshop organizers.
* Organizers *
Gabriella Lapesa (U-Stuttgart)
Christopher Klamm (U-Mannheim)
Theresa Gessler (European University Viadrina)
Valentin Gold (U-Göttingen)
Simone Ponzetto (U-Mannheim)
Application links
*Netherlands*
<https://jobs.lever.co/veeva/a6c967ac-5bbb-412b-9c3d-b72d709b8da7> [
https://jobs.lever.co/veeva/a6c967ac-5bbb-412b-9c3d-b72d709b8da7]
*Germany
<https://jobs.lever.co/veeva/e73b2147-5e3c-41cf-8f9e-db64dcdd1d3a> *[
https://jobs.lever.co/veeva/e73b2147-5e3c-41cf-8f9e-db64dcdd1d3a]
Linkedin: https://www.linkedin.com/posts/activity-7061693573410254848-wJ6R
*What You'll Do*
- Adopt the latest technologies and trends in NLP to your platform
- Experience with training, fine-tuning, and serving Large Language
Models
- Design, develop, and implement an end-to-end pipeline for extracting
predefined categories of information from large-scale, unstructured data
across multi-domain and multilingual settings
- Create a robust semantic search functionality that effectively answers
user queries related to various aspects of the data
- Use and develop named entity recognition, entity-linking,
slot-filling, few-shot learning, active learning, question/answering, dense
passage retrieval, and other statistical techniques and models for
information extraction and machine reading
- Deeply understand and analyze our data model per data source and
geo-region and interpret model decisions
- Collaborate with data quality teams to define annotation tasks and
metrics and perform a qualitative and quantitative evaluation. We have more
than 1900 curators!
- Utilize cloud infrastructure for model development, ensuring seamless
collaboration with our team of software developers and DevOps engineers for
efficient deployment to production
*Requirements*
- 4+ years of experience as a data scientist (or 2+ years with a Ph.D.
degree)
- Master's or Ph.D. in Computer Science, Artificial Intelligence,
Computational Linguistics, or a related field
- Strong theoretical knowledge of Natural Language Processing, Machine
Learning, and Deep Learning techniques
- Proven experience working with large language models and transformer
architectures, such as GPT, BERT, or similar
- Familiarity with large-scale data processing and analysis, preferably
within the medical domain
- Proficiency in Python and relevant NLP libraries (e.g., NLTK, SpaCy,
Hugging Face Transformers)
- Experience in at least one framework for BigData (e.g., Ray, Spark)
and one framework for Deep Learning (e.g., PyTorch, JAX)
- Experience working with cloud infrastructure (e.g., AWS, GCP, Azure)
and containerization technologies (e.g., Docker, Kubernetes) and
experience with bashing script
- Strong collaboration and communication skills, with the ability to
work effectively in a cross-functional team
- Used to start-up environments
- Social competence and a team player
- High energy and ambitious
- Agile mindset
*You can work remotely anywhere in Germany or The Netherlands, but you have
to live in Germany or The Netherlands and be legally authorized to work
there without requiring Veeva's support for a visa or relocation. If you do
not meet this condition but you think you are an exceptional candidate,
please clarify it in a separate note, and we will consider it.About
Link: Our product offers real-time academic, social, and medical data to
build comprehensive profiles. These profiles help our life-science industry
partners find the right experts to accelerate the development and adoption
of new therapeutics. We accelerate clinical trials and equitable care. We
are proud that our work helps patients receive their most urgent care
sooner.*
*About Veeva:* Veeva is a mission-driven organization that aspires to help
our customers in Life Sciences and Regulated industries bring their
products to market, faster. We are shaped by our values: Do the Right
Thing, Customer Success, Employee Success, and Speed. Our teams develop
transformative cloud software, services, consulting, and data to make our
customers more efficient and effective in everything they do. Veeva is a
work anywhere company. You can work at home, at a customer site, or in an
office on any given day. As a Public Benefit Corporation, you will also
work for a company focused on making a positive impact on its customers,
employees, and communities.
Application links
*Netherlands*
<https://jobs.lever.co/veeva/a6c967ac-5bbb-412b-9c3d-b72d709b8da7> [
https://jobs.lever.co/veeva/a6c967ac-5bbb-412b-9c3d-b72d709b8da7]
*Germany
<https://jobs.lever.co/veeva/e73b2147-5e3c-41cf-8f9e-db64dcdd1d3a> *[
https://jobs.lever.co/veeva/e73b2147-5e3c-41cf-8f9e-db64dcdd1d3a]
Linkedin: https://www.linkedin.com/posts/activity-7061693573410254848-wJ6R
Ehsan Khoddam
Data Science Manager at Veeva Systems Inc.