*GenBench: The second workshop on generalisation (benchmarking) in NLP*
*Workshop description*The ability to generalise well is often mentioned as
one of the primary desiderata for models of natural language processing
(NLP).
Yet, there are still many open questions related to what it means for an
NLP model to generalise well, and how generalisation should be evaluated.
LLMs, trained on gigantic training corpora that are – at best – hard to
analyse or not publicly available at all, bring a new set of challenges to
the topic.
The second GenBench workshop aims to serve as a cornerstone to catalyse
research on generalisation in the NLP community.
The workshop aims to bring together different expert communities to discuss
challenging questions relating to generalisation in NLP, crowd-source
challenging generalisation benchmarks for LLMs, and make progress on open
questions related to generalisation.
Topics of interest include, but are not limited to:
- Opinion or position papers about generalisation and how it should be
evaluated;
- Analyses of how existing or new models generalise;
- Empirical studies that propose new paradigms to evaluate
generalisation;
- Meta-analyses that compare how results from different generalisation
studies compare;
- Meta-analyses that study how different types of generalisation are
related;
- Papers that discuss how generalisation of LLMs can be evaluated;
- Papers that discuss why generalisation is (not) important in the era
of LLMs;
- Studies on the relationship between generalisation and fairness or
robustness.
The second GenBench workshop on generalisation (benchmarking) in NLP will
be co-located with EMNLP 2024.
*Submission types*
We call for two types of submissions: regular workshop submissions and
collaborative benchmarking task submissions.
The latter will consist of a data/task artefact and a companion paper
motivating and evaluating the submission.
In both cases, we accept archival papers and extended abstracts.
*1. Regular workshop submissions*
Regular workshop submissions present papers on the topic of generalisation
(see examples listed above).
Regular workshop papers may be submitted as an archival paper, when they
report on completed, original and unpublished research, or as a shorter
extended abstract, otherwise.
More details on this category can be found below.
If you are unsure whether a specific topic is well-suited for submission,
feel free to reach out to the organisers of the workshop at
genbench(a)googlegroups.com.
*2. Collaborative Benchmarking Task (CBT) submissions*
The goal of this year's CBT is to generate versions of existing evaluation
datasets for LLMs which, given a particular training corpus, have a larger
distribution shift than the original test set, or – in other words –
evaluate generalisation to a stronger degree than the original dataset.
For this particular challenge, we focus on three training corpora: C4,
RedPajama-Data-1T, and Dolma.
All three corpora are publicly available, and they can be searched via the
What's in My Big Data API (https://github.com/allenai/wimbd).
We will focus on three popular evaluation datasets: MMLU, HumanEval, and
SiQA.
Submitters to the CBT are asked to design a way to assess distribution
shift for one or more of these evaluation datasets, given particular
features of the training corpus, and then generate one or more versions of
the dataset that have a larger distribution shift according to this method.
Newly generated sets do not have to have the same size as the original test
set, but should have at least 200 examples.
Practically speaking, CBT submissions consist of:
1. the data/task artefact, submitted through
https://github.com/GenBench/genbench_cbt
2. a paper describing the dataset and its method of construction,
submitted through
https://openreview.net/group?id=GenBench.org/2024/Workshop
We accept submissions that consider only one pretraining dataset and
evaluation dataset, but encourage submitters to apply their suggested
protocols to both pretraining datasets.
We also suggest that submitters include model results for models trained on
these datasets.
Suggestions are provided on the CBT website: https://genbench.org/cbt.
Given enough high-quality submissions, we aim to write a paper with the
combined results, to which submitters can be co-authors, if they wish so.
More detailed guidelines will be given on https://genbench.org/cbt.
*Archival vs extended abstract*
Archival papers are up to 8 pages excluding references and report on
completed, original and unpublished research.
They follow the requirements of regular EMNLP 2024 submissions.
Accepted papers will be published in the workshop proceedings and are
expected to be presented at the workshop.
The papers will undergo double-blind peer review and should thus be
anonymised.
Extended abstracts can be up to 2 pages excluding references, and may
report on work in progress or be cross-submissions of work that has already
appeared in another venue.
Abstract titles will be posted on the workshop website, but will not be
included in the proceedings.
*Submission instructions*For both archival papers and extended abstracts,
we refer to the EMNLP 2024 website for paper templates and requirements.
Additional requirements for both regular workshop papers and collaborative
benchmarking task submissions can be found on our website.
All submissions can be submitted through OpenReview:
https://openreview.net/group?id=GenBench.org/2024/Workshop.
*Important dates*
These deadlines are tentative, for the latest version see
https://genbench.org/workshop
- August 15, 2024: Paper submission deadline
- September 20, 2024: Notification deadline
- October 4, 2024: Camera-ready deadline
- November 15 or 16, 2024: Workshop
Note: all deadlines are 11:59 PM UTC-12:00
*Preprints*
We do not have an anonymity deadline, preprints are allowed, both before
the submission deadline as well as after.
*Contact*
Email address: genbench(a)googlegroups.com
Website: https://genbench.org/workshop
*On behalf of the organisers*Dieuwke Hupkes
Verna Dankers
Khuyagbaatar Batsuren
Amirhossein Kazemnejad
Christos Christodoulopoulos
Mario Giulianelli
Ryan Cotterell
Dear colleagues and friends,
This year, we are organizing the MedVidQA
<https://medvidqa.github.io/>challenge
with TREC 2024 <https://trec.nist.gov/pubs/call2024.html>. This challenge
aims at developing models for (1) retrieving the relevant videos and
locating the visual answer in those videos for the medical or
health-related question and (2) generating step-by-step textual summaries
of the visual instructional segment that can be considered the answer to
the medical query. This track comprises two main tasks: Video Corpus Visual
Answer Localization (VCVAL) and Query-Focused Instructional Step
Captioning (QFISC).
For more details, please visit the challenge website (
https://medvidqa.github.io/) and the TREC 2024 website (
https://trec.nist.gov/pubs/call2024.html).
Participants are required to complete their registration by submitting the
TREC 2024 Registration Form
<https://ir.nist.gov/evalbase/accounts/login/?next=/evalbase/>.
*Important Dates*
- *Release of the training and validation datasets:* April 30, 2024
- *Release of the video corpus:* May 12, 2024
- *Release of the test sets:* June 7 (Task A), September 2, 2024 (Task
B)
- *Run submission deadline:* August 12 (Task A), September 16, 2024
(Task B)
- *Release of the official results:* September 8 (Task A), October
11, 2024 (Task B),
We look forward to your participation in MedVidQA at TREC 2024.
Join our Google Group <https://groups.google.com/g/trec-medvidqa2024> for
important updates! If you have any questions, ask us on our Google Group
<https://groups.google.com/g/trec-medvidqa2024> or email
<deepak.gupta(a)nih.gov> us.
Thank you,
MedVidQA 2024 Organizers
We are seeking highly motivated and talented individuals to join our research team as Postdoctoral Researchers in the field of Natural Language Processing. The position offers an exciting opportunity to investigate the computational and algorithmic aspects underlying modern Artificial Intelligence systems, with a specific focus on the personalized and subjective approaches to Natural Language Processing. Successful candidates will work closely with Prof. Debora Nozza, Prof. Dirk Hovy, and the MilaNLP lab.
Your profile:
- a Ph.D. in Computer Science, Computational Linguistics/NLP, Machine Learning, Data Science, or related fields.
- Excellent programming skills in Python.
- Fluency in spoken and written English. Knowledge of Italian is NOT a requirement.
- Knowledge of current neural network models and implementation tools for neural networks (e.g., PyTorch).
- Experience with publications in top-tier venues in the field of NLP/Computational Linguistics.
Position Details:
- Starting date: September 1, 2024, or any time thereafter
- Duration: 2 years
- Deadline: 30th May 2024
- Competitive Salary: Applicants from outside Italy may qualify for a researcher taxation scheme
How to apply:
Go to the Bocconi postdoc job market page (https://jobmarket.unibocconi.eu/?id=601). Candidates should attach a CV and a cover letter to their application.
Online interviews will take place during June 2024. Please contact debora.nozza(at)unibocconi.it if you have any questions.
We invite applications for a 2-3 years postdoc position at CLASP,
University of Gothenburg, Sweden.
The overarching goal of the project is to construct a multimodal NLP system
that uses text and visual input alongside large language models to assure
text coherence over long documents and long sentence contexts.
Candidates should have experience with machine learning including deep
learning and neural networks, preferably with a background in mainstream
natural language processing tasks and methods (e.g., language modeling,
parsing, coreference, machine translation, discourse parsing, etc).
Interest in the cognitive processes/models behind human language processing
and understanding is a key advantage for the position.
The application deadline is June 3rd. If you have questions, don't hesitate
to email me sharid.loaiciga(a)gu.se
More details in the complete official call here:
https://web103.reachmee.com/ext/I005/1035/job?site=7&lang=UK&validator=9b89…
--------------------------------
Call for Extended Abstracts: Large Language Models and Lexicography 2024
Hotel Croatia, Cavtat, Croatia | 8 October 2024
You are invited to submit extended abstracts for the workshop Large Language Models and Lexicography, which will be held in conjunction with the Euralex 2024 congress <https://euralex.jezik.hr/> . The workshop is organised jointly by the Centre for Language Resources and Technologies, University of Ljubljana, and Jožef Stefan Institute, Ljubljana, Slovenia.
Key Information
Workshop link: https://www.cjvt.si/en/research/community/llm-lex-2024/
Date: 8 October 2024
Venue: Hotel Croatia, Cavtat, Croatia
Deadline: 3 June 2024
Submission link: <https://easychair.org/conferences/?conf=llmlex2024> https://easychair.org/conferences/?conf=llmlex2024
Notification of acceptance: beginning of July 2024
Best regards,
Simon Krek
You are kindly invited to submit a paper to the Slovenian Language
Technologies and Digital Humanities Conference (JTDH 2024), which will take
place on September 19 and 20, 2024 in Ljubljana, Slovenia:
https://www.sdjt.si/wp/jtdh-2024-en.
We welcome extended abstracts and full papers on topics that include but
are not limited to:
- speech and other mono- and multilingual language technologies;
- digital linguistics: translation studies, corpus linguistics, lexicology
and lexicography, standardisation;
- digital humanities and historical studies, ethnology, musicology,
cultural heritage, archaeology, and fine arts;
- digital humanities in education and digital publishing.
The official languages of the conference are Slovene and English.
We are happy to announce the keynotes Simon Dobnik and Barbara McGillivray,
a round table Frontiers in Speech Communication Research, and two
pre-conference events:
-
The final stop of CLASSLA Express – a series of workshops on
investigating South Slavic corpora using CLARIN.SI concordancers.
-
A joint business meeting of the CLASSLA Knowledge Centre for South
Slavic Languages and the ReLDI Centre Belgrade.
Important dates:
May 17, 2024: Deadline for abstract/paper submission
July 5, 2024: Notification of acceptance
August 23, 2024: Final abstract/paper submission
August 23, 2024: Registration deadline
September 18, 2024: Pre-conference events and workshops
September 19 & 20, 2024: JTDH 2024 Conference
More information: https://www.sdjt.si/wp/jtdh-2024-en
[Apologies for cross-posting]
DDHI-2024@JOHD
Special Collection on DATA-DRIVEN HISTORY OF IDEAS of the JOURNAL OF OPEN
HUMANITIES DATA
Guest editors: Arianna Betti & Hein van den Berg
=====================================
Important Dates
Abstracts due: 1st June 2024
Full papers due: 1st December, 2024
Call for Papers 2024
----------------------------------------------------------
The new field of data-driven history of ideas combines qualitative,
quantitative and computational methods for the study of the origins,
development and spread of ideas from any time and place. It also comes with
two challenging demands that are distinctive in the landscape of
computational humanities. The first is the demand for the adequate
representation and detection of concepts, rather than words; the second is
the need for high-quality, virtually 100% accurate large corpora in many
languages across centuries by both known and virtually unknown authors seen
as carriers of ideas. These two main demands generate in turn further needs
on resources that must be, typically, newly created or substantially
adapted for the field: datasets such as expertly curated sets of
bibliographic metadata, annotation sets and historical gazetteers,
ontologies, and network data; infrastructural facilities for collaborative
environments, and workflows that suit and support the field; ground truths
for the evaluation of models from language technology, and techniques
integrating language models with approaches and tools from data science,
visual analytics, and knowledge representation.
Results produced in the field can be published in the same way as
traditional articles in in-domain journals and books. The resources that
make data-driven enterprises in the history of ideas possible, however,
still lack an apt venue, despite the fact that work on such resources is
key to the field and can be extremely time-consuming. It is with the
intention of creating a home for openly shareable corpora, datasets and
other resources, as well as to support the work of the next generation of
researchers, that we invite submissions to a special collection of the
Journal of Open Humanities Data on Data-Driven History of Ideas.
Submissions for this special collection are welcome that focus on,
facilitate or support the study of philosophical and scientific thought of
any epoch and geographical area, geared in particular towards the origin,
development and spread of ideas.
Submission topics include, but are not limited to
--------------------------------------------------
* Textual data: high-quality, virtually 100% accurate corpora from any
epoch and language
* Ground truths and annotation datasets
* Curated collections of bibliographical metadata and full bibliographies
* Ontologies
* Lexica
* Historical gazetteers
* Collections of (historical):
* Geographic-political data eg political affiliation of cities through the
centuries
* Timeline data of authors, printers, countries
* Complete publishing histories of books
* Unique identifiers
* Network data
* Academic conference data
* Computational tools focused on DDHI:
* Multilingual and multi-layout OCR postcorrection
* Transkribus models
* Applied concept-focused work in computational linguistics, data science,
visual analytics, and knowledge representation (concept-detection,
concept-change)
* Networks and graphs
* Data visualisations for DDHI
Manuscripts will be peer reviewed after editorial consideration, and
accepted papers will be published online on a rolling basis. Please note
that there are Publication Fees
<https://openhumanitiesdata.metajnl.com/about/submissions/> for accepted
papers. Follow the submission guidelines
<https://openhumanitiesdata.metajnl.com/about/submissions/> to submit your
manuscript.
The Journal of Open Humanities Data (JOHD)
<https://openhumanitiesdata.metajnl.com/> is a growing open-access
peer-reviewed academic journal specifically dedicated to publications
describing humanities research objects, software, and methods with high
potential for reuse. These might include curated resources like (annotated)
linguistic corpora, ontologies, and lexicons, as well as databases, maps,
atlases, linked data objects, and other data sets created with qualitative,
quantitative, or computational methods.
JOHD publishes two types of papers:
-
Short data papers contain a concise description of a humanities research
object with high reuse potential from research related to the ancient
world. These are short (1000 words) highly structured narratives and must
conform to the data paper template
<https://s3-eu-west-1.amazonaws.com/ubiquity-partner-network/up/journal/johd…>.
A data paper does not replace a traditional research article, but rather
complements it.
-
Full length research papers discuss and illustrate methods, challenges,
and limitations in the creation, collection, management, access,
processing, or analysis of data in Humanities research related to the
ancient world, including standards and formats. These are intended to be
longer narratives (3000 - 5000 words), which give authors the ability to
contribute to a broader discussion around the study and representation of
the ancient world through data.
JOHD provides immediate open access to its content on the principle that
making research freely available to the public supports a greater global
exchange of knowledge. Authors remain the copyright holders and grant third
parties the right to use, reproduce, and share the article according
to the Creative
Commons <http://creativecommons.org/licenses/by/4.0/> licence agreement.
Authors are encouraged to publish their data in recommended repositories.
Please note that there are Publication Fees
<https://openhumanitiesdata.metajnl.com/about/submissions/> for accepted
papers, but authors can ask for a waiver if they do not have funding for
the fees.
Submission deadline:
1 June 2024 (abstracts due)
1 December 2024 (full papers due, upon abstract acceptance)
Submissions:
------------------
If you are interested in submitting an article, please submit an abstract
of max. 300 words by June 1, 2024 using this form:
https://docs.google.com/forms/d/e/1FAIpQLSfpHO3RYHTNJRtmJRZ4QHkorN5buq8KnwK…
You will be asked to paste the text of the abstract in the form.
Special collection guest editors: Arianna Betti (lead guest editor), Hein
van den Berg
About the Guest Editors:
Arianna Betti is Professor and Chair of Philosophy of Language at the
University of Amsterdam, and leader of the Concepts in Motion group at the
Institute for Logic, Language and Computation. After studying historical
and systematic aspects of ideas such as axiom, truth, and fact (Against
Facts, MIT Press, 2015), they now specialise in data-driven research aimed
at tracing the development of ideas such as these in a strongly
interdisciplinary setting. They have been member of the Young Academy of
the Royal Netherlands Academy of Arts and Sciences (KNAW), of the
Scientific Council of the Italian Research Council (CNR), of the Global
Young Academy (GYA), and recipient of two ERC grants (2008–2013, 2014–2015)
as well as of several major Dutch NWO grants, including a VICI (2017–2024).
Hein van den Berg obtained his PhD at the VU Amsterdam in history and
philosophy of science in 2011, with a prize-winning dissertation on Kant’s
conception of proper science and Kant’s philosophy of biology. After
obtaining a postdoctoral grant from the Royal Netherlands Academy of Arts
and Sciences (KNAW) for conducting research on the history of biology at
the Technical University Dortmund, he became assistant professor at the
Institute for Logic, Language, and Computation of the University of
Amsterdam in 2016. He does research on the history and philosophy of logic,
biology, and psychiatry. As a member of the Concepts in Motion group since
2011, he has been involved in a large number of computational and
data-driven history of ideas projects.
Fancy a trip to Amsterdam? The 12th edition of our PhD Symposium on Future Directions in Information Access (FDIA 2024), organised by the BCS Information Retrieval Specialist Group, will be held in conjunction with the 15th European Summer School on Information Retrieval (ESSIR 2024, https://2024.essir.eu/, 1-5 July 2024).
We cordially invite Masters and doctoral (PhD) students as well as early-stage researchers to submit a paper on their research topic to the symposium. You’ll learn a lot about Information Retrieval while at the school and get great feedback on your topic, meet lots of other students, and hear inspiring talks.
The FDIA Symposium provides an excellent opportunity for students to give pointers to their work and obtain experience in presenting and communicating their research.
FDIA 2024 is the next chapter in a long list of previous events. Previous symposiums were held in Vienna, Austria (with ESSIR 2023), Lisbon, Portugal (with ESSIR 2022), Milan, Italy in 2019 (with ESSIR 2019), Tianjin, China in 2018 (with ICTIR 2018), Barcelona, Spain in 2017 (with ESSIR 2017); Thessaloniki, Greece in 2015 (with ESSIR 2015); Granada, Spain in 2013 (with ESSIR 2013); Koblenz, Germany in 2011 (with ESSIR 2011), Padova, Italy in 2009 (with ESSIR 2009); London, England in 2008, and Glasgow, Scotland in 2007 (with ESSIR 2007). They have provided an entertaining and exciting forum for early-stage researchers for sharing new research ideas.
Why future directions, because we encourage submissions that focus on early research such as pilot studies, presenting challenges and future opportunities, conceptual and theoretical work, and the contributions from doctoral work.
Why information access, because it captures the broader ideas of information retrieval, storage, and management to include interaction and usage.
We especially encourage submissions on formative research ideas which present a summary of their doctoral work, initial empirical findings/pilot studies, explore conceptual and/or theoretical models, and/or describe current challenges and opportunities. Submissions focusing on new directions and emerging work in Information Access/Retrieval which create discussion and provoke a reaction are strongly encouraged.
Areas of research include, but are not limited to:
- Information Retrieval Theory
- Human-Computer Interaction and Information Retrieval, User Modelling, Interactive IR
- Collaborative Information Seeking and Searching
- IR for Good
- IR Evaluation
- Learning to Rank
- Retrieval-augmented Generation
- Neural and Generative IR
- Multimedia and Multimodal IR
- Recommender Systems
- Web IR
- Clustering and Categorization
- Enterprise Search
- Conversational Agents, knowledge graphs
- IR Applications (e.g. Digital Humanities, News IR, Legal IR, IR and Bibliometrics, Academic Search and Recommendation, etc.)
- NeuraSearch (use of fMRI, EEG, fNTIR, Eye Tracking, etc. in IR)
Papers should be 4-8 pages in length excluding references for presentation and poster (e.g., an outline of the PhD or Master’s project, a discussion of topics and ideas). Submissions should be converted to PDF and submitted via Easy Chair: https://easychair.org/my/conference?conf=fdia2024. We plan to publish the proceedings at CEUR-WS.org. Please use the one-column CEUR style (CEURART.zip).
We strongly encourage students to submit as a solo author, but papers with several authors are welcome as well. A selection of papers will be invited to give a short oral and/or poster presentation.
IMPORTANT DATES
June 6, 2024: Submission deadline
July 20, 2024: Notification deadline
July 3, 2024: FDIA in Amsterdam (during ESSIR July 1-5, 2024)
CONTACT
Please email fdia2024(a)easychair.org.
ORGANISERS
PC Chairs
Haiming Liu, University of Southampton, UK
Ingo Frommholz, University of Wolverhampton, UK
Yashar Moshfeghi, University of Strathclyde, UK
--
Ingo Frommholz (he/him), PhD, FBCS, FHEA
Reader (~Associate Professor) in Data Science
ACM CIKM 2023 General Chair
Head of Data, AI, Interaction, Retrieval and Language Group http://dairel.org
Deputy Head Digital Innovations and Solutions Centre (DISC)
University of Wolverhampton, UK
Adjunct Professor, Bern University of Applied Sciences, Switzerland
Web: http://www.frommholz.org/ | Email: ifrommholz(a)acm.org
Twitter: @iFromm | Mastodon: @ingo@idf.social
PGP/GPG fingerprint: B74E A422 C7B2 A5BB 2BC2 523B 2790 216E F8F8 D166
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x2790216EF8F8D166
You are invited to participate in the *ArAIEval Shared Task *at the *ArabicNLP
2024*:
(i) detection of propagandistic textual spans with persuasion techniques
identification (unimodal) on Arabic news articles and tweets.
(ii) distinguishing between propagandistic and non-propagandistic memes
(multimodal) in Arabic.
The shared task will be held alongside the Second Arabic Natural Language
Processing Conference (ArabicNLP 2024), co-located with the ACL2024
Conference in Bangkok, Thailand (11-16 Aug, 2024).
*Tasks*
Task 1: Unimodal (Text) Propagandistic Technique Detection:
The task is to detect the propaganda techniques used in a text and identify
the exact span(s) in which each propaganda technique appears. You will be
given text snippets from news paragraphs or tweets (multigenre). This is a
sequence tagging task.
Task 2: Multimodal propagandistic memes classification:
We offer the three subtasks as defined below:
-
Subtask 2A: Given a text extracted from a meme, detect whether it is
propagandistic or not.
-
Subtask 2B: Given a meme (text overlayed image), the task is to detect
whether the content is propagandistic.
-
Subtask 2C: Given multimodal content (text extracted from meme and the
meme itself) the task is to detect whether the content is propagandistic.
Website for detailed information: https://araieval.gitlab.io/
Registration and submission:
Task 1: https://codalab.lisn.upsaclay.fr/competitions/18111
Task 2: https://codalab.lisn.upsaclay.fr/competitions/18099
Important Dates
-
15 Mar 2024: Registration on CodaLab and the start of the development
cycle (release of training and development datasets, along with submission
for the development phase on CodaLab)
-
27 April 2023 23:59 AOE: Beginning of the evaluation cycle (test sets
release)
-
4 May 2024 23:59 AOE: End of the evaluation cycle (run submission)
-
5 May 2024: Release leaderboard
-
15 May 2024: Deadline for the submission of shared task papers
-
17 June 2024: Notification of acceptance of shared task papers
-
1 July 2024: Deadline for submission of camera-ready papers
-
16 August 2023: ArabicNLP Conference (colocated with ACL-2024)
Organizers
-
Firoj Alam, Qatar Computing Research Institute, HBKU, Qatar
-
Maram Hasanain, Qatar Computing Research Institute, HBKU, Qatar
-
Reem Suwaileh, HBKU, Qatar
-
Md. Arid Hasan, University of New Brunswick, Canada
-
Fatema Ahmed, Qatar Computing Research Institute, HBKU, Qatar
-
Md. Rafiul Biswas, HBKU, Qatar
-
Wajdi Zaghouani, HBKU, Qatar
----
*Wajdi Zaghouani, Ph.D.*
*Associate Professor in Digital Humanities*
College of Humanities and Social Sciences
P.O. Box 34110 | Education City | Doha, Qatar
tel: +974 4454 5601 | mob: +974 33454992
wzaghouani(a)hbku.edu.qa| Office A141, LAS Building
========================================
1st UniDive Training Summer School 2024
========================================
===== LATEST NEWS AND CLARIFICATIONS =====
- Due to recently *extended budget*, UniDive can now fund more than 50
trainees.
- The *funding* covers travel and stay for 6 nights
- The project submission deadline is extended to *Monday 6 May* (or until
budget exhaustion)
- One does not have to be a UniDive member to apply.
- Gradate research master students are eligible
- Eligibility for funding depends on the *affiliation* (not the
nationality). The eligible countries are:
* COST countries <https://www.cost.eu/about/members/>
* Near-Neighbor Countries (Algeria, Armenia, Azerbaijan, Belarus,
Egypt, Jordan, Kosovo, Lebanon, Libya, Morocco, Palestine, Syria, and
Tunisia)
- Please, *circulate this call* to all potential candidates to help us
enhance the coverage of low-resourced languages
======================================
Dates: *8 — 12 July 2024*
Location: *Technical University of Moldova*, Chișinău, Moldova
Coordinating Project: UNIDIVE
<https://unidive.lisn.upsaclay.fr/doku.php?id=start> (Universality,
Diversity and Idiosyncrasy in Language Technology)
Website:
https://unidive.lisn.upsaclay.fr/doku.php?id=meetings:other-events:1st_unid…
Cost: *Participants selected on the basis of their application will be
reimbursed, details are below.*
*Apply by:* *May 06, 2024*
======================================
CALL FOR APPLICATIONS
We are happy to announce the 1st edition of UNIDIVE Summer School on
Universality, Diversity and Idiosyncrasy in Language Technology. It is
dedicated mainly (but not exclusively) to young researchers and
investigators. Researchers working on low-resourced languages, dialects and
varieties are particularly welcome
SUMMER SCHOOL ACTIVITIES
- Annotation of Universal Dependencies treebank for a new language - a
course by Sylvain Kahane (Université Paris Nanterre, France)
- Annotation of multiword expressions in a new language - course by
Verginica Mititelu (Romanian Academy) and Voula Giouli (Aristotle
University of Thessaloniki and ILSP, ATHENA RC, Greece)
- Corpus annotation infrastructure- a course by Daniel Zeman (Charles
University, Czechia), Bruno Guillaume (LORIA, France) and Agata Savary
(Université Paris-Saclay, France)
- A brainstorming hackathon on topics submitted by the trainees
- Poster sessions
APPLICATIONS AND SUBMISSION GUIDELINES
Each applicant should *submit a project* for a construction of a resource
related to the topics of the training school (e.g. a new/enhanced UD
treebank, a new PARSEME corpus, a resource adding a new annotation layer on
top of a UD/PARSEME corpus, etc.). The length of the application should be
2 pages (excluding references). The application should contain:
- The title
- Applicant's name and affiliation (including the country)
- A list of 3-4 key-words
- Description of a resource related to the topics of the training
school
- Explanation how the participation in the training school will be
useful for the project
- Open questions related to the project which could be addressed
during the brainstorming hackathon
- Short statement of the project phase (planning, started, in the
process of creation)
The projects are to be submitted via the OpenReview
<https://openreview.net/group?id=UniDive/2024/Training_School> portal.
TRAINEE'S SELECTION CRITERIA
We can fund at least 40 trainees, the selection criteria include:
- Trainee's country: trainees only from COST countries[1]
<https://mail.math.md/?_task=mail&_caps=pdf%3D1%2Cflash%3D0%2Ctiff%3D0%2Cweb…>
and Near-Neighbour Countries can be funded. See here
<https://www.cost.eu/about/members/> and here
<https://www.cost.eu/about/strategy/international-collaboration/>.
- Age: Young Researchers and Investigators, i.e. under the age of 40,
are promoted
- Gender and geographical balance (notably between Inclusiveness
Target Countries and others COST countries)
- Relevance and quality of the project submitted by the trainee
- Status of the language on which the trainee intends to work
(low-resourced languages, dialects or varieties are promoted)
*If you are not selected on the basis of these criteria and you can find
other financial sources to cover your travel, accommodation and meals, you
are also welcome to participate. *
*The authors of the selected projects may optionally present them in a
poster session during the Training School. *
IMPORTANT DATES
Deadline for project submission: May 6, 2024
Notification of acceptance: May 23, 2024
Summer school: July 8-12, 2024
For any inquiry, please contact the organisers at:
victoria.bobicev(a)ia.utm.md
Looking forward to seeing you in Moldova,
Organizing Committee