I will start a new research group on natural language processing as part
of the Bamberg AI Center (https://www.uni-bamberg.de/en/bacai/). There
are currently four open positions:
We do fundamental NLP research at the intersection to computational
psychology, digital humanities, and computational social sciences.
We have currently four positions open (deadline February 28, 2024):
1. Postdoc, Open Topic (3 years)
2. PhD student in interactive prompt optimization (3 years)
3. Researcher in event-centered emotion analysis (1 year)
4. Researcher in multimodal emotion analysis (1 year)
Position 3+4 can be combined to have a 2-year position.
Please find more details at
https://www.bamnlp.de/openpositions/
Do not hesitate to contact me, if you have questions!
Roman Klinger
Dear all,
Some of you might be interested in LancsLex, a new free online tool developed at Lancaster University for the analysis of English vocabulary. It is available at https://lancslex.lancs.ac.uk/
It is based on recent research (2024) that led to the publication of the Frequency Dictionary of British English: Core Vocabulary and Exercises for Learners https://cass.lancs.ac.uk/words-words-words-a-new-frequency-dictionary-of-br…
Best,
Vaclav
Professor Vaclav Brezina
Professor in Corpus Linguistics
Department of Linguistics and English Language
ESRC Centre for Corpus Approaches to Social Science
Faculty of Arts and Social Sciences, Lancaster University
Lancaster, LA1 4YD
Office: County South, room C05
T: +44 (0)1524 510828
[cid:a6b1d92e-489d-4010-affb-663448b416a5]@vaclavbrezina
[cid:9f8ad673-48c0-498e-ad39-c5ade712437c]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…>
We invite proposals for tasks to be run as part of SemEval-2025
<https://semeval.github.io/SemEval2025/>. SemEval (the International
Workshop on Semantic Evaluation) is an ongoing series of evaluations of
computational semantics systems, organized under the umbrella of SIGLEX
<https://siglex.org/>, the Special Interest Group on the Lexicon of the
Association for Computational Linguistics.
SemEval tasks explore the nature of meaning in natural languages: how to
characterize meaning and how to compute it. This is achieved in practical
terms, using shared datasets and standardized evaluation metrics to
quantify the strengths and weaknesses and possible solutions. SemEval tasks
encompass a broad range of semantic topics from the lexical level to the
discourse level, including word sense identification, semantic parsing,
coreference resolution, and sentiment analysis, among others.
For SemEval-2025 <https://semeval.github.io/SemEval2025/cft>, we welcome
tasks that can test an automatic system for the semantic analysis of text
(e.g., intrinsic semantic evaluation, or an application-oriented
evaluation). We especially encourage tasks for languages other than
English, cross-lingual tasks, and tasks that develop novel applications of
computational semantics. See the websites of previous editions of SemEval
to get an idea about the range of tasks explored, e.g. SemEval-2020
<http://alt.qcri.org/semeval2020/> and SemEval-2021-/2023/2024
<https://semeval.github.io/>.
We strongly encourage proposals based on pilot studies that have already
generated initial data, evaluation measures and baselines. In this way, we
can avoid unforeseen challenges down the road which that may delay the task.
In case you are not sure whether a task is suitable for SemEval, please
feel free to get in touch with the SemEval organizers at
semevalorganizers(a)gmail.com to discuss your idea.
=== Task Selection ===
Task proposals will be reviewed by experts, and reviews will serve as the
basis for acceptance decisions. Everything else being equal, more
innovative new tasks will be given preference over task reruns. Task
proposals will be evaluated on:
- Novelty: Is the task on a compelling new problem that has not been
explored much in the community? Is the task a rerun, but covering
substantially new ground (new subtasks, new types of data, new languages,
etc.)?
- Interest: Is the proposed task likely to attract a sufficient number
of participants?
- Data: Are the plans for collecting data convincing? Will the resulting
data be of high quality? Will annotations have meaningfully high
inter-annotator agreements? Have all appropriate licenses for use and
re-use of the data after the evaluation been secured? Have all
international privacy concerns been addressed? Will the data annotation be
ready on time?
- Evaluation: Is the methodology for evaluation sound? Is the necessary
infrastructure available or can it be built in time for the shared task?
Will research inspired by this task be able to evaluate in the same manner
and on the same data after the initial task?
- Impact: What is the expected impact of the data in this task on future
research beyond the SemEval Workshop?
-
Ethical: The data must be compliant with privacy policies. e.g.
a) avoid personally identifiable information (PII). Tasks aimed at
identifying specific people will not be accepted,
b) avoid medical decision making (compliance with HIPAA, do not try to
replace medical professionals, especially if it has anything to do with
mental health)
c) these are representative and not exhaustive
=== New Tasks vs. Task Reruns ===
We welcome both new tasks and task reruns. For a new task, the proposal
should address whether the task would be able to attract participants.
Preference will be given to novel tasks that have not received much
attention yet.
For reruns of previous shared tasks (whether or not the previous task was
part of SemEval), the proposal should address the need for another
iteration of the task. Valid reasons include: a new form of evaluation
(e.g. a new evaluation metric, a new application-oriented scenario), new
genres or domains (e.g. social media, domain-specific corpora), or a
significant expansion in scale. We further discourage carrying over a
previous task and just adding new subtasks, as this can lead to the
accumulation of too many subtasks. Evaluating on a different dataset with
the same task formulation, or evaluating on the same dataset with a
different evaluation metric, typically should not be considered a separate
subtask.
=== Task Organization ===
We welcome people who have never organized a SemEval task before, as well
as those who have. Apart from providing a dataset, task organizers are
expected to:
- Verify the data annotations have sufficient inter-annotator agreement
- Verify licenses for the data allow its use in the competition and
afterwards. In particular, text that is publicly available online is not
necessarily in the public domain; unless a license has been provided, the
author retains all rights associated with their work, including copying,
sharing and publishing. For more information, see:
https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter
- Resolve any potential security, privacy, or ethical concerns about the
data
- Commit to make the data available after the task
- Provide task participants with format checkers and standard scorers.
- Provide task participants with baseline systems to use as a starting
point (in order to lower the obstacles to participation). A baseline system
typically contains code that reads the data, creates a baseline response
(e.g. random guessing, majority class prediction), and outputs the
evaluation results. Whenever possible, baseline systems should be written
in widely used programming languages and/or should be implemented as a
component for standard NLP pipelines.
- Create a mailing list and website for the task and post all relevant
information there.
- Create a CodaLab or other similar competition for the task and upload
the evaluation script.
- Manage submissions on CodaLab or a similar competition site.
- Write a task description paper to be included in SemEval proceedings,
and present it at the workshop.
- Manage participants’ submissions of system description papers, manage
participants’ peer review of each others’ papers, and possibly shepherd
papers that need additional help in improving the writing.
- Review other task description papers.
- Define Roles for each Organizer:
- Lead Organizer - main point of contact, expected to ensure
deliverables are met on time and participate in contributing to
task duties
(see below).
- Co-Organizers - provide significant contributions to ensuring the
task runs smoothly. Some examples include, maintaining communication with
task participants, preparing data, creating and running
evaluation scripts,
and leading paper reviewing and acceptance.
- Advisory Organizers - more of a supervisor role, may not contribute
to detailed tasks but will provide guidance and support.
=== Important dates ===
- Task proposals due March 31, 2024 (Anywhere on Earth)
- Task selection notification May 18, 2024
=== Preliminary timetable ===
- Sample data ready July 15, 2024
- Training data ready September 1, 2024
- Evaluation data ready December 1, 2024 (internal deadline; not for public
release)
- Evaluation starts January 10, 2025
- Evaluation end by January 31, 2025 (latest date; task organizers may
choose an earlier date)
- Paper submission due February 2025
- Notification to authors on March 2025
- Camera-ready due April 2025
- SemEval workshop Summer 2025 (co-located with a major NLP conference)
Tasks that fail to keep up with crucial deadlines (such as the dates for
having the task and CodaLab website up and dates for uploading sample,
training, and evaluation data) or that diverge significantly from the
proposal may be cancelled at the discretion of SemEval organizers. While
consideration will be given to extenuating circumstances, our goal is to
provide sufficient time for the participants to develop strong and
well-thought-out systems. Cancelled tasks will be encouraged to submit
proposals for the subsequent year’s SemEval. To reduce the risk of tasks
failing to meet the deadlines, we are unlikely to accept multiple tasks
with overlap in the task organizers.
=== Submission Details ===
The task proposal should be a self-contained document of no longer than 3
pages (plus additional pages for references). All submissions must be in
PDF format, following the ACL template
<https://github.com/acl-org/acl-style-files>.
Each proposal should contain the following:
- Overview
- Summary of the task
- Why this task is needed and which communities would be interested
in participating
- Expected impact of the task
- Data & Resources
- How the training/testing data will be produced. Please discuss whether
existing corpora will be re-used.
- Details of copyright, so that the data can be used by the research
community both during the SemEval evaluation and afterwards
- How much data will be produced
- How data quality will be ensured and evaluated
- An example of what the data would look like
- Resources required to produce the data and prepare the task for
participants (annotation cost, annotation time, computation time, etc.)
- Assessment of any concerns with respect to ethics, privacy, or
security (e.g. personally identifiable information of private
individuals;
potential for systems to cause harm)
- Pilot Task (strongly recommended)
- Details of the pilot task
- What lessons were learned and how these will impact the task design
- Evaluation
- The evaluation methodology to be used, including clear evaluation
criteria
- For Task Reruns
- Justification for why a new iteration of the task is needed (see
criteria above)
- What will differ from the previous iteration
- Expected impact of the rerun compared with the previous iteration
- Task organizers
- Names, affiliations, email addresses
- (optional) brief description of relevant experience or expertise
- (if applicable) years and task numbers, of any SemEval tasks you
have run in the past
- Role of each organizer
Proposals will be reviewed by an independent group of area experts who may
not have familiarity with recent SemEval tasks, and therefore all proposals
should be written in a self-explanatory manner and contain sufficient
examples.
*The submission webpage is:* SemEval2025 Task Proposal Submission
<https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/SemEval> (
https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/SemEval)
For further information on this initiative, please refer to
https://semeval.github.io/SemEval2025/cft
=== Chairs ===
Atul Kr. Ojha, Insight SFI Centre for Data Analytics, DSI, University of
Galway
A. Seza Doğruöz, Ghent University
Giovanni Da San Martino, University of Padua
Harish Tayyar Madabushi, The University of Bath
Sara Rosenthal, IBM Research AI
Aiala Rosá, Universidad de la República - Uruguay
Contact: semevalorganizers(a)gmail.com
*** With apologies for multiple postings ***
Call for Applications
Resident Academic Full-Time Post in Experimental Linguistics
Institute of Linguistics and Language Technology
Applications are invited for a Resident Academic full-time post in Experimental Linguistics at the Institute of Linguistics and Language Technology of the University of Malta.
The appointment will be on initial definite four-year contract of employment. Following the successful completion of the one-year probationary period, the Resident Academic will commence a two-year Tenure Track process. At the conclusion of this two-year period of service, the Resident Academic will be subject to a Tenure Review.
Candidates must be in possession of a PhD in Linguistics/Language Sciences, and have demonstrable expertise in empirical linguistics and commitment to research that involves linguistic data processing and statistics from an experimental perspective. Teaching experience at tertiary level will be considered an asset.
The appointee will be required to contribute to the teaching and supervision of students, research and administrative duties, as well as outreach activities as may be required by the Institute and/or the University. More specifically, s/he will be required to teach courses in areas which apply quantitative approaches to the study of language and speech; these include courses on empirical linguistics, linguistic data processing and statistics, as well as experimental techniques. Moreover, s/he will be required to contribute to the development of the research programmes of the Institute and is additionally expected to be available to offer support to staff and students on matters such as experimental design, practical aspects in the implementation of an experiment and data processing and use of appropriate statistical tests, should the need arise. Good communicative skills, the ability to work in a team, and a strong willingness to engage in the Institute’s administrative and outreach activities are also essential.
The Resident Academic Stream is composed of four grades; these being Professor, Associate Professor, Senior Lecturer and Lecturer. Entry into the grade of Lecturer or above shall only be open to persons in possession of a PhD or an equivalent research-based doctorate within strict guidelines established by the University.
The annual salary for 2024 attached to the respective grades in the Resident Academic Stream is as follows:
*Professor: €48,818 plus an Academic Supplement of €33,119 and a Professorial Allowance of €2,330
*Associate Professor: €44,864 plus Academic Supplement of €25,377 and a Professorial Allowance of €1,423
Senior Lecturer: €40,677 plus an Academic Supplement of €18,358
Lecturer: €34,008 with an annual increment of €641 to €35,931 and an Academic Supplement of €14,932
*The University will only consider appointing an applicant at the grade of Professor or Associate Professor, when the applicant already holds an equivalent appointment at a University or Research Institute of repute.
The University of Malta will provide academic staff with financial resources through the Academic Resources Fund to support continuous professional development and to provide the tools and resources required by an academic to adequately fulfil the teaching and research commitments within the University.
The University of Malta may also appoint promising and exceptional candidates into the grade of Assistant Lecturer, provided that they are committed to obtain the necessary qualifications to enter the Resident Academic Stream. Such candidates will either have achieved exceptional results at undergraduate level, be already in possession of a relevant Masters qualification, or would have been accepted for or already in the process of achieving their PhD.
Assistant Lecturer with Masters: €31,764 with an annual increment of €596 to €33,552 and an Academic Supplement of €5,294
Assistant Lecturer: €29,589 with an annual increment of €531 to €31,182 and an Academic Supplement of €5,037.
Candidates must upload covering letter, curriculum vitae, certificates (certificates should be submitted in English) and names and emails of three referees through this form: https://www.um.edu.mt/hrmd/workatum-general
Applications should be received by Sunday, 25 February 2024.
Late applications will not be considered.
For more detail, see https://www.um.edu.mt/hrmd/recruitment/generalrecruitment/residentacademicf…
*********************
Patrizia Paggio
Professor
University of Malta
Institute of Linguistics and Language Technology
patrizia.paggio(a)um.edu.mt
Associate Professor
University of Copenhagen
Centre for Language Technology
paggio(a)hum.ku.dk
The fifth workshop on Resources for African Indigenous Language (RAIL)
Colocated with LREC-COLING 2024
https://bit.ly/rail2024
Conference dates: 20-25 May 2024
Workshop date: 25 May 2024
Venue: Lingotto Conference Centre, Torino (Italy)
The fifth RAIL workshop website: https://bit.ly/rail2024
LREC-COLING 2024 website: https://lrec-coling-2024.org/
Submission website: https://softconf.com/lrec-coling2024/rail2024/
The fifth Resources for African Indigenous Languages (RAIL) workshop
will be co-located with LREC-COLING 2024 in Lingotto Conference Centre,
Torino, Italy on 25 May 2024. The RAIL workshop is an interdisciplinary
platform for researchers working on resources (data collections, tools,
etc.) specifically targeted towards African indigenous languages. In
particular, it aims to create the conditions for the emergence of a
scientific community of practice that focuses on data, as well as
computational linguistic tools specifically designed for or applied to
indigenous languages found in Africa.
Many African languages are under-resourced while only a few of them are
somewhat better resourced. These languages often share interesting
properties such as writing systems, or tone, making them different from
most high-resourced languages. From a computational perspective, these
languages lack enough corpora to undertake high level development of
Human Language Technologies (HLT) and Natural Language Processing (NLP)
tools, which in turn impedes the development of African languages in
these areas. During previous workshops, it has become clear that the
problems and solutions presented are not only applicable to African
languages but are also relevant to many other low-resource languages.
Because these languages share similar challenges, this workshop
provides researchers with opportunities to work collaboratively on
issues of language resource development and learn from each other.
The RAIL workshop has several aims. First, the workshop brings together
researchers who work on African indigenous languages, forming a
community of practice for people working on indigenous languages.
Second, the workshop aims to reveal currently unknown or unpublished
existing resources (corpora, NLP tools, and applications), resulting in
a better overview of the current state-of-the-art, and also allows for
discussions on novel, desired resources for future research in this
area. Third, it enhances sharing of knowledge on the development of
low-resource languages. Finally, it enables discussions on how to
improve the quality as well as availability of the resources.
The workshop has “Creating resources for less-resourced languages” as
its theme, but submissions on any topic related to properties of
African indigenous languages (including non-African languages) may be
accepted. Suggested topics include (but are not limited to) the
following:
* Digital representations of linguistic structures
* Descriptions of corpora or other data sets of African indigenous
languages
* Building resources for (under resourced) African indigenous languages
* Developing and using African indigenous languages in the digital age
* Effectiveness of digital technologies for the development of African
indigenous languages
* Revealing unknown or unpublished existing resources for African
indigenous languages
* Developing desired resources for African indigenous languages
* Improving quality, availability and accessibility of African
indigenous language resources
Submission requirements:
We invite papers on original, unpublished work related to the topics of
the workshop. Submissions, presenting completed work, may consist of up
to eight (8) pages of content for a long submission and up to four (4)
pages of content for a short submission plus additional pages of
references. The final camera-ready version of accepted long papers are
allowed one additional page of content (up to 9 pages) so that
reviewers’ feedback can be incorporated. Papers should be formatted
according to the LREC-COLING style sheet
(https://lrec-coling-2024.org/authors-kit/), which is provided on the
LREC-COLING 2024 website (https://lrec-coling-2024.org/). Reviewing is
double-blind, so make sure to anonymise your submission (e.g., do not
provide author names, affiliations, project names, etc.) Limit the
amount of self citations (anonymised citations should not be used). The
RAIL workshop follows the LREC-COLING submission requirements.
Please submit papers in PDF format to the START account
(https://softconf.com/lrec-coling2024/rail2024/). Accepted papers will
be published in proceedings linked to the LREC-COLING conference.
Important dates:
Submission deadline: 23 February 2024
Date of notification: 15 March 2024
Camera ready deadline: 29 March 2024
RAIL workshop: 25 May 2024
Organising Committee
Rooweither Mabuya, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Mmasibidi Setaka, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Menno van Zaanen, South African Centre for Digital Language Resources
(SADiLaR), South Africa
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za
Professor in Digital Humanities
South African Centre for Digital Language Resources
https://www.sadilar.org
[NWU Celebrations]
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________
The 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2024) will be held in Kyoto, Japan on September 18-20, 2024. We now welcome the submission of special sessions proposals, which will take place during the main conference. The submission deadline for special session proposals is March 10, 2024, AOE.
We encourage submissions of proposals on any topic of interest to the discourse and dialogue communities. This program is intended to offer new perspectives and bring together researchers working on related topics.
Topics of interest include all aspects related to Dialogue and Discourse including (but not limited to) annotation and resources, evaluation, large language models, adversarial and RL methods, explainable/ethical AI, interactive/multimodal/situated/incremental systems, and applications of dialogue and discourse.
A SIGDIAL special session is the length of a regular session at the conference, and may be organized as a poster session, a panel session, a poster session with panel discussion, a dialogue challenge, or an oral presentation session. Special sessions may, at the discretion of the SIGDIAL organizers, be held as parallel sessions. The papers submitted to special sessions are handled by the special session organizers, but for the submitted papers to be in the SIGDIAL proceedings, they have to undergo the same review process as regular papers. The reviewers for the special session papers will be taken from the SIGDIAL program committee itself, taking into account the suggestions of the session organizers, and the program chairs will make acceptance decisions. In other words, special session organizers decide what appears in the session, while the program chairs decide what appears in the proceedings and the rest of the conference program.
** Submissions **
Special Session proposals should be 2-4 pages containing: title, a summary of the topic, motivating theoretical interest and/or application context; a list of organizers and sponsors; and a requested session format(s): poster/panel/oral session. The special session proposals will be reviewed jointly by the general chair and program co-chairs.
** Links **
Those wishing to propose a special session may want to look at some of the sessions organized at recent SIGDial meetings:
Natural Language in Human Robot Interaction (NLiHRI 2022) (https://2022.sigdial.org/call-for-papers-nlihri/)
SummDial 2021 (https://elitr.github.io/automatic-minuting/summdial.html)
SafeConvAI 2021 (https://sites.google.com/view/safety4convai/home)
RoboDIAL 2022 (https://robodial.github.io/)
Interactive Natural Language Technology for Explainable Artificial Intelligence 2019 (https://sites.google.com/view/nl4xai2019/)
https://www.sigdial.org/files/workshops/conference18/sessions.htm
** Important Dates **
Mar 10, 2023: Special Session Proposal Submission Deadline
Mar 22, 2023: Special Session Notifications
The proposals should be sent to conference(a)sigdial.org.
SIGDIAL 2024 Program Committee
Vera Demberg and Stefan Ultes
Conference Website: https://2024.sigdial.org/
--
Prof. Dr.-Ing. Stefan Ultes
Natural Language Generation and Dialogue Systems
Otto-Friedrich-University of Bamberg
Bamberg Center for Artificial Intelligence
Faculty Information Systems and Applied Computer Sciences
phone: +49 (0) 951 863 2900 or +49 (0) 951 863 2901 (secretary)
https://www.uni-bamberg.de/en/ds
Special issue of the TAL journal: Scholarly Document Processing
https://tal-65-2.sciencesconf.org/
** Deadline for submission: March, 15th 2024 **
** Guest Editors **
Florian Boudin, JFLI/LS2N, Nantes University
Akiko Aizawa, National Institute of Informatics
** Context **
The body of scholarly literature is steadily and rapidly expanding. In arXiv alone, the number of scientific articles submitted in 2022 exceeded 185,000, averaging nearly 500 submissions per day. In the face of this exponential growth, researchers and institutions are continually challenged to keep pace with the sheer volume of new knowledge being created. Automated methods for analyzing and interpreting scientific papers are therefore urgently needed to assist researchers in navigating through the expanding volume of scientific information, enabling more efficient and targeted acquisition of new knowledge across various fields. More precisely, the development of methods capable of extracting reliable, valuable and verifiable information from scientific papers is crucial for many downstream tasks including retrieval, recommendation, summarization, question-answering and document understanding.
The uniqueness of scientific papers, marked by intricate technical language, discipline-specific terminology, a distinct structural organization and the inclusion of complex elements such as equations, tables, and figures, poses a significant challenge for existing natural language processing and information retrieval methods. Furthermore, these methods should also account for additional features provided at the collection level (e.g., citation networks) or embedded in rich paper metadata (e.g., authors, keywords, publication venues), each introducing its own set of challenges. This special issue of the TAL journal is dedicated to papers describing work that address these challenges, and more broadly to papers describing research on *natural language processing and information retrieval of scholarly and scientific documents*. Relevant topics for this issue include, but are not limited to, the following areas (in alphabetical order):
- Bibliometrics, scientometrics
- Citation analysis and recommendation
- Claim verification
- Datasets, tools and resources
- Information extraction, NER
- Large Language Models (LLMs)
- Plagiarism detection
- Question-answering
- Retrieval and recommendation
- Scientific document analysis
- Scientific writing assistance
- Text simplification
- Summarization and generation
** Important dates **
• Submission deadline: 15 March 2024
• Notification to the authors after first review: May 2024
• Notification to the authors after second review: September 2024
• Publication : December 2024
** Submission format **
The length of the papers must be between 20 and 25 pages.
Style sheets are available on the journal's website ([https://www.atala.org/content/instruction-authors-style-files-0](https://ww…).
Authors are invited to submit their paper on this platform: [https://tal-65-2.sciencesconf.org/](https://tal-65-2.sciencesconf.org/)
To do so, authors will need to first create an account by clicking on "Create account" (Créer un compte) next to the “Login" (Connexion) button at the top of this page. To submit a paper, authors can connect to their account and upload their submission in "My Space" > "My submissions”.
The articles can be written in English or in French.
The TAL journal has a double-blind review process. It is necessary to anonymize the article, the name of the file, and to avoid self-references. Each article is evaluated by three reviewers, two external reviewers and a member of the editorial board of the journal TAL.
** TAL Journal **
TAL (Traitement Automatique des Langues / Natural Language Processing) is an international journal published by ATALA (French Association for Natural Language Processing, [http://www.atala.org](http://www.atala.org)) since 1959 with the support of CNRS (National Centre for Scientific Research). It has moved to an electronic mode of publication, with printing on demand. The TAL journal is open-access. Paper submission, publication and access are free of charge.
Papers published in the TAL journal will be made available on the ATALA website and on ACL Anthology.
*The Third Ukrainian Natural Language Processing Workshop (UNLP 2024)*
<https://unlp.org.ua/>
UNLP 2024 features the first *Shared Task on Fine-Tuning Large Language
Models for Ukrainian*.
This Shared Task aims to challenge and assess LLMs' capabilities to
understand and generate Ukrainian, paving the way for LLM development in
Slavic languages.
*News: *The training data was released today.
*Task Description*
In this shared task, your goal is to instruction-tune a large language
model that can answer questions and perform tasks in Ukrainian. The model
should possess knowledge of Ukrainian history, language, and literature, as
well as common knowledge, and should be capable of generating fluent and
factually accurate responses.
The evaluation will be two-fold: accuracy of answers to multiple-choice
questions and human evaluation on a selection of text generation tasks.
You can find the instructions, sample data, and scripts at
https://github.com/unlp-workshop/unlp-2024-shared-task.
*Registration*
Teams that intend to participate should register by filling in this form
<https://forms.gle/MiC7pWsWbwBdSmoX9>.
*Publication*
Participants in the shared task are invited to submit a paper to the UNLP
2024 <https://unlp.org.ua/call-for-papers/> workshop. Submitting a
paper is *not
mandatory* for participating in the Shared Task. Papers must follow the
workshop submission instructions and will undergo regular peer review.
Their acceptance will not depend on the results obtained in the shared
task, but on the quality of the paper. Accepted papers will appear in the
ACL anthology and will be presented at a session of UNLP 2023 specially
dedicated to the Shared Task.
*Important Dates*
February 12, 2024 — Release of train data
February 16, 2024 — Release of test data to registered participants
February 24, 2024 — Registration deadline; release of open questions
February 26, 2024 — Submission of system responses
March 4, 2024 — Results of the Shared Task announced
March 6, 2024 — Shared Task paper due
March 27, 2024 — Notification of acceptance
April 5, 2024 — Camera-ready papers due
May 25, 2024 — Workshop
*Contact*
Discord for the shared task: https://discord.gg/kCc6xgWbCJ
Email: info(a)unlp.org.ua
Website: https://unlp.org.ua/
Twitter: https://twitter.com/UNLP_workshop
Telegram: https://t.me/UNLP_workshop
Facebook: https://www.facebook.com/UNLPworkshop
*The Third Ukrainian Natural Language Processing Workshop (UNLP 2024)*
<https://unlp.org.ua/>
*Call For Papers*
UNLP 2024 <https://unlp.org.ua/> will be held *online* on May 25, 2024, in
conjunction with LREC-COLING 2024.
The workshop will bring together academics, researchers, and practitioners
in the fields of NLP and Computational Linguistics who work with the
Ukrainian language or do cross-Slavic research that can be applied to the
Ukrainian language.
The workshop will accept research papers for the Crimean Tatar language
with the aim of supporting this severely endangered language of the
indigenous people of Ukraine. The workshop will also accept papers with
negative results.
*Shared Task*
The Third UNLP organizes the first Shared Task on Fine-Tuning Large
Language Models (LLMs) for Ukrainian. The aim is to challenge and assess
LLMs’ capabilities to understand and generate Ukrainian, paving the way for
LLM development in Slavic languages.
In this shared task, your goal is to instruction-tune a large language
model that can answer questions and perform tasks in Ukrainian. The model
should possess knowledge of Ukrainian history, language, and literature, as
well as common knowledge, and should be capable of generating fluent and
factually accurate responses.
You can find the detailed instructions, limitations, baseline, and
evaluation sample at https://github.com/unlp-workshop/unlp-2024-shared-task.
*Important dates*
March 4, 2024 — Workshop paper due
March 27, 2024 — Notification of acceptance
April 5, 2024 — Camera-ready papers due
May 25, 2024 — Workshop
*Submissions*
UNLP invites submissions of completed and ongoing projects. Submissions
describing resources or solutions that have been made available to the
broader public are strongly encouraged.
We invite two types of submissions: long and short papers. Long papers
should describe original, unpublished, and completed work. The short papers
may describe work in progress, small focused contributions, system
demonstrations, new linguistic resources, or experiments based on existing
software and resources.
The workshop will provide *Grammarly Premium* to all authors. To request
Grammarly Premium, please submit the form provided on the website home page
<https://unlp.org.ua/>.
Learn more at https://unlp.org.ua/call-for-papers/.
Link for paper submission: https://softconf.com/lrec-coling2024/unlp2024/.
*Share your LRs!*
When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense, i.e. also
technologies, standards, evaluation kits, etc.) that have been used for the
work described in the paper or are a new result of your research. Moreover,
ELRA encourages all LREC-COLING authors to share the described LRs (data,
tools, services, etc.) to enable their reuse and replicability of
experiments (including evaluation ones).
*Workshop Organizers*
Andrii Hlybovets, National University of Kyiv-Mohyla Academy, Ukraine
Mariana Romanyshyn, Grammarly, Ukraine
Nataliia Romanyshyn, Ukrainian Catholic University, Ukraine
Oleksii Ignatenko, Ukrainian Catholic University, Ukraine
Find our program committee members at https://unlp.org.ua/committees/.
*Follow us*
Website: https://unlp.org.ua/.
Twitter: https://twitter.com/UNLP_workshop.
Telegram: https://t.me/UNLP_workshop.
Facebook: https://www.facebook.com/UNLPworkshop.
Email: info(a)unlp.org.ua.