Dear Colleagues,
We are pleased to announce that the 2024 edition of the *Lectures on
Computational Linguistics*, a series of lectures dedicated to central topics in
Computational Linguistics and Natural Language Processing, will be held in
Bari from June 19 to 21.
The programme and all information are available on this
<https://www.ai-lc.it/en/lectures-2/lectures-2024/> site.
The 2024 edition is organized by the Italian Association of Computational
Linguistics/Associazione Italiana di Linguistica Computazionale (AILC) with
the Department of Computer Science and the Department of Humanistic
Research and Innovation of the University of Bari 'Aldo Moro'.
The interdisciplinary nature of the school crosses several areas,
particularly the Humanities, Computer Science and Artificial Intelligence.
The program includes tutorials, labs, evening lectures, and two student
presentation sessions. The 2024 edition features a four-hour tutorial
dedicated to introducing Large Language Models to a broad audience.
*Programme*
*Wednesday, June 19, 2024*
9:00–9:30: Welcome and opening
9.30 – 11.30: Tutorial 1 (part 1) – Introduction to Large Language Models –
Andrey Kutuzov, Language Technology Group, University of Oslo
11.30 – 12.00: BREAK
12:00 – 13:30: Student session
1.30pm – 3.00pm: LUNCH
3.00pm – 5.00pm: Tutorial 1 (part 2) – Introduction to Large Language
Models – Andrey Kutuzov, Language Technology Group, University of Oslo
5.00pm – 5.30pm: BREAK
5.30pm – 6.30pm: Evening lecture
7.30pm: Welcome drink
*Thursday, June 20, 2024*
9:00 – 11:00: Tutorial 2 – Computational methods for lexical semantic
change detection – Nina Tahmasebi, University of Gothenburg
11:00 – 11:30: BREAK
11.30 – 13.30: Lab. 1 (part 1) – Hands-on Large Language Models – Marco
Polignano & Lucia Siciliani, University of Bari Aldo Moro
1.30pm – 3.00pm: LUNCH
3.00pm – 5.00pm: Lab. 1 (part 2) – Hands-on Large Language Models – Marco
Polignano & Lucia Siciliani, University of Bari Aldo Moro
5.00pm – 5.30pm: BREAK
5.30pm – 6.30pm: Evening lecture
7.00pm: Tour of the Old Town and dinner with typical food
*Friday, June 21, 2024*
9:00 – 11:00: Tutorial 3 – Dissociating language and thought in Large
Language Models – Anna Ivanova, School of Psychology, Georgia Tech
11:00 – 11:30: BREAK
11.30am – 1.00pm: Student session
1.00pm – 2.00pm: LUNCH
2.00pm – 4.00pm: Lab 2 – Lab. Computational methods for lexical semantic
change detection – Pierluigi Cassotti, University of Gothenburg
*Registration*
The school is mainly aimed at Doctoral and Master's degree students,
although a minimum qualification is not required for access. Participation
is free but subject to registration, and places are limited to 200.
Students wishing to present aspects of their work in the "Student
Presentations" sessions are asked to send a 500-word abstract to
ailc.lectures(a)gmail.com by May 10, 2024. Notifications of acceptance will
be sent by May 31.
Scientific Committee
Pierpaolo Basile (University of Bari Aldo Moro)
Raffaella Bernardi (University of Trento)
Tommaso Caselli (University of Groningen)
Felice Dell'Orletta (Institute of Computational Linguistics CNR – Pisa)
Elisabetta Jezek (University of Pavia)
Local Organizing Committee
Pierpaolo Basile (Department of Computer Science, University of Bari Aldo
Moro)
Marco de Gemmis (Department of Computer Science, University of Bari Aldo
Moro)
Maristella Gatto (Department of Humanistic Research and Innovation,
University of Bari Aldo Moro)
Olimpia Imperio (Coordinator of the Doctorate in Letters, Languages and
Arts, Department of Humanistic Research and Innovation, University of Bari
Aldo Moro)
Secretariat
Lucia Siciliani (Department of Computer Science, University of Bari Aldo
Moro)
Contacts: ailc.lectures(a)gmail.com
--
*Linguistica computazionale. Introduzione all'analisi automatica dei testi
<https://www.mulino.it/isbn/9788815290359>.*
Bologna, Il Mulino, in libreria dal 3 marzo 2023
--
[image: LOGO-UNIPV]
Elisabetta Jezek
Dipartimento di Studi Umanistici
Professore Associato di Glottologia e Linguistica
Corso Strada Nuova 65 - 27100 Pavia (Italia)
<http://maps.google.com/?q=Corso+Strada+Nuova+65+27100+Pavia+%28Italia%29>
T. 0382984391
https://studiumanistici.unipv.it/?pagina=docenti&id=13
<https://studiumanistici.unipv.it/?pagina=docenti&id=135>5
Elisabetta Jezek's Personal Meeting Room
https://us02web.zoom.us/j/7814331810
--
Le informazioni contenute nella presente comunicazione sono di natura privata
e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai
destinatari indicati e per le finalità strettamente legate al relativo
contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di
eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente.
--
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. If you received this in error, please contact the sender and
delete the material.
<http://lettere.unipv.it/diplinguistica/docenti.php>
--
[image: LOGO-UNIPV]
PhD ELISABETTA JEZEK
Dipartimento di Studi Umanistici
PROFESSORE ASSOCIATO IN LINGUISTICA E GLOTTOLOGIA
Presidente del corso di laurea magistrale internazionale in European
Languages, Cultures and Societies in Contact
Membro del Consiglio Direttivo dell'Associazione Italiana di Linguistica
Computazionale
<https://firmamail.unipv.it/index.php/firme/genera>
https://unipv.unifind.cineca.it/resource/person/659960
Elisabetta Jezek's Personal Meeting Room
https://us02web.zoom.us/j/7814331810
--
Le informazioni contenute nella presente comunicazione sono di natura privata
e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai
destinatari indicati e per le finalità strettamente legate al relativo
contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di
eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente.
--
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. If you received this in error, please contact the sender and
delete the material.
<http://lettere.unipv.it/diplinguistica/docenti.php>
*** CMCL – 2nd Call for Papers***
The 13th edition of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2024) will be co-located with the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
Webpage: https://cmclorg.github.io/
Direct submission page: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL
ARR commitment page: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL_ARR_Commi…
*Workshop Description*
CMCL 2024 is a one-day workshop held in conjunction with ACL 2024. CMCL invites papers on cognitive modeling, cognitively-inspired natural language processing, and, more broadly, the alignment of language models with human cognition/perception. The 2024 workshop follows in the tradition of earlier meetings at ACL 2010, ACL 2011, NAACL-HLT 2012, ACL 2013, ACL 2014, NAACL 2015, EACL 2017, LSA 2018, NAACL 2019, EMNLP 2020, NAACL 2021, and ACL 2022.
*Scope and Topics*
The research interests/questions include, but are not limited to:
- Human-like language acquisition/learning: How is language acquisition of language models (LMs) (dis)similar to
humans, and why?
- Contrasting/aligning NLP models with human behavior data: What do humans compute during language comprehension/production, and how/why?
- Linguistic probing of NLP models: How well do current language models understand/represent/generalize language behaviorally/internally?
- Linguistically-motivated data modeling/analysis: How can one quantify a particular aspect of language?
- Emergent communication/language: What are the sufficient conditions for the emergence of language?
A more formal description of the workshop scope is:
- Stochastic models of factors influencing a speaker's production or comprehension decisions.
- Models of semantic interpretation, including psychologically realistic notions of word and phrase meaning and composition.
- Incremental parsers for diverse grammar formalisms and their psychological plausibility.
- Models of speaker-specific linguistic adaptation and/or generalization.
- Models of first and second language acquisition and bilingual language processing.
- Behavioral tasks for better understanding neural models of linguistic representation.
- Models and empirical analysis of the relationship between mechanistic psycholinguistic principles and pragmatics or semantics.
- Models of lexical acquisition, including phonology, morphology, and semantics.
- Psychologically motivated models of grammar induction.
- Psychologically plausible models of lexical or conceptual representations.
- Models of language disorders, such as aphasia, dyslexia, or dysgraphia.
- Behavioral datasets or resources for modeling language processing or production in languages other than English.
- Models of language comprehension difficulty.
- Models of language learning and generalization.
- Models of linguistic information propagation and language evolution in communities.
- Cognitively-motivated models of discourse and dialogue.
*Invited Speakers*
Aida Nematzadeh (Google DeepMind)
Frank Keller (University of Edinburgh)
*Important Dates*
- May 17, 2024: Paper submission/commitment deadline (cf. May 15, 2024: notification of ACL 2024)
- June 17, 2024: Notification of acceptance
- July 1, 2024: Camera-ready paper due
- August 15, 2024: Workshop dates
Deadlines are at 11:59 pm AOE.
*Workshop submissions*
CMCL accepts direct submissions through the OpenReview site: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL
We also receive papers already reviewed in ACL Rolling Review (ARR) February or earlier: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL_ARR_Commi…
There is no need that the CMCL is mentioned as a preferred venue in the original ARR submission.
Detiailed submission flow/schedule is shown in our workshop webpage: https://cmclorg.github.io/
*Submission types*
We invite three types of submissions:
(1) Archival regular workshop submissions that present original research in either long (8 pages + references) or short (4 pages + references) paper format.
(2) Non-archival submissions of extended abstracts that present preliminary results (from 2 to 4 pages + references).
(3) Non-archival cross-submission of long/short papers that present relevant research submitted/published elsewhere (including ACL "Findings of..." papers).
- Only regular workshop papers submitted via (1) will be included in the proceedings, but all types of papers will have a presentation opportunity in the workshop.
- Submissions must be formatted using the ACL style template (https://github.com/acl-org/acl-style-files) and be submitted as a PDF file.
- We adhere to the ACL anonymity policy: https://www.aclweb.org/adminwiki/index.php/ACL_Anonymity_Policy
- This year, we don't host a shared task.
*Workshop Organizers*
Tatsuki Kuribayashi (MBZUAI, tatsuki.kuribayashi(a)mbzuai.ac.ae)
Giulia Rambelli (University of Bologna, giulia.rambelli4(a)unibo.it)
Ece Takmaz (University of Amsterdam, ece.takmaz(a)uva.nl)
Philipp Wicke (Ludwig Maximilian University LMU, pwicke(a)cis.lmu.de)
Yohei Oseki (University of Tokyo, oseki(a)g.ecc.u-tokyo.ac.jp)
*Program Committee*
Abdellah Fourtassi (Aix-Marseille University)
Adina Williams (FAIR)
Afra Alishahi (Tilburg University)
Aniello De Santo (University of Utah)
Carina Kauf (MIT)
Cassandra Jacobs (University of Buffalo)
Christos Christodoulopoulos (Amazon)
Cory Shain (MIT)
Ethan Wilcox (ETH Zurich)
Frances Yung (Saarland University)
Fred Mailhot (Dialpad)
Gianluca Lebani (University Ca' Foscari Venezia)
James Michaelov (The University of California San Diego)
John Hale (University of Georgia)
Laurent Prévot (Aix-Marseille University)
Lisa Beinborn (VU Amsterdam)
Ludovica Pannitto (University of Trento)
Micha Elsner (Ohio State University)
Nora Hollenstein (University of Copenhagen)
Rachel Ryskin (University of California Merced)
Raquel Garrido Alhama (Tilburg University)
Richard Futrell (UC Irvine Language Science)
Robert Frank (Yale University)
Ryo Yoshida (The University of Tokyo)
Samar Husain (IIT Delhi)
Sandra Kuebler (Indiana University)
Tal Linzen (New York University)
Ted Briscoe (MBZUAI)
Tiago Pimentel (ETH Zurich)
Tim Hunter (UCLA)
Vera Demberg (Saarland University)
William Schuler (Ohio State University)
Yao Yao (Hong Kong Polytechnic University)
*Website*
https://cmclorg.github.io/
*Sponsoring Institutions*
Japan Society for the Promotion of Science
*Contact*
cmclorganizers2024(a)gmail.com
The University of Amsterdam has a fully funded PhD position on AI/NLP/IR
for information access.
We seek an ambitious PhD student with a background in artificial
intelligence, natural language processing and information retrieval.
Your focus will be on large language models (LLMs) for information
access. How can we search specific collections, including full text,
metadata, and multimodal content? How can we support complex search
tasks and practices, such as scholarly research on cultural data, and
the research and workflow of investigative journalism?
The PhD position is part of four PhD vacancies as part of digital
Humanities, Artificial Intelligence, Cultural Heritage (HAICu) project,
a large national science agenda project funded by the Netherlands
Organization for Scientific Research. We are one of the best European
and global places to study AI, and you will work together with other AI
and Digital Humanities researchers, and a range of external partners on
scientific breakthroughs. HAICu deploys artificial intelligence (AI) to
make digital heritage collections more accessible, and the extraordinary
challenges of cultural heritage provide a unique opportunity to push the
boundaries of AI. The PhD position is fully funded and you will be
employed by the University of Amsterdam for four years (full-time, with
all employment benefits) and are expected to complete a PhD thesis
within this period.
Are you interested? Strong candidates with an AI/NLP/IR background are
encouraged to apply by May 15. Details are in:
https://vacatures.uva.nl/UvA/job/4PhDs/792167402/ (Project #1).
Feel free to reach out with questions or comments!
Jaap Kamps
We invite you to participate and submit your work to the First Workshop
on Data Contamination (CONDA) co-located with ACL 2024 in Bangkok, Thailand.
Data contamination, where evaluation data is inadvertently included in
pre-training corpora of large scale models, and language models (LMs) in
particular, has become a concern in recent times. The growing scale of
both models and data, coupled with massive web crawling, has led to the
inclusion of segments from evaluation benchmarks in the pre-training
data of LMs. The scale of internet data makes it difficult to prevent
this contamination from happening, or even detect when it has happened.
Crucially, when evaluation data becomes part of pre-training data, it
introduces biases and can artificially inflate the performance of LMs on
specific tasks or benchmarks. This poses a challenge for fair and
unbiased evaluation of models, as their performance may not accurately
reflect their generalization capabilities.
Although a growing number of papers and state-of-the-art models mention
issues of data contamination, there is no agreed-upon definition or
standard methodology to ensure that a model does not report results on
contaminated benchmarks. Addressing data contamination is a shared
responsibility among researchers, developers, and the broader community.
By adopting best practices, increasing transparency, documenting
vulnerabilities, and conducting thorough evaluations, we can work
towards minimizing the impact of data contamination and ensuring fair
and reliable evaluations.
We welcome paper submissions on all topics related to data
contamination, including but not limited to:
* Definitions, taxonomies, and gradings of contamination
* Contamination detection (both manual and automatic)
* Community efforts to discover, report, and organize contamination events
* Documentation frameworks for datasets or models
* Methods to avoid data contamination
* Methods to forget contaminated data
* Scaling laws and contamination
* Memorization and contamination
* Policies to avoid impact of contamination in publication venues and
open source communities
* Reproducing and attributing results from previous work to data
contamination
* Survey work on data contamination research
* Data contamination in other modalities
*Submission Instructions*
We welcome two types of papers: regular workshop papers and non-archival
submissions. Regular workshop papers will be included in the workshop
proceedings. All submissions must be in PDF format and made through
OpenReview.
* Regular workshop papers: Authors can submit papers up to 8 pages,
with unlimited pages for references. Authors may submit up to 100 MB
of supplementary materials separately and their code for
reproducibility. All submissions undergo a double-blind single-track
review. Best Paper Award(s) will be given based on nomination by the
reviewers. Accepted papers will be presented as posters with the
possibility of oral presentations.
* Non-archival submissions: Cross-submissions are welcome. Accepted
papers will be presented at the workshop but not included in the
workshop proceedings. Papers must be in PDF format and will be
reviewed in a double-blind fashion by workshop reviewers. We also
welcome extended abstracts (up to 2 pages) of papers that are work
in progress, under review or to be submitted to other venues. Papers
in this category need to follow the ACL format.
In addition to papers submitted directly to the workshop, which will be
reviewed by our Programme Committee. We also accept papers reviewed
through ACL Rolling Review and committed to the workshop. Please, check
the relevant dates for each type of submission.
*Important dates*
* Relevant deadlines to consider when submitting your paper are:
* Paper submission deadline: May 17 (Friday), 2024
* ARR pre-reviewed commitment deadline: TBD, 2024
* Notification of acceptance: June 17 (Monday), 2024
* Camera-ready paper due: July 1 (Monday), 2024
* Workshop date: August 16, 2024
*Sponsors*
* AWS AI and Amazon Bedrock
* HuggingFace
* Google
*Contact*
* Website: https://conda-workshop.github.io/
* Email: conda-workshop(a)googlegroups.com
*Organizers*
Oscar Sainz, University of the Basque Country (UPV/EHU)
Iker García Ferrero, University of the Basque Country (UPV/EHU)
Eneko Agirre, University of the Basque Country (UPV/EHU)
Jon Ander Campos, Cohere
Alon Jacovi, Bar Ilan University
Yanai Elazar, Allen Institute for Artificial Intelligence and University
of Washington
Yoav Goldberg, Bar Ilan University and Allen Institute for Artificial
Intelligence
--
Eneko Agirre
HiTZ Hizkuntza Teknologiako Zentroa - Ixa Taldea
Centro Vasco de Tecnología de la Lengua - Grupo Ixa
Basque Center for Language Technology - Ixa NLP Group
University of the Basque Country (UPV/EHU)
hitz.ehu.eus/eneko <https://hitz.ehu.eus/eneko>
The research group Data Mining and Machine Learning at the University of Vienna is looking for a Postdoctoral Researcher in Natural Language Processing.
Possible research topics are:
- Analysis, explainability and interpretability of large language models
- Linguistic capabilities of large language models
- Extraction of structured information from text, linking knowledge graphs and language
- Weak supervision of natural language processing models
- Multimodal and multilingual deep learning
For more details see:
https://jobs.univie.ac.at/job/Postdoctoral-Researcher-in-Natural-Language-P…
--
Univ.-Prof. Dr. Benjamin Roth
Digitale Textwissenschaften
Universität Wien
Kolingasse 14
Raum 5.17
1090 Wien
email: benjamin.roth(a)univie.ac.at
tel: +43 14277 79513
virtual coffee (Tuesday 2pm CEST): https://www.benjaminroth.net/virtual_coffee
video call: https://univienna.zoom.us/j/93796507934?pwd=VFg5dW9JbStPUml6WFVtOWJXV3phQT09
web: https://dm.cs.univie.ac.at/team/person/112089/
Dear all
Please, find below more information about a conference we are organising.
The conference, which will take place on 21-22 November 2024 at the University of Liège (Belgium), is meant for researchers interested in metaphor and national identity discourse from different perspectives (linguistics, cognitive science, etc.).
Regards,
_____________
CALL FOR PAPERS
METAPOL3: DISCOURSE, IDEOLOGIES AND SUB-STATE NATIONALISM
21-22 November 2024, University of Liège, Belgium
Despite attempts to discourage sub-state nationalism and keep the political map of the world in its present form, the struggle for separate identities still remains a serious issue in modern-day countries. Sub-state nationalism has led to violent conflicts in postcolonial Africa, the former Yugoslavia and Soviet Union, and has been one of the main causes of political upheaval in Belgium, Britain, Spain, China, etc.
As Anderson (1983) indicates, nations are imagined communities whose formation involves the spread of discourses aimed at establishing a clear difference between in-groups and out-groups. While national identity has attracted a fair amount of scholarly interest in the field of political science, it is only in the early 90s that studies emphasizing the discursive manifestations of nationalism started being conducted (Wodak & Matouschek, 1993; Wodak & Reisigl, 1999; Wodak et al. 1999).
These last two decades, the study of political discourse has been consolidated by metaphor analysis (Musolff, 2006; 2016; Saric & Stanojevic, 2019), and even though great strides have indeed been made in political discourse analysis, research on sub-state nationalism remains scant. It is thus in an attempt to fill this gap that we are organizing this conference which will hopefully bring together researchers from different fields (linguistics, sociology, political science, cognitive science), interested in discourse, metaphor and nationalist ideologies.
Topics of interest include, but are not limited to,
- The discursive construction of (sub-state) national identity
- Characteristics of separatist discourse
- Conceptualisations of the body politic
- Metaphor scenarios in national identity discourse
- Visual metaphor in (sub-state) nationalist discourse
- Gender and metaphor in (sub-state) nationalist discourse
- ...
KEYNOTE SPEAKER
Professor Martin Reisigl, University of Vienna
SUBMISSION OF PROPOSALS
The conference will be held in person at the University of Liège, Belgium.
Each presentation will last 20 minutes, followed by 10 minutes of Q&A.
Conference proposals should include:
- A title (max. 15 words)
- Key words (max. 5 words)
- An abstract (300 words, excluding references)
Guide for submitting a proposal
To submit a proposal, you must create a user account on sciencesconf.org and log in as a registered user.
It is possible to create an account either directly on the SciencesConf portal or by clicking on the Login button on top right of the conference website (https://metapol3.sciencesconf.org/).
Once connected, access "My submissions" and then go to New submission > Submit an abstract.
Individual and co-authored papers in English or French are welcome.
All abstracts will go through double-blind peer review.
IMPORTANT DATES
Submission deadline: 15/05/24
Notification of acceptance: 15/07/24
CONTACT AND MORE INFORMATION
For more information, please visit our website (https://metapol3.sciencesconf.org/), and if necessary, do not hesitate to email us at metapol3(a)sciencesconf.org.
The following Research Fellow position is available as part of the Edinburgh Clinical NLP Group and the Advanced Care Research Centre at the University of Edinburgh. The deadline is 19th of April 2024.
https://www.jobs.ac.uk/job/DGS698/research-fellow-in-clinical-natural-langu…
----------------------------------------------------------------
Dr. Beatrice Alex
Senior Lecturer and Chancellor’s Fellow
University of Edinburgh
Head of the Edinburgh Language Technology Group
Co-lead of the Edinburgh Clinical NLP Group
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
Registration Deadline 10 April
The 2nd Arabic Named Entity Recognition Shared Task, at ArabicNLP’24
https://dlnlp.ai/st/wojood/
Dataset: Wojood-Fine <https://aclanthology.org/2023.arabicnlp-1.25/> New version: Arabic Fine-Grained Entity Recognition (Wojood + Subtypes of entity types).
Subtask-1 (Closed-Track Flat Fine-Grain NER): We provide the Wojood-Fine Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). External data is not allowed .... (read more <https://dlnlp.ai/st/wojood/>).
Subtask-2 (Closed-Track Nested Fine-Grain NER): This subtask is similar to the subtask-1, we provide the Wojood-Fine Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%) .... (read more <https://dlnlp.ai/st/wojood/>).
Subtask-3 (Open-Track NER - Gaza War): to allow participants to reflect on the utility of NER in the context of real-world events, allow them to use external resources, and encourage them to use generative models in different ways (fine-tuned, zero-shot learning, in-context learning, etc.). The goal of focusing on generative models in this particular subtask is to help the Arabic NLP research community better understand the capabilities and performance gaps of LLMs in information extraction, an area currently understudied.
We provide development and test data related to the current War on Gaza. This is motivated by the assumption that discourse about recent global events will involve mentions from different data distribution. For this subtask, we include data from five different news domains related to the War on Gaza - but we keep the names of the domains hidden. Participants will be given a development dataset (10K tokens, 2K from each of the five domains), and a testing dataset (50K tokens, 10K from each domain). Both development and testing sets are manually annotated with fine-grain named entities using the same annotation guidelines used in Subtask1 and Subtask2 (also described in Liqreina et al., 2023). .... (read more <https://dlnlp.ai/st/wojood/>).
BASELINES
Two baseline models trained on WojoodFine (flat and nested) are provided (See Liqreina et al., 2023 <https://aclanthology.org/2023.arabicnlp-1.25/>). The code used to produce these baselines is available on GitHub <https://github.com/SinaLab/ArabicNER>.
Subtask
Precision
Recall
Average Micro-F1
Flat Fine-Grain NER (Subtask 1)
0.8870
0.8966
0.8917
Nested Fine-Grain NER (Subtask 2)
0.9179
0.9279
0.9229
GOOGLE COLAB NOTEBOOKS
To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models.
[1] Train Flat Fine-Grain NER <https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>: This notebook can be used to train our ArabicNER model on the flat Fine-grain NER task using the sample Wojood_Fine data.
[2] Evaluate Flat Fine-Grain NER <https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.
[3] Train Nested Fine-Grain NER <https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>: This notebook can be used to train our ArabicNER model on the nested Fine-grain task using the sample Wojood data.
[4] Evaluate Nested Fine-Grain NER <https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.
REGISTRATION
Participants need to register via this form (NERSharedTask 2024) <https://docs.google.com/forms/d/1ISMILgQYfUug3XuDpxFmuPASXkWaduYOUc3xOZuGwq…>. Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups.
FAQ
For any questions related to this task, please check our Frequently Asked Questions <https://docs.google.com/document/d/1W_13FRpP3NbDx_ALYJWA3-ESXPRVomOjNovUuYf…>
IMPORTANT DATES
- February 25, 2024: Shared task announcement.
- March 1, 2024: Release of training data, development sets, scoring script, and Codalab links.
- April 10, 2024: Registration deadline.
- April 26, 2024: Test set made available.
- May 3, 2024: Codalab Test system submission deadline.
- May 10, 2024: Shared task system paper submissions due.
- June 17, 2024: Notification of acceptance.
- July 1, 2024: Camera-ready version.
- August 16, 2024: ArabicNLP 2024 conference in Thailand.
CONTACT
For any questions related to this task, please contact the organizers directly using the following email address: NERSharedtask(a)gmail.com <mailto:NERSharedtask@gmail.com> .
(Re-sending due to the initial attempt bouncing, apologies in advance if
you receive multiple copies!)
Professor and co-principal investigator Najoung Kim <https://najoung.kim/>of
the Boston University Department of Linguistics <https://ling.bu.edu/> (with
active affiliations in Computer Science <https://www.bu.edu/cs/>and Data
Science <https://www.bu.edu/cds-faculty/>) is seeking a Postdoctoral
Associate to join the Professor's TIN Lab in Fall 2024. The successful
applicant will have a background in one of the following disciplines:
Artificial Intelligence, Natural Language Processing, Computational
Linguistics, Cognitive Science, or other relevant areas. The postdoctoral
associate will work closely with the PIs (Najoung Kim, Boston University &
Sebastian Schuster, UCL) and will be responsible for co-leading a
collaborative research project minimally involving two PhD-level graduate
students.
*Responsibilities*
The postdoctoral associate’s primary responsibility is to lead a research
project, the aim of which is to develop detailed evaluation protocols for
AI technology applied to a consequential task of real-world
complexity—specifically, AI in the domain of academic AI research—and to
apply this evaluation to estimate the capacities of the current best
models. We expect there to be a substantial system development component
(for building the baselines) as well as a substantial human study design
component (for a rigorous evaluation of the system outputs), where
different expertise can be contributed by different members of the research
team (minimally, two PIs, the postdoctoral associate, and two PhD-level
graduate students).
The postdoctoral associate is also invited to engage with the broader
academic community at BU, spanning Linguistics, Computer Science, and the
Center for Computing & Data Sciences, and academic communities in Boston
and New England. There will also be regular opportunities to connect with
the community at UCL.
*Qualifications*
The postdoctoral associate needs to hold a PhD degree at the start of their
appointment. Hands-on experience in either: (1) building systems that use
language models as a core component to solve complex tasks, or (2) leading
human annotation efforts or human behavioral experiments is required.
Publications or prior research experience in one of the following topic
areas are desired, but not required:
- Compositional generalization
- Data-efficient training methods (e.g., BabyLM-scale)
- Language model evaluation
- General-purpose prompting techniques
*Location*
The postdoctoral associate will be based in Boston University. They will be
physically located in one of the office spaces in 621 Commonwealth Avenue
or 665 Commonwealth Avenue, subject to space availabilities.
*Duration*
This is a one-year position with the base expectation that it will renew
for a second year, conditioned on satisfactory progress.
*Compensation*
The 12-month compensation for this position will be $90K-100K USD,
commensurate with experience.
*Application*
Candidates must submit a CV, two pieces of their most significant research
contribution, and contacts of two references at the time of application. We
will only contact the reference writers for letters of recommendation when
we decide to interview the candidate. Application materials should be
uploaded as individual PDF files through Academic Jobs Online at
https://academicjobsonline.org/ajo/jobs/27426. We will give full
consideration to applications received by April 15, 2024: two weeks from
the job posting date. Afterwards, applications will be considered on a
rolling basis until the position is filled.
Inquiries should be directed to najoung(a)bu.edu and s.schuster(a)ucl.ac.uk.
--
Najoung Kim
Assistant Professor
Department of Linguistics & Computer Science, Boston University
https://najoung.kim 🍪