9th Symposium on Corpus Approaches to Lexicogrammar (LxGr2024)
CALL FOR PAPERS
Extended deadline for abstract submission: 15 April 2024
The symposium will take place online on Friday 5 and Saturday 6 July 2024.
Invited Speakers
Lise Fontaine<http://www.uqtr.ca/PagePerso/Lise.Fontaine> (Université du Québec à Trois-Rivières): Reconciling (or not) lexis and grammar
Ute Römer-Barron<http://alsl.gsu.edu/profile/ute-romer> (Georgia State University): Phraseology research in second language acquisition
LxGr primarily welcomes papers reporting on corpus-based research on any aspect of the interaction of lexis and grammar - particularly studies that interrogate the system lexicogrammatically to get lexicogrammatical answers. However, position papers discussing theoretical or methodological issues are also welcome, as long as they are relevant to both lexicogrammar and corpus linguistics.
If you would like to present, send an abstract of 500 words (excluding references) to lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>
Abstracts for research papers should specify the research focus (research questions or hypotheses), the corpus, the methodology (techniques, metrics), the theoretical orientation, and the main findings. Abstracts for position papers should specify the theoretical orientation and the potential contribution to both lexicogrammar and corpus linguistics.
Abstracts will be double-blind reviewed by members of the Programme Committee<https://sites.edgehill.ac.uk/lxgr/committee>.
Full papers will be allocated 35 minutes (including 10 minutes for discussion).
Work-in-progress reports will be allocated 20 minutes (including 5 minutes for discussion).
There will be no parallel sessions.
Participation is free.
For details, visit the LxGr website: https://sites.edgehill.ac.uk/lxgr/lxgr2024
If you have any questions, contact gabrielc(a)edgehill.ac.uk<mailto:gabrielc@edgehill.ac.uk>
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
The Department of Digital Humanities, Faculty of Arts, University of Helsinki, invites applications for the position of
UNIVERSITY LECTURER IN HUMANITIES DATA SCIENCE / COMPUTATIONAL HUMANITIES
for a permanent appointment starting 1st of September 2024.
https://jobs.helsinki.fi/job/Helsinki-University-Lecturer-in-Humanities-Dat…
Due date: April 25, 2024
The position relates to the application of computational and/or statistical methods in the humanities. The application areas are to be interpreted broadly, from area studies to cognitive science, linguistics to history, phonetics to literature. Application, on the other hand, is to be understood primarily from the viewpoint of end-use across this plethora of humanistic research, e.g. through matching approaches to research questions and data, and not as a focus on methodological development itself. The lecturer will be attached to the Liberal Arts and Sciences bachelor’s programme currently under preparation at the university.
——————————————
Jörg Tiedemann
University of Helsinki
https://blogs.helsinki.fi/language-technology/
The first workshop on evaluating IR systems with Large Language Models
(LLMs) is accepting submissions that describe original research findings,
preliminary research results, proposals for new work, and recent relevant
studies already published in high-quality venues.
Topics of interest
We welcome both full papers and extended abstract submissions on the
following topics, including but not limited to:
- LLM-based evaluation metrics for traditional IR and generative IR.
- Agreement between human and LLM labels.
- Effectiveness and/or efficiency of LLMs to produce robust relevance
labels.
- Investigating LLM-based relevance estimators for potential systemic
biases.
- Automated evaluation of text generation systems.
- End-to-end evaluation of Retrieval Augmented Generation systems.
- Trustworthiness in the world of LLMs evaluation.
- Prompt engineering in LLMs evaluation.
- Effectiveness and/or efficiency of LLMs as ranking models.
- LLMs in specific IR tasks such as personalized search, conversational
search, and multimodal retrieval.
- Challenges and future directions in LLM-based IR evaluation.
Submission guidelines
We welcome the following submissions:
- Previously unpublished manuscripts will be accepted as extended
abstracts and full papers (any length between 1 - 9 pages) with unlimited
references, formatted according to the latest ACM SIG proceedings template
available at http://www.acm.org/publications/proceedings-template.
- Published manuscripts can be submitted in their original format.
All submissions should be made through Easychair:
https://easychair.org/conferences/?conf=llm4eval
All papers will be peer-reviewed (single-blind) by the program committee
and judged by their relevance to the workshop, especially to the main
themes identified above, and their potential to generate discussion. For
already published studies, the paper can be submitted in the original
format. These submissions will be reviewed for their relevance to this
workshop. All submissions must be in English (PDF format).
All accepted papers will have a poster presentation with a few selected for
spotlight talks. Accepted papers may be uploaded to arXiv.org, allowing
submission elsewhere as they will be considered non-archival. The
workshop’s website will maintain a link to the arXiv versions of the papers.
Important Dates
- Submission Deadline: April 25th, 2024 (AoE time)
- Acceptance Notifications: May 31st, 2024 (AoE time)
- Workshop date: July 18, 2024
Website
For more information, visit the workshop website:
https://llm4eval.github.io/
Contact
For any questions about paper submission, you may contact the workshop
organizers at llm4eval(a)easychair.org
--
Apologies for cross-posting.
--
Have you recently completed or expect very soon an MSc or equivalent degree
in computer science, artificial intelligence, computational linguistics,
engineering, or a related area? Are you interested in carrying out research
on automatic translation during the next few years? Are you excited to
spend a part of your life in a pleasant city in the heart of the Italian
Alps?
WE ARE LOOKING FOR YOU!!!
The Machine Translation <https://mt.fbk.eu/> (MT) group at Fondazione Bruno
Kessler (Trento, Italy) in conjunction with the ICT International Doctorate
School of the University of Trento <https://iecs.unitn.it/> is pleased to
announce the availability of the following fully-funded PhD position:
TITLE: Resource-efficient Foundation Models for Automatic Translation
DESCRIPTION:
The advent of foundation models has led to impressive advancements in all
areas of natural language processing. However, their huge size poses
limitations due to the significant computational costs associated with
their use or adaptation. When applying them to specific tasks, fundamental
questions arise: do we actually need all the architectural complexity of
large and - by design - general-purpose foundation models? Can we optimize
them to achieve higher efficiency? These questions spark interest in
research aimed at reducing models’ size, or deploying efficient decoding
strategies, so as to accomplish the same tasks while maintaining or even
improving performance. Success in this direction would lead to significant
practical and economic benefits (e.g., lower adaptation costs, the
possibility of local deployment on small-sized hardware devices), as well
as advantages from an environmental impact perspective towards sustainable
AI. Focusing on automatic translation, this PhD aims to understand the
functioning dynamics of general-purpose massive foundation models and
explore possibilities to streamline them for specific tasks. Possible areas
of interest range from textual and speech translation (e.g., how to
streamline a massively multilingual model to best handle a subset of
languages?) to scenarios where the latency is a critical factor, such as in
simultaneous/streaming translation (e.g., how to streamline the model to
reduce latency?), to automatic subtitling of audiovisual content (e.g., how
to streamline the model without losing its ability to generate compact
outputs suitable for subtitling?).
CONTACTS: Matteo Negri (negri(a)fbk.eu), Luisa Bentivogli (bentivo(a)fbk.eu)
COMPLETE DETAILS AVAILABLE AT:
https://iecs.unitn.it/education/admission/call-for-application
IMPORTANT DATES:
The deadline for application is May 7th, 2024, hrs. 04:00 PM (CEST)
Prospective candidates are strongly invited to contact us in advance for
preliminary interviews. Precedence for interviews will be given to
short-listed candidates that will send us a complete CV via email (
negri(a)fbk.eu, bentivo(a)fbk.eu) by April 22, 2024.
Candidate profile
The ideal candidate must have recently completed or expect very soon an MSc
or equivalent degree in computer science, artificial intelligence,
computational linguistics, engineering, or a closely related area. In
addition, the applicant should:
-
Have an interest in Machine and Speech Translation
-
Have experience in deep learning and machine learning, in general
-
Have good programming skills in Python and experience in PyTorch
-
Enjoy working with real-world problems and large data sets
-
Have good knowledge of written and spoken English
-
Enjoy working in a closely collaborating team
Working Environment
The doctoral student will be employed at the MT group at Fondazione Bruno
Kessler, Trento, Italy. The group (about 10 people including staff and
students) has a long tradition in research on machine and speech
translation and is currently involved in several projects. Former students
are nowadays employed in leading IT companies in the world.
Benefits
Fondazione Bruno Kessler offers an attractive benefits package, including a
flexible work week, full reimbursement for conferences and summer schools,
a competitive salary, an excellent team of supervisors and mentors, help
with housing, full health insurance, the possibility of Italian courses,
and sporting facilities.
Further Information
For preliminary interviews, and should you need further information about
the position, please contact Matteo Negri (negri(a)fbk.eu) and Luisa
Bentivogli (bentivo(a)fbk.eu).
Best Regards,
Matteo Negri
--
--
Le informazioni contenute nella presente comunicazione sono di natura
privata e come tali sono da considerarsi riservate ed indirizzate
esclusivamente ai destinatari indicati e per le finalità strettamente
legate al relativo contenuto. Se avete ricevuto questo messaggio per
errore, vi preghiamo di eliminarlo e di inviare una comunicazione
all’indirizzo e-mail del mittente.
--
The information transmitted is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. If you received this in
error, please contact the sender and delete the material.
Dear Colleagues,
We are pleased to announce that the 2024 edition of the *Lectures on
Computational Linguistics*, a series of lectures dedicated to central topics in
Computational Linguistics and Natural Language Processing, will be held in
Bari from June 19 to 21.
The programme and all information are available on this
<https://www.ai-lc.it/en/lectures-2/lectures-2024/> site.
The 2024 edition is organized by the Italian Association of Computational
Linguistics/Associazione Italiana di Linguistica Computazionale (AILC) with
the Department of Computer Science and the Department of Humanistic
Research and Innovation of the University of Bari 'Aldo Moro'.
The interdisciplinary nature of the school crosses several areas,
particularly the Humanities, Computer Science and Artificial Intelligence.
The program includes tutorials, labs, evening lectures, and two student
presentation sessions. The 2024 edition features a four-hour tutorial
dedicated to introducing Large Language Models to a broad audience.
*Programme*
*Wednesday, June 19, 2024*
9:00–9:30: Welcome and opening
9.30 – 11.30: Tutorial 1 (part 1) – Introduction to Large Language Models –
Andrey Kutuzov, Language Technology Group, University of Oslo
11.30 – 12.00: BREAK
12:00 – 13:30: Student session
1.30pm – 3.00pm: LUNCH
3.00pm – 5.00pm: Tutorial 1 (part 2) – Introduction to Large Language
Models – Andrey Kutuzov, Language Technology Group, University of Oslo
5.00pm – 5.30pm: BREAK
5.30pm – 6.30pm: Evening lecture
7.30pm: Welcome drink
*Thursday, June 20, 2024*
9:00 – 11:00: Tutorial 2 – Computational methods for lexical semantic
change detection – Nina Tahmasebi, University of Gothenburg
11:00 – 11:30: BREAK
11.30 – 13.30: Lab. 1 (part 1) – Hands-on Large Language Models – Marco
Polignano & Lucia Siciliani, University of Bari Aldo Moro
1.30pm – 3.00pm: LUNCH
3.00pm – 5.00pm: Lab. 1 (part 2) – Hands-on Large Language Models – Marco
Polignano & Lucia Siciliani, University of Bari Aldo Moro
5.00pm – 5.30pm: BREAK
5.30pm – 6.30pm: Evening lecture
7.00pm: Tour of the Old Town and dinner with typical food
*Friday, June 21, 2024*
9:00 – 11:00: Tutorial 3 – Dissociating language and thought in Large
Language Models – Anna Ivanova, School of Psychology, Georgia Tech
11:00 – 11:30: BREAK
11.30am – 1.00pm: Student session
1.00pm – 2.00pm: LUNCH
2.00pm – 4.00pm: Lab 2 – Lab. Computational methods for lexical semantic
change detection – Pierluigi Cassotti, University of Gothenburg
*Registration*
The school is mainly aimed at Doctoral and Master's degree students,
although a minimum qualification is not required for access. Participation
is free but subject to registration, and places are limited to 200.
Students wishing to present aspects of their work in the "Student
Presentations" sessions are asked to send a 500-word abstract to
ailc.lectures(a)gmail.com by May 10, 2024. Notifications of acceptance will
be sent by May 31.
Scientific Committee
Pierpaolo Basile (University of Bari Aldo Moro)
Raffaella Bernardi (University of Trento)
Tommaso Caselli (University of Groningen)
Felice Dell'Orletta (Institute of Computational Linguistics CNR – Pisa)
Elisabetta Jezek (University of Pavia)
Local Organizing Committee
Pierpaolo Basile (Department of Computer Science, University of Bari Aldo
Moro)
Marco de Gemmis (Department of Computer Science, University of Bari Aldo
Moro)
Maristella Gatto (Department of Humanistic Research and Innovation,
University of Bari Aldo Moro)
Olimpia Imperio (Coordinator of the Doctorate in Letters, Languages and
Arts, Department of Humanistic Research and Innovation, University of Bari
Aldo Moro)
Secretariat
Lucia Siciliani (Department of Computer Science, University of Bari Aldo
Moro)
Contacts: ailc.lectures(a)gmail.com
--
*Linguistica computazionale. Introduzione all'analisi automatica dei testi
<https://www.mulino.it/isbn/9788815290359>.*
Bologna, Il Mulino, in libreria dal 3 marzo 2023
--
[image: LOGO-UNIPV]
Elisabetta Jezek
Dipartimento di Studi Umanistici
Professore Associato di Glottologia e Linguistica
Corso Strada Nuova 65 - 27100 Pavia (Italia)
<http://maps.google.com/?q=Corso+Strada+Nuova+65+27100+Pavia+%28Italia%29>
T. 0382984391
https://studiumanistici.unipv.it/?pagina=docenti&id=13
<https://studiumanistici.unipv.it/?pagina=docenti&id=135>5
Elisabetta Jezek's Personal Meeting Room
https://us02web.zoom.us/j/7814331810
--
Le informazioni contenute nella presente comunicazione sono di natura privata
e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai
destinatari indicati e per le finalità strettamente legate al relativo
contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di
eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente.
--
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. If you received this in error, please contact the sender and
delete the material.
<http://lettere.unipv.it/diplinguistica/docenti.php>
--
[image: LOGO-UNIPV]
PhD ELISABETTA JEZEK
Dipartimento di Studi Umanistici
PROFESSORE ASSOCIATO IN LINGUISTICA E GLOTTOLOGIA
Presidente del corso di laurea magistrale internazionale in European
Languages, Cultures and Societies in Contact
Membro del Consiglio Direttivo dell'Associazione Italiana di Linguistica
Computazionale
<https://firmamail.unipv.it/index.php/firme/genera>
https://unipv.unifind.cineca.it/resource/person/659960
Elisabetta Jezek's Personal Meeting Room
https://us02web.zoom.us/j/7814331810
--
Le informazioni contenute nella presente comunicazione sono di natura privata
e come tali sono da considerarsi riservate ed indirizzate esclusivamente ai
destinatari indicati e per le finalità strettamente legate al relativo
contenuto. Se avete ricevuto questo messaggio per errore, vi preghiamo di
eliminarlo e di inviare una comunicazione all’indirizzo e-mail del mittente.
--
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. If you received this in error, please contact the sender and
delete the material.
<http://lettere.unipv.it/diplinguistica/docenti.php>
*** CMCL – 2nd Call for Papers***
The 13th edition of the Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2024) will be co-located with the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
Webpage: https://cmclorg.github.io/
Direct submission page: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL
ARR commitment page: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL_ARR_Commi…
*Workshop Description*
CMCL 2024 is a one-day workshop held in conjunction with ACL 2024. CMCL invites papers on cognitive modeling, cognitively-inspired natural language processing, and, more broadly, the alignment of language models with human cognition/perception. The 2024 workshop follows in the tradition of earlier meetings at ACL 2010, ACL 2011, NAACL-HLT 2012, ACL 2013, ACL 2014, NAACL 2015, EACL 2017, LSA 2018, NAACL 2019, EMNLP 2020, NAACL 2021, and ACL 2022.
*Scope and Topics*
The research interests/questions include, but are not limited to:
- Human-like language acquisition/learning: How is language acquisition of language models (LMs) (dis)similar to
humans, and why?
- Contrasting/aligning NLP models with human behavior data: What do humans compute during language comprehension/production, and how/why?
- Linguistic probing of NLP models: How well do current language models understand/represent/generalize language behaviorally/internally?
- Linguistically-motivated data modeling/analysis: How can one quantify a particular aspect of language?
- Emergent communication/language: What are the sufficient conditions for the emergence of language?
A more formal description of the workshop scope is:
- Stochastic models of factors influencing a speaker's production or comprehension decisions.
- Models of semantic interpretation, including psychologically realistic notions of word and phrase meaning and composition.
- Incremental parsers for diverse grammar formalisms and their psychological plausibility.
- Models of speaker-specific linguistic adaptation and/or generalization.
- Models of first and second language acquisition and bilingual language processing.
- Behavioral tasks for better understanding neural models of linguistic representation.
- Models and empirical analysis of the relationship between mechanistic psycholinguistic principles and pragmatics or semantics.
- Models of lexical acquisition, including phonology, morphology, and semantics.
- Psychologically motivated models of grammar induction.
- Psychologically plausible models of lexical or conceptual representations.
- Models of language disorders, such as aphasia, dyslexia, or dysgraphia.
- Behavioral datasets or resources for modeling language processing or production in languages other than English.
- Models of language comprehension difficulty.
- Models of language learning and generalization.
- Models of linguistic information propagation and language evolution in communities.
- Cognitively-motivated models of discourse and dialogue.
*Invited Speakers*
Aida Nematzadeh (Google DeepMind)
Frank Keller (University of Edinburgh)
*Important Dates*
- May 17, 2024: Paper submission/commitment deadline (cf. May 15, 2024: notification of ACL 2024)
- June 17, 2024: Notification of acceptance
- July 1, 2024: Camera-ready paper due
- August 15, 2024: Workshop dates
Deadlines are at 11:59 pm AOE.
*Workshop submissions*
CMCL accepts direct submissions through the OpenReview site: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL
We also receive papers already reviewed in ACL Rolling Review (ARR) February or earlier: https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/CMCL_ARR_Commi…
There is no need that the CMCL is mentioned as a preferred venue in the original ARR submission.
Detiailed submission flow/schedule is shown in our workshop webpage: https://cmclorg.github.io/
*Submission types*
We invite three types of submissions:
(1) Archival regular workshop submissions that present original research in either long (8 pages + references) or short (4 pages + references) paper format.
(2) Non-archival submissions of extended abstracts that present preliminary results (from 2 to 4 pages + references).
(3) Non-archival cross-submission of long/short papers that present relevant research submitted/published elsewhere (including ACL "Findings of..." papers).
- Only regular workshop papers submitted via (1) will be included in the proceedings, but all types of papers will have a presentation opportunity in the workshop.
- Submissions must be formatted using the ACL style template (https://github.com/acl-org/acl-style-files) and be submitted as a PDF file.
- We adhere to the ACL anonymity policy: https://www.aclweb.org/adminwiki/index.php/ACL_Anonymity_Policy
- This year, we don't host a shared task.
*Workshop Organizers*
Tatsuki Kuribayashi (MBZUAI, tatsuki.kuribayashi(a)mbzuai.ac.ae)
Giulia Rambelli (University of Bologna, giulia.rambelli4(a)unibo.it)
Ece Takmaz (University of Amsterdam, ece.takmaz(a)uva.nl)
Philipp Wicke (Ludwig Maximilian University LMU, pwicke(a)cis.lmu.de)
Yohei Oseki (University of Tokyo, oseki(a)g.ecc.u-tokyo.ac.jp)
*Program Committee*
Abdellah Fourtassi (Aix-Marseille University)
Adina Williams (FAIR)
Afra Alishahi (Tilburg University)
Aniello De Santo (University of Utah)
Carina Kauf (MIT)
Cassandra Jacobs (University of Buffalo)
Christos Christodoulopoulos (Amazon)
Cory Shain (MIT)
Ethan Wilcox (ETH Zurich)
Frances Yung (Saarland University)
Fred Mailhot (Dialpad)
Gianluca Lebani (University Ca' Foscari Venezia)
James Michaelov (The University of California San Diego)
John Hale (University of Georgia)
Laurent Prévot (Aix-Marseille University)
Lisa Beinborn (VU Amsterdam)
Ludovica Pannitto (University of Trento)
Micha Elsner (Ohio State University)
Nora Hollenstein (University of Copenhagen)
Rachel Ryskin (University of California Merced)
Raquel Garrido Alhama (Tilburg University)
Richard Futrell (UC Irvine Language Science)
Robert Frank (Yale University)
Ryo Yoshida (The University of Tokyo)
Samar Husain (IIT Delhi)
Sandra Kuebler (Indiana University)
Tal Linzen (New York University)
Ted Briscoe (MBZUAI)
Tiago Pimentel (ETH Zurich)
Tim Hunter (UCLA)
Vera Demberg (Saarland University)
William Schuler (Ohio State University)
Yao Yao (Hong Kong Polytechnic University)
*Website*
https://cmclorg.github.io/
*Sponsoring Institutions*
Japan Society for the Promotion of Science
*Contact*
cmclorganizers2024(a)gmail.com
The University of Amsterdam has a fully funded PhD position on AI/NLP/IR
for information access.
We seek an ambitious PhD student with a background in artificial
intelligence, natural language processing and information retrieval.
Your focus will be on large language models (LLMs) for information
access. How can we search specific collections, including full text,
metadata, and multimodal content? How can we support complex search
tasks and practices, such as scholarly research on cultural data, and
the research and workflow of investigative journalism?
The PhD position is part of four PhD vacancies as part of digital
Humanities, Artificial Intelligence, Cultural Heritage (HAICu) project,
a large national science agenda project funded by the Netherlands
Organization for Scientific Research. We are one of the best European
and global places to study AI, and you will work together with other AI
and Digital Humanities researchers, and a range of external partners on
scientific breakthroughs. HAICu deploys artificial intelligence (AI) to
make digital heritage collections more accessible, and the extraordinary
challenges of cultural heritage provide a unique opportunity to push the
boundaries of AI. The PhD position is fully funded and you will be
employed by the University of Amsterdam for four years (full-time, with
all employment benefits) and are expected to complete a PhD thesis
within this period.
Are you interested? Strong candidates with an AI/NLP/IR background are
encouraged to apply by May 15. Details are in:
https://vacatures.uva.nl/UvA/job/4PhDs/792167402/ (Project #1).
Feel free to reach out with questions or comments!
Jaap Kamps
We invite you to participate and submit your work to the First Workshop
on Data Contamination (CONDA) co-located with ACL 2024 in Bangkok, Thailand.
Data contamination, where evaluation data is inadvertently included in
pre-training corpora of large scale models, and language models (LMs) in
particular, has become a concern in recent times. The growing scale of
both models and data, coupled with massive web crawling, has led to the
inclusion of segments from evaluation benchmarks in the pre-training
data of LMs. The scale of internet data makes it difficult to prevent
this contamination from happening, or even detect when it has happened.
Crucially, when evaluation data becomes part of pre-training data, it
introduces biases and can artificially inflate the performance of LMs on
specific tasks or benchmarks. This poses a challenge for fair and
unbiased evaluation of models, as their performance may not accurately
reflect their generalization capabilities.
Although a growing number of papers and state-of-the-art models mention
issues of data contamination, there is no agreed-upon definition or
standard methodology to ensure that a model does not report results on
contaminated benchmarks. Addressing data contamination is a shared
responsibility among researchers, developers, and the broader community.
By adopting best practices, increasing transparency, documenting
vulnerabilities, and conducting thorough evaluations, we can work
towards minimizing the impact of data contamination and ensuring fair
and reliable evaluations.
We welcome paper submissions on all topics related to data
contamination, including but not limited to:
* Definitions, taxonomies, and gradings of contamination
* Contamination detection (both manual and automatic)
* Community efforts to discover, report, and organize contamination events
* Documentation frameworks for datasets or models
* Methods to avoid data contamination
* Methods to forget contaminated data
* Scaling laws and contamination
* Memorization and contamination
* Policies to avoid impact of contamination in publication venues and
open source communities
* Reproducing and attributing results from previous work to data
contamination
* Survey work on data contamination research
* Data contamination in other modalities
*Submission Instructions*
We welcome two types of papers: regular workshop papers and non-archival
submissions. Regular workshop papers will be included in the workshop
proceedings. All submissions must be in PDF format and made through
OpenReview.
* Regular workshop papers: Authors can submit papers up to 8 pages,
with unlimited pages for references. Authors may submit up to 100 MB
of supplementary materials separately and their code for
reproducibility. All submissions undergo a double-blind single-track
review. Best Paper Award(s) will be given based on nomination by the
reviewers. Accepted papers will be presented as posters with the
possibility of oral presentations.
* Non-archival submissions: Cross-submissions are welcome. Accepted
papers will be presented at the workshop but not included in the
workshop proceedings. Papers must be in PDF format and will be
reviewed in a double-blind fashion by workshop reviewers. We also
welcome extended abstracts (up to 2 pages) of papers that are work
in progress, under review or to be submitted to other venues. Papers
in this category need to follow the ACL format.
In addition to papers submitted directly to the workshop, which will be
reviewed by our Programme Committee. We also accept papers reviewed
through ACL Rolling Review and committed to the workshop. Please, check
the relevant dates for each type of submission.
*Important dates*
* Relevant deadlines to consider when submitting your paper are:
* Paper submission deadline: May 17 (Friday), 2024
* ARR pre-reviewed commitment deadline: TBD, 2024
* Notification of acceptance: June 17 (Monday), 2024
* Camera-ready paper due: July 1 (Monday), 2024
* Workshop date: August 16, 2024
*Sponsors*
* AWS AI and Amazon Bedrock
* HuggingFace
* Google
*Contact*
* Website: https://conda-workshop.github.io/
* Email: conda-workshop(a)googlegroups.com
*Organizers*
Oscar Sainz, University of the Basque Country (UPV/EHU)
Iker García Ferrero, University of the Basque Country (UPV/EHU)
Eneko Agirre, University of the Basque Country (UPV/EHU)
Jon Ander Campos, Cohere
Alon Jacovi, Bar Ilan University
Yanai Elazar, Allen Institute for Artificial Intelligence and University
of Washington
Yoav Goldberg, Bar Ilan University and Allen Institute for Artificial
Intelligence
--
Eneko Agirre
HiTZ Hizkuntza Teknologiako Zentroa - Ixa Taldea
Centro Vasco de Tecnología de la Lengua - Grupo Ixa
Basque Center for Language Technology - Ixa NLP Group
University of the Basque Country (UPV/EHU)
hitz.ehu.eus/eneko <https://hitz.ehu.eus/eneko>
The research group Data Mining and Machine Learning at the University of Vienna is looking for a Postdoctoral Researcher in Natural Language Processing.
Possible research topics are:
- Analysis, explainability and interpretability of large language models
- Linguistic capabilities of large language models
- Extraction of structured information from text, linking knowledge graphs and language
- Weak supervision of natural language processing models
- Multimodal and multilingual deep learning
For more details see:
https://jobs.univie.ac.at/job/Postdoctoral-Researcher-in-Natural-Language-P…
--
Univ.-Prof. Dr. Benjamin Roth
Digitale Textwissenschaften
Universität Wien
Kolingasse 14
Raum 5.17
1090 Wien
email: benjamin.roth(a)univie.ac.at
tel: +43 14277 79513
virtual coffee (Tuesday 2pm CEST): https://www.benjaminroth.net/virtual_coffee
video call: https://univienna.zoom.us/j/93796507934?pwd=VFg5dW9JbStPUml6WFVtOWJXV3phQT09
web: https://dm.cs.univie.ac.at/team/person/112089/