Dear all,
We are pleased to announce the schedule and lineup of speakers & tutors for the UCREL NLP Summer School 2024! The school will be led by 16 experts, covering a wide range of NLP talks and hands-on tutorials. We are also introducing a mini team-based hackathon and a poster session. There will be plenty of time for knowledge exchange and discussions both within the sessions and during breaks.
Upon request, we have extended the early bird registration until April 1, 2024. Please note that registrations are processed on a first-come, first-served basis, and we are offering in-person sessions only, therefore spaces are limited.
- Date: 24-26 July 2024
- Venue: InfoLab21, School of Computing and Communications, Lancaster University, UK.
- Registration: https://bit.ly/UCREL2024
- Schedule and speakers: https://ucrel.lancs.ac.uk/uss2024
Registered applicants who plan to attend the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security (NLPAICS 2024) https://www.nlpaics.com/, which takes place at Lancaster University just two days after the summer school, will receive a 20% discount on your NLPAICS conference registration fees.
For any questions, please email us at ucrel(a)lancaster.ac.uk
Best wishes,
Mo
--------------------------------
Dr Mo El-Haj
Senior Lecturer in NLP
Director of Admissions (SCC)
Co-Director of UCREL NLP Group https://ucrel.lancs.ac.uk/
Natural Language Engineering (NLE) Journal Editorial Board
https://www.cambridge.org/core/journals/natural-language-engineering
Advisory Board of the Natural Language Processing Book Series
https://benjamins.com/catalog/nlp
School of Computing and Communications, Lancaster University
https://www.lancaster.ac.uk/staff/elhaj
@DocElhaj
You may receive emails from me outside what are your typical office hours.
I do not expect you to respond to my email outside your working hours.
> CALL FOR PAPERS
>
> SLATE - Symposium on Languages, Applications and Technologies
>
> Águeda, Portugal, July 4-5, 2024
>
> https://slate-conf.org/2024/home
>
>
>
> IMPORTANT DATES
>
> Paper Submission Deadline: April 25, 2024
>
> Paper Authors' Notification: May 24, 2024
>
> Final Paper Submission: May 31, 2024
>
> Conference Date: July 4-5, 2024
>
>
>
> Context
>
> We often use languages. Earlier, to communicate between ourselves. Later, to communicate with computers. And more recently, with the advent of networks, we found a way to make computers communicate between themselves. All these different forms of communication use languages, different languages, but they still share many similarities. In SLATE, we are interested in discussing these languages, organised in three main tracks:
>
>
>
> - HHL Track: Processing Human-Human Languages, dedicated to the presentation and discussion of Natural Language Processing (NLP) tools and applications.
>
> - HCL Track: Processing Human-Computer Languages, where researchers, developers, and educators exchange ideas and information on the latest academic or industrial work on the design, processing, assessment, and applications of programming languages.
>
> - CCL Track: Processing Computer-Computer Languages, broad space for discussing (mark-up) languages for communication between computers, including those used for visualisation and presentation of information to the end-user
>
> List of topics
>
> * Human-Human Languages (HHL) track:
>
> - Computational approaches to Morphology, Syntax, and Semantics;
>
> - Machine translation and tools for Computer Assisted Translation;
>
> - Computational terminology and lexicography;
>
> - Information Retrieval and Automatic Question Answering;
>
> - Information Extraction;
>
> - Natural Language Understanding;
>
> - Corpus Linguistics;
>
> - Statistical Methods for NLP;
>
> - Tools and resources for NLP;
>
> - Natural Language Generation;
>
> - Speech Recognition and Synthesis;
>
> - NLP system and resource evaluation;
>
> - Language teaching support tools.
>
>
>
> * Human-Computer Languages (HCL) track:
>
> - Programming language concepts, methodologies and tools;
>
> - Language and Grammars, design, formal specification and quality;
>
> - Domain Specific Languages design and implementation;
>
> - Programming, refactoring and debugging environments;
>
> - Dynamic and static analysis of programs;
>
> - Program Comprehension and program visualization;
>
> - Compilation and interpretation techniques;
>
> - Code generation and optimization;
>
> - Programming languages teaching methods and tools;
>
> - Cross-fertilization of different technological spaces (modelware, grammarware, ontologies, etc);
>
> - High level visual languages for Low-code development.
>
>
>
> * Computer-Computer Languages (CCL) track:
>
> - Semantic data description frameworks;
>
> - Semantic Web languages;
>
> - Ontology engineering;
>
> - IoT data protocols;
>
> - XML Databases and Big Data;
>
> - Publishing and document storage formats;
>
> - HTML5 and web formatting;
>
> - Industry-specific XML based standards;
>
> - Web APIs and service marketplaces;
>
> - Service-Oriented Architectures;
>
> - E-learning systems, standards, and interoperability;
>
> - Data and graph visualization languages.
>
> For any more specific information regarding publication policy, committees or how to get to the venue please visit our website: [https://slate-conf.org/2024/home.](https://slate-conf.org/2024/home)
>
> Kindest Regards,
>
> SLATE'24 Organization Committee
https://gu-clasp.github.io/MILLing/
*Multimodality and Interaction in Language Learning (MILLing)* will
bring together researchers in linguistics and computational
linguistics to discuss learning through linguistic interaction, from
the perspectives of both human language acquisition and machine
learning. We encourage contributions from the fields of theoretical linguistics,
experimental linguistics, pragmatics, computational linguistics,
artificial intelligence, and cognitive science.
The conference is organised by the Centre for Linguistic Theory and
Studies in Probability (CLASP, <https://gu-clasp.github.io/>),
University of Gothenburg. The conference will be held between October
14 and 15 in Gothenburg, Sweden.
Important dates
----
- Submission deadline: May 31, 2024, anywhere on Earth
- Notification of acceptance: Aug 30, 2024, anywhere on Earth
- Camera ready: Sep 20, 2024, anywhere on Earth
- Conference: Oct 14--15, 2024, University of Gothenburg, Sweden
Topics of interest
----
We hope to see innovative work that
considers language learning from different perspectives, and we hope
to cultivate discussion that reaches across traditionally disparate
disciplines. Papers are invited on topics in these and closely related
areas, including (but not limited to) the following:
- Language acquisition: formal, statistical, experimental, and machine learning-based work
- Language learning through dialogue in humans and machines
- Multi-modality and figurativeness in language learning and dialogue
- Linguistic variation, adaptation, and audience design
- Low-resource and ecologically plausible language modelling (e.g., BabyLM)
- Cognitive architectures for language learning
- Information state update in humans and machines
- Cognitive aproaches to second language acquisition
- Dialogue systems for language learning
- Online, reinforcement and curriculum learning in NLP
- Atypical development and language learning
- Ethical considerations in AI-assisted language learning
Submission Requirements
----
MILLing will feature two types of submissions: long papers and short
papers. Long papers must describe original research, and they must not
exceed 8 pages excluding references (position papers are also accepted
and should be formatted in the same way). Short papers present work in
progress, or they describe systems and/or projects. They must not
exceed 4 pages excluding references. All types of papers will be
published in the 2024 ACL Anthology as a CLASP Conference Proceedings.
Papers should be electronically submitted via the softconf system at:
<https://softconf.com/n/MILLing2024/>. Submissions should be PDF files
and use the LaTeX or Word templates provided for ACL submissions
(<https://github.com/acl-org/acl-style-files>). Submissions have to be
anonymous. Please make sure that you select the right track when
submitting your paper. Contact the organisers if you have problems
using softconf.
Concurrent Submissions
----
Papers that have been or will be submitted to other conferences or
publications must indicate this at submission time using a footnote on
the title page of the submissions. Authors of papers accepted for
presentation at MILLing must notify the program chairs by the
camera-ready deadline as to whether the paper will be presented. All
accepted papers must be presented at the conference to appear in the
proceedings. We will not accept publications or presentation papers
that overlap significantly in content or results with papers that will
be (or have been) published elsewhere.
Camera Ready Versions
----
Camera ready versions should follow the same guidelines with respect
to style and page numbers as the initial submission, i.e. there are no
additional pages allowed in the final submission. Please submit the
camera ready version by Sep 20, 2024.
About CLASP
----
MILLing is organised by the Centre for Linguistic Theory and Studies
in Probability (CLASP, <https://gu-clasp.github.io/>) at the Department
of Philosophy, Linguistics and Theory of Science (FLoV), University of
Gothenburg. CLASP focuses its research on the application of
probabilistic and information theoretic methods to the analysis of
natural language. CLASP is concerned both with understanding the
cognitive foundations of language and developing efficient language
technology. We work at the interface of computational
linguistics/natural language processing, theoretical linguistics, and
cognitive science.
=======2 PhD positions on NLP at CNRS@CREATE Singapore ===========
CNRS@CREATE Singapore, the first CNRS’ overseas subsidiary, has 2 PhD
offer positions in hybrid strategies for NLP. The candidate will work
within the DesCartes program
(https://www.cnrsatcreate.cnrs.fr/descartes/), a large research project
that aims to develop disruptive hybrid AI to serve the smart city and to
enable optimized decision-making in complex situations, encountered for
critical urban systems.
We are looking for candidates with:
→ Master degree in Computer science or equivalent with solid background
in NLP, AI and/or machine learning. Very strong academic records are
highly recommended.
→ Good experience in deep learning approaches for NLP
→ Good programming skills in Python
→ Very good English skills (both writing and speaking)
→ Can work collaboratively with other researchers
The candidate will be registered at Paul Sabatier University-Toulouse
for 3 years and is expected to spend time in Singapore
(https://www.cnrsatcreate.cnrs.fr/about-us/). The thesis will be
supervised by Jian Su (A*STAR Institute for Infocomm Research), and
Farah Benamara (IRIT, Toulouse University and IPAL Singapore).
To apply, please send a detailed CV, your grades and a list of
publications if any. The position is open until fulfilled but the
deadline to apply is April 15th, for a start on September/October 2024.
Feel free to contact us for any questions: farah.benamara(a)irit.fr
--
========================
Farah Benamara Zitoune
Professor in Computer Science, Université Paul Sabatier
IRIT-CNRS
118 Route de Narbonne, 31062, Toulouse.
Tel : +33 5 61 55 77 06
http://www.irit.fr/~Farah.Benamara
==================================
Processing of figurative language is a rapidly growing area in NLP, including computational modeling of metaphors, idioms, puns, irony, sarcasm, simile, and other figures. Characteristic to all areas of human activity (from poetic, ordinary, scientific, social media) and, thus, to all types of discourse, figurative language becomes an important problem for NLP systems. Its ubiquity in language has been established in a number of corpus studies and the role it plays in human reasoning has been confirmed in psychological experiments. This makes figurative language an important research area for computational and cognitive linguistics, and its automatic identification, interpretation and generation indispensable for any semantics-oriented NLP application.
The proposed workshop will be the fourth edition of the biennial Workshop on Figurative Language Processing, whose first editions were held at NAACL 2018, ACL 2020 and EMNLP 2022, respectively. The workshop builds upon a long series of related workshops that the current organizers have been involved with: “Metaphor in NLP” series (2013-2016) and “Computational Approaches to Linguistic Creativity” series (2009-2010). We expand the scope to incorporate various types of figurative language, with the aim of maintaining and nourishing a community of NLP researchers interested in this topic. The main focus will be on computational modeling of figurative language, however papers on cognitive, linguistic, social, rhetorical, and applied aspects are also of interest, provided that they are presented within a computational, formal, or a quantitative framework. Recent advancement in language models have led to several works on figurative language understanding (Chakrabarty et al 2022a; Chakrabarty et al 2022b; Liu et al 2022; Hu et al 2023) and generation (Stowe et al 2021; Chakrabarty et al 2021; Sun et al 2022; Tian et al 2021) At the same time large language models have opened up opportunities to utilize figurative language in scientific (Kim et al 2023) as well as creative writing (Chakrabarty et al 2022c; Tian et al 2022). Additionally there have also been recent work on multimodal figurative language generation (Chakrabarty et al 2023; Akula et al 2023), understanding (Hessel et al 2023; Yosef et al 2023) and interpretation (Hwang et al 2023; Desai et al 2022; Kumar et al 2022). We encourage submissions along these axes.
Topics of Interest
The workshop will solicit both full papers and short papers for either oral or poster presentation. Topics will include, but will not be limited to, the following:
Identification and interpretation of different types of figurative language: Linguistic, conceptual and extended metaphor; irony, sarcasm, puns, simile, metonymy, personification, synecdoche, hyperbole
Generation of different types of figurative language: sarcasm, simile, metaphors, humor, hyperbole
Multilingual and multimodal figurative language processing
Resources and evaluation
Annotation of figurative language in corpora
Datasets for evaluation of tools
Evaluation methodologies
Figurative use in low-resource languages
Processing of figurative language for NLP applications
Figurative language in sentiment analysis; dialogue systems; computational social science; educational applications
Figurative language and mental health
Figurative language in digital humanities
Figurative language in creative writing
Figurative language and cognition
Cognitive models of processing of figurative language by the human brain
Human-AI collaboration for figurative language
Shared Tasks
Multilingual euphemisms detection: Euphemisms are a linguistic device used to soften or neutralize language that may otherwise be harsh or awkward to state directly (e.g. "between jobs" instead of "unemployed", "late" instead of "dead", "collateral damage" instead of "war-related civilian deaths"). By acting as alternative words or phrases, euphemisms are used in everyday language to maintain politeness, mitigate discomfort, or conceal the truth. While they are culturally-dependent, the need to discuss sensitive topics in a non-offensive way is universal, suggesting similarities in the way euphemisms are used across languages and cultures. We propose a shared task in which participants will need to disambiguate sentences in multiple languages as either euphemistic or not. The dataset will include English, Mandarin, Spanish, Yoruba, and possibly additional languages.
Understanding of Figurative Language through Visual Entailment: One important modality that has gained interest recently is vision, namely the interpretation of figurative language in media such as memes, art, or comics. This task is challenging because it involves reasoning abstractly about images, and also involves understanding social commonsense and cultural context. We will frame this as a visual entailment task where a model not only has to predict if a caption entails the content in the image but also provide free text explanations justifying the label prediction. These tasks have proved difficult for state-of-the-art multimodal models in the past. We will have a paper and a baseline for the same.
Important Dates
Long, Short & Demonstration Paper Submission: March 10th, 2024
Long, Short & Demonstration Paper Notification: April 14th, 2024
Final Paper Submission: April 24th, 2024
Workshop: June 21/22, 2024
For more information, please check https://sites.google.com/view/figlang2024
In this newsletter:
LDC membership discounts expire March 1
Spring 2024 data scholarship recipients
Four corpora withdrawn from the LDC Catalog
New publications:
Second Language University Speech Intelligibility Corpus<https://catalog.ldc.upenn.edu/LDC2024S02>
AIDA Scenario 1 Practice Topic Annotation<https://catalog.ldc.upenn.edu/LDC2024T02>
________________________________
LDC membership discounts expire March 1
Time is running out to save on 2024 membership fees. Renew your LDC membership, rejoin the Consortium, or become a new member by March 1 to receive a discount of up to 10%. For more information on membership benefits and options, visit Join LDC<https://www.ldc.upenn.edu/members/join-ldc>.
Spring 2024 data scholarship recipients
Congratulations to the recipients of LDC's Spring 2024 data scholarships:
Jordan Chandler: Université Rennes 2 (France): Master's student, English Studies. Jordan is awarded a copy of Penn Parsed Corpora of Historical English LDC2020T16 to continue his research on the historical development of adjective, quantifier, and article indefiniteness in the English language.
Nikhil Raghav: TCG Crest (India): PhD candidate, Institute for Advancing Intelligence. Nikhil is awarded copies of Third DIHARD Challenge Development LDC2022S12 and Third DIHARD Challenge Evaluation LDC2022S14 for his work in speaker diarization.
Abraham Sanders: Rensselaer Polytechnical Institute (USA): PhD candidate, Cognitive Science. Abraham is awarded copies of Fisher English Training Speech Part 1 Speech LDC2004S13, Fisher English Training Speech Part 1 Transcripts LDC2004T19, Fisher English Training Part 2 Speech LDC2005S13 and Fisher English Training Part 2 Transcripts LDC2005T19, for his work in spoken dialogue systems.
The next round of applications will be accepted in September 2024. For information about the program, visit the Data Scholarships page<https://www.ldc.upenn.edu/language-resources/data/data-scholarships>.
Four corpora withdrawn from the LDC Catalog
We regret to announce that The New York Times Annotated Corpus, LDC2008T19, has been withdrawn from the LDC Catalog by the data provider. Because they contain data from LDC2008T19, the following three corpora are also withdrawn from the Catalog: Benchmarks for Open Relation Extraction LDC2014T27, Concretely Annotated New York Times LDC2018T12, and News Sub-domain Named Entity Recognition LDC2023T12. Organizations and individuals who have previously licensed any of these data sets can continue to use them under the terms of their respective special license agreements.
________________________________
New publications:
Second Language University Speech Intelligibility Corpus<https://catalog.ldc.upenn.edu/LDC2024S02> was developed by Northern Arizona University, The Pennsylvania State University, and The University of Texas at Dallas. It contains 10.5 hours of English speech collected from 66 international faculty and university students representing 15 language backgrounds at 10 North American universities. This release also includes orthographic transcriptions for all recordings, intelligibility scores for 73% of the files, speaker metadata, and aligned Praat textgrids.
The speech data is comprised of presentations, descriptions, reflections, and microteaching tasks. Speakers were recruited from courses at intensive English programs and oral skills courses for international graduate students seeking to become international teaching assistants.
2024 members can access this corpus through their LDC accounts provided they have submitted a completed copy of the special license agreement. Non-members may license this data for a fee.
*
AIDA Scenario 1 Practice Topic Annotation<https://catalog.ldc.upenn.edu/LDC2024T02> was developed by LDC and is comprised of annotations for 212 English, Russian, and Ukrainian web documents (text, image, and video) from AIDA Scenario 1 Practice Topic Source Data (LDC2023T11)<https://catalog.ldc.upenn.edu/LDC2023T11>, specifically, the set of practice documents designated for annotation in Phase 1.
Annotations are presented as tab separated files in the following categories for each topic:
* Mentions: single references in source data to a real-world entity or filler, event, or relation.
* Slots: pre-defined roles in an event or relation filled by an argument (entity mention).
* Linking: entity mentions linked to entries in the knowledge base as a method of indicating the real-world entity to which an entity referred.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
[apologies if you receive multiple copies of this call]
Dear colleagues and friends,
*We are pleased to release the 1st Call for Participation - CLEF 2024
SimpleText Task4: SOTA?*
*Overview:* SOTA? is introduced as Task 4 in the SimpleText track of CLEF
2024. The goal of the SOTA? shared task is to develop systems which given
the full text of an AI paper, are capable of recognizing whether an
incoming AI paper indeed reports model scores on benchmark datasets, and if
so, to extract all pertinent (Task, Dataset, Metric, Score) quadruples
presented within the paper.
More info on the task website:
https://sites.google.com/view/simpletext-sota/home
SOTA? will be divided into two evaluation phases:
- Evaluation Phase 1: Few-shot Testing;
- Evaluation Phase 2: Zero-shot Testing
*To participate in SOTA? i.e. SimpleText Task 4 @ CLEF 2024, please
register your team*:
1. CLEF 2024 official registration page
https://clef2024.imag.fr/index.php?page=Pages/registration.html
2. Codalab competition site:
https://codalab.lisn.upsaclay.fr/competitions/16616
Note, SOTA? is organized as a new task this year under the "SimpleText -
Improving Access to Scientific Texts for Everyone" initiative
https://simpletext-project.com/. Please take a look at the other 3 tasks,
i.e. Task 1, 2, and 3, offered by SimpleText and select one or more of
those task options too if you are interested. Note that there is no
interdependence of the dataset between "Task 4 - SOTA?" and the other three
tasks of SimpleText.
*Dates*
Training and validation datasets available: Feb 1, 2024
Test data available/Evaluation starts: April 23, 2024
Evaluation ends: May 3, 2024
Participant paper submissions due: May 31, 2024
Notification to authors: June 24, 2024
Camera ready due: July 8, 2024
CLEF 2024 Workshop, Grenoble, France: 9-12 September 2024
*Task Organizers*
Jennifer D’Souza (TIB Leibniz Information Centre for Science and Technology
- Germany)
Salomon Kabongo (L3S Research Center, Germany)
Hamed Babaei Giglou (TIB Leibniz Information Centre for Science and
Technology - Germany)
Yue Zhang (Berlin Technical University, Germany)
Sören Auer (TIB Leibniz Information Centre for Science and Technology -
Germany)
*We look forward to having you on board!*
*Contact:* sota.task [at] gmail.com
The Institute of Translation Studies and Specialised Communication, Department of Language and Information Sciences, University of Hildesheim (Germany, https://www.uni-hildesheim.de/fb3/institute/institut-fuer-uebersetzungswiss…) is seeking to fill a lectureship position. (Near-)native command of English is a must, very good command of German is also required, see details in the official announcement (in German):
https://bewerbung.uni-hildesheim.de/jobposting/3653d72a8c32e0c740078c88667d…
Application *deadline*: 22nd of March 2024
--
Prof. Dr. Ekaterina Lapshinova-Koltunski
Geschäftsführende Direktorin
Institut für Übersetzungswissenschaft und Fachkommunikation
Fachbereich 3: Sprach und Informationswissenschaften
Stiftung Universität Hildesheim
Lübecker Straße 3
31141 Hildesheim
+49 5121 883-30934
I will start a new research group on natural language processing as part
of the Bamberg AI Center (https://www.uni-bamberg.de/en/bacai/). There
are currently four open positions:
We do fundamental NLP research at the intersection to computational
psychology, digital humanities, and computational social sciences.
We have currently four positions open (deadline February 28, 2024):
1. Postdoc, Open Topic (3 years)
2. PhD student in interactive prompt optimization (3 years)
3. Researcher in event-centered emotion analysis (1 year)
4. Researcher in multimodal emotion analysis (1 year)
Position 3+4 can be combined to have a 2-year position.
Please find more details at
https://www.bamnlp.de/openpositions/
Do not hesitate to contact me, if you have questions!
Roman Klinger
Dear all,
Some of you might be interested in LancsLex, a new free online tool developed at Lancaster University for the analysis of English vocabulary. It is available at https://lancslex.lancs.ac.uk/
It is based on recent research (2024) that led to the publication of the Frequency Dictionary of British English: Core Vocabulary and Exercises for Learners https://cass.lancs.ac.uk/words-words-words-a-new-frequency-dictionary-of-br…
Best,
Vaclav
Professor Vaclav Brezina
Professor in Corpus Linguistics
Department of Linguistics and English Language
ESRC Centre for Corpus Approaches to Social Science
Faculty of Arts and Social Sciences, Lancaster University
Lancaster, LA1 4YD
Office: County South, room C05
T: +44 (0)1524 510828
[cid:a6b1d92e-489d-4010-affb-663448b416a5]@vaclavbrezina
[cid:9f8ad673-48c0-498e-ad39-c5ade712437c]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…>