Apologies for cross-posting.
----------------------------------------
We invite proposals for tasks to be run as part of SemEval-2024
<https://semeval.github.io/SemEval2024/>. SemEval (the International
Workshop on Semantic Evaluation) <https://semeval.github.io/>is an ongoing
series of evaluations of computational semantics systems, organized under
the umbrella of SIGLEX <https://siglex.org/>, the Special Interest Group on
the Lexicon of the Association for Computational Linguistics.
SemEval tasks explore the nature of meaning in natural languages: how to
characterize meaning and how to compute it. This is achieved in practical
terms, using shared datasets and standardized evaluation metrics to
quantify the strengths and weaknesses and possible solutions. SemEval tasks
encompass a broad range of semantic topics from the lexical level to the
discourse level, including word sense identification, semantic parsing,
coreference resolution, and sentiment analysis, among others.
For SemEval-2024, we welcome any task that can test an automatic system for
the semantic analysis of text, which could be an intrinsic semantic
evaluation or an application-oriented evaluation. We especially encourage
tasks for languages other than English, cross-lingual tasks, and tasks that
develop novel applications of computational semantics. See the websites of
previous editions of SemEval to get an idea about the range of tasks
explored, SemEval-2022 <https://semeval.github.io/SemEval2022/> and
SemEval-2023 <https://semeval.github.io/SemEval2023/>.
We strongly encourage proposals based on pilot studies that have already
generated initial data, as this can provide concrete examples and can help
to foresee the challenges of preparing the full task. In the event of
receiving many proposals, preference will be given to proposals that have
already run a pilot study.
In case you are not sure whether a task is suitable for SemEval, please
feel free to get in touch with the SemEval organizers at
semevalorganizers(a)gmail.com to discuss your idea.
=== Task Selection ===
Task proposals will be reviewed by experts, and reviews will serve as the
basis for acceptance decisions. Everything else being equal, more
innovative new tasks will be given preference over task reruns. Task
proposals will be evaluated on:
- Novelty: Is the task on a compelling new problem that has not been
explored much in the community? Is the task a rerun, but covering
substantially new ground (new subtasks, new types of data, new languages,
etc.)?
- Interest: Is the proposed task likely to attract a sufficient number
of participants?
- Data: Are the plans for collecting data convincing? Will the resulting
data be of high quality? Will annotations have meaningfully high
inter-annotator agreements? Have all appropriate licenses for use and
re-use of the data after the evaluation been secured? Have all
international privacy concerns been addressed? Will the data annotation be
ready on time?
- Evaluation: Is the methodology for evaluation sound? Is the necessary
infrastructure available or can it be built in time for the shared task?
Will research inspired by this task be able to evaluate in the same manner
and on the same data after the initial task?
- Impact: What is the expected impact of the data in this task on future
research beyond the SemEval Workshop?
=== New Tasks vs. Task Reruns ===
We welcome both new tasks and task reruns. For a new task, the proposal
should address whether the task would be able to attract participants.
Preference will be given to novel tasks that have not received much
attention yet.
For reruns of previous shared tasks (whether or not the previous task was
part of SemEval), the proposal should address the need for another
iteration of the task. Valid reasons include: a new form of evaluation
(e.g. a new evaluation metric, a new application-oriented scenario), new
genres or domains (e.g. social media, domain-specific corpora), or a
significant expansion in scale. We further discourage carrying over a
previous task and just adding new subtasks, as this can lead to the
accumulation of too many subtasks. Evaluating on a different dataset with
the same task formulation, or evaluating on the same dataset with a
different evaluation metric, typically should not be considered a separate
subtask.
=== Task Organization ===
We welcome people who have never organized a SemEval task before, as well
as those who have. Apart from providing a dataset, task organizers are
expected to:
- Verify the data annotations have sufficient inter-annotator agreement
- Verify licenses for the data to allow its use in the competition and
afterwards. In particular, text that is publicly available online is not
necessarily in the public domain; unless a license has been provided, the
author retains all rights associated with their work, including copying,
sharing and publishing. For more information, see:
https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter
- Resolve any potential security, privacy, or ethical concerns about the
data
- Make the data available in a long-term repository under an appropriate
license, preferably using Zenodo: https://zenodo.org/communities/semeval/
- Provide task participants with format checkers and standard scorers.
- Provide task participants with baseline systems to use as a starting
point (in order to lower the obstacles to participation). A baseline system
typically contains code that reads the data, creates a baseline response
(e.g. random guessing, majority class prediction), and outputs the
evaluation results. Whenever possible, baseline systems should be written
in widely used programming languages and/or should be implemented as a
component for standard NLP pipelines.
- Create a mailing list and website for the task and post all relevant
information there.
- Create a CodaLab or other similar competition for the task and upload the
evaluation script.
- Manage submissions on CodaLab or a similar competition site.
- Write a task description paper to be included in SemEval proceedings, and
present it at the workshop.
- Manage participants’ submissions of system description papers, manage
participants’ peer review of each others’ papers, and possibly shepherd
papers that need additional help in improving the writing.
- Review other task description papers.
=== Important dates ===
- Task proposals due April 17, 2023 (Anywhere on Earth)
- Task selection notification May 22, 2023
=== Preliminary timetable ===
- Sample data ready July 15, 2023
- Training data ready September 1, 2023
- Evaluation data ready December 1, 2023 (internal deadline; not for public
release)
- Evaluation starts January 10, 2024
- Evaluation end by January 31, 2024 (latest date; task organizers may
choose an earlier date)
- Paper submission due February 2024
- Notification to authors on March 2024
- Camera-ready due April 2024
- SemEval workshop Summer 2024 (co-located with a major NLP conference)
Tasks that fail to keep up with crucial deadlines (such as the dates for
having the task and CodaLab website up and dates for uploading samples,
training, and evaluation data) may be cancelled at the discretion of
SemEval organizers. While consideration will be given to extenuating
circumstances, our goal is to provide sufficient time for the participants
to develop strong and well-thought-out systems. Cancelled tasks will be
encouraged to submit proposals for the subsequent year’s SemEval. To reduce
the risk of tasks failing to meet the deadlines, we are unlikely to accept
multiple tasks with overlap in the task organizers.
=== Submission Details ===
The task proposal should be a self-contained document of no longer than 3
pages (plus additional pages for references). All submissions must be in
PDF format, following the ACL template
<https://github.com/acl-org/acl-style-files>.
Each proposal should contain the following:
- Overview
- Summary of the task
- Why this task is needed and which communities would be interested in
participating
- Expected impact of the task
- Data & Resources
- How the training/testing data will be produced. Please discuss whether
existing corpora will be re-used.
- Details of copyright, so that the data can be used by the research
community both during the SemEval evaluation and afterwards
- How much data will be produced
- How data quality will be ensured and evaluated
- An example of what the data would look like
- Resources required to produce the data and prepare the task for
participants (annotation cost, annotation time, computation time, etc.)
- Assessment of any concerns with respect to ethics, privacy, or security
(e.g. personally identifiable information of private individuals; potential
for systems to cause harm)
- Pilot Task (strongly recommended)
- Details of the pilot task
- What lessons were learned and how these will impact the task design
- Evaluation
- The evaluation methodology to be used, including clear evaluation
criteria
- For Task Reruns
- Justification for why a new iteration of the task is needed (see
criteria above)
- What will differ from the previous iteration
- Expected impact of the rerun compared with the previous iteration
- Task organizers
- Names, affiliations, email addresses
- (optional) brief description of relevant experience or expertise
- (if applicable) years and task numbers, of any SemEval tasks you have
run in the past
Proposals will be reviewed by an independent group of area experts who may
not have familiarity with recent SemEval tasks, and therefore all proposals
should be written in a self-explanatory manner and contain sufficient
examples.
The submission webpage is:
https://openreview.net/group?id=aclweb.org/ACL/2023/Workshop/SemEval
=== Chairs ===
Atul Kr. Ojha, SFI Insight Centre for Data Analytics, DSI, University of
Galway
A. Seza Doğruöz, Ghent University
Giovanni Da San Martino, University of Padua
Harish Tayyar Madabushi, The University of Bath
Ritesh Kumar, Dr. Bhimrao Ambedkar University
Contact: semevalorganizers(a)gmail.com
* ***LREC-COLING 2024 Announcement****
_LREC-COLING 2024 - The 2024 Joint International Conference on
Computational Linguistics, Language Resources and Evaluation__
__Lingotto Conference Centre - Turin (Italy)__
__20-25 May, 2024_
*Conference website: https://lrec-coling-2024.lrec-conf.org/
*Twitter: @LrecColing2024
Two major international key players in the area of computational
linguistics, the ELRA Language Resources Association (ELRA) and the
International Committee on Computational Linguistics (ICCL), are joining
forces to organize the 2024 Joint International Conference on
Computational Linguistics, Language Resources and Evaluation
(LREC-COLING 2024) to be held in Turin (Italy) on 20-25 May, 2024.
The hybrid conference will bring together researchers and practitioners
in computational linguistics, speech, multimodality, and natural
language processing, with special attention to evaluation and the
development of resources that support work in these areas. Following in
the tradition of the well-established parent conferences COLING and
LREC, the joint conference will feature grand challenges and provide
ample opportunity for attendees to exchange information and ideas
through both oral presentations and extensive poster sessions,
complemented by a friendly social program.
The three-day main conference will be accompanied by a total of three
days of workshops and tutorials held in the days immediately before and
after.
*General Chairs*
Nicoletta Calzolari, CNR-ILC, Pisa
Min-Yen Kan, National University of Singapore
*Advisors to General Chairs*
Chu-Ren Huang, The Hong Kong Polytechnic University
Joseph Mariani, LISN-CNRS, Paris-Saclay University
*Programme Chairs*
Veronique Hoste, Ghent University
Alessandro Lenci, University of Pisa
Sakriani Sakti, Japan Advanced Institute of Science and Technology
Nianwen Xue, Brandeis University
*Management Chair*
Khalid Choukri, ELDA/ELRA, Paris
*Local Chairs*
Valerio Basile, University of Turin
Cristina Bosco, University of Turin
Viviana Patti, University of Turin
Job advertisement!
TurkuNLP (Natural Language Processing) is a multidisciplinary research group combining NLP and digital linguistics. We develop machine learning methods and tools to automatically process and understand text data and apply these to explore human interaction, communication and language use in very large digital text datasets such as those automatically crawled from the internet and historical text collections.
We invite applications for post doctoral researcher positions. The postdocs recruited will work within our research projects on web-as-corpus research, corpus linguistics and NLP on topics such as human diversity, multilingual modeling of web genres (registers), and semantic search.
For more details and to leave an application, please see job ID 14647 at https://www.utu.fi/en/university/come-work-with-us/open-vacancies and visit our websites at turkunlp.org and https://sites.utu.fi/humandiversity/. I am also happy to answer any questions you might have, please don't hesitate to contact me!
The postdocs are expected to begin their employment 1st of May 2023 or as soon as possible based on agreement.
Best regards,
Veronika Laippala
Dear list members,
I am delighted to announce the latest publication in the Elements in Corpus Linguistics series, published by Cambridge University Press. The title is "Corpus-Assisted Discourse Studies", and the authors are Mathew Gillings, Gerlinde Mautner and Paul Baker. This Element is now available FREE until 4 April 2023 at the following URL:
https://www.cambridge.org/core/search?q=9781009168151
Here is a summary of the Element:
"The breadth and spread of corpus-assisted discourse studies (CADS) indicate its usefulness for exploring language use within a social context. However, its theoretical foundations, limitations, and epistemological implications must be considered so that we can adjust our research designs accordingly. This Element offers a compact guide to which corpus linguistic tools are available and how they can contribute to finding out more about discourse. It will appeal to researchers both new and experienced, within the CADS community and beyond."
Best wishes
Susan Hunston (Series Editor)
Professor Susan Hunston (she/her)
Department of English Language and Linguistics
University of Birmingham
Birmingham B15 2TT
UK
(+44) 0121 414 5675
s.e.hunston(a)bham.ac.uk
Hola Luis
¿qué tal?
Acabo de ver en Corpora-list que estás a tope con temas de chatbots.
A lo mejor ya te ha llegado la info: estamos organizando una tarea que
puede que os pueda interesar.
A ver si participas ;-)
Saludos
Paolo
-----
*Apologies for cross-posting*
Do you believe machine generated text is becoming an issue? Are you
interested in boosting research to automatically detect machine
generated text? 🤖👩🏻
We cordially invite all researchers and practitioners from all fields
to participate in the AuTexTification task. If interested, register
yourself in the shared task through this link: https://lnkd.in/dzBZsYiD
Once registered and training phase started, the datasets will be sent
to your email along with a password. Look for more information
regarding task description, schedules, or submissions through the
Autextification web page: https://sites.google.com/view/autextification
More information on the shared task
The new era of automatic content generation has surged through
powerful causal language models like GPT, PALM, or Bloom that can be
used to spread untruthful news, human-looking reviews, or opinions.
Thus, it is imperative to develop technology to automatically detect
generated text for content moderation and to attribute generated text
to specific models to protect intellectual property or to distill
responsibilities. In this context, we propose the “Automatic Text
Identification” (AuTexTification) shared task, to boost research and
development of automatic systems to detect automatically generated
text, obtained by state-of-the-art language models, in English and
Spanish.
We propose two subtasks: (i) Human or Generated, where given a
text participants will have to determine whether a text has been
automatically generated or not; and (ii) Model Attribution, where
participants will have to determine what model generated a text. The
generation models used to generate the text are of increasing number
of neural parameters, ranging from 2 to 175 billion, meaning that
participants' systems should be versatile enough to detect a diverse
set of text generation models and writing styles.
In the training phase, participants will be provided with two
partitions for subtask 1, i.e., English and Spanish partitions, with
binary labels 👩🏻 and 🤖. Similarly, a partition per language will be
released for subtask 2. It will include six labels (A, B, C, D, E, and
F), each label representing a text generation model. Later, the
unlabeled test data will be released.
Important Dates
March 22, 2023: Release of training data
April 21, 2023: Release of test data
May 10, 2023: Participant system results submission
May 17, 2023: Results notification
June 3, 2023: Paper submission
June 16, 2023: Paper peer-reviewed
July 4, 2023: Camera-ready paper version
September 26, 2023: Conference
Task organizers
José Ángel González (Symanto) Contact Email: jose.gonzalez(a)symanto.com
Areg Sarvazyan (Symanto) Contact Email: areg.sarvazyan(a)symanto.com
Marc Franco-Salvador (Symanto)
Francisco Rangel (Symanto)
Berta Chulvi (Universitat Politècnica de València)
Paolo Rosso (Universitat Politècnica de València)
Please reach out to the organizers or join the Slack workspace to
connect with the other participants and organizers:
https://lnkd.in/di_zaMHf
The Digital Linguistics Lab at Bielefeld University (head: JProf. Dr.-Ing. Hendrik Buschmeier) is seeking to fill a research position (PhD-student, E13 TV-L, 100%, fixed-term) in the area of multimodal human-robot interaction in the research project “Hybrid Living”.
Join us to work in an interdisciplinary team on research questions in the intersection of human-robot interaction and computational linguistics. Specifically, you will work (1) on the use of multimodal communication (verbal and nonverbal) to situatively instruct a service robot, (2) on making the robot's behaviour transparent to its users, and (3) on models for solving human-robot interaction problems through communication.
The formal job advertisement, with information on how to apply, can be found here: https://uni-bielefeld.hr4you.org/job/view/2265/research-position-in-multimo…
Questions? Don't hesitate to get in touch: hbuschme(a)uni-bielefeld.de
Hendrik Buschmeier
--
JProf. Dr.-Ing. Hendrik Buschmeier
Digital Linguistics Lab
Faculty of Linguistics and Literary Studies, Bielefeld University
https://purl.org/net/hbuschme
The Universidad Politécnica de Madrid (UPM, Spain) is pleased to announce
the following Three full Ph.D. scholarships to work on the following
topics:
- Personalization of DNN-based generative conversational systems
(chatbots) <https://euraxess.ec.europa.eu/jobs/82169>
- Acoustic environment awareness and automatic dialogue evaluation for
chatbots <https://euraxess.ec.europa.eu/jobs/82175>
- Multimodal task-oriented conversational systems and adaptation with
human feedback <https://euraxess.ec.europa.eu/jobs/82183>
The UPM is the largest Spanish technological university as well as a
renowned European institution. With two recognitions as Campus of
International Excellence, it is outstanding in its research activity
together with its training of highly-qualified professionals, competitive
at an international level. These three Ph.D scholarships are supported by
the European Commission through Project ASTOUND
<https://astound-project.eu/> (101071191 -
HORIZON-EIC-2021-PATHFINDERCHALLENGES-01). For information about our school
<http://etsit.upm.es/>and research group <https://blogs.upm.es/gthau> check
the corresponding link.
**** Prerequisites:*
- A Master's degree in computational linguistics, computer science,
telecommunications or alike, graded with success and corresponding
knowledge in the field.
- Candidates should have a fluent competency in written and spoken English
(Spanish is a plus)
- Good communication and team work skills
- Good programming skills (preferably Python)
**** Additional desired qualifications*
- Knowledge and/or experience in training deep neural networks using
different frameworks (Pytorch or Tensorflow).
- Knowledge and/or experience of natural language processing, machine
learning or speech technologies.
- Experience in writing scientific papers and /or participating in
international challenges.
**** What we offer:*
Three full scholarship for 3.5 years including €21k gross salary per annum
(2023 rate) and health coverage, the possibility of attending national and
international conferences, personal and professional advanced training
courses, management and career coaching, the possibility of working for a
European research project, and the opportunity to live and work in one of
the most attractive cities in the world (Madrid). The start date is
negotiable.
**** Application:*
If you are interested and have related background please send the following
documents to luisfernando.dharo(a)upm.es:
- Your CV
- Academic transcripts (both undergraduate and master)
- 2 paragraphs describing your research interests and background
- Most relevant publication, if any, or Master thesis (or equivalent)
Should you have questions or would like to discuss further details, please
get in touch.
The School of Informatics (https://www.ed.ac.uk/informatics) at the
University of Edinburgh is hiring a lecturer/reader in Computational
Social Science. The position is permanent. The successful candidate is
expected to take part at the social media analysis group
(SMASH, https://smash.inf.ed.ac.uk/) at Edinburgh, and work closely
with the social and political science school. In addition, they are
expected to develop a new course for undergraduates on computation social
science and teach it.
The application deadline is 25th of April 2023.
More details and application is in the following link:
https://elxw.fa.em3.oraclecloud.com/hcmUI/CandidateExperience/en/sites/CX_1…
If you have any enquiries, feel free to get in touch with Walid Magdy
(wmagdy(a)inf.ed.ac.uk, https://homepages.inf.ed.ac.uk/wmagdy).
Dear all,
On behalf of the voting members of the European Summer University in
Digital Humanities 2023 Evaluative Committee, I am pleased to announce
that the Babes-Bolyai University in Cluj-Napoca, Romania has been
selected as the future host for ESU 2023-2025. We are thrilled that
Christian Schuster, the director of the Transylvania Digital
Humanities Center there, and Alexandra Cotoc, a former ESU community
member, will join an impressive team from across the university to
provide the ESU with a new intellectual home.
The Evaluative Committee had the difficult task of selecting from an
extremely competitive pool of candidates, a testament to the strength
of the ESU over the years and to Elisabeth Burr’s extraordinary
leadership. I thank each one of the committee members for their time,
their thoughtful insights, and the collegiality with which they
approached this process.
We hope that you will encourage your colleagues and your students to
join us in Cluj-Napoca this summer for the next chapter in the
European Summer University in Digital Humanities.
Carol
CAROL CHIODO, Ph.D.
Librarian for Collections and Digital Scholarship | Americas, Europe,
and Oceania Division (AEOD)
carol_chiodo(a)harvard.edu | profile -->
ORCID orcid.org/0000-0002-6424-3445[1]
HARVARD LIBRARY
Champions of curiosity for the betterment of the world
library.harvard.edu
HARVARD UNIVERSITY
Acknowledgment of land and people of the Massachusett Tribe
Links:
------
[1] http://orcid.org/0000-0002-6424-3445
[APOLOGIES FOR MULTIPLE POSTINGS]
SEBD 2023 Doctoral Consortium - Last Call for Papers
====================================================================
Important dates
Doctoral Consortium Submission Deadline: Friday, March 31, 2023 (AoE)
Papers Notification: Wednesday, April 26, 2023 (AoE)
Camera-Ready Submission Deadline: Thursday, June 01, 2023 (AoE)
Doctoral Consortium Day: Sunday, July 02, 2023
Submission Link: https://cmt3.research.microsoft.com/SEBD2023/
=================================================
The SEBD 2023 Doctoral Consortium will take place in a dedicated session
during the 31st Italian Symposium on Advanced Database Systems (SEBD
2023), Galzignano
Terme, Padova (Italy), July 02-05, 2023, http://sebd2023.dei.unipd.it/.
The goal is to provide a forum for PhD candidates to present their ongoing
research and receive feedback from renowned and experienced members of the
research community. The Consortium fosters a collaborative environment,
encouraging constructive discussions and sharing of ideas. It will be an
excellent opportunity for developing person-to-person networks to the
benefit of the PhD students in their future careers – as well as of the
community.
Submissions from students who are in the early stages of their research
should provide a clear description of the problem to be addressed and the
planned methodology. Submissions from students who are in the middle or
final stages of their PhD research should clearly indicate the
contributions made to date and future work directions.
Each doctoral symposium paper must be single-authored by a current PhD
student or a PhD student who submitted the thesis between September and
December 2022. The paper should be written in English and must be 6-7 pages
long, including selected references. Submissions must be formatted in PDF,
prepared in CEUR-ART Column 1 Style (http://ceur-ws.org/Vol-XXX/CEURART.zip),
and submitted electronically via the submission system:
https://cmt3.research.microsoft.com/SEBD2023/
Submissions will be reviewed by the Doctoral Consortium Program Committee
(appointed by the Doctoral Consortium Chairs). All papers will be reviewed
with respect to the overall presentation quality, the potential for the
future impact of the research on the field, and the expected benefit to the
other doctoral students attending the conference. The accepted papers will
be published as part of the SEBD 2023 proceedings on WS-CEUR.org and
indexed in Scopus, DBLP and Google Scholar.
=================================================
Topics
The SEBD Symposium and its Doctoral Consortium cover a broad range of
topics, including traditional database management, as well as new
challenges for data management in any possible domain. Suggested topics
include (but are not limited to) the following ones:
-
Big Data and Smart Computing;
-
Data integration, Heterogeneous and Federated DBMS;
-
Data mining, knowledge discovery, information extraction, and machine
learning;
-
Data visualization;
-
Data warehousing;
-
Distributed and parallel databases;
-
Grid, peer-to-peer databases, and Cloud Computing;
-
Incompleteness, inconsistency, and other aspects of data quality;
-
Uncertainty in databases;
-
Ethical problems posed by Big Data Analysis;
-
Keyword-based and natural language access to structured, semistructured,
and unstructured data;
-
Knowledge representation and reasoning;
-
Ontology-based data management;
-
Privacy, security and trust management;
-
Query processing and optimization, approximate query answering;
-
Real-time, embedded, sensor, and mobile databases;
-
Scientific and Statistical Databases;
-
Semantic Web and Open Linked data;
-
Social networks and Graph databases;
-
Transaction and workflow management, interoperability and Web services.
=================================================
Contact
For any questions regarding Doctoral Consortium submissions, please email
the Doctoral Consortium Chairs:
-
Letizia Tanca (letizia.tanca(a)polimi.it)
-
Stefano Marchesin (stefano.marchesin(a)unipd.it)
--
Stefano Marchesin, PhD
Postdoctoral Researcher
Information Management Systems (IMS) Group
Department of Information Engineering
University of Padua
Via Gradenigo 6/a, 35131 Padua, Italy
Home page: http://www.dei.unipd.it/~marches1/