> [Apologies for cross-posting]
>
> ======================================================================
> CALL FOR PAPERS - SIMBig 2023
> ======================================================================
>
> SIMBig 2023 - 10th International Conference on Information Management and Big Data
> Where: Tecnológico de Monterrey, Mexico City, Mexico
> When: August 30 - September 01, 2023
> Website: https://simbig.org/SIMBig2023/ <https://simbig.org/SIMBig2023/>
>
> ======================================================================
>
> OVERVIEW
> ----------------------------------
>
> SIMBig 2023 seeks to present new methods of Artificial Intelligence (AI), Data Science, and related fields, for analyzing, managing, and extracting insights and patterns from large volumes of data.
>
>
> KEYNOTE SPEAKERS
> ----------------------------------
>
> Mona Diab, Meta AI, USA
> Carlos Coello, TEC Monterrey, Mexico
> Finale Doshi-Velez, Harvard University, USA
> Huan Liu, Arizona State University, USA
>
> IMPORTANT DATES
> ----------------------------------
>
> June 24, 2023 --> Full papers and short papers due
> July 28, 2023 --> Notification of acceptance
> August 11, 2023 --> Camera-ready versions
> August 30 - September 01, 2023 --> Conference held in Mexico DF, Mexico
>
> PUBLICATION
> ----------------------------------
>
> All accepted papers of SIMBig 2023 (tracks including) will be published with Springer CCIS Series <https://www.springer.com/series/7899>
> Best papers of SIMBig 2023 (tracks including) will be selected to submit an extension to be published in the Springer SN Computer Science Journal. <https://www.springer.com/journal/42979>
>
> TOPICS OF INTEREST
> ----------------------------------
>
> SIMBig 2023 has a broad scope. We invite contributions on theory and practice, including but not limited to the following technical areas:
>
> Artificial Intelligence
> Data Science
> Machine Learning
> Natural Language Processing
> Semantic Web
> Healthcare Informatics
> Biomedical Informatics
> Data Privacy and Security
> Information Retrieval
> Ontologies and Knowledge Representation
> Social Networks and Social Web
> Information Visualization
> OLAP and Business intelligence
> Data-driven Software Engineering
>
> SPECIAL TRACKS
> ----------------------------------
>
> SIMBig 2023 proposes 5 special tracks in addition to the main conference:
>
> SNMAM <https://simbig.org/SIMBig2023/en/snmam.html> - Social Network and Media Analysis and Mining
> ANLP - Applied Natural Language Processing
> CIIN - Cybersecurity and IoT for Intelligent Networks
> DISE - Data-drIven Software Engineering
> EE-AI-HPC - Efficiency Enhancement for AI and High-Performance Computing
>
> CONTACT
> ----------------------------------
>
> SIMBig 2023 General Chairs
>
> Juan Antonio Lossio-Ventura, National Institutes of Health, USA (juan.lossio(a)nih.gov <mailto:juan.lossio@nih.gov>)
> Hugo Alatrista-Salas, Pontificia Universidad Católica del Perú, Peru (halatrista(a)pucp.pe <mailto:halatrista@pucp.pe>)
>
International Conference on Human-Informed Translation and Interpreting Technology (HiT-IT 2023)
Naples, Italy, 7, 8 and 9 July 2023
*** SUBMISSION DEADLINE EXTENDED TO 30 APRIL 2023***
The International Conference on Human-Informed Translation and Interpreting Technology (HiT-IT 2023) will take place in Naples, Italy between 7 and 9 July 2023. The conference will be preceded by tutorials on 6 July 2023.
HiT-IT seeks to act as a meeting point for (and invites) researchers working in translation and interpreting technologies, practicing technology-minded translators and interpreters, companies and freelancers providing services in translation and interpreting as well as companies developing tools for translators and interpreters. In addition to the accepted papers for presentation, HiT-IT will feature invited talks by prominent experts as well as presentations and panels hosted by practitioners.
For more details and for the main conference topics please visit the conference website
http://hit-it-conference.org/
Submissions and publication
The conference invites the following types of submissions reporting original unpublished work.
User papers for industry and practitioners ranging between 2 and 4 pages (without references). References to related work are optional.
Academic submissions, in three different categories (have to follow formatting requirements, references to related work are required):
• (academic) full papers: describing original completed research. Allowed paper length: maximum 12 pages (without references).
• (academic) work-in-progress papers – describing work in progress, late breaking research, papers at a more conceptual stage, and other types of papers that do not fit in the ‘full’ papers category. Allowed paper length: maximum 7 pages (without references).
• (academic) demo papers – describing working systems. Allowed paper length: maximum 5 pages (without references). In addition to the papers, the authors will be expected to demonstrate the systems at the conference.
The conference will not consider the submission and evaluation of abstracts only.
The accepted papers will be published in the conference proceedings and made available online on the conference website. We plan to invite the authors of the best papers to submit extended versions to a special issue of a prestigious journal.
Important dates
Submission deadline: 30 April 2023
Notification of acceptance: 31 May 2023
Final version due: 10 June 2023
Early fee deadline: 20 June 2023
Conference dates: 7, 8 and 9 July 2023
Tutorials: 6 July 2023
Keynote speakers
Jochen Hummel (Coreon)
Tharindu Ranasinghe (Aston University)
Invited tutorials
Felix do Carmo (University of Surrey): Neural Machine Translation
Alina Karakanta (Leiden University Centre for Linguistics): Automatic subtitling
Conference Chairs
Gloria Corpas Pastor (University of Malaga)
Ruslan Mitkov (University of Wolverhampton)
Johanna Monti (University of Naples L’Orientale)
Constantin Orasan (University of Surrey)
Organising Committee
Dayana Abuin Rios (University of Malaga)
Khadija Ait Elqih (University of Naples l’Orientale)
Anastasia Bezobrazova (University of Malaga)
Meriem Boulekhoukh (University of Oran)
Rocío Caro Quintana (University of Wolverhampton)
Amal El Farhmat (University of Malaga)
Lilit Kharatian (University of Malaga)
Alfiya Khabibullina (University of Malaga)
Nikolai Nikolov (INCOMA Ltd.)
Daria Sokova (New Bulgarian University)
Giulia Speranza (University of Naples l’Orientale)
Sponsors
Pangeanic, El-Translations and Juremy are the official sponsors of the conference.
Venue
The conference will take place at the Palazzo del Mediterraneo, University of Naples
Further information and contact details
Registration for HiT-IT 2023 is now open. To register, please complete the registration form.
See the conference website (http://hit-it-conference.org/home ) for more details; you can also email 2023(a)hit-it-conference.org<mailto:2023@hit-it-conference.org>.
The deadline is approaching, April 25, 2023
*IACT’23: Human or AI? Calling for research papers on implicit authorship
disambiguation in IR *
*Call for Papers: The 1st International Workshop on Implicit Author
Characterization from Texts for Search and Retrieval (IACT’23) *
The workshop will be held in conjunction with the 46th International ACM
SIGIR Conference on Research and Development in Information Retrieval
Workshop website: https://en.sce.ac.il/news/iact23
July 27, 2023. Taipei, Taiwan.
*Paper submission deadline: April 25, 2023, AoE*
Submission link: https://easychair.org/conferences/?conf=iact23
To bring the research community's attention to the limitations of current
models in recognizing and characterizing AI vs. human authors, we organize
the first edition of IACT workshops under the umbrella of the SIGIR
conference. Research works submitted to the workshop should foster
scientific advances in all aspects of author characterization.
All papers must be original and not simultaneously submitted to another
journal or conference. The following paper categories are welcome:
- *Full research papers*: up to 8 pages. Original and high-quality
unpublished contributions to the theory and practical aspects of the
workshop topics.
- *Short research* *papers*: up to 5 pages. It can describe ongoing
research, resources, and demos.
- *Negative results* *papers*: up to 5 pages. Highlighting tested
hypotheses that did not get the expected outcome is also welcomed.
- *Position papers*: up to 5 pages. Discussing current and future
research directions.
The length constraints do not include references.
The submissions must be anonymous and will be peer-reviewed by at least two
program committee members.
The authors of accepted papers will be given 15 minutes for a short oral
presentation. The workshop will run as a hybrid event to allow virtual
attendance and meet the SIGIR format.
Research works submitted to the workshop should foster the scientific
advance on all aspects of implicit author information extraction from text,
including but not limited to the following:
- Differentiation between AI-generated content and human-generated
content and bot profiling
- Characterization of conversational agents
- Feature detection of authors for human vs. AI determination
- Prompt understanding and recognition in language models
- Personalized question answering and conversation generation
- Troll identification on social media
- Review authenticity estimation
- Multi-modal, multi-genre, and multilingual author analysis
- Character analysis, description, and representation in narrative texts
- Detecting implicit expressions of sentiment, emotion, opinion, and bias
- Transfer learning for implicit author characterization
- Implicit author characterization annotation schema
- Evaluation of implicit author characterization
- Author characterization in low-resource languages and under-studied
domains
- Accountability and regulation of AI-based information extraction,
retrieval, and content generation
- Copyright issues of AI-generated content
- Ethical and privacy implications of author characterization and
implicit information extraction
- Fairness and bias of AI-generated content
Organizing Committee:
- Marina Litvak - marinal(a)ac.sce.ac.il; Shamoon College of Engineering
Beer Sheva; Israel
- Irina Rabaev - irinar(a)ac.sce.ac.il; Shamoon College of Engineering
Beer Sheva; Israel
- Alípio Mário Jorge - amjorge(a)fc.up.pt; University of Porto; Porto,
Portugal
- Ricardo Campos - ricardo.campos(a)ipt.pt; Polytechnic Institute of Tomar
INESC TEC, Portugal; Porto, Portugal
- Adam Jatowt - adam.jatowt(a)uibk.ac.at; University of Innsbruck;
Innsbruck, Austria
Invited Speakers:
- Prof. Mark Last - Ben-Gurion University of the Negev, Israel
- Prof. Dr. Valia Kordoni - Humboldt-Universität Berlin, Germany
Contact:
- Dr. Marina Litvak: litvak.marina(a)gmail.com
- Dr. Irina Rabaev: irinar(a)ac.sce.ac.il
--
Best regards,
Marina Litvak
when it comes to corpora research the response time to queries such as:
* what is the character on the nth offset of a file
* which ones are all other characters preceding and proceeding that
one by m offsets or up to a certain char or pattern ...
* what is the intra- and inter-textuality of a given segment of characters
. . .
and many other related ones, should be "zero comma nada" (they should
run instantly), but I think this is virtually impossible because texts
these says (say, PDF files) are, basically, visually appealing
containers of streams of data displayed by rendering engines; HTML
files contain all that javascript cr@p, google goo, ads, insufferably
idiotic "we care about your privacy" road blocks, ...
I haven't found a convincing explanation as to why that is the case,
but I can't quite understand why is it that the MVC pattern is well
understood when it comes to software design, but people can't
apparently fully separate the text from its presentation when it comes
to documents.
"Web as corpus" folks:
https://www.researchgate.net/publication/276511711_Maristella_Gatto_Web_as_…
don't even attempt to address those issues. At the end of the day as
Borges said:
" ... el nombre es arquetipo de la cosa en las letras de 'rosa' está
la rosa y todo el Nilo en la palabra 'Nilo'"
so, let's get down to first manage to get one character in a text
after the other ...
lbrtchx
International workshop NLP for translation and interpreting applications (NLP4TIA)
Varna, Bulgaria 7/8 September 2023
https://nlp4tia.web.uah.es/
First Call for Papers
In the last two decades we have been able to witness a technological turn in translation and interpreting studies with Natural Language Processing (NLP) and deep learning playing more and more prominent part. There is already a growing number of NLP applications which are used to support the work of translators and interpreters. In addition, the recent advances in (and latest models of) deep learning have powered the further development and success of high performing Neural Machine Translation (NMT) systems.
Translation technology has revolutionised the translation profession and nowadays most professional translators employ tools such as translation memory (TM) systems in their daily work. Latest advances of Neural Machine Translation (NMT) have resulted in NMT not only becoming an integral part of most state-of-the art TM tools but also typical for the translation workflow of many companies, organisations and freelance translators.
Although translation has benefited more from technological advances, interpreting has also experienced a technological turn. However, it has not been until some years ago that soft technology has permeated interpreting practice and research. Computer assisted translation, MT and NLP tools have been adapted to be used by interpreters. In addition, corpus-based studies have also underpinned dialogue interpreting.
The increasing interest in NLP, MT and the automation of processes has brought us to multidisciplinary projects that deal with the development of models for automated oral communication. Machine interpreting has already been developed and is being improved, focusing on speed and accuracy matters. Either domain-specific (commercial, military, humanitarian) or general (Skype Translator), there is still a long way to go to render machine interpreting more human-like.
Many of the above recent developments have to do with the employment of Natural Language Processing tools and resources to support the work of translators and interpreters. This workshop is expected to discuss the growing importance of NLP in different translation and interpreting scenarios.
Workshop topics
The workshop invites submissions reporting original unpublished work on topics including but not limited to:
- NLP and MT for under-resourced languages;
- Translation Memory systems;
- NLP and MT for translation memory systems;
- NLP for CAT and CAI tools;
- Integration of NLP tools in remote interpreting platforms;
- NLP for dialogue interpreting;
- Development of NLP based applications for communication in public service settings (healthcare, education, law, emergency services);
- Corpus-based studies applied to translation and interpreting;
- Machine translation and machine interpreting;
- Resources for translation and machine translation;
- Resources for interpreting and interpreting technology application;
- Quality estimation of human and machine translation;
- Post-editing strategies and tools;
- Automatic post-editing of MT;
- NLP and MT for subtitling.
- Technology acceptance by interpreters and translators;
- Machine Translation and translation tools for literary texts;
- Evaluation of machine translation and translation and interpreting tools in general;
- The impact of the technological turn in translation and interpreting;
- Cognitive effort and eye-tracking experiments in translation and interpreting;
- Development of models for research and practice of translation and interpreting;
- Multidisciplinary cooperation in NLP applied to translation and interpreting.
Submissions and publication
Submissions must consist of full-text papers and should not exceed 7 pages excluding references, they should be a minimum of 5 pages long. The accepted papers will be published as NLP4TIA workshop e-proceedings with ISBN, will be assigned a DOI and will be also available at the time of the conference. The papers should be in English and should be submitted via the conference management system START using this link.
Authors of accepted papers will receive guidelines regarding how to produce camera-ready versions of their papers for inclusion in the proceedings.
Each submission will be reviewed by at least two programme committee members. Accepted papers will be presented orally as part of the programme of the workshop.
Submissions should be compliant with the below templates and should be uploaded as pdf files in START (START is configured to accept pdf files only). The following templates should be used: LaTeX at Overeaf, LaTeX , MS Office
Important dates
Deadline for paper submission: 10 July 2023
Acceptance notification: 5 August 2023
Final camera-ready version: 25 August 2023
Workshop camera-ready proceedings ready: 31 August 2023
NLP4TIA workshop: 7/8 September 2023
Workshop Chairs
Raquel Lázaro Gutiérrez (University of Alcala)
Antonio Pareja Lora (University of Alcala)
Ruslan Mitkov (University of Wolverhampton)
Programme Committee
Cristina Aranda (Big Onion)
Juanjo Arevalillo (Hermes Traducciones)
Silvia Bernardini (University of Bologna)
Gabriel Cabrera Méndez (Dualia Teletraducciones)
Matt Coler (University of Groningen)
Elena Davitti (University of Surrey)
Joanna Drugan (Heriot-Watt University)
Marie Escribe (LanguageWire)
Claudio Fantinuoli (Mainz University/KUDO Inc
Antonio García Cabot University of Alcala)
Adriana Jaime Pérez (Migralingua Voze)
Miguel Ángel Jiménez Crespo (Rutgers University)
Óscar Luis Jiménez Serrano (University of Granada)
Koen Kerremans (Free University Brussel)
Maria Kunilovskaya (Saarland University)
Els Lefever (Ghent University)
Pilar León Arauz (University of Granada)
Johanna Monti (University of Naples L'Orientale)
Elena Montiel Ponsoda (Plytecnic University Madrid)
Helena Moriz (University of Lisbon)
Elena Murgolo (Orbital 14)
Dora Murgu (Interprefy)
Constantin Orasan (University of Surrey)
María Teresa Ortego Antón (University of Valladolid)
Tharindu Ranasinghe (Aston University)
Celia Rico (Universidad Complutense de Madrid)
Caroline Rossi (University Grenoble les Alpes)
María del Mar Sánchez Ramos (Universiity of Alcala)
Miriam Seghiri (University of Malaga)
Vilelmini Sosoni (Ionian University)
Rui Manuel Sousa Silva (University of Porto)
Nicoletta Spinolo (University of Bologna)
Venue
The workshop will take place at hotel Cherno More in Varna.
Further information and contact details
Registration for NLP4TIA is now open and is done via the RANLP main conference page. To register, please complete the registration form.
The conference website (https://nlp4tia.web.uah.es/ ) will be updated on a regular basis. For further information, please email nlp4tia(a)uah.es<mailto:nlp4tia@uah.es>.
###################
########## Call for Papers
######## Special Session
###### EnGeoData'2023: Geospatial data analysis under the umbrella of One Health
#### https://simbig.org/engeodata/2023 <https://simbig.org/engeodata/2023>
###
## IEEE DSAA 2023
# The 10th IEEE International Conference on Data Science and Advanced Analytics
# October 9-13, 2023, Thessaloniki, Greece
### AIMS AND TOPICS
1. Abstract
Current context of urbanization, globalization, high mobility/trade, and climate change amid the health domain favors the (re-) emergence of known and unknown diseases. Thus, geospatial and environmental data analysis for One Health is crucial to provide insights into the connections between humans, animals, and environment. This type of analysis allows us to identify and monitor health issues that arise due to the interactions between these three areas. However, it is challenging due to: (1) the multi-modality of the data (e.g., unstructured, imaging, semantic, spatial, temporal, among others); and (2) the difficulty in choosing the "most appropriate” knowledge discovery process according to specific field needs (e.g., animal, plant or human health; crisis and disaster surveillance).
EnGeoData 2023 aims to provide high quality research facing the challenges mentioned above with theoretical and/or experimental approaches.
2. Topics
Topics of interest include (but are not limited to):
- Pre and post processing of environmental data
- Geographical information retrieval
- Spatial data mining, spatial data warehousing, and spatial data lake
- Knowledge discovery use-cases applied to environmental data
- Spatial text mining
- Spatial ontology
- Spatial recommendation and personalization
- Visual analytics for geo-spatial data
- Dedicated applications:
* Spatio-temporal analytics platform
* Agricultural decision support systems
* Urban traffic systems
* Trajectory analysis
* Land-use and urban policies
* Land-use and urban planning analysis
* Spatio-temporal analysis in ecology and agriculture
* Disease surveillance systems (One Health)
### SUBMISSION
All papers should be submitted electronically via EasyChair Submissions: https://easychair.org/my/conference?conf=dsaa2023 <https://easychair.org/my/conference?conf=dsaa2023> under the “Special Session” Track
- Paper Submission Deadline: May 22, 2023
- Paper Notification: July 17, 2023
- Paper Camera Ready Due: August 7, 2023
The length of each paper submitted to the special session should be no more than ten (10) pages and should be formatted following the standard 2-column U.S. letter style of IEEE Conference template. For further information and instructions, see the IEEE Proceedings Author Guidelines.
All submissions will be blind reviewed by the Program Committee on the basis of technical quality, relevance to the session’s topics of interest, originality, significance, and clarity. Author names and affiliations must not appear in the submissions, and bibliographic references must be adjusted to preserve author anonymity. Submissions failing to comply with paper formatting and authors anonymity will be rejected without reviews.
### CHAIRS
- Mathieu Roche, CIRAD, TETIS, France
- Antonio Lossio-Ventura, National Institutes of Health, USA
- Hamid Laga, Murdoch University, Australia
- Maguelonne Teisseire, INRAE, TETIS, France
For questions, please contact us at engeodata(a)teledetection.fr <mailto:engeodata@teledetection.fr>
AmericasNLP 2023 has extended its paper submission deadline. The new
deadline is: *April 22, 2023*. (The original deadline was April 15.) More
information below!
The Third Workshop on NLP for Indigenous Languages of the Americas (
AmericasNLP 2023)
The Third Workshop on NLP for Indigenous Languages of the Americas (
AmericasNLP) will be co-located with the 61st Annual Meeting of the
Association for Computational Linguistics (ACL 2023
<https://2023.aclweb.org/>), which is scheduled to be held in Toronto,
Canada, between July 9-14, 2023.
The goal of the workshop is to encourage and increase the visibility of
work on the Indigenous languages of the Americas. It aims to encourage
research on NLP, computational linguistics, corpus linguistics and speech
for Indigenous languages, to connect researchers and professionals from
underrepresented communities and native speakers of endangered languages
with the ACL community, and, more generally, to promote machine learning
approaches suitable for low-resource languages.
We invite the submission of
-
Long papers (8 pages) and short papers (4 pages) on substantial,
original, and unpublished research
-
Non-archival extended abstracts (2 pages), technical reports (8 pages),
and work which has been presented at other venues (in the format of the
original publication)
Submissions do not need to describe work on native languages directly, as
long as it is clear why those can benefit from the described approaches.
Areas of interest include but are not limited to:
-
Creation of datasets for NLP applications
-
Incorporation of external knowledge into neural systems
-
Linguistic typology and the use of typological features for NLP
-
Transfer learning, meta-learning, and active learning
-
Weakly supervised, semi-supervised, and unsupervised learning
-
Machine translation of low-resource languages
-
Morphology and phonology of low-resource languages
-
NLP applications for Indigenous languages of the Americas
Important dates:
-
Start of the anonymity period: March 15, 2023
-
Submission deadline: April 15, 2023 *April 22, 2023*
-
Notification of acceptance: May 15, 2023
-
Camera ready papers due: May 26, 2023
-
Workshop: July 14, 2023
All deadlines are 11.59 pm UTC -12h (anywhere on earth).
Link to submission portal:
https://softconf.com/acl2023/AmericasNLP2023/
The workshop also includes:
-
A machine translation shared task on truly low-resource languages
-
A mentoring program to support students and newcomers from
underrepresented communities (application form:
https://forms.gle/afBWauDfDQijXHTy9)
We also have a diverse set of invited speakers, focused on bridging the gap
between linguists, NLP, and machine learning research!
-
Steven Bird (linguistics; ethics)
-
Angela Fan (NLP; machine translation)
-
Kristine Stenzel (field linguistics; American Indigenous languages)
Organizing Committee
-
Manuel Mager, AWS AI Labs
-
Arturo Oncevay, University of Edinburgh
-
Enora Rice, University of Colorado Boulder
-
Abteen Ebrahimi, University of Colorado Boulder
-
Shruti Rijhwani, Google Research
-
Alexis Palmer, University of Colorado Boulder
-
Katharina Kann, University of Colorado Boulder
More information and contact information can be found at
http://turing.iimas.unam.mx/americasnlp/.
--
Dr. Katharina Kann
Assistant Professor of Computer Science
University of Colorado Boulder
Personal page: https://kelina.github.io
Group page: https://nala-cub.github.io
Apologies for cross-posting.
The task proposals deadline had been extended to the 24th April 2023
anywhere on earth. We look forward to your proposals.
Important Dates:
- Task proposals due April 17, 2023 April 24, 2023 (Anywhere on Earth)
- Task selection notification May 22, 2023
Contact: semevalorganizers(a)gmail.com
We invite proposals for tasks to be run as part of SemEval-2024. SemEval
(the International Workshop on Semantic Evaluation)is an ongoing series of
evaluations of computational semantics systems, organized under the
umbrella of SIGLEX, the Special Interest Group on the Lexicon of the
Association for Computational Linguistics.
SemEval tasks explore the nature of meaning in natural languages: how to
characterize meaning and how to compute it. This is achieved in practical
terms, using shared datasets and standardized evaluation metrics to
quantify the strengths and weaknesses and possible solutions. SemEval tasks
encompass a broad range of semantic topics from the lexical level to the
discourse level, including word sense identification, semantic parsing,
coreference resolution, and sentiment analysis, among others.
For SemEval-2024, we welcome any task that can test an automatic system for
the semantic analysis of text, which could be an intrinsic semantic
evaluation or an application-oriented evaluation. We especially encourage
tasks for languages other than English, cross-lingual tasks, and tasks that
develop novel applications of computational semantics. See the websites of
previous editions of SemEval to get an idea about the range of tasks
explored, SemEval-2022 and SemEval-2023.
We strongly encourage proposals based on pilot studies that have already
generated initial data, as this can provide concrete examples and can help
to foresee the challenges of preparing the full task. In the event of
receiving many proposals, preference will be given to proposals that have
already run a pilot study.
In case you are not sure whether a task is suitable for SemEval, please
feel free to get in touch with the SemEval organizers at
semevalorganizers(a)gmail.com to discuss your idea.
*=== Task Selection ===*
Task proposals will be reviewed by experts, and reviews will serve as the
basis for acceptance decisions. Everything else being equal, more
innovative new tasks will be given preference over task reruns. Task
proposals will be evaluated on:
Novelty: Is the task on a compelling new problem that has not been explored
much in the community? Is the task a rerun, but covering substantially new
ground (new subtasks, new types of data, new languages, etc.)?
Interest: Is the proposed task likely to attract a sufficient number of
participants?
Data: Are the plans for collecting data convincing? Will the resulting data
be of high quality? Will annotations have meaningfully high inter-annotator
agreements? Have all appropriate licenses for use and re-use of the data
after the evaluation been secured? Have all international privacy concerns
been addressed? Will the data annotation be ready on time?
Evaluation: Is the methodology for evaluation sound? Is the necessary
infrastructure available or can it be built in time for the shared task?
Will research inspired by this task be able to evaluate in the same manner
and on the same data after the initial task?
Impact: What is the expected impact of the data in this task on future
research beyond the SemEval Workshop?
*=== New Tasks vs. Task Reruns ===*
We welcome both new tasks and task reruns. For a new task, the proposal
should address whether the task would be able to attract participants.
Preference will be given to novel tasks that have not received much
attention yet.
For reruns of previous shared tasks (whether or not the previous task was
part of SemEval), the proposal should address the need for another
iteration of the task. Valid reasons include: a new form of evaluation
(e.g. a new evaluation metric, a new application-oriented scenario), new
genres or domains (e.g. social media, domain-specific corpora), or a
significant expansion in scale. We further discourage carrying over a
previous task and just adding new subtasks, as this can lead to the
accumulation of too many subtasks. Evaluating on a different dataset with
the same task formulation, or evaluating on the same dataset with a
different evaluation metric, typically should not be considered a separate
subtask.
*=== Task Organization ===*
We welcome people who have never organized a SemEval task before, as well
as those who have. Apart from providing a dataset, task organizers are
expected to:
- Verify the data annotations have sufficient inter-annotator agreement
- Verify licenses for the data to allow its use in the competition and
afterwards. In particular, text that is publicly available online is not
necessarily in the public domain; unless a license has been provided, the
author retains all rights associated with their work, including copying,
sharing and publishing. For more information, see:
https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter
- Resolve any potential security, privacy, or ethical concerns about the
data
- Make the data available in a long-term repository under an appropriate
license, preferably using Zenodo: https://zenodo.org/communities/semeval/
- Provide task participants with format checkers and standard scorers.
- Provide task participants with baseline systems to use as a starting
point (in order to lower the obstacles to participation). A baseline system
typically contains code that reads the data, creates a baseline response
(e.g. random guessing, majority class prediction), and outputs the
evaluation results. Whenever possible, baseline systems should be written
in widely used programming languages and/or should be implemented as a
component for standard NLP pipelines.
- Create a mailing list and website for the task and post all relevant
information there.
- Create a CodaLab or other similar competition for the task and upload the
evaluation script.
- Manage submissions on CodaLab or a similar competition site.
- Write a task description paper to be included in SemEval proceedings, and
present it at the workshop.
- Manage participants’ submissions of system description papers, manage
participants’ peer review of each others’ papers, and possibly shepherd
papers that need additional help in improving the writing.
- Review other task description papers.
*=== Important dates ===*
- Task proposals due April 17, 2023 (Anywhere on Earth)
- Task selection notification May 22, 2023
*=== Preliminary timetable ===*
- Sample data ready July 15, 2023
- Training data ready September 1, 2023
- Evaluation data ready December 1, 2023 (internal deadline; not for public
release)
- Evaluation starts January 10, 2024
- Evaluation end by January 31, 2024 (latest date; task organizers may
choose an earlier date)
- Paper submission due February 2024
- Notification to authors on March 2024
- Camera-ready due April 2024
- SemEval workshop Summer 2024 (co-located with a major NLP conference)
Tasks that fail to keep up with crucial deadlines (such as the dates for
having the task and CodaLab website up and dates for uploading samples,
training, and evaluation data) may be cancelled at the discretion of
SemEval organizers. While consideration will be given to extenuating
circumstances, our goal is to provide sufficient time for the participants
to develop strong and well-thought-out systems. Cancelled tasks will be
encouraged to submit proposals for the subsequent year’s SemEval. To reduce
the risk of tasks failing to meet the deadlines, we are unlikely to accept
multiple tasks with overlap in the task organizers.
*=== Submission Details ===*
The task proposal should be a self-contained document of no longer than 3
pages (plus additional pages for references). All submissions must be in
PDF format, following the ACL template.
Each proposal should contain the following:
- Overview
- Summary of the task
- Why this task is needed and which communities would be interested in
participating
- Expected impact of the task
- Data & Resources
- How the training/testing data will be produced. Please discuss whether
existing corpora will be re-used.
- Details of copyright, so that the data can be used by the research
community both during the SemEval evaluation and afterwards
- How much data will be produced
- How data quality will be ensured and evaluated
- An example of what the data would look like
- Resources required to produce the data and prepare the task for
participants (annotation cost, annotation time, computation time, etc.)
- Assessment of any concerns with respect to ethics, privacy, or security
(e.g. personally identifiable information of private individuals; potential
for systems to cause harm)
- Pilot Task (strongly recommended)
- Details of the pilot task
- What lessons were learned and how these will impact the task design
- Evaluation
- The evaluation methodology to be used, including clear evaluation
criteria
- For Task Reruns
- Justification for why a new iteration of the task is needed (see
criteria above)
- What will differ from the previous iteration
- Expected impact of the rerun compared with the previous iteration
- Task organizers
- Names, affiliations, email addresses
- (optional) brief description of relevant experience or expertise
- (if applicable) years and task numbers, of any SemEval tasks you have
run in the past
Proposals will be reviewed by an independent group of area experts who may
not have familiarity with recent SemEval tasks, and therefore all proposals
should be written in a self-explanatory manner and contain sufficient
examples.
The submission webpage is:
https://openreview.net/group?id=aclweb.org/ACL/2023/Workshop/SemEval
*=== Chairs ===*
Atul Kr. Ojha, SFI Insight Centre for Data Analytics, DSI, University of
Galway
A. Seza Doğruöz, Ghent University
Giovanni Da San Martino, University of Padua
Harish Tayyar Madabushi, The University of Bath
Ritesh Kumar, Dr. Bhimrao Ambedkar University
Contact: semevalorganizers(a)gmail.com
Dear colleagues,
In the context of the Lexhnology ANR project (joint linguistic and NLP
discourse structure modelling of legal texts for language pedagogy),
started early 2023, we currently have one open position for doctoral
candidates with a background in Natural Language Processing or related
fields.
# Thesis topics
Interest in the legal field has recently exploded in the NLP community.
International evaluation campaigns are proposed on several semantic
tasks such as legal information extraction, entailment, rhetorical role
recognition, judgement prediction (LegalEval@SemEval2023,
COLIEE-2023)... In addition, several conferences and workshops gathering
researchers have been recently organised (ASAIL@ICAIL2023,
JURISIN@IsAI-2023), showing the growing interest of the NLP community in
this specific domain. Numerous datasets are now built and collected
(PileOfLaw20, LexGLUE22), allowing the community to create specialised
Large Language Models (LLMs) in the legal field (e.g. LegalBERT).
This craze is due to the fact that legal texts have several specific
characteristics that make their automatic processing difficult and
require specific development: they are both language and domain specific
and often longer than the length LLMs can handle.
The role of the PhD student to be recruited will be to:
- propose a framework for probing Pretrained Language Models in terms of
the captured discourse information
- research effective methods to inject discourse knowledge in
Transformer-based language models (discourse inspired self-learning
tasks or multi-tasks learning or Transformer architecture revision...)
- develop an argumentative structure recognition system which will be
used in an online platform by legal English users for supporting their
reading and understanding tasks
# Project context
The PhD fellowship is offered in the context of the Lexhnology (joint
linguistic and NLP discourse structure modelling of legal texts for
language pedagogy) project funded by the French National Research Agency
(https://lexhnology.hypotheses.org/). Partenaires include CRINI (Nantes
Université), LS2N (Nantes Université), ATILF (CNRS & Université de
Lorraine), and LAIRDIL (Université de Toulouse).
The successful candidate will join the NLP research group at LS2N lab in
Nantes (https://taln-ls2n.github.io/). Nantes is located in the western
part of France, crossed by the Loire River, and situated just 50
kilometres away from the Atlantic coast
(https://www.levoyageanantes.fr/en/to-see/).
# Requirements
* Master degree (completed or nearly completed) in Computer Sciences,
Computational Linguistics, Natural Language Processing, Machine
Learning, Data Sciences or a closely related field
* Excellent academic records
* Practical experience in Machine Learning (esp Deep Learning) methods
* Good knowledge of experimental design methodology and statistics
* Some level of familiarity with discourse analysis would be a plus
* Excellent programming skills (esp. Python)
* English (at least B2) and French proficiency both spoken and written
* Initiative and ability to work independently and as part of a team
# General information
* Supervisors: Prof. Richard Dufour, Dr Nicolas Hernandez, Dr Laura
Monceaux.
* Type of Contract : PhD Student contract / Thesis offer
* Contract Period : 36 months
* Start date of the thesis : 1 September 2023
* Proportion of work : Full time on site
* Remuneration : about 2175 € gross monthly (before taxes), partial
reimbursement of public transport costs
# Additional information and application
Application deadline : 8 May 2023
For further information and application, contact Nicolas Hernandez
(nicolas.hernandez(a)univ-nantes.fr) AND Laura Monceaux
(laura.monceaux(a)univ-nantes.fr) AND Richard Dufour
(richard.dufour(a)univ-nantes.fr).
Applications should contain all the documents indicated below:
a. Free style cover letter outlining the interest for the PhD/ANR
project
b. Curriculum vitae
c. Transcripts of grades from first and second year of master's program
(or, if applicable, a document attesting to anticipated success)
d. Names and addresses of two references.
Shortlisted applicants will be interviewed online.
--
Dr. Nicolas Hernandez
Associate Professor (Maître de Conférences)
Nantes Université - LS2N UMR6004
https://nicolashernandez.github.io/
+33 (0)2 51 12 53 94
+33 (0)2 40 30 60 67
https://sciences-techniques.univ-nantes.fr/programme-du-m1-atal
Dear Memebers
We are very excited to share a new tool with the community. Multi-Feature
Tagger of English (MFTE) is the Python version of the MFTE Perl (Le Foll
2021). This improved and extended Python version includes semantic tags
from Biber (2006) and Biber et al. (1999), as well as additional tags,
e.g., separate tags for third person singular male and female pronouns.
This tagger first uses the Python NLP library stanza for grammatical
part-of-speech tagging before applying rule-based regular expressions to
tag for a range of more complex lexico-grammatical and semantic features
typically used in multidimensional analysis (MDA; cf. Biber 1984; 1988).
The software is available as a free and opensource Python command line and
simple GUI. Current version is a pre-alpha release with bugs and errors
expected (& incomplete documentation). If you are interested in doing MDA
studies, you may want to give it a try. Please feel free to report any
errors or other glitches using the Issues tab on the Github repo. The
software is available on the link below along with instructions how to
install and use it.
https://github.com/mshakirDr/MFTE
Regards