*** Apologies for cross-posting ***
The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT’23)
The workshop will be held in conjunction with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.
Workshop website: https://en.sce.ac.il/news/iact23
Date: July 27, 2023
Location: Taipei, Taiwan.
Paper submission deadline: Extended to May 23, 2023, AoE
Submission link: https://easychair.org/conferences/?conf=iact23
To bring the research community's attention to the limitations of current models in recognizing and characterizing AI vs. human authors, we organize the first edition of IACT workshops under the umbrella of the SIGIR conference. Research works submitted to the workshop should foster scientific advances in all aspects of author characterization.
Submission Guidelines:
All papers must be original and not simultaneously submitted to another journal or conference. The following paper categories are welcome:
- Full research papers: up to 8 pages. Original and high-quality unpublished contributions to the theory and practical aspects of the workshop topics.
- Short research papers: up to 5 pages. It can describe ongoing research, resources, and demos.
- Negative results papers: up to 5 pages. Highlighting tested hypotheses that did not get the expected outcome is also welcomed.
- Position papers: up to 5 pages. Discussing current and future research directions.
The length constraints do not include references.
The submissions must be anonymous and will be peer-reviewed by at least two program committee members.
Workshop Format:
The authors of accepted papers will be given 15 minutes for a short oral presentation. The workshop will run as a hybrid event to allow virtual attendance and meet the SIGIR format.
Workshop Topics:
Research works submitted to the workshop should foster the scientific advance on all aspects of implicit author information extraction from text, including but not limited to the following:
- Differentiation between AI-generated content and human-generated content and bot profiling
- Characterization of conversational agents
- Feature detection of authors for human vs. AI determination
- Prompt understanding and recognition in language models
- Personalized question-answering and conversation generation
- Troll identification on social media
- Review authenticity estimation
- Multi-modal, multi-genre, and multilingual author analysis
- Character analysis, description, and representation in narrative texts
- Detecting implicit expressions of sentiment, emotion, opinion, and bias
- Transfer learning for implicit author characterization
- Implicit author characterization annotation schema
- Evaluation of implicit author characterization
- Author characterization in low-resource languages and under-studied domains
- Accountability and regulation of AI-based information extraction, retrieval, and content generation
- Copyright issues of AI-generated content
- Ethical and privacy implications of author characterization and implicit information extraction
- Fairness and bias of AI-generated content
Organizing Committee:
Marina Litvak - marinal(a)ac.sce.ac.il; Shamoon College of Engineering Beer Sheva; Israel
Irina Rabaev - irinar(a)ac.sce.ac.il; Shamoon College of Engineering Beer Sheva; Israel
Alípio Mário Jorge - amjorge(a)fc.up.pt; University of Porto; Porto, Portugal
Ricardo Campos - ricardo.campos(a)ipt.pt; Polytechnic Institute of Tomar INESC TEC, Portugal; Porto, Portugal
Adam Jatowt - adam.jatowt(a)uibk.ac.at; University of Innsbruck; Innsbruck, Austria
Invited Speakers:
Prof. Mark Last - Ben-Gurion University of the Negev, Israel
Prof. Dr. Valia Kordoni - Humboldt-Universität Berlin, Germany
Dr. Marina Litvak: litvak.marina(a)gmail.com
Dr. Irina Rabaev: irinar(a)ac.sce.ac.il
All the best,
Hugo Sousa on behalf of the IATC'23 Organizers
**apologies for cross-postings**
===== Call for Participation - IWCS 2023 =====
Early Registration Extension: -> 28th May 2023
15th International Conference on Computational Semantics (IWCS)
Universit�� de Lorraine, Nancy, France
20-23th June 2023
IWCS is the biennial meeting of SIGSEM [1], the ACL special interest
group on semantics [2]; this year's edition is organized in person by the
Loria [3] and IDMC [4] of the Universit�� de Lorraine.
[1] http://sigsem.org/
[2] http://aclweb.org/
[3] https://www.loria.fr/fr/
[4] http://idmc.univ-lorraine.fr/
The aim of the IWCS conference is to bring together researchers
interested in any aspects of the computation, annotation, extraction,
representation and neuralisation of meaning in natural language,
whether this is from a lexical or structural semantic perspective.
IWCS embraces both symbolic and machine learning approaches to
computational semantics, and everything in between. The conference
and workshops will take place 20-23 June 2023.
We invite paper submissions in all areas of computational semantics, in
other words all computational aspects of meaning of natural language within
written, spoken, signed, or multi-modal communication.
Presentations will be oral and posters.
Submissions are invited on these closely related areas, including the
* design of meaning representations
* syntax-semantics interface
* representing and resolving semantic ambiguity
* shallow and deep semantic processing and reasoning
* hybrid symbolic and statistical approaches to semantics
* distributional semantics
* alternative approaches to compositional semantics
* inference methods for computational semantics
* recognising textual entailment
* learning by reading
* methodologies and practices for semantic annotation
* machine learning of semantic structures
* probabilistic computational semantics
* neural semantic parsing
* computational aspects of lexical semantics
* semantics and ontologies
* semantic web and natural language processing
* semantic aspects of language generation
* generating from meaning representations
* semantic relations in discourse and dialogue
* semantics and pragmatics of dialogue acts
* multimodal and grounded approaches to computing meaning
* semantics-pragmatics interface
* applications of computational semantics
Two types of submission are solicited: long papers and short papers. Both
types should be submitted not later than 3 March (anywhere on earth).
Long papers should describe original research and must not exceed 8 pages
(not counting acknowledgements and references).
Short papers (typically system or project descriptions, or ongoing research)
must not exceed 4 pages (not counting acknowledgements and references).
Both types will be published in the conference proceedings and in the ACL
Anthology. Accepted papers get an extra page in the camera-ready version.
IWCS papers should be formatted following the common two-column structure
as used by ACL. Please use our specific style-files or the Overleaf template, taken
from ACL 2021. Similar to ACL 2021, initial submissions should be fully anonymous
to ensure double-blind reviewing.
Papers should be submitted in PDF format via Softconf:
Please make sure that you select the right track when submitting your paper.
Contact the organisers if you have problems using Softconf.
No anonymity period
IWCS 2023 does not have an anonymity period. However, we ask you to be
reasonable and not publicly advertise your preprint during (or right before) review.
22 March 2023 (anywhere on earth) Paper submissions
17 April 2023 Decisions sent to authors
20-23 June 2023 IWCS conference
=== CONTACT ===
For questions, contact: iwcs2023-contact(a)univ-lorraine.fr
Maxime Amblard, Ellen Breithloltz (the IWCS 2023 organizers)
It is our pleasure to announce the publication of issue 10(2) of the
Journal of Language Modelling (JLM), a free open-access peer-reviewed
journal aiming to bridge the gap between theoretical, formal and
computational linguistics:
http://jlm.ipipan.waw.pl/ (see “CURRENT” or “ALL ISSUES”).
The direct persistent link to this issue is:
JLM is indexed by SCOPUS, ERIH PLUS, DBLP, DOAJ, etc., and it is a member
“Idiosyncratic frequency as a measure of derivation vs. inflection”
Maria Copot, Timothee Mickus, Olivier Bonami
“Simplicity and learning to distinguish arguments from modifiers”
Leon Bergen, Edward Gibson, Timothy J. O'Donnell
“Neural heuristics for scaling constructional language processing”
Paul Van Eecke, Jens Nevens, Katrien Beuls
“External Reviewers 2019–2022”
The current make-up of the JLM Editorial Board is enclosed below.
Best regards,
Adam Przepiórkowski (JLM Editor-in-Chief)
Steven Abney, University of Michigan, USA
Ash Asudeh, University of Rochester, USA
Chris Biemann, Universität Hamburg, GERMANY
Igor Boguslavsky, Technical University of Madrid, SPAIN; Institute for
Information Transmission Problems, Russian Academy of Sciences, Moscow,
António Branco, University of Lisbon, PORTUGAL
David Chiang, University of Southern California, Los Angeles, USA
Greville Corbett, University of Surrey, UNITED KINGDOM
Dan Cristea, University of Iași, ROMANIA
Jan Daciuk, Gdańsk University of Technology, POLAND
Mary Dalrymple, University of Oxford, UNITED KINGDOM
Darja Fišer, University of Ljubljana, SLOVENIA
Anette Frank, Universität Heidelberg, GERMANY
Claire Gardent, CNRS/LORIA, Nancy, FRANCE
Jonathan Ginzburg, Université Paris-Diderot, FRANCE
Stefan Th. Gries, University of California, Santa Barbara, USA
Heiki-Jaan Kaalep, University of Tartu, ESTONIA
Laura Kallmeyer, Heinrich-Heine-Universität Düsseldorf, GERMANY
Jong-Bok Kim, Kyung Hee University, Seoul, KOREA
Kimmo Koskenniemi, University of Helsinki, FINLAND
Jonas Kuhn, Universität Stuttgart, GERMANY
Alessandro Lenci, University of Pisa, ITALY
Ján Mačutek, Comenius University in Bratislava, SLOVAKIA
Igor Mel’čuk, University of Montreal, CANADA
Glyn Morrill, Technical University of Catalonia, Barcelona, SPAIN
Stefan Müller, Humboldt Universität zu Berlin, GERMANY
Mark-Jan Nederhof, University of St Andrews, UNITED KINGDOM
Petya Osenova, Sofia University, BULGARIA
David Pesetsky, Massachusetts Institute of Technology, USA
Maciej Piasecki, Wrocław University of Technology, POLAND
Christopher Potts, Stanford University, USA
Louisa Sadler, University of Essex, UNITED KINGDOM
Agata Savary, Université François Rabelais Tours, FRANCE
Sabine Schulte im Walde, Universität Stuttgart, GERMANY
Stuart M. Shieber, Harvard University, USA
Mark Steedman, University of Edinburgh, UNITED KINGDOM
Stan Szpakowicz, School of Electrical Engineering and Computer Science,
University of Ottawa, CANADA
Shravan Vasishth, Universität Potsdam, GERMANY
Zygmunt Vetulani, Adam Mickiewicz University, Poznań, POLAND
Aline Villavicencio, Federal University of Rio Grande do Sul, Porto Alegre,
Veronika Vincze, University of Szeged, HUNGARY
Yorick Wilks†, Florida Institute of Human and Machine Cognition, USA
Shuly Wintner, University of Haifa, ISRAEL
Zdeněk Žabokrtský, Charles University in Prague, CZECH REPUBLIC
Adam Przepiórkowski ˈadam ˌpʃɛpjurˈkɔfskʲi
http://clip.ipipan.waw.pl/ ____ Computational Linguistics in Poland
http://jlm.ipipan.waw.pl/ ___________ Journal of Language Modelling
http://zil.ipipan.waw.pl/ ____________ Linguistic Engineering Group
http://nkjp.pl/ _________________________ National Corpus of Polish
The Cambridge Institute for Automated Language Teaching and Assessment
(ALTA) is seeking two Research Assistants or Research Associates in Natural
Language Processing and Machine Learning to join their strong team of
researchers in the Department of Computer Science and Technology.
ALTA is a virtual institute which brings together researchers from Computer
Science, Engineering, Linguistics and Language Assessment to investigate
new ways of using technology to enhance language learning and to develop
cutting-edge approaches to assessment which will benefit learners and
teachers worldwide.
The successful applicant will be working in EdTech based on LLM technology
and will focus on at least one of the following areas: automated assessment
of language learners, explainable models of assessment, learning-content
generation, or adaptive learning. In all cases, the candidate must have a
directly relevant PhD (or must be close to completion). The candidate is
expected to have knowledge and experience of computational techniques
relevant to natural language processing and machine learning, including an
understanding and experience with pre-trained language models. The
candidate will need to be confident communicating in cross-disciplinary
Further information: https://www.jobs.cam.ac.uk/job/40995/
We are inviting applications for one PhD position (3 years) and a
postdoctoral position (funding available 09/2023-01/2026)of acomputer
scientist,computational linguistor psycholinguist,who has experience
withor interest incognitive modelling for language processing (e.g.,
Bayesian models, and/or modelsusing cognitive architectureslike ACT-R).
Thepositions will be funded as part of theERC Starting Grant
"Individualized Interaction in Discourse" ofProf. Vera Demberg, at
Saarland University. The goal of the position is todevelop models that
capture individualdifferences in discourse and pragmatic processing.
The candidatewill conduct research on the design andimplementation of
cognitive models of language processingat the level of discourse and/or
pragmatic processing. These models should capture individual differences
in cognitionsuch as working memory, language experience, background
knowledge, theory-of-mind abilities etc.
The successful applicant must have excellent spoken and
writtenproficiency inEnglish, and have a background in natural language
processing or cognitive modelling.
Applicants are requested to submit their application, including a cover
letter that specifies why you would like to workon this topic and what
qualifies you for it, an academic CV, a list of academic publications,
yourBSc/ MSc / PhDthesis (ora current draft), copies of academic degree
certificates and names of two potential references.
For application to the Postdoctoral position, please quote opening
number W2290, for the PhD position please quote opening number W2289.
The applicationsshould be sent via emaildirectly to Prof. Vera Demberg:
*The**(extended) **application deadline is**June**4th****, 202**3*
Saarland University is one of the leading centres for computer science
and computational linguistics in Europe, and offers adynamic and
stimulating research environment.The groupis affiliated with both
theDepartment of Computer Scienceand withtheDepartment of Language
Scienceand Technology.
Both departments are part of the Saarland Informatics Campus, which
brings together 800 researchers and 2000 students from 81countries. We
collaborate closely with the university's Department of Computer
Science, the Max Planck Institute for Informatics,the Max Planck
Institute for Software Systems, and the German Research Center for
Artificial Intelligence (DFKI).
Our researchers and students come from all over the world, and our
primary working language is English.
The Saarland University is an equal opportunities employer. In
accordance with its policy of increasing the proportion of womenin this
type of employment, the University actively encourages applications from
women. Women are given preference in cases ofequal suitability, ability
and professional performance.
Applications from severely disabled persons will be given preferential
consideration in the event of equal suitability. Part-timeemployment is
generally possible.
We welcome applications regardless of gender,nationality, ethnic and
social origin, religion/belief, disability, age, and sexualorientation
and identity.
Pay grade classification is based on the particular details of the
position held and the extent to which the applicant meetstherequirements
of the pay grade within the TV-L salary scale.
Unfortunately, costs for attending an interview at Saarland University
cannot be reimbursed in principle.
When you submit a job application to Saarland University you will be
transmitting personal data.Please refer to our privacy noticefor
information on how we collect and process personal data in accordance
with Art.13 of theDatenschutz-Grundverordnung.Bysubmitting your
application you confirm that you have taken note of the information in
the Saarland University privacy notice.************
Prof. Dr. Vera Demberg
Computer Science and Computational Linguistics
Saarland Informatics Campus
Saarland University
Campus C7.2 Room 3.02
D-66123 Saarbrücken
Phone: +49-681-302-70024
Sekretariat: +49-681-302-70025
Fax: +49-681-302-70026
You received this message because you are subscribed to the Google
Groups "XPRAG Wine Gatherings" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to xpragwine+unsubscribe(a)googlegroups.com.
To view this discussion on the web visit
For more options, visit https://groups.google.com/d/optout.
***** apologies for multiple posting ****
Dear Colleagues,
We are running a special issue on "When Natural Language Processing
Meets Machine Learning— Opportunities, Challenges and Solutions" in
journal Computers (ISSN 2073-431X,
https://www.mdpi.com/journal/computers ). Dr. Lu Bai from Ulster
University, Belfast BT15 1ED, UK, Prof. Dr. Huiru Zheng from Ulster
University, Belfast BT15 1ED, UK, and Dr. Zhibao Wang from Northeast
Petroleum University, Daqing 163318, China are editing this special
issue. We are writing to invite you to contribute a paper to be
published in *open access* form for this Special Issue.
The combination of Natural Language Processing (NLP) and Machine
Learning (ML) has led to many advancements in the field of artificial
intelligence, enabling computers to understand and analyse human
language. NLP focuses on the interactions between human language and
computers, while ML provides algorithms and techniques to make
predictions and automate tasks based on data. The opportunities
presented by this combination include improved text classification,
sentiment analysis, machine translation, and question-answering systems.
However, the integration of NLP and ML still faces several challenges,
such as the need for large amounts of annotated data for training,
handling the complexity and variability of human language, and ensuring
the ethical and fair use of AI systems. To overcome these challenges,
NLP and ML researchers are exploring innovative solutions such as
transfer learning, semi-supervised learning, and unsupervised learning
methods, as well as developing techniques to handle unstructured and
diverse data. Additionally, there is a growing emphasis on ensuring the
accountability, transparency, and ethical use of AI systems. For more
details, please visit the special issue website:
The manuscript submission deadline is *31 December 2023*.
You could send your manuscript earlier or up until the deadline. Papers
will be reviewed upon receipt and published on an ongoing basis.
Extended conference papers are also welcome. They should contain at
least 50% of new material, e.g., in the form of technical extensions,
more in-depth evaluations, or additional use cases.
Computers (http://www.mdpi.com/journal/computers) is a fully open access
journal of computer science, published quarterly online by MDPI,
Switzerland. It is covered by the Scopus (Elsevier, 2018 CiteScore:
3.7), ESCI (Web of Science), INSPEC and DBLP.Manuscripts are
peer-reviewed and a first decision provided to authors approximately
*16.5* days after submission; acceptance to publication is undertaken in
*3.8* days.
For further details on the submission process, please see the
instructions for authors at the journal website
In the hope that this invitation receives your favorable consideration,
we look forward to our future collaboration.
Dr. Lu Bai from Ulster University, Belfast BT15 1ED, UK
Prof. Dr. Huiru Zheng from Ulster University, Belfast BT15 1ED, UK
Dr. Zhibao Wang from Northeast Petroleum University, Daqing 163318, China
Mr. Blink Yu
Managing Editor
E-Mail: blink.yu(a)mdpi.com
Skype: live:c91693ac8277e1f0
MDPI Wuhan Office
No.6 Jingan Road, 430064 Wuhan, China
Disclaimer: MDPI recognizes the importance of data privacy and
protection. We treat personal data in line with the General Data
Protection Regulation (GDPR) and with what the community expects of us.
The information contained in this message is confidential and intended
solely for the use of the individual or entity to whom they are
addressed. If you have received this message in error, please notify me
and delete this message from your system. You may not copy this message
in its entirety or in part, or disclose its contents to anyone.
The Tübingen AI Center is inviting applications for the following tenured positions:
- Full Professor of Machine Learning and Intelligent Systems (2 positions)
- Full Professor of ML Engineering and Technology Transfer
The Tübingen AI Center is a research institution hosted by the University of Tübingen in cooperation with the Max Planck Institute for Intelligent Systems whose core machine learning faculties work together to develop more robust, efficient and accountable learning systems. Embedded in Tübingen's rapidly growing science and technology campus, the Tübingen AI Center has close ties with the newly established ELLIS Institute Tübingen, and more generally cooperates closely with the pan-European ELLIS network as well as the Cyber Valley initiative, which connects researchers with start-ups and industry in the area.
Details about the positions and how to apply can be found at https://tuebingen.ai/careers. Applications should be submitted by June 14, 2023. Inquiries about the position may be directed to the Central Office of the Tübingen AI Center (lynn.anthonissen(a)uni-tuebingen.de).
The Second Workshop on Corpus Generation and Corpus Augmentation for
Machine Translation (CoCo4MT) @MT-SUMMIT XIX
The 19th Machine Translation Summit
Sep 4-8, 2023, Macau SAR, China
It is a well-known fact that machine translation systems, especially
those that use deep learning, require massive amounts of data. Several
resources for languages are not available in their human-created format.
Some of the types of resources available are monolingual, multilingual,
translation memories, and lexicons. Those types of resources are
generally created for formal purposes such as parliamentary collections
when parallel and more informal situations when monolingual. The quality
and abundance of resources including corpora used for formal reasons is
generally higher than those used for informal purposes. Additionally,
corpora for low-resource languages, languages with less digital
resources available, tends to be less abundant and of lower quality.
CoCo4MT is a workshop centered around research that focuses on manual
and automatic corpus creation, cleansing, and augmentation techniques
specifically for machine translation. We accept work that covers any
language (including sign language) but we are specifically interested in
those submissions that explicitly report on work with languages with
limited existing resources (low-resource languages). Since techniques
from high-resource languages are generally statistical in nature and
could be used as generic solutions for any language, we welcome
submissions on high-resource languages also.
CoCo4MT aims to encourage research on new and undiscovered techniques.
We hope that the methods presented at this workshop will lead to the
development of high-quality corpora that will in turn lead to
high-performing MT systems and new dataset creation for multiple
corpora. We hope that submissions will provide high-quality corpora that
are available publicly for download and can be used to increase machine
translation performance thus encouraging new dataset creation for
multiple languages that will, in turn, provide a general workshop to
consult for corpora needs in the future. The workshop’s success will be
measured by the following key performance indicators:
- Promotes the ongoing increase in quality of machine translation
systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research areas
to increase the availability of commonly used corpora and new corpora,
- Drives innovation to address the need for higher quality and abundance
of low-resource language data.
Topics of interest include:
- Difficulties with using existing corpora (e.g., political
considerations or domain limitations) and their effects on final MT
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for
training MT systems.
To encourage research on corpus construction for low-resource machine
translation, we introduce a shared task focused on identifying
high-quality instances that should be translated into a target
low-resource language. Participants are provided access to multi-way
corpora in the high-resource languages of English, Spanish, German,
Korean, and Indonesian, and using these, are required to identify
beneficial instances, that when translated into the low-resource
languages of Cebuano, Gujarati, and Burmese, lead to high-performing MT
systems. More details on data, evaluation and submission can be found on
the website (https://sites.google.com/view/coco4mt) or by emailing
CoCo4MT will accept research, review, or position papers. The length of
each paper should be at least four (4) and not exceed ten (10) pages,
plus unlimited pages for references. Submissions should be formatted
according to the official MT Summit 2023 style templates
Accepted papers will be published in the MT Summit 2023 proceedings
which are included in the ACL Anthology and will be presented at the
conference either orally or as a poster.
Submissions must be anonymized and should be made to the workshop using
the Softconf conference management system
(https://softconf.com/mtsummit2023/CoCo4MT). Scientific papers that have
been or will be submitted to other venues must be declared as such, and
must be withdrawn from the other venues if accepted and published at
CoCo4MT. The review will be double-blind.
We would like to encourage authors to cite papers written in ANY
language that are related to the topics, as long as both original
bibliographic items and their corresponding English translations are
Registration will be handled by the main conference. (To be announced)
May 18, 2023 - Call for papers released
May 19, 2023 - Shared task release of train, dev and test data
May 25, 2023 - Shared task release of baselines
June 5, 2023 - Second call for papers
June 20, 2023 - Third and final call for papers
July 05, 2023 - Paper submissions due
July 05, 2023 - Shared task deadline to submit results
July 20, 2023 - Notification of acceptance
July 20, 2023 - Shared task system description papers due
July 31, 2023 - Camera-ready due
September 4-5, 2023 - CoCo4MT workshop
CoCo4MT Workshop Organizers:
CoCo4MT Shared Task Organizers:
ORGANIZING COMMITTEE (listed alphabetically)
Ananya Ganesh University of Colorado Boulder
Constantine Lignos Brandeis University
John E. Ortega Northeastern University
Jonne Sälevä Brandeis University
Katharina Kann University of Colorado Boulder
Marine Carpuat University of Maryland
Rodolfo Zevallos Universitat Pompeu Fabra
Shabnam Tafreshi University of Maryland
William Chen Carnegie Mellon University
PROGRAM COMMITTEE (listed alphabetically tentative)
Abteen Ebrahimi University of Colorado Boulder
Adelani David Saarland University
Ananya Ganesh University of Colorado Boulder
Alberto Poncelas ADAPT Centre at Dublin City University
Anna Currey Amazon
Amirhossein Tebbifakhr University of Trento
Atul Kr. Ojha National University of Ireland Galway
Ayush Singh Northeastern University
Barrow Haddow University of Edinburgh
Bharathi Raja Chakravarthi National University of Ireland Galway
Beatrice Savoldi University of Trento
Bogdan Babych Heidelberg University
Briakou Eleftheria University of Maryland
Constantine Lignos Brandeis University
Dossou Bonaventure Mila Quebec AI Institute
Duygu Ataman New York University
Eleftheria Briakou University of Maryland
Eleni Metheniti Université Toulosse - Paul Sabatier
Jasper Kyle Catapang University of Birmingham
John E. Ortega Northeastern University
Jonne Sälevä Brandeis University
Kalika Bali Microsoft
Katharina Kann University of Colorado Boulder
Kochiro Watanabe The University of Tokyo
Koel Dutta Chowdhury Saarland University
Liangyou Li Huawei
Manuel Mager University of Stuttgart
Maria Art Antonette Clariño University of the Philippines Los Baños
Marine Carpuat University of Maryland
Mathias Müller University of Zurich
Nathaniel Oco De La Salle University
Niu Xing Amazon
Patrick Simianer Lilt
Rico Sennrich University of Zurich
Rodolfo Zevallos Universitat Pompeu Fabra
Sangjee Dondrub Qinghai Normal University
Santanu Pal Saarland University
Sardana Ivanova University of Helsinki
Shantipriya Parida Silo AI
Shiran Dudy Northeastern University
Surafel Melaku Lakew Amazon
Tommi A Pirinen University of Tromsø
Valentin Malykh Moscow Institute of Physics and Technology
Xing Niu Amazon
Xu Weijia University of Maryland
Dear colleagues,
We are happy to invite you to join the *Arabic NER SharedTask 2023*
<https://dlnlp.ai/st/wojood/> which will be organized as part of the WANLP
2023. We will provide you with a large corpus and Google Colab notebooks to
help you reproduce the baseline results.
دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود
المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها.
Named Entity Recognition (NER) is integral to many NLP applications. It is
the task of identifying named entity mentions in unstructured text and
classifying them to predefined classes such as person, organization,
location, or date. Due to the scarcity of Arabic resources, most of the
research on Arabic NER focuses on flat entities and addresses a limited
number of entity types (person, organization, and location). The goal of
this shared task is to alleviate this bottleneck by providing Wojood, a
large and rich Arabic NER corpus. Wojood consists of about 550K tokens (MSA
and dialect, in multiple domains) that are manually annotated with 21
entity types.
Participants need to register via this form (
*https://forms.gle/UCCrVNZ2LaPviCZS6* <https://forms.gle/UCCrVNZ2LaPviCZS6>).
Participating teams will be provided with common training development
datasets. No external manually labelled datasets are allowed. Blind test
data set will be used to evaluate the output of the participating teams.
Each team is allowed a maximum of 3 submissions. All teams are required to
report on the development and test sets (after results are announced) in
their write-ups.
For any questions related to this task, please check our *Frequently Asked
- March 03, 2023: Registration available
- May 25, 2023: Data-sharing and evaluation on development set Avaliable
- June 10, 2023: Registration deadline
- July 20, 2023: Test set made available
- July 30, 2023: Evaluation on test set (TEST) deadline
- September 5, 2023: Shared task system paper submissions due
- October 12, 2023: Notification of acceptance
- October230, 2023: camera-ready papers due
** All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).*
For any questions related to this task, please contact the organizers
directly using the following email address: *NERSharedtask2023(a)gmail.com
<NERSharedtask2023(a)gmail.com>* or join the google group:
As described, this shared task targets both flat and nested Arabic NER. The
subtasks are:
*Subtask 1:* *Flat NER*
In this subtask, we provide the Wojood-Flat train (70%) and development
(10%) datasets. The final evaluation will be on the test set (20%). The
flat NER dataset is the same as the nested NER dataset in terms of
train/test/dev split and each split contains the same content. The only
difference in the flat NER is each token is assigned one tag, which is the
first high-level tag assigned to each token in the nested NER dataset.
*Subtask 2:* *Nestd NER*
In this subtask, we provide the Wojood-Nested train (70%) and development
(10%) datasets. The final evaluation will be on the test set (20%).
The evaluation metrics will include precision, recall, F1-score. However,
our official metric will be the micro F1-score.
The evaluation of shared tasks will be hosted through CODALAB. Teams will
be provided with a CODALAB link for each shared task.
-*CODALAB link for NER Shared Task Subtask 1 (Flat NER)*
-*CODALAB link for NER Shared Task Subtask 2 (Nestd NER)*
Two baseline models trained on Wojood (flat and nested) are provided:
*Nested NER baseline:* is presented in this *article*
<https://aclanthology.org/2022.lrec-1.387/>, and code is available in
*GitHub* <https://github.com/SinaLab/ArabicNER>. The model achieves a micro
F1-score of 0.9059 (note that this baseline does not handle nested entities
of the same type).
*Flat NER baseline:* same code repository for nested NER (*GitHub*
<https://github.com/SinaLab/ArabicNER>) can also be used to train flat NER
task. Our flat NER baseline achieved a micro F1-score of 0.8785.
To allow you to experiment with the baseline, we authored four Google Colab
notebooks that demonstrate how to train and evaluate our baseline models.
[1] *Train Flat NER*
This notebook can be used to train our ArabicNER model on the flat NER task
using the sample Wojood data found in our repository.
[2] *Evaluate Flat NER*
this notebook will use the trained model saved from the notebook above to
perform evaluation on unseen dataset.
[3] *Train Nested NER*
This notebook can be used to train our ArabicNER model on the nested NER
task using the sample Wojood data found in our repository.
[4] *Evaluate Nested NER*
this notebook will use the trained model saved from the notebook above to
perform evaluation on unseen dataset.
- Mustafa Jarrar, Birzeit University
- Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
- Mohammed Khalilia, Birzeit University
- Bashar Talafha, University of British Columbia
- AbdelRahim Elmadany, University of British Columbia
- Nagham Hamad, Birzeit University
- Alaa Omer, Birzeit University
*** Combo Call for Workshop Papers ***
19th IEEE eScience Conference (eScience 2023)
October 9-13, 2023, St. Raphael Resort, Limassol, Cyprus
The 19th IEEE eScience Conference will be held in Limassol, Cyprus on October 9-13, 2023.
eScience 2023 will host a number of workshops which will be co-located with the main
conference on Monday, October 9 and Tuesday, October 10, 2023.
The eScience conference has a long history of hosting well-attended workshops. These
workshops share the goal of bringing together international and interdisciplinary research
communities, developers, and users of eScience applications and enabling IT technologies.
Workshops play a crucial role in the conference by providing an opportunity for researchers
and practitioners to present their work in a more focused way than the conference itself and
to have in-depth discussions of particular topics of interest to the community.
eScience 2023 will host the following workshops:
● 1st Workshop on cItizeN Science engagemenT based on Ict soLutions (INSTIL 2023)
● 3rd Workshop on E-science ReseaRch leading tO negative Results (ERROR 2023)
● 3rd Workshop on Reproducible Workflows, Data Management, and Security
(ReWorDS 2023)
● 4th Global Research Platform (4GRP) Workshop
● IEEE International Workshop on Artificial Intelligence for Health (AI4Health 2023)
● Research Software Engineers in eScience: Sustainable RSE Ecosystems within eScience
● Workshop papers submission deadline: Defined per workshop, not before end June 2023
● Notification of acceptance: Defined per workshop
● IEEE proceedings camera ready: July 21, 2023
● Workshop days: October 9-10, 2023
Deadlines refer to 23:59 in the AoE (Anywhere on Earth) time zone.
General Chair
• George Angelos Papadopoulos, University of Cyprus, Cyprus
Technical Program Co-Chairs
• Rafael Ferreira da Silva, Oak Ridge National Laboratory, USA
• Rosa Filgueira, University of St Andrews, UK
Organisation Committee
Steering Committee