== 12th NLP4CALL, Tórshavn, Faroe Islands==
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.
The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.
The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.
We welcome papers:
- that describe research directly aimed at ICALL;
- that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
- that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
- that discuss challenges and/or research agenda for ICALL
- that describe empirical studies on language learner data.
This year a special focus is given to work done on error detection/correction and feedback generation.
We encourage paper presentations and software demonstrations describing the above- mentioned themes primarily, but not exclusively, for the Nordic languages.
==Shared task==
NEW for this year is the MultiGED shared task on token-level error detection for L2 Czech, English, German, Italian and Swedish, organized by the Computational SLA working group.
For more information, please see the Shared Task website: https://github.com/spraakbanken/multiged-2023
==Invited speakers==
This year, we have the pleasure to announce two invited talks.
The first talk is given by Marije Michel from the University of Amsterdam.
The second talk is given by Pierre Lison from the Norwegian Computing Center.
==Submission information==
Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references.
We will be using the NLP4CALL template for the workshop this year. The author kit can be accessed here, alternatively on Overleaf:
<https://spraakbanken.gu.se/sites/default/files/2023/NLP4CALL%20workshop%20t…>
<https://spraakbanken.gu.se/sites/default/files/2023/nlp4call%20template.doc>
<https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…>
Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2023>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.
Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>).
==Important dates==
03 April 2023: paper submission deadline
21 April 2023: notification of acceptance
01 May 2023: camera-ready papers for publication
22 May 2023: workshop date
==Organizers==
David Alfter (1), Elena Volodina (2), Thomas François (3), Arne Jönsson (4), Evelina Rennes (4)
(1) Gothenburg Research Infrastructure for Digital Humanities, Department of Literature, History of Ideas, and Religion, University of Gothenburg, Sweden
(2) Språkbanken, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden
(3) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium
(4) Department of Computer and Information Science, Linköping University, Sweden
==Contact==
For any questions, please contact David Alfter, david.alfter(a)gu.se
For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…>
Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>
Hi there,
Could you please distribute the following job offer? Thanks.
Best,
Pascal
-------------------------------------------------------------------------------------
We invite applications for a 3-year PhD position co-funded by Inria,
the French national research institute in Computer Science and Applied
Mathematics, and LexisNexis France, leader of legal information in
France and subsidiary of the RELX Group.
The overall objective of this project is to develop an automated
system for detecting argumentation structures in French legal
decisions, using recent machine learning-based approaches (i.e. deep
learning approaches). In the general case, these structures take the
form of a directed labeled graph, whose nodes are the elements of the
text (propositions or groups of propositions, not necessarily
contiguous) which serve as components of the argument, and edges are
relations that signal the argumentative connection between them (e.g.,
support, offensive). By revealing the argumentation structure behind
legal decisions, such a system will provide a crucial milestone
towards their detailed understanding, their use by legal
professionals, and above all contributes to greater transparency of
justice.
The main challenges and milestones of this project start with the
creation and release of a large-scale dataset of French legal
decisions annotated with argumentation structures. To minimize the
manual annotation effort, we will resort to semi-supervised and
transfer learning techniques to leverage existing argument mining
corpora, such as the European Court of Human Rights (ECHR) corpus, as
well as annotations already started by LexisNexis. Another promising
research direction, which is likely to improve over state-of-the-art
approaches, is to better model the dependencies between the different
sub-tasks (argument span detection, argument typing, etc.) instead of
learning these tasks independently. A third research avenue is to find
innovative ways to inject the domain knowledge (in particular the rich
legal ontology developed by LexisNexis) to enrich enrich the
representations used in these models. Finally, we would like to take
advantage of other discourse structures, such as coreference and
rhetorical relations, conceived as auxiliary tasks in a multi-tasking
architecture.
The successful candidate holds a Master's degree in computational
linguistics, natural language processing, machine learning, ideally
with prior experience in legal document processing and discourse
processing. Furthermore, the candidate will provide strong programming
skills, expertise in machine learning approaches and is eager to work
at the interplay between academia and industry.
The position is affiliated with the MAGNET [1], a research group at
Inria, Lille, which has expertise in Machine Learning and Natural
Language Processing, in particular Discourse Processing. The PhD
student will also work in close collaboration with the R&D team at
LexisNexis France, who will provide their expertise in the legal
domain and the data they have collected.
Applications will be considered until the position is filled. However,
you are encouraged to apply early as we shall start processing the
applications as and when they are received. Applications, written in
English or French, should include a brief cover letter with research
interests and vision, a CV (including your contact address, work
experience, publications), and contact information for at least 2
referees. Applications (and questions) should be sent to Pascal Denis
(pascal.denis(a)inria.fr).
The starting date of the position is 1 November 2022 or soon
thereafter, for a total of 3 full years.
Best regards,
Pascal Denis
[1] https://team.inria.fr/magnet/
[2] https://www.lexisnexis.fr/
--
Pascal
----
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.
----
+++++++++++++++++++++++++++++++++++++++++++++++
Pascal Denis
Equipe MAGNET, INRIA Lille Nord Europe
Bâtiment B, Avenue Heloïse
Parc scientifique de la Haute Borne
59650 Villeneuve d'Ascq
Tel: ++33 3 59 35 87 24
Url: http://researchers.lille.inria.fr/~pdenis/
+++++++++++++++++++++++++++++++++++++++++++++++
Dear colleagues,
Last month, we shared the result of our collaborative work on a core metadata scheme for learner corpora with LCR2022 participants. Our proposal builds on Granger and Paquot (2017)'s first attempt to design such a scheme and during our presentation, we explained the rationale for expanding on the initial proposal and discussed selected aspects of the revised scheme.
Our proposal is available at https://docs.google.com/spreadsheets/d/1-RbX5iUCUtCBkZU9Rfk-kv-Vzc--F-eUW2O…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…>
We firmly believe that our efforts to develop a core metadata scheme for learner corpora will only be successful to the extent that (1) the LCR community is given the opportunity to engage with our work in various ways (provide feedback on the general structure of the scheme, the list of variables that we identified as core and their operationalization; test the metadata on other learner corpora; use the scheme to start a new corpus compilation, etc.) and (2) the core metadata scheme is the result of truly collaborative work.
As mentioned at LCR2022, we will be collecting feedback on the metadata scheme until the end of October. The online feedback form is available at:
https://docs.google.com/document/d/1NeDUuxGJlPSJI9wHVA1xgGM-aV8jXTa8Qlb45K-…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…>
We'd like to thank all the colleagues who already got back to us (at LCR2022, by email or via the online form). We also thank them for their appreciation and enthusiasm for our work! We'd also like to encourage more colleagues (and particularly those of you who have experience in learner corpus compilation) to provide feedback! We need help in finalizing the core metadata scheme to make sure that it can be applied in all learner compilation contexts. In short, we need you to make sure the scheme meets the needs of the LCR community at large.
With very best wishes,
Magali Paquot (also on behalf of Alexander König, Jennifer-Carmen Frey, and Egon W. Stemle)
Reference
Granger, S. & M. Paquot (2017). Towards standardization of metadata for L2 corpora. Invited talk at the CLARIN workshop on Interoperability of Second Language Resources and Tools, 6-8 December 2017, University of Gothenburg, Sweden.
Dr. Magali Paquot
Centre for English Corpus Linguistics
Institut Langage et Communication
UCLouvain
https://perso.uclouvain.be/magali.paquot/
CoCo4MT is extended its deadline for paper submission to July 16th!
The Second Workshop on Corpus Generation and Corpus Augmentation for
Machine Translation (CoCo4MT) @MT-SUMMIT XIX
The 19th Machine Translation Summit
Sep 4-8, 2023, Macau SAR, China
https://sites.google.com/view/coco4mt
SCOPE
It is a well-known fact that machine translation systems, especially
those that use deep learning, require massive amounts of data. Several
resources for languages are not available in their human-created format.
Some of the types of resources available are monolingual, multilingual,
translation memories, and lexicons. Those types of resources are
generally created for formal purposes such as parliamentary collections
when parallel and more informal situations when monolingual. The quality
and abundance of resources including corpora used for formal reasons is
generally higher than those used for informal purposes. Additionally,
corpora for low-resource languages, languages with less digital
resources available, tends to be less abundant and of lower quality.
CoCo4MT is a workshop centered around research that focuses on manual
and automatic corpus creation, cleansing, and augmentation techniques
specifically for machine translation. We accept work that covers any
language (including sign language) but we are specifically interested in
those submissions that explicitly report on work with languages with
limited existing resources (low-resource languages). Since techniques
from high-resource languages are generally statistical in nature and
could be used as generic solutions for any language, we welcome
submissions on high-resource languages also.
CoCo4MT aims to encourage research on new and undiscovered techniques.
We hope that the methods presented at this workshop will lead to the
development of high-quality corpora that will in turn lead to
high-performing MT systems and new dataset creation for multiple
corpora. We hope that submissions will provide high-quality corpora that
are available publicly for download and can be used to increase machine
translation performance thus encouraging new dataset creation for
multiple languages that will, in turn, provide a general workshop to
consult for corpora needs in the future. The workshop’s success will be
measured by the following key performance indicators:
- Promotes the ongoing increase in quality of machine translation
systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research areas
to increase the availability of commonly used corpora and new corpora,
- Drives innovation to address the need for higher quality and abundance
of low-resource language data.
Topics of interest include:
- Difficulties with using existing corpora (e.g., political
considerations or domain limitations) and their effects on final MT
systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for
training MT systems.
SHARED TASK
To encourage research on corpus construction for low-resource machine
translation, we introduce a shared task focused on identifying
high-quality instances that should be translated into a target
low-resource language. Participants are provided access to multi-way
corpora in the high-resource languages of English, Spanish, German,
Korean, and Indonesian, and using these, are required to identify
beneficial instances, that when translated into the low-resource
languages of Cebuano, Gujarati, and Burmese, lead to high-performing MT
systems. More details on data, evaluation and submission can be found on
the website (https://sites.google.com/view/coco4mt/shared-task) or by
emailing coco4mt-shared-task(a)googlegroups.com.
SUBMISSION INFORMATION
CoCo4MT will accept research, review, or position papers. The length of
each paper should be at least four (4) and not exceed ten (10) pages,
plus unlimited pages for references. Submissions should be formatted
according to the official MT Summit 2023 style templates
(https://www.overleaf.com/latex/templates/mt-summit-2023-template/knrrcnxhkq…).
Accepted papers will be published in the MT Summit 2023 proceedings
which are included in the ACL Anthology and will be presented at the
conference either orally or as a poster.
Submissions must be anonymized and should be made to the workshop using
the Softconf conference management system
(https://softconf.com/mtsummit2023/CoCo4MT). Scientific papers that have
been or will be submitted to other venues must be declared as such, and
must be withdrawn from the other venues if accepted and published at
CoCo4MT. The review will be double-blind.
We would like to encourage authors to cite papers written in ANY
language that are related to the topics, as long as both original
bibliographic items and their corresponding English translations are
provided.
Registration will be handled by the main conference. (To be announced)
IMPORTANT DATES
May 18, 2023 - Call for papers released
May 19, 2023 - Shared task release of train, dev and test data
May 25, 2023 - Shared task release of baselines
June 5, 2023 - Second call for papers
June 20, 2023 - Third and final call for papers
July 16, 2023 - Paper submissions due
July 16, 2023 - Shared task deadline to submit results
July 27, 2023 - Notification of acceptance
July 27, 2023 - Shared task system description papers due
August 03, 2023 - Camera-ready due
September 4-5, 2023 - CoCo4MT workshop
CONTACT
CoCo4MT Workshop Organizers:
coco4mt-2023-organizers(a)googlegroups.com
CoCo4MT Shared Task Organizers:
coco4mt-shared-task(a)googlegroups.com
ORGANIZING COMMITTEE (listed alphabetically)
Ananya Ganesh University of Colorado Boulder
Constantine Lignos Brandeis University
John E. Ortega Northeastern University
Jonne Sälevä Brandeis University
Katharina Kann University of Colorado Boulder
Marine Carpuat University of Maryland
Rodolfo Zevallos Universitat Pompeu Fabra
Shabnam Tafreshi University of Maryland
William Chen Carnegie Mellon University
PROGRAM COMMITTEE (listed alphabetically tentative)
Abteen Ebrahimi University of Colorado Boulder
Adelani David Saarland University
Ananya Ganesh University of Colorado Boulder
Alberto Poncelas ADAPT Centre at Dublin City University
Anna Currey Amazon
Amirhossein Tebbifakhr University of Trento
Atul Kr. Ojha National University of Ireland Galway
Ayush Singh Northeastern University
Barrow Haddow University of Edinburgh
Bharathi Raja Chakravarthi National University of Ireland Galway
Beatrice Savoldi University of Trento
Bogdan Babych Heidelberg University
Briakou Eleftheria University of Maryland
Constantine Lignos Brandeis University
Dossou Bonaventure Mila Quebec AI Institute
Duygu Ataman New York University
Eleftheria Briakou University of Maryland
Eleni Metheniti Université Toulosse - Paul Sabatier
Jasper Kyle Catapang University of Birmingham
John E. Ortega Northeastern University
Jonne Sälevä Brandeis University
Kalika Bali Microsoft
Katharina Kann University of Colorado Boulder
Kochiro Watanabe The University of Tokyo
Koel Dutta Chowdhury Saarland University
Liangyou Li Huawei
Manuel Mager University of Stuttgart
Maria Art Antonette Clariño University of the Philippines Los Baños
Marine Carpuat University of Maryland
Mathias Müller University of Zurich
Nathaniel Oco De La Salle University
Niu Xing Amazon
Patrick Simianer Lilt
Rico Sennrich University of Zurich
Rodolfo Zevallos Universitat Pompeu Fabra
Sangjee Dondrub Qinghai Normal University
Santanu Pal Saarland University
Sardana Ivanova University of Helsinki
Shantipriya Parida Silo AI
Shiran Dudy Northeastern University
Surafel Melaku Lakew Amazon
Tommi A Pirinen University of Tromsø
Valentin Malykh Moscow Institute of Physics and Technology
Xing Niu Amazon
Xu Weijia University of Maryland
2nd Call for Abstracts: 1st Workshop on Readability for Low Resourced Languages (RLRL 2023)
Free registration is now open https://bit.ly/3pwUwlG - a few tickets are still available.
Please join us for an exciting online workshop where experts in natural language processing will come together to discuss the latest research and innovative approaches to assessing the readability of low-resource languages. The workshop will take place as a free online event on September 5, 2023, and is being hosted jointly by Lancaster University, Sheffield Hallam University and King Saud University.
We welcome researchers and practitioners to submit presentation abstract proposals of up to 500 words for talks related to the development of a Readability Framework for low-resource languages.
The ultimate goal of the workshop is to discuss best practices and state-of-the-art AI-based approaches to create mathematical representations of expected readability levels at different school grade or cognitive ability levels. The workshop will also focus on utilising classifiers that are intuitive for humans to understand and adjust, enabling the analysis and improvement of the decision-making criteria. We welcome abstracts on work that is still in progress or that does not yet have conclusive results. We encourage authors to share their work at various stages of development to facilitate discussions and collaboration during the workshop.
Important Dates:
- Due date for workshop abstract submission: August 1, 2023 (extended)
- Notification of abstract acceptance to authors: August 10, 2023
- Workshop date: September 5, 2023 (online event<https://bit.ly/3pwUwlG>)
Keynote speakers:
- Professor Laurence Anthony - Faculty of Science and Engineering at Waseda University, Japan.
- Dr Violetta Cavalli-Sforza - School of Science and Engineering at Al Akhawayn University, Morocco.
- Professor Hend Al-Khalifa - College of Computer and Information Sciences at King Saud University, KSA
- Dr Abdel-Karim Al Tamimi- Computer Science and Software Engineering at Sheffield Hallam University, UK
- Dr Mo El-Haj - School of Computing and Communications at Lancaster University, UK
For list of speakers, talks' titles and abstract please visit the workshop's website:
https://wp.lancs.ac.uk/acc/rlrl2023/
The main objectives of the workshop are three-fold:
1- Increase awareness of the importance of readability in low-resource languages and its impact on language learning and literacy.
2- Discuss the challenges of readability in low-resource languages, such as limited resources and lack of standardization, and brainstorm strategies for addressing these challenges.
3- Foster a community of practice among participants, allowing them to share their experiences and best practices for addressing readability issues in low-resource languages.
Abstract submission:
Abstract submission page is now open, please submit abstracts of no more than 500 words https://easychair.org/conferences/?conf=rlrl2023
Alternatively, you can contact the organisers directly with presentation ideas on topics related to readability or low resourced languages.
Topics of interest include, but are not limited to:
- Machine learning for text readability
- Applications of readability assessment
- Readability in low-resource languages
- Comprehensibility measures
- Mathematical representations of readability levels
- Text simplification for low-resource languages
- Readability and comprehensibility in language learning
- The effects of text simplification on readability
- Readability frameworks for indigenous languages
- Updating readability representations
We look forward to your contributions and to a productive and enlightening workshop on September 5, 2023.
RLRL 2023 Organisers:
- Dr Mo El-Haj (SCC/DSI/UCREL, Lancaster University)
- Dr Abdel-Karim Al Tamimi (CSSE, Sheffield Hallam University)
- Prof. Hend Al Khalifa (iWAN, King Saud University)
https://wp.lancs.ac.uk/acc/rlrl2023/
Best wishes,
Mahmoud
---------------------
Dr Mo El-Haj
Senior Lecturer in NLP
Co-Director of UCREL NLP Group
Strategic Lead of Arabic and Financial NLP Research
Advisory Board of the Natural Language Processing Journal
https://benjamins.com/catalog/nlp
School of Computing and Communications, Lancaster University
https://www.lancaster.ac.uk/staff/elhaj
@DocElhaj<https://twitter.com/DocElhaj>
Call for postdoc applications in Natural Language Processing for the
automatic detection of gender stereotypes in the French media (Grenoble
Alps University, France)
Starting date: flexible, November 30, 2023, at the latest
Duration: full-time position for 12 months
Salary: according to experience (up to 4142€/ month)
Application Deadline: Open until filled
Location: The position will be based in Grenoble, France.
This is not a remote work.
Keywords: natural language processing, gender stereotypes
bias, corpus analysis, language models,
transfer learning, deep learning
*Context* The University of Grenoble Alps (UGA) has an open position for
a highly motivated postdoc researcher to joint the multidisciplinary
GenderedNews project. Natural Language Processing models trained on
large amount of on-line content, have quickly opened new perspectives to
process on-line large amount of on-line content for measuring gender
bias in a daily basis (see our project https://gendered-news.imag.fr/
<https://gendered-news.imag.fr/> ). Regarding research on stereotypes,
most recent works have studied Language Models (LM) from a stereotype
perspective by providing specific corpora such as StereoSet (Nadeem et
al., 2020) or CrowS-Pairs (Nangia et al. 2020). However, these studies
are focusing on the quantifying of bias in the LM predictions rather
than bias in the original data (Choenni et al., 2021). Furthermore, most
of these studies ignore named entities (Deshpande et al., 2022) which
account for an important part of the referents and speakers in news. In
this project, we intend to build corpora, methods and NLP tools to
qualify the differences between the language used to describe groups of
people in French news.
*Main Tasks*
The successful postdoc will be responsible for day-to-day running of the
research project, under the supervision of François Portet (Prof UGA at
LIG) and Gilles Bastin (prof UGA at PACTE). Regular meetings will take
place every two weeks.
- Defining the dimensions of stereotypes to be investigated and the
possible metrics that can be processed from a machine learning perspective.
- Exploring, managing and curating news corpora in French for
stereotypes investigation, with a view to making them widely available
to the community to favor reproducible research and comparison.
- Studying and developing new computational models to process large
number of texts to reveal stereotype bias in news. Make use of
pretrained models for the task.
- Evaluate the methods on curated focused corpus and apply it to the
unseen real longitudinal corpus and analyze the results with the team.
- Preparing articles for submission to peer-reviewed conferences and
journals.
- Organizing progress meetings and liaising between members of the team.
The hired person will interact with PhD students, interns and
researchers being part of the GenderedNews project. According to his/her
background his/her own interests and in accordance with the project's
objective, the hired person will have the possibility to orient the
research in different directions.
*Scientific Environment*
The recruited person will be hosted within the GETALP teams of the LIG
laboratory (https://lig-getalp.imag.fr/
<https://lig-getalp.imag.fr/>), which offers a dynamic, international,
and stimulating environment for conducting high-level
multidisciplinary research. The person will have access to large
datasets of French news, GPU servers, to support for missions as well as
to the scientific activities of the labs. The team is housed in a modern
building (IMAG) located in a 175-hectare landscaped campus that
was ranked as the eighth most beautiful campus in Europe by the Times
Higher Education magazine in 2018.
The person will also closely work with Gilles Bastin (PACTE, a Sociology
lab in Grenoble) and Ange Richard (PhD at LIG and PACTE). The project
also includes an informal collaboration with "Prenons la une"
(https://prenonslaune.fr/ <https://prenonslaune.fr/>) a journalists’
association which promotes a fair representation of women in the media.
*Requirements*
The candidate must have a PhD degree in Natural Language Processing or
computer science or in the process of acquiring it. The successful
candidate should have
- Good knowledge of Natural Language Processing - Experience in corpus
collection/formatting and manipulation. - Good programming skills in
Python - Publication record in a close field of research - Willing to
work in multidisciplinary and international teams - Good communication
skills - Good mastering of French is required
*Instructions for applying*
Applications will be considered on the fly and must be addressed to
François Portet (Francois.Portet(a)imag.fr
<mailto:Francois.Portet@imag.fr>). It is therefore advisable to apply as
soon as possible. The application file should contain
- Curriculum vitae - References for potential letter(s) of
recommendation - One-page summary of research background and interests
for the position - Publications demonstrating expertise in the
aforementioned areas - Pre-defense reports and defense minutes; or
summary of the thesis with the date of defense for those currently
in doctoral studies
*References*
Deshpande et al. (2022). StereoKG: Data-Driven Knowledge Graph
Construction for Cultural Knowledge and Stereotypes. arXiv preprint
arXiv:2205.14036.
Choenni et al. (2021). Stepmothers are mean and academics are
pretentious: What do pretrained language models learn about you? arXiv
preprint arXiv:2109.10052.
Nadeem et al. (2020) StereoSet: Measuring stereotypical bias in
pretrained language models. ArXiv.
Nangia et al. (2020) CrowS-Pairs: A Challenge Dataset for Measuring
Social Biases in Masked Language Models. In EMNLP2020.
--
François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE
Phone: +33 (0)4 57 42 15 44
Email:francois.portet@imag.fr
www:http://membres-liglab.imag.fr/portet/
*** Second Workshop on Information Extraction from Scientific Publications
(WIESP) at IJCNLP-AACL 2023 ***
*** Website: https://ui.adsabs.harvard.edu/WIESP/2023/ ***
*** Twitter: https://twitter.com/wiesp_nlp ***
Building on the success of the First WIESP at AACL-IJCNLP 2022, the Second
Workshop on Information Extraction from Scientific Publications (WIESP)
will provide a platform to researchers to foster discussion and research on
information extraction, mining, generation, and knowledge discovery from
scientific publications using Natural Language Processing and Machine
Learning techniques. A lot of technological change happened in one year
(since the 1st WIESP), especially with Generative Artificial Intelligence
research. We are incorporating a few additional topics to stay abreast with
the latest developments and research in the community. The 2nd iteration of
WIESP would focus on the following topics (but not limited to):
- Large Language Models (LLMs) for Science
- Application of LLMs on information extraction, generation, mining and
knowledge discovery from scientific publications
- Probing LLMs for scientific fact checking and misinformation
- Scientific document parsing
- Scientific named-entity recognition
- Scientific article summarization
- Question-answering on scientific articles
- Citation context/span extraction
- Structured information extraction from full-text, tables, figures,
bibliography
- Novel datasets curated from scientific publications
- Argument extraction and mining
- Challenges in information extraction from scientific articles
- Building knowledge graphs via mining scientific literature; querying
scientific knowledge graphs
- Novel tools for IE on scientific literature and interaction with users
- Mathematical information extraction
- Scientific concepts, facts extraction
- Visualizing scientific knowledge
- Bibliometric and Altmetric studies via information extraction from
scientific articles and metadata
In addition to research paper presentations, WIESP will also feature
keynote talks, a panel discussion on “Large Language Models and Scientific
Literature Mining'', and shared tasks. We will update the details on our
website as and when they become available. We especially welcome
participation from academic and research institutions, government and
industry labs, publishers, and information service providers. Projects and
organizations using NLP/ML techniques in their text mining and enrichment
efforts are also welcome to participate. We strongly encourage
participation of students, researchers, and science practitioners from
diverse backgrounds, especially from underrepresented groups and
communities, to be a part of WIESP events, and pro-actively make the
workshop a diverse and inclusive one.
***Call for Papers***
We invite papers of the following categories:
***Long papers*** must describe substantial, original, completed, and
unpublished work. Wherever appropriate, concrete evaluation and analysis
should be included. Papers must not exceed eight (8) pages of content, plus
unlimited pages of references. The final versions of long papers will be
given one additional page of content (up to 9 pages) so that reviewers'
comments can be taken into account.
***Short papers*** must describe original and unpublished work. Please note
that a short paper is not a shortened long paper. Instead, short papers
should have a point that can be made in a few pages, such as a small,
focused contribution, a negative result, or an interesting application
nugget. Short papers must not exceed four (4) pages, plus unlimited pages
of references. The final versions of short papers will be given one
additional page of content (up to 5 pages) so that reviewers' comments can
be taken into account.
In addition to papers, WIESP will also host shared tasks. More details on
the WIESP shared tasks will be available on our website shortly. Also, we
will publish separate CfPs on the shared tasks. Shared task authors will be
invited to write their system descriptions and those will be subjected to
peer review.
***Shared Task: Function of Citation in Astrophysics Literature (FOCAL)***
The citation graph is an essential tool for helping researchers find
relevant literature. To further empower discovery, we aim to label the
edges of the graph with the function of the citation: e.g. is the cited
work necessary background knowledge, or is it used as a comparison, to the
citing work? To start this process, we propose a shared task of
automatically labeling citations with a function based on the textual
context of the citation. A sample dataset and more instructions can be
found at: https://ui.adsabs.harvard.edu/WIESP/2023/SharedTasks
All accepted papers would be published in the WIESP proceedings as part of
IJCNLP-AACL 2023 and indexed in the ACL Anthology.
***Important Dates***
- Paper Submission Deadline: August 25, 2023
- Notification of workshop paper/abstract acceptance: October 2, 2023
- Camera-ready Submission Deadline: October 15, 2023
- Workshop: November 2-4, 2023 (online, final date TBD)
***All submission deadlines are 11.59 pm UTC -12h ("Anywhere on Earth")***
***Submission Website and Format***
Submission Link: TBD (please have an eye on the website)
Submission will be via softconf. Submissions should follow the ACLPUB
formatting guidelines (https://acl-org.github.io/ACLPUB/formatting.html)
and template files (https://github.com/acl-org/acl-style-files/tree/master).
Submissions (Long and Short Papers) will be subject to a double-blind
peer-review process. We follow the same policies as IJCNLP-AACL 2023
regarding preprints and double submissions. The anonymity period for WIESP
2023 is from July 25 to August 25.
***Organizers***
- Tirthankar Ghosal, National Center for Computational Sciences, Oak Ridge
National Laboratory, USA
- Felix Grezes, Center for Astrophysics | Harvard & Smithsonian, USA
- Thomas Allen, Center for Astrophysics | Harvard & Smithsonian, USA
- Kelly Lockhart, Center for Astrophysics | Harvard & Smithsonian, USA
- Alberto Accomazzi, Center for Astrophysics | Harvard & Smithsonian, USA
--
+++++++++++++++++++++++++++++++++++
Tirthankar Ghosal
https://member.acm.org/~tghosal
++++++++++++++++++++++++++++++++++++
*FinCausal 2023: Financial Document Causality Detection*
We are glad to announce that the Training Dataset for both English and
Spanish is released and ready on Codalab in this link:
https://codalab.lisn.upsaclay.fr/competitions/14596
Please register on CodaLab and get to the FInCausal.2023 Competition.
Under Participate, you will find the Training Datasets together with a
Starting Kit to guide you through the Task.
###### *Task Description and Important Links *#######
*FinCausal-2023 Shared Task: “Financial Document Causality Detection” *is
organised within the *5th Financial Narrative Processing Workshop (FNP
2023)* taking place in the 2023 IEEE International Conference on Big Data
(IEEE BigData 2023) <http://bigdataieee.org/BigData2023/>, Sorrento, Italy,
15-18 December 2023. It is a *one-day event*.
Workshop URL: https://wp.lancs.ac.uk#####cfie/fincausal2023/
<https://wp.lancs.ac.uk/cfie/fincausal2023/>
###### *Additional Information *#######
*Shared Task Description:*
Financial analysis needs factual data and an explanation of the variability
of these data. Data state facts but need more knowledge regarding how these
facts materialised. Furthermore, understanding causality is crucial in
studying decision-making processes.
The *Financial Document Causality Detection Task* (FinCausal) aims at
identifying elements of cause and effect in causal sentences extracted from
financial documents. Its goal is to evaluate which events or chain of
events can cause a financial object to be modified or an event to occur,
regarding a given context. In the financial landscape, identifying cause
and effect from external documents and sources is crucial to explain why a
transformation occurs.
Two subtasks are organised this year. *English FinCausal subtask *and* Spanish
FinCausal subtask*. This is the first year where we introduce a subtask in
Spanish.
*Objective*: For both tasks, participants are asked to identify, given a
causal sentence, which elements of the sentence relate to the cause, and
which relate to the effect. Participants can use any method they see fit
(regex, corpus linguistics, entity relationship models, deep learning
methods) to identify the causes and effects.
*English FinCausal subtask*
- *Data Description: *The dataset has been sourced from various 2019
financial news articles provided by Qwam, along with additional SEC data
from the Edgar Database. Additionally, we have augmented the dataset from
FinCausal 2022, adding 500 new segments. Participants will be provided with
a sample of text blocks extracted from financial news and already labelled.
- *Scope: *The* English FinCausal subtask* focuses on detecting causes
and effects when the effects are quantified. The aim is to identify, in
a causal sentence or text block, the causal elements and the consequential
ones. Only one causal element and one effect are expected in each segment.
- *Length of Data fragments: *The* English FinCausal subtask* segments
are made up of up to three sentences.
- *Data format: *CSV files. Datasets for both the English and the
Spanish subtasks will be presented in the same format.
This shared task focuses on determining causality associated with a
quantified fact. An event is defined as the arising or emergence of a new
object or context regarding a previous situation. So, the task will
emphasise the detection of causality associated with the transformation of
financial objects embedded in quantified facts.
*Spanish FinCausal subtask*
- *Data Description: *The dataset has been sourced from a corpus of
Spanish financial annual reports from 2014 to 2018. Participants will be
provided with a sample of text blocks extracted from financial news,
labelled through inter-annotator agreement.
- *Scope: *The *Spanish FinCausal subtask* aims to detect all types of
causes and effects, not necessarily limited to quantified effects. The
aim is to identify, in a paragraph, the causal elements and the
consequential ones. Only one causal element and one effect are expected in
each paragraph.
- *Length of Data fragments: *The *Spanish FinCausal subtask* involves
complete paragraphs.
- *Data format: *CSV files. Datasets for both the English and the
Spanish subtasks will be presented in the same format.
This shared task focuses on determining causality associated with both
events or quantified facts. For this task, a cause can be the justification
for a statement or the reason that explains a result. This task is also a
relation detection task.
Best regards,
FinCausal 2023 Team
The Data Science Chair at JMU Würzburg as a member of the Center for AI and Data Science (CAIDAS) offers two positions for doctoral researchers (m/w/d) in the area of machine learning.
Both positions will work within the BigData@Geo2 project, the followup of the successful BigData@Geo project [1], that provides machine-learning-aided decision support for agricultural measures in the light of regional climate change. This includes prediction of crop yields and enabling proactive agricultural strategies.
In the first position, you will build machine and deep learning improved climate models that provide a basis for the prediction of regional climate change and agricultural risk assessment, allowing agriculture to react in time by applying appropriate policies to deal with the challenge of changing climate-related conditions. This work focuses on the use and extension of state of the art deep learning architectures such as transformers to solve important downstream tasks such as increasing climate model resolution, identifying relevant climate indicators, integrating additional ecosystem information, and transfer function.
The second position focuses on natural language processing and will allow you to work on data from many small companies in the form of historical yearbooks, as well as general information from local newspapers or social media discussing local climate events. Using this data, you will develop new methods for discovering climate, ecosystem and agriculturally relevant events that assist in the overarching goal of BigData@Geo2 of assessing the economic viability of agricultural decisions, such as which crops to grow in future seasons, or predicting crop yield.
Payment is at the level of E13 according to the German federal wage agreement scheme (TV-L). Candidates are expected to have a strong background in computer science and mathematics, with a specialisation in machine learning and interest in the topic of one of the positions. Prior knowledge in the field of deep learning in one of the subject areas is advantageous.
Please send your application (letter of motivation, curriculum vitae, academic records) at your earliest convenience, but no later than August 25th, 2023, to Prof. Dr. Andreas Hotho (dmir-jobs(a)uni-wuerzburg.de). You are welcome to contact us on the same address for additional details.
[1] https://bigdata-at-geo.eu/