*Asia Pacific Journal of Corpus Research (APJCR) is now available online:*
http://icr.or.kr/ejournals-apjcr
*The Incredible Shrinking Noun Phrase: Ongoing Change in Japanese Word
Formation*Kevin Heffernan, (Kwansei Gakuin University), JAPAN; Yusuke
Imanishi (Kwansei Gakuin University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.1
________________________________________
*Identifying Key Grammatical Errors of Japanese English as a Foreign
Language Learners in a Learner Corpus: Toward Focused Grammar Instruction
with Data-Driven Learning*
Atsushi Mizumoto (Kansai University), JAPAN; Yoichi Watari (Chukyo
University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.25
________________________________________
*A Comparison of the Constructions Make / Take a Decision in Malaysian
English with the Supervarieties *
Christina Sook Beng Ong (Wawasan Open University), MALAYSIA
DOI: https://doi.org/10.22925/apjcr.2023.4.1.43
________________________________________
*Effects of Corpus Use on Error Identification in L2 Writing *
Yoshiho Satake (Aoyama Gakuin University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.61
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Associate Professor | Department of English Language and Literature,
Incheon National University, *South Korea*
President | The Korea Association of Secondary English Education, *South
Korea *(http://kasee.org)
Vice President | The Korea Association of Primary English Education), *South
Korea *(http://kapee.or.kr)
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor-in-Chief | Asia Pacific Journal of Corpus Research, ICR,
*International* (http://icr.or.kr/apjcr)
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
H(EN): http://ckjung.org
== 12th NLP4CALL, Tórshavn, Faroe Islands==
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.
The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.
The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.
We welcome papers:
- that describe research directly aimed at ICALL;
- that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
- that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
- that discuss challenges and/or research agenda for ICALL
- that describe empirical studies on language learner data.
This year a special focus is given to work done on error detection/correction and feedback generation.
We encourage paper presentations and software demonstrations describing the above- mentioned themes primarily, but not exclusively, for the Nordic languages.
==Shared task==
NEW for this year is the MultiGED shared task on token-level error detection for L2 Czech, English, German, Italian and Swedish, organized by the Computational SLA working group.
For more information, please see the Shared Task website: https://github.com/spraakbanken/multiged-2023
==Invited speakers==
This year, we have the pleasure to announce two invited talks.
The first talk is given by Marije Michel from the University of Amsterdam.
The second talk is given by Pierre Lison from the Norwegian Computing Center.
==Submission information==
Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references.
We will be using the NLP4CALL template for the workshop this year. The author kit can be accessed here, alternatively on Overleaf:
<https://spraakbanken.gu.se/sites/default/files/2023/NLP4CALL%20workshop%20t…>
<https://spraakbanken.gu.se/sites/default/files/2023/nlp4call%20template.doc>
<https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…>
Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2023>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.
Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>).
==Important dates==
03 April 2023: paper submission deadline
21 April 2023: notification of acceptance
01 May 2023: camera-ready papers for publication
22 May 2023: workshop date
==Organizers==
David Alfter (1), Elena Volodina (2), Thomas François (3), Arne Jönsson (4), Evelina Rennes (4)
(1) Gothenburg Research Infrastructure for Digital Humanities, Department of Literature, History of Ideas, and Religion, University of Gothenburg, Sweden
(2) Språkbanken, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden
(3) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium
(4) Department of Computer and Information Science, Linköping University, Sweden
==Contact==
For any questions, please contact David Alfter, david.alfter(a)gu.se
For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…>
Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>
[Apologies for cross-posting]
Dear colleagues
We are inviting submissions for the next issue of Asia Pacific Journal of
Corpus Research, to appear on 31 December 2023.
*ABOUT*The Asia Pacific Journal of Corpus Research (APJCR, e-ISSN
2733-8096, DOI: https://doi.org/10.22925/apjcr) is an international and
interdisciplinary peer-reviewed journal intended to explore corpus research
in the Asia Pacific region. APJCR addresses areas of methodological,
applied and theoretical work in the field of corpus research. Examples of
such include discourse analysis, lexical studies, grammatical studies,
language acquisition, language learning, language education, lexicography,
pragmatics, sociolinguistics, (machine) translation studies, (digital)
literary studies, computational linguistics, speech, phonetics, deep
learning and natural language understanding in conjunction with corpus.
*NO ARTICLE PROCESS CHARGE*APJCR does not charge authors an Article
Processing Fee (APF).
*OPEN ACCESS POLICY*APJCR provides open access to its content under the
principle in the academic field that making research freely available to
the public supports a greater global exchange of knowledge.
*SUBMISSION*
Papers (in English or Korean) should be sent to *apjcreditor(a)icr.or.kr
<apjcreditor(a)icr.or.kr>*
*Full instruction can be found on http://icr.or.kr/apjcr
<http://icr.or.kr/apjcr>*
*IMPORTANT DATES*- Manuscript submission: 15 October 2023
- First decision (articles assessed by editors): October 2023
- Final decision: November 2023
- Production: December 2023
- Online publication: 31 December 2023
*APJCR ARCHIVE*- Google Scholar:
https://scholar.google.co.kr/scholar?hl=ko&as_sdt=0%2C5&q=apjcr&btnG=
- KoreaScience: http://koreascience.or.kr/journal/CPSOBX/v1n1.page
*ENQUIRIES*
help(a)icr.or.kr
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Associate Professor | Department of English Language and Literature,
Incheon National University, *South Korea*
President | The Korea Association of Secondary English Education, *South
Korea *(http://kasee.org)
Vice President | The Korea Association of Primary English Education), *South
Korea *(http://kapee.or.kr)
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor-in-Chief | Asia Pacific Journal of Corpus Research, ICR,
*International* (http://icr.or.kr/apjcr)
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
**
*SECOND CALL FOR PAPERS: EACL 2024 STUDENT RESEARCH WORKSHOP *
*
Student Research Workshop co-located with EACL 2024 in St. Julians, Malta.
Workshop Dates: March 21/22 2024
***Paper Submission Deadline: December 18, 2023 (Direct) and January 17,
2024 (through ARR)***
**
About the Student Research Workshop
**
The EACL 2024 Student Research Workshop (SRW) is a forum to bring
together students investigating various areas of Computational
Linguistics and Natural Language Processing. The workshop provides an
excellent opportunity for participants to present their work and to
receive mentorship and valuable feedback from the international research
community.
The workshop's goal is to aid students at multiple stages of their
education, including undergraduate, MSc/MA, junior and senior PhD
students, in getting familiar with conducting and presenting their
research.
General Invitation for Submission*
We invite papers in two different categories:
*
**
*
Thesis Proposals: This category is appropriate for PhD students who
have decided on a thesis topic and wish to get feedback on their
proposal and broader ideas for their continuing work.
*
Research Papers: Papers in this category can describe completed
work, or work in progress with preliminary results. For these
papers, the first author **MUST BE** a current student (graduate or
undergraduate). Topics of interest for the SRW are the same as for
the main EACL 2024
conference:<https://www.2022.aclweb.org/calls>https://2024.eacl.org/calls/papers/
<https://2024.eacl.org/calls/papers/>
We are opening a unique opportunity for the submission of research
papers that, while not accepted to the EACL main conference, align well
with the themes of this workshop. To be eligible for submission, the
first author must be a current student. Additionally, submissions should
be complemented with the reviews from ARR to provide context and
insights for evaluation. The submission deadline for this will be
January 17, 2024.
Why Submit to EACL SRW?
*
Mentorship program: EACL SRW provides a unique opportunity for
students to receive constructive feedback and advise from more
senior researchers through our on-site mentorship program.
*
Improving your publication record: Publishing a paper as an
undergraduate or as a MSc/MA student is beneficial when applying for
a PhD program. Publishing a paper in an EACL SRW workshop can be
really helpful for improving students’ publication records.
*
Negative results: we encourage the submission of studies with
negative results providing insights on why and in which scenarios a
particular method fails.
All accepted papers and thesis proposals will be presented in the main
conference poster sessions, which will give students an opportunity to
interact with and to present their work to a large and diverse audience,
including top researchers in the field and assigned mentors.**
*Important Dates*
****
*
Direct Workshop paper submission: December 18, 2023
*
Pre-reviewed ARR paper submission: January 17, 2024
*
Notification of acceptance: January 20, 2024
*
Camera-ready deadline: January 30 2024
*
Workshop dates: March 21-22, 2024
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
**
Submission Requirements
**
We accept both archival submissions (which will be included in the
conference proceedings) and non-archival submissions (which will be
presented at the workshop but will not be included in the proceedings).
**
The archival submissions must follow the anonymity period and the
restrictions of the main conference.
Short papersconsist of up to four (4) pages of content, plus unlimited
references. Upon acceptance, they will be given five (5) content pages
in the proceedings.
Long papersconsist of up to eight (8) pages of content, plus unlimited
references. Upon acceptance, they will be given nine (9) content pages
in the proceedings.
Thesis proposalsconsist of up to eight (8) pages of content, plus
unlimited references. The title must begin with “Thesis Proposal:”. Upon
acceptance, they will be given nine (9) content pages in the proceedings.
We strongly recommend the use of the official ARR style templates. The
paper templates are available as an Overleaf template and can also be
downloaded directly (LaTeX and Word) via
https://aclrollingreview.org/cfp <https://aclrollingreview.org/cfp>under
'Paper Submission and Templates'.
All submissions must be in PDF format. Submissions that do not adhere to
the above author guidelines or ACL policies will be rejected without
review.
Submission is electronic, using the OpenReview conference management.
The submission link is available here:
https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SRW
<https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SRW>
Grants
We expect to have grants to offset some portion of students' travel,
conference registration, and accommodation expenses. Further details
will be posted on the SRW website.
To contact the organizers of the workshop, please email us at:
eaclsrw(a)gmail.com
Website and Contact Information
For more information, please visit
https://sites.google.com/view/eacl2024srw
<https://sites.google.com/view/eacl2024srw>and follow us on Twitter
@eacl_srw. To contact the organizers of the workshop, please email us at
eaclsrw(a)gmail.com*
Third Call for Papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024
Website:
https://mormor-karl.github.io/events/CALD-pseudo/
Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/
Submission Deadline: Monday, 18 December 2023 (anywhere on earth)
We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* December 18, 2023: paper submission deadline
* January 17, 2024: resubmission of already pre-reviewed ARR papers
* January 20, 2024: notification of acceptance
* January, 30 2024: camera-ready papers due
* March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL)
[Introduction]
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected.
[Topics of Interest]
CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as
* automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person;
* developing context-sensitive algorithms for replacement of personal information in unstructured data;
* studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization;
* effectiveness of pseudonymization as a way of protecting writer identity;
*
reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data;
* constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those;
* approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data;
* pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance;
* and numerous other open questions.
[Submission Guidelines]
Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions
* Short papers (up to 4 pages) for ongoing or preliminary work
All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files
Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/
Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.)
We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop.
[Invited speakers]
We are happy to announce that the workshop will host two invited speakers:
*
Anders Søgaard, University of Copenhagen, Denmark
*
Ildikó Pilán, the Norwegian Computing Center, Norway
[Workshop Organizers]
* Elena Volodina, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Simon Dobnik, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå university, Sweden
[Program Committee]
A list of program committee members is available on the workshop website.
[Contact]
For inquiries, please contact mormor.karl(a)svenska.gu.se
ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat…
___________________
Elena Volodina, PhD, Docent
https://spraakbanken.gu.se/en/about/staff/elena
Life is like a mirror. Smile at it and it smiles back at you.
Peace Pilgrim
Apologies for cross-posting.
---------------------------------------------------------------------------
7thWorkshop on Indian Language Data: Resources and Evaluation (WILDRE)
Venue: Lingotto Conference Centre - Torino, Italy (Organized under
LREC-COLING 2024 (20-25 May 2024))
Website: http://sanskrit.jnu.ac.in/conf/wildre7
WILDRE-7, the 7th Workshop on Indian Language Data: Resources and
Evaluation is proposed to be organised in Lingotto Conference Centre -
Torino, Italy under the LREC-COLING platform. India has a huge linguistic
diversity and has seen concerted efforts from the Indian government and
industry to develop language resources. European Language Resource
Association (ELRA) and its associate organizations have been very active
and successful in addressing the challenges and opportunities related to
language resource creation and evaluation. It is therefore a big
opportunity for resource creators of Indian languages to showcase their
work on this platform and also to interact and learn from those involved in
similar initiatives all over the world. The broader objectives of the
WILDRE will be
To map the status of Indian Language Resources
To investigate challenges related to creating and sharing various levels of
language resources
To promote a dialogue between language resource developers and users
To provide an opportunity for researchers from India to collaborate with
researchers from other parts of the world
Dates for Short/Long papers and Posters and Demos (tentative)
February 28, 2023: Paper submissions due
March 28, 2024: Paper notification acceptance
April 10, 2024: Camera-ready papers due
SUBMISSIONS
Papers must describe original, completed/ in progress and unpublished work.
Each submission will be reviewed by three program committee members.
Accepted papers will be given up to 10 pages (for full papers) 5 pages (for
short papers and posters) in the workshop proceedings, and will be
presented as oral paper or poster.
Papers should be formatted according to the LREC-COLING style sheet, which
is provided on the LREC-COLING 2024 website (
https://lrec-coling-2024.org/authors-kit/). Papers should be submitted in
PDF format to the LREC-COLING website (
https://softconf.com/lrec-coling2024/wildre-7/)
We are seeking submissions under the following category
Full papers (10 pages)
Short papers (work in progress: 5 pages)
Posters (innovative ideas/proposals, research proposal of students)
Demo (of working online/standalone systems)
WILDRE-7 will have a special focus on Demos of Indian Language Technology.
In the past few years, as more resources have been developed and made
available, there has been an increased activity in developing usable
technology using these. WILDRE-7 would like to encourage and widen the Demo
track to allow the community to showcase their demos and have mutually
beneficial interactions with each other as well as resource developers.
WILDRE-7 is seeking full, short papers, posters and demos on the following
topics related to Indian Language Resources:
Digital Humanities, heritage computing
Corpora - text, speech, multimodal, methodologies, annotation and tools
Lexicons and Machine-readable dictionaries
Ontologies, Grammars
Language resources for NLP/ IR/Speech tasks, tools and Infrastructure for
language resources
Standards or specifications for language resources application
Licensing and copyright issues
Data mining
Text summarization
Both submission and review processes will be handled electronically. The
review process will be double-blind. The workshop website will provide the
submission guidelines and the link for the electronic submission.
When submitting a paper from the START page, authors will be asked to
provide essential information about resources (in a broad sense, i.e.
technologies, standards, evaluation kits, etc.) that have been used for the
work described in the paper or are a new result of your research. Moreover,
ELRA encourages all LREC-COLING authors to share the described LRs (data,
tools, services, etc.), to enable their reuse, and replicability of
experiments, including evaluation ones, etc.
For further information on this initiative, please refer to
https://lrec-coling-2024.org/
Shared Task
Following the success of the five WILDRE workshops, WILDRE-7 will include
Code-mixed Less-Resourced Sentiment Analysis (Code-mixed) and Discourse
Machine Translation (DiscoMT) Shared Tasks. The organizers of shared tasks
will provide datasets and evaluation platforms to evaluate systems
developed by the participants. For further information on this initiative,
please refer to http://sanskrit.jnu.ac.in/conf/wildre7
Workshop Organisers
Girish Nath Jha, Jawaharlal Nehru University, India
Kalika Bali, Microsoft Research India Lab, Bangalore, India
Sobha L, AU-KBC, Anna University, Chennai, India
Atul Kr. Ojha, University of Galway, Ireland & Panlingua Language
Processing LLP, India
Workshop contact:
Atul Kr. Ojha, University of Galway, Ireland & Panlingua Language
Processing LLP, India, shashwatup9k(a)gmail.com
Identify, Describe and Share your LRs
Describing your LRs in the LRE Map is now a normal practice in the
submission procedure of LREC (introduced in 2010 and adopted by other
conferences). To continue the efforts initiated at LREC 2014 about “Sharing
LRs” (data, tools, web services, etc.), authors will have the possibility,
when submitting a paper, to upload LRs in a special LREC repository. This
effort of sharing LRs, linked to the LRE Map for their description, may
become a new “regular” feature for conferences in our field, thus
contributing to creating a common repository where everyone can deposit and
share data.
As scientific work requires accurate citations of referenced work to allow
the community to understand the whole context and also replicate the
experiments conducted by other researchers, LREC-COLING 2024 endorses the
need to uniquely identify LRs through the use of the International Standard
Language Resource Number (ISLRN, www.islrn.org), a Persistent Unique
Identifier to be assigned to each Language Resource. The assignment of
ISLRNs to LRs cited in LREC-COLING papers will be offered at submission
time.
--
Thanks,
Atul
I will start a new research group on natural language processing as part
of the Bamberg AI Center (https://www.uni-bamberg.de/en/bacai/). There
are currently four open positions:
We do fundamental NLP research at the intersection to computational
psychology, digital humanities, and computational social sciences.
We have currently four positions open (deadline February 28, 2024):
1. Postdoc, Open Topic (3 years)
2. PhD student in interactive prompt optimization (3 years)
3. Researcher in event-centered emotion analysis (1 year)
4. Researcher in multimodal emotion analysis (1 year)
Position 3+4 can be combined to have a 2-year position.
Please find more details at
https://www.bamnlp.de/openpositions/
Do not hesitate to contact me, if you have questions!
Roman Klinger
Applications are invited for a Postdoctoral Researcher position within the
project “Polyglot Machines: Human-like Learning of Morphologically Rich
Languages”, financed by a NWO-VIDI Talent Grant and coordinated by
Principal Investigator (PI) dr. Arianna Bisazza. This is an
interdisciplinary project at the intersection of Computational
Linguistics/Natural Language Processing (NLP), Computational
Psycholinguistics and Language Acquisition.
Despite the impressive advances made possible by neural networks, current
NLP systems are still far from displaying the learning abilities of humans
in many languages. By contrast, children around the world acquire extremely
diverse languages in comparable time spans and from considerably less
linguistic input than that required by neural models.
This project aims to improve language modeling for low-resource
morphologically rich languages, taking inspiration from child language
acquisition insights. Among other methodologies, an artificial language
learning paradigm will be used to simulate the learning of typologically
diverse languages and evaluate the effect of known child-directed language
properties on the acquisition of morphology and other language aspects.
You will be carrying out your research in the context of the Computational
Linguistics group, which is part of the Centre for Language and Cognition
of the University of Groningen, The Netherlands.
An important part of your work will be conducted together with the PI and
the PhD student that will be hired for the same project. Collaboration is
also possible with other PhD students supervised by the PI, as well as
other members of CLCG.
Main requirement: A PhD degree in any area related to the tasks (such as
Computational Linguistics, Computational Psycholinguistics and Language
Acquisition).
Find more details and apply here by 15 February 2024:
https://www.rug.nl/about-ug/work-with-us/job-opportunities/?details=00347-0…
Starting date: Negotiable. Ideally 1 September 2024. The appointment will
be for a specified period of 1 year, renewable for up to 2 more years (so
up to 3 years in total) following positive evaluation.
For questions about the position: A. Bisazza (do not use email for
applications)
a.bisazza(a)rug.nl
--
Arianna Bisazza
Associate Professor
University of Groningen
http://www.cs.rug.nl/~bisazza
Dear all,
The 5th International Workshop on Computational Approaches to Historical
Language Change (https://www.changeiskey.org/event/2024-acl-lchange/,
collocated with ACL'24) is hosting a shared task on _explainable
semantic change modeling_: AXOLOTL-24.
AXOLOTL-24 stands for "Ascertain and eXplain Overhauls of the Lexicon
Over Time at LChange'24" and you are welcome to participate!
https://github.com/ltgoslo/axolotl24_shared_task will serve as the main
information hub for the shared task. Example of the datasets, processing
and evaluation scripts, etc will appear in this Github repository in due
time according to the timeline below.
If you are interested in AXOLOTL-24, please also join our Google Group:
https://groups.google.com/g/axolotl-24/
========
Timeline
========
- February 1 2024 - training data published
- March 25 2024 - test data published
- April 9 2024 - deadline for submission of the systems’ predictions
- April 10 2024 - AXOLOTL'24 test results published
- May 10 2024 - paper submission deadline (same procedure as with other
LChange'24 papers)
============
Introduction
============
This shared task builds on the existing tradition of competitions in
diachronic semantic change detection, like (Schlechtweg et al 2020) and
many others. However, this time we focus on explaining diachronic
semantic changes, even if on a very basic level (for now).
In particular, we challenge the participants to implement a semantic
change modeling system which, given two historical corpora and a sense
inventory corresponding to one of the periods, is able to:
1. Find the target word usages associated with new, gained senses
2. Describe these senses in a way that facilitates understanding and
lexicographical research.
Thus, the task is to identify which exact senses were gained between two
time periods and generate reasonable descriptions (definitions) of these
senses.
To be able to use high-quality gold data, we use a simplified setup
where instead of asking the participants to retrieve and analyze all
target word usages in raw corpora, we provide two manually checked sets
of usage examples (still of considerable size). Below, we still call
them "corpora", for clarity.
The shared task will feature data from Finnish and Russian languages,
but you do not have to speak these languages to participate. There will
also be a surprise language of lesser size at the test stage. For all
these languages, we will use gold, manually annotated data to evaluate
the predictions of the participant systems.
The shared task will consist of two subtasks. The participants are
welcome to choose one of them or both, at their will.
===============================
Subtask 1. Bridging diachronic word uses and a synchronic dictionary
===============================
The participants are offered two corpora, belonging to different time
periods. In addition to this, they are provided with a set of dictionary
entries (sense inventories) for the target words describing their senses
in the first time period (accompanied by definitions). The task is to
find all usages of the target words belonging to newly gained senses,
i.e., senses not covered by the provided sense inventory.
The assumption is that sense definitions from the dictionary, even
though not always covering all word senses even from the same time
period, may still be a useful additional source of information. The goal
is to map word usages to the dictionary senses. This is very similar to
Word Sense Disambiguation, with the difference being that the usages
corresponding to word senses absent from the dictionary should be
grouped into novel sense clusters (this is more similar to Word Sense
Induction). In a way, this subtask is a mixture of WSD and WSI.
- Inputs: a set of target words, two sets of usages for each target word
(a usage is a text fragment containing a target word); target word
dictionary entries with sense ids for the first of two time periods.
- Predictions: sense id for every word usage of the second time period
(either re-using an id from the provided dictionary or adding a novel one).
- Metrics: Adjusted Rand Index (ARI) for all usages and macro-F1 for
usages with existing senses
- Ground truth: manually annotated sense inventories
==============================
Subtask 2. Definition generation for novel word senses
==============================
This subtask challenges the participants to submit good
descriptions/definitions for the novel senses they found in subtask 1.
The definitions can be generated from scratch or retrieved from existing
ontologies: this is completely up to the participants. The organizers
will map the predicted definitions to the gold standard ones and
evaluate their quality with the standard NLG metrics.
- Inputs: Same as subtask 1
- Predictions: Same as subtask 1 plus a dictionary-like definition for
every novel sense of the target word (a sense not present in the
dictionary entry from the first time period)
- Metrics: BLEU/ROUGE and BERTScore. The final score is averaged across
target words
- Ground truth: definitions from our gold standard sense inventories
==========
Organizers
==========
- Mariia Fedorova (University of Oslo)
- Andrey Kutuzov (University of Oslo)
- Timothee Mickus (University of Helsinki)
- Niko Partanen (University of Helsinki)
- Janine Siewert (University of Helsinki)
==========
References
==========
1. Diachronic word embeddings and semantic shifts: a survey (Kutuzov et
al., COLING 2018)
2. SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
(Schlechtweg et al., SemEval 2020)
3. Computational approaches to semantic change (Tahmasebi et al.,
LangSci Press 2021)
4. Semeval-2022 Task 1: CODWOE – Comparing Dictionaries and Word
Embeddings (Mickus et al., SemEval 2022)
5. Interpretable Word Sense Representations via Definition Generation:
The Case of Semantic Change Analysis (Giulianelli et al., ACL 2023)
--
Andrey
Language Technology Group (LTG)
University of Oslo