*Asia Pacific Journal of Corpus Research (APJCR) is now available online:*
http://icr.or.kr/ejournals-apjcr
*The Incredible Shrinking Noun Phrase: Ongoing Change in Japanese Word
Formation*Kevin Heffernan, (Kwansei Gakuin University), JAPAN; Yusuke
Imanishi (Kwansei Gakuin University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.1
________________________________________
*Identifying Key Grammatical Errors of Japanese English as a Foreign
Language Learners in a Learner Corpus: Toward Focused Grammar Instruction
with Data-Driven Learning*
Atsushi Mizumoto (Kansai University), JAPAN; Yoichi Watari (Chukyo
University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.25
________________________________________
*A Comparison of the Constructions Make / Take a Decision in Malaysian
English with the Supervarieties *
Christina Sook Beng Ong (Wawasan Open University), MALAYSIA
DOI: https://doi.org/10.22925/apjcr.2023.4.1.43
________________________________________
*Effects of Corpus Use on Error Identification in L2 Writing *
Yoshiho Satake (Aoyama Gakuin University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.61
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Associate Professor | Department of English Language and Literature,
Incheon National University, *South Korea*
President | The Korea Association of Secondary English Education, *South
Korea *(http://kasee.org)
Vice President | The Korea Association of Primary English Education), *South
Korea *(http://kapee.or.kr)
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor-in-Chief | Asia Pacific Journal of Corpus Research, ICR,
*International* (http://icr.or.kr/apjcr)
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
H(EN): http://ckjung.org
== 12th NLP4CALL, Tórshavn, Faroe Islands==
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.
The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.
The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.
We welcome papers:
- that describe research directly aimed at ICALL;
- that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
- that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
- that discuss challenges and/or research agenda for ICALL
- that describe empirical studies on language learner data.
This year a special focus is given to work done on error detection/correction and feedback generation.
We encourage paper presentations and software demonstrations describing the above- mentioned themes primarily, but not exclusively, for the Nordic languages.
==Shared task==
NEW for this year is the MultiGED shared task on token-level error detection for L2 Czech, English, German, Italian and Swedish, organized by the Computational SLA working group.
For more information, please see the Shared Task website: https://github.com/spraakbanken/multiged-2023
==Invited speakers==
This year, we have the pleasure to announce two invited talks.
The first talk is given by Marije Michel from the University of Amsterdam.
The second talk is given by Pierre Lison from the Norwegian Computing Center.
==Submission information==
Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references.
We will be using the NLP4CALL template for the workshop this year. The author kit can be accessed here, alternatively on Overleaf:
<https://spraakbanken.gu.se/sites/default/files/2023/NLP4CALL%20workshop%20t…>
<https://spraakbanken.gu.se/sites/default/files/2023/nlp4call%20template.doc>
<https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…>
Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2023>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.
Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>).
==Important dates==
03 April 2023: paper submission deadline
21 April 2023: notification of acceptance
01 May 2023: camera-ready papers for publication
22 May 2023: workshop date
==Organizers==
David Alfter (1), Elena Volodina (2), Thomas François (3), Arne Jönsson (4), Evelina Rennes (4)
(1) Gothenburg Research Infrastructure for Digital Humanities, Department of Literature, History of Ideas, and Religion, University of Gothenburg, Sweden
(2) Språkbanken, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden
(3) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium
(4) Department of Computer and Information Science, Linköping University, Sweden
==Contact==
For any questions, please contact David Alfter, david.alfter(a)gu.se
For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…>
Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>
[Apologies for cross-posting]
Dear colleagues
We are inviting submissions for the next issue of Asia Pacific Journal of
Corpus Research, to appear on 31 December 2023.
*ABOUT*The Asia Pacific Journal of Corpus Research (APJCR, e-ISSN
2733-8096, DOI: https://doi.org/10.22925/apjcr) is an international and
interdisciplinary peer-reviewed journal intended to explore corpus research
in the Asia Pacific region. APJCR addresses areas of methodological,
applied and theoretical work in the field of corpus research. Examples of
such include discourse analysis, lexical studies, grammatical studies,
language acquisition, language learning, language education, lexicography,
pragmatics, sociolinguistics, (machine) translation studies, (digital)
literary studies, computational linguistics, speech, phonetics, deep
learning and natural language understanding in conjunction with corpus.
*NO ARTICLE PROCESS CHARGE*APJCR does not charge authors an Article
Processing Fee (APF).
*OPEN ACCESS POLICY*APJCR provides open access to its content under the
principle in the academic field that making research freely available to
the public supports a greater global exchange of knowledge.
*SUBMISSION*
Papers (in English or Korean) should be sent to *apjcreditor(a)icr.or.kr
<apjcreditor(a)icr.or.kr>*
*Full instruction can be found on http://icr.or.kr/apjcr
<http://icr.or.kr/apjcr>*
*IMPORTANT DATES*- Manuscript submission: 15 October 2023
- First decision (articles assessed by editors): October 2023
- Final decision: November 2023
- Production: December 2023
- Online publication: 31 December 2023
*APJCR ARCHIVE*- Google Scholar:
https://scholar.google.co.kr/scholar?hl=ko&as_sdt=0%2C5&q=apjcr&btnG=
- KoreaScience: http://koreascience.or.kr/journal/CPSOBX/v1n1.page
*ENQUIRIES*
help(a)icr.or.kr
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Associate Professor | Department of English Language and Literature,
Incheon National University, *South Korea*
President | The Korea Association of Secondary English Education, *South
Korea *(http://kasee.org)
Vice President | The Korea Association of Primary English Education), *South
Korea *(http://kapee.or.kr)
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor-in-Chief | Asia Pacific Journal of Corpus Research, ICR,
*International* (http://icr.or.kr/apjcr)
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
**
*SECOND CALL FOR PAPERS: EACL 2024 STUDENT RESEARCH WORKSHOP *
*
Student Research Workshop co-located with EACL 2024 in St. Julians, Malta.
Workshop Dates: March 21/22 2024
***Paper Submission Deadline: December 18, 2023 (Direct) and January 17,
2024 (through ARR)***
**
About the Student Research Workshop
**
The EACL 2024 Student Research Workshop (SRW) is a forum to bring
together students investigating various areas of Computational
Linguistics and Natural Language Processing. The workshop provides an
excellent opportunity for participants to present their work and to
receive mentorship and valuable feedback from the international research
community.
The workshop's goal is to aid students at multiple stages of their
education, including undergraduate, MSc/MA, junior and senior PhD
students, in getting familiar with conducting and presenting their
research.
General Invitation for Submission*
We invite papers in two different categories:
*
**
*
Thesis Proposals: This category is appropriate for PhD students who
have decided on a thesis topic and wish to get feedback on their
proposal and broader ideas for their continuing work.
*
Research Papers: Papers in this category can describe completed
work, or work in progress with preliminary results. For these
papers, the first author **MUST BE** a current student (graduate or
undergraduate). Topics of interest for the SRW are the same as for
the main EACL 2024
conference:<https://www.2022.aclweb.org/calls>https://2024.eacl.org/calls/papers/
<https://2024.eacl.org/calls/papers/>
We are opening a unique opportunity for the submission of research
papers that, while not accepted to the EACL main conference, align well
with the themes of this workshop. To be eligible for submission, the
first author must be a current student. Additionally, submissions should
be complemented with the reviews from ARR to provide context and
insights for evaluation. The submission deadline for this will be
January 17, 2024.
Why Submit to EACL SRW?
*
Mentorship program: EACL SRW provides a unique opportunity for
students to receive constructive feedback and advise from more
senior researchers through our on-site mentorship program.
*
Improving your publication record: Publishing a paper as an
undergraduate or as a MSc/MA student is beneficial when applying for
a PhD program. Publishing a paper in an EACL SRW workshop can be
really helpful for improving students’ publication records.
*
Negative results: we encourage the submission of studies with
negative results providing insights on why and in which scenarios a
particular method fails.
All accepted papers and thesis proposals will be presented in the main
conference poster sessions, which will give students an opportunity to
interact with and to present their work to a large and diverse audience,
including top researchers in the field and assigned mentors.**
*Important Dates*
****
*
Direct Workshop paper submission: December 18, 2023
*
Pre-reviewed ARR paper submission: January 17, 2024
*
Notification of acceptance: January 20, 2024
*
Camera-ready deadline: January 30 2024
*
Workshop dates: March 21-22, 2024
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
**
Submission Requirements
**
We accept both archival submissions (which will be included in the
conference proceedings) and non-archival submissions (which will be
presented at the workshop but will not be included in the proceedings).
**
The archival submissions must follow the anonymity period and the
restrictions of the main conference.
Short papersconsist of up to four (4) pages of content, plus unlimited
references. Upon acceptance, they will be given five (5) content pages
in the proceedings.
Long papersconsist of up to eight (8) pages of content, plus unlimited
references. Upon acceptance, they will be given nine (9) content pages
in the proceedings.
Thesis proposalsconsist of up to eight (8) pages of content, plus
unlimited references. The title must begin with “Thesis Proposal:”. Upon
acceptance, they will be given nine (9) content pages in the proceedings.
We strongly recommend the use of the official ARR style templates. The
paper templates are available as an Overleaf template and can also be
downloaded directly (LaTeX and Word) via
https://aclrollingreview.org/cfp <https://aclrollingreview.org/cfp>under
'Paper Submission and Templates'.
All submissions must be in PDF format. Submissions that do not adhere to
the above author guidelines or ACL policies will be rejected without
review.
Submission is electronic, using the OpenReview conference management.
The submission link is available here:
https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SRW
<https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SRW>
Grants
We expect to have grants to offset some portion of students' travel,
conference registration, and accommodation expenses. Further details
will be posted on the SRW website.
To contact the organizers of the workshop, please email us at:
eaclsrw(a)gmail.com
Website and Contact Information
For more information, please visit
https://sites.google.com/view/eacl2024srw
<https://sites.google.com/view/eacl2024srw>and follow us on Twitter
@eacl_srw. To contact the organizers of the workshop, please email us at
eaclsrw(a)gmail.com*
Third Call for Papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024
Website:
https://mormor-karl.github.io/events/CALD-pseudo/
Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/
Submission Deadline: Monday, 18 December 2023 (anywhere on earth)
We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* December 18, 2023: paper submission deadline
* January 17, 2024: resubmission of already pre-reviewed ARR papers
* January 20, 2024: notification of acceptance
* January, 30 2024: camera-ready papers due
* March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL)
[Introduction]
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected.
[Topics of Interest]
CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as
* automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person;
* developing context-sensitive algorithms for replacement of personal information in unstructured data;
* studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization;
* effectiveness of pseudonymization as a way of protecting writer identity;
*
reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data;
* constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those;
* approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data;
* pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance;
* and numerous other open questions.
[Submission Guidelines]
Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions
* Short papers (up to 4 pages) for ongoing or preliminary work
All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files
Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/
Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.)
We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop.
[Invited speakers]
We are happy to announce that the workshop will host two invited speakers:
*
Anders Søgaard, University of Copenhagen, Denmark
*
Ildikó Pilán, the Norwegian Computing Center, Norway
[Workshop Organizers]
* Elena Volodina, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Simon Dobnik, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå university, Sweden
[Program Committee]
A list of program committee members is available on the workshop website.
[Contact]
For inquiries, please contact mormor.karl(a)svenska.gu.se
ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat…
___________________
Elena Volodina, PhD, Docent
https://spraakbanken.gu.se/en/about/staff/elena
Life is like a mirror. Smile at it and it smiles back at you.
Peace Pilgrim
I will start a new research group on natural language processing as part
of the Bamberg AI Center (https://www.uni-bamberg.de/en/bacai/). There
are currently four open positions:
We do fundamental NLP research at the intersection to computational
psychology, digital humanities, and computational social sciences.
We have currently four positions open (deadline February 28, 2024):
1. Postdoc, Open Topic (3 years)
2. PhD student in interactive prompt optimization (3 years)
3. Researcher in event-centered emotion analysis (1 year)
4. Researcher in multimodal emotion analysis (1 year)
Position 3+4 can be combined to have a 2-year position.
Please find more details at
https://www.bamnlp.de/openpositions/
Do not hesitate to contact me, if you have questions!
Roman Klinger
Dear colleagues,
The 9th Workshop on Noisy and User-generated Text is welcoming paper commitments from ARR.
More info on the workshop:http://noisy-text.github.io
ARR commitment link:https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/WNUT_ARR_C…
Our ARR commitment deadline is January 17th anywhere on earth, so you can also commit EACL submissions after rejection.
Best,
Rob
Apologies for cross-posting
------------------------------------------------------
Dear colleagues,
We invite you to submit to the special session on “Emergent Phenomena in
Deep Representations and Large Language Models” as a part of IJCNN 2024
and IEEE WCCI 2024, which will be located in Yokohama, Japan.
We are looking forward to your contributions.
Please find the CfP below.
Best wishes,
On behalf of Organising Committee
Özge Alacam
------------------------------------------------------
First Call for Papers: Special Session on Emergent Phenomena in Deep
Representations and Large Language Models @IJCNN 2024 & IEEE WCCI 2024:
Deep learning models trained on large datasets have shown spectacular
performance in a wide range of tasks demonstrated by current
applications of Large Language Models. However, recent works have shown
that the abilities large machine learning models acquire often emerge
unpredictably with increasing model complexity or training dataset size.
These emergent phenomena include the unexpected appearance of abilities
for which the model was not explicitly trained, but they might also be
related to unexpected performance boosts due to the increased model
complexity. Emergent phenomena are not always beneficial: larger models
may pick up new biases from the training data or start hallucinating.
To move towards increasingly sustainable, reliable, and explainable
applications of AI systems, it is necessary to increase the
understanding of the mechanisms surrounding emergent phenomena.
Moreover, this effort provides increased insight into the learning
process behind the acquisition of abilities of large models to perform
specific tasks. Important research questions relate to the definition of
emergent phenomena, their causes (what controls which abilities are
acquired and when?), training efficiency, and training data quality
(e.g., acquiring desired abilities with less computational effort),
prompting strategies to get or test for desired model behaviour (e.g., a
chain of thought), and further verification methods of model abilities
and properties.
The primary goal of this special session is (i) to discuss the emergent
abilities and risks in deep neural networks and representations from
very different angles and (ii) facilitate networking and encourage
collaboration between various research fields that approach this issue
from different perspectives, like computational linguistics, ethics in
AI, computer science, physics, etc.
Topics of interest include, but are not limited to:
• The definition of emergence in the context of NLP and ML
• Prompting strategies
• Physics-based/inspired analyses (e.g. phase transitions in ML
models)
• Explainability and interpretability (XAI)
• Evaluation measures for model ability, monitoring strategies,
assessment of model abilities (e.g. technical or psychology-based)
• Knowledge distillation, model pruning, energy-efficient models.
• Mitigation strategies for emergent risks and model deterioration.
• Fine-tuning and Retrieval-augmented generation (RAG)
• Papers focusing on specific emergent phenomena (reasoning,
creativity, double descent phenomena etc.)
The website for the call for papers is accessible at
https://sites.google.com/view/emergenn/call-for-papers
Organising Committee:
------------------------------
• Dr. Özge Alacam (Ludwig-Maximilian University & Uni Bielefeld,
Germany)
• Dr. Michiel Straat (Uni Bielefeld, Germany)
• Prof. Dr. Hinrich Schütze (Ludwig-Maximilian University, Germany)
• Prof. Dr. Alessandro Sperduti (University of Padova, Italy)
Important Dates:
------------------------------
• January 15, 2024 - Paper Submission Deadline
• March 15, 2024 - Notification of Acceptance
• May 1, 2024 - Camera-ready Deadline & Early
Registration Deadline
• June 30 - July 5, 2024 - Main Conference (IEEE WCCI 2024,
Yokohama, Japan)
* All deadlines are 11:59 PM UTC-12:00 ("anywhere on Earth")
Submission Format and Platform:
------------------------------
• Submissions will be through the IEEE WCCI 2024 Submission page
<https://edas.info/login.php?rurl=aHR0cHM6Ly9lZGFzLmluZm8vTjMxNjE0P2M9MzE2MT…>.
• Each paper is limited to 8 pages, including figures, tables,
and references. Please refer to the author guidelines provided by IEEE
WCCI 2024
• Please specify during the submission that your paper is
intended for the Special Session: Emergent Phenomena in Deep
Representations and Large Language Models.
• Special session webpage:
https://sites.google.com/view/emergenn/call-for-papers
• IEEE WCCI 2024 webpage: https://2024.ieeewcci.org/
Contact information:
------------------------------
• Özge Alacam : oezge.alacam(a)uni-bielefeld.de
• Michiel Straat : mstraat(a)techfak.uni-bielefeld.de
Second Call for papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024
Website:
https://mormor-karl.github.io/events/CALD-pseudo/
Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/
Submission Deadline: Monday, 18 December 2023
We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* December 18, 2023: paper submission deadline
* January 17, 2024: resubmission of already pre-reviewed ARR papers
* January 20, 2024: notification of acceptance
* January, 30 2024: camera-ready papers due
* March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL)
[Introduction]
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected.
[Topics of Interest]
CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as
* automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person;
* developing context-sensitive algorithms for replacement of personal information in unstructured data;
* studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization;
* effectiveness of pseudonymization as a way of protecting writer identity;
*
reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data;
* constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those;
* approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data;
* pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance;
* and numerous other open questions.
[Submission Guidelines]
Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions
* Short papers (up to 4 pages) for ongoing or preliminary work
All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files
Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/
Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.)
We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop.
[Invited speakers]
We are happy to announce that the workshop will host two invited speakers:
*
Anders Søgaard, University of Copenhagen, Denmark
*
Ildikó Pilán, the Norwegian Computing Center, Norway
[Workshop Organizers]
* Elena Volodina, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Simon Dobnik, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå university, Sweden
[Program Committee]
A list of program committee members is available on the workshop website.
[Contact]
For inquiries, please contact mormor.karl(a)svenska.gu.se
ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat…
___________________
Elena Volodina, PhD, Docent
https://spraakbanken.gu.se/en/about/staff/elena
Life is like a mirror. Smile at it and it smiles back at you.
Peace Pilgrim
Dear corpora-list members,
We are announcing the first SemEval shared task on Semantic Textual
Relatedness (STR): A shared task on automatically detecting the degree of
semantic relatedness (closeness in meaning) between pairs of sentences.
The semantic relatedness of two language units has long been considered
fundamental to understanding meaning (Halliday and Hasan, 1976; Miller and
Charles, 1991), and automatically determining relatedness has many
applications such as evaluating sentence representation methods, question
answering, and summarization.
Two sentences are considered semantically similar when they have a
paraphrasal or entailment relation. On the other hand, relatedness is a
much broader concept that accounts for all the commonalities between two
sentences: whether they are on the same topic, express the same view,
originate from the same time period, one elaborates on (or follows from)
the other, etc. For instance, for the following sentence pairs:
-
Pair 1: a. There was a lemon tree next to the house. b. The boy enjoyed
reading under the lemon tree.
-
Pair 2: a. There was a lemon tree next to the house. b. The boy was an
excellent football player.
Most people will agree that the sentences in pair 1 are more related than
the sentences in pair 2.
In this task, new textual datasets will be provided for Afrikaans
<https://en.wikipedia.org/wiki/Afrikaans>, Algerian Arabic
<https://en.wikipedia.org/wiki/Algerian_Arabic>, Amharic
<https://en.wikipedia.org/wiki/Amharic>, English, Hausa
<https://en.wikipedia.org/wiki/Hausa_language>, Hindi
<https://en.wikipedia.org/wiki/Hindi>, Indonesian
<https://en.wikipedia.org/wiki/Indonesian_language>, Kinyarwanda
<https://en.wikipedia.org/wiki/Kinyarwanda>, Marathi
<https://en.wikipedia.org/wiki/Marathi_language>, Moroccan Arabic
<https://en.wikipedia.org/wiki/Moroccan_Arabic>, Modern Standard Arabic
<https://en.wikipedia.org/wiki/Modern_Standard_Arabic>, Punjabi
<https://en.wikipedia.org/wiki/Punjabi_language>, Spanish
<https://en.wikipedia.org/wiki/Spanish_language>, and Telugu
<https://en.wikipedia.org/wiki/Telugu_language>.
Data
Each instance in the training, development, and test sets is a sentence
pair. The instance is labeled with a score representing the degree of
semantic textual relatedness between the two sentences. The scores can
range from 0 (maximally unrelated) to 1 (maximally related). These gold
label scores have been determined through manual annotation. Specifically,
a comparative annotation approach was used to avoid known limitations of
traditional rating scale annotation methods This comparative annotation
process (which avoids several biases of traditional rating scales) led to a
high reliability of the final relatedness rankings.
Further details about the task, the method of data annotation, how STR is
different from semantic textual similarity, applications of semantic
textual relatedness, etc. can be found in this paper:
https://aclanthology.org/2023.eacl-main.55.pdf
Tracks
Each team can provide submissions for one, two or all of the tracks shown
below:
Track A: Supervised
Participants are to submit systems that have been trained using the labeled
training datasets provided. Participating teams are allowed to use any
publicly available datasets (e.g., other relatedness and similarity
datasets or datasets in any other languages). However, they must report
additional data they used, and ideally report how impactful each resource
was on the final results.
Track B: Unsupervised
Participants are to submit systems that have been developed without the use
of any labeled datasets pertaining to semantic relatedness or semantic
similarity between units of text more than two words long in any language.
The use of unigram or bigram relatedness datasets (from any language) is
permitted.
Track C: Cross-lingual
Participants are to submit systems that have been developed without the use
of any labeled semantic similarity or semantic relatedness datasets in the
target language and with the use of labeled dataset(s) from at least one
other language. Note: Using labeled data from another track is mandatory
for submission to this track.
Deciding which track a submission should go to:
-
If a submission uses labeled data in the target language: submit to
Track A
-
If a submission does not use labeled data in the target language but
uses labeled data from another language: submit to Track C
-
If a submission does not use labeled data in any language: submit to
Track B
** Here ‘labeled data’ refers to labeled datasets pertaining to semantic
relatedness or semantic similarity between units of text more than two
words long.
Evaluation
The official evaluation metric for this task is the Spearman rank
correlation coefficient, which captures how well the system-predicted
rankings of test instances align with human judgments. You can find the
evaluation script for this shared task on our Github page
<https://github.com/semantic-textual-relatedness/Semantic_Relatedness_SemEva…>
.
Helpful Links
-
Competition Website: https://codalab.lisn.upsaclay.fr/competitions/15704
-
Task Website: <https://afrisenti-semeval.github.io/>
https://semantic-textual-relatedness.github.io
-
Twitter X: <https://twitter.com/AfriSenti2023>
https://twitter.com/SemRel2024
-
Contact organisers semrel-semeval-organisers(a)googlegroups.com
-
Google group for participants semrel
-semeval-participants(a)googlegroups.com
Important Dates
-
Training data ready: 11 September 2023
-
Evaluation Starts: 10 January 2024
-
Evaluation End: 31 January 2024
-
System Description Paper Due: February 2024
-
SemEval workshop: Summer 2024 - (co-located with NAACL 2024)
NB. We will organise a mentorship session in January and a system
description writing tutorial in February for all participants, especially
students and junior researchers.
References
-
Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko. 2019. Big BiRD: A
Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic
Composition. Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language
Technologies.
-
M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. London:
Longman.
-
George A Miller and Walter G Charles. 1991. Contextual Correlates of
Semantic Similarity. Language and Cognitive Processes, 6(1):1–28
-
Mohamed Abdalla, Krishnapriya Vishnubhotla, and Saif Mohammad. 2023.
What Makes Sentences Semantically Related? A Textual Relatedness Dataset
and Empirical Study. In Proceedings of the 17th Conference of the European
Chapter of the Association for Computational Linguistics, pages 782–796,
Dubrovnik, Croatia. Association for Computational Linguistics.
Task Organizers
Nedjma Ousidhoum
Shamsuddeen Hassan Muhammad
Mohamed Abdalla
Krishnapriya Vishnubhotla
Vladimir Araujo
Meriem Beloucif
Idris Abdulmumin
Seid Muhie Yimam
Nirmal Surange
Christine De Kock
Sanchit Ahuja
Oumaima Hourrane
Manish Shrivastava
Alham Fikri Aji
Thamar Solorio
Saif M. Mohammad