*Asia Pacific Journal of Corpus Research (APJCR) is now available online:*
http://icr.or.kr/ejournals-apjcr
*The Incredible Shrinking Noun Phrase: Ongoing Change in Japanese Word
Formation*Kevin Heffernan, (Kwansei Gakuin University), JAPAN; Yusuke
Imanishi (Kwansei Gakuin University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.1
________________________________________
*Identifying Key Grammatical Errors of Japanese English as a Foreign
Language Learners in a Learner Corpus: Toward Focused Grammar Instruction
with Data-Driven Learning*
Atsushi Mizumoto (Kansai University), JAPAN; Yoichi Watari (Chukyo
University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.25
________________________________________
*A Comparison of the Constructions Make / Take a Decision in Malaysian
English with the Supervarieties *
Christina Sook Beng Ong (Wawasan Open University), MALAYSIA
DOI: https://doi.org/10.22925/apjcr.2023.4.1.43
________________________________________
*Effects of Corpus Use on Error Identification in L2 Writing *
Yoshiho Satake (Aoyama Gakuin University), JAPAN
DOI: https://doi.org/10.22925/apjcr.2023.4.1.61
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Associate Professor | Department of English Language and Literature,
Incheon National University, *South Korea*
President | The Korea Association of Secondary English Education, *South
Korea *(http://kasee.org)
Vice President | The Korea Association of Primary English Education), *South
Korea *(http://kapee.or.kr)
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor-in-Chief | Asia Pacific Journal of Corpus Research, ICR,
*International* (http://icr.or.kr/apjcr)
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
H(EN): http://ckjung.org
== 12th NLP4CALL, Tórshavn, Faroe Islands==
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.
The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.
The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.
We welcome papers:
- that describe research directly aimed at ICALL;
- that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
- that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
- that discuss challenges and/or research agenda for ICALL
- that describe empirical studies on language learner data.
This year a special focus is given to work done on error detection/correction and feedback generation.
We encourage paper presentations and software demonstrations describing the above- mentioned themes primarily, but not exclusively, for the Nordic languages.
==Shared task==
NEW for this year is the MultiGED shared task on token-level error detection for L2 Czech, English, German, Italian and Swedish, organized by the Computational SLA working group.
For more information, please see the Shared Task website: https://github.com/spraakbanken/multiged-2023
==Invited speakers==
This year, we have the pleasure to announce two invited talks.
The first talk is given by Marije Michel from the University of Amsterdam.
The second talk is given by Pierre Lison from the Norwegian Computing Center.
==Submission information==
Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references.
We will be using the NLP4CALL template for the workshop this year. The author kit can be accessed here, alternatively on Overleaf:
<https://spraakbanken.gu.se/sites/default/files/2023/NLP4CALL%20workshop%20t…>
<https://spraakbanken.gu.se/sites/default/files/2023/nlp4call%20template.doc>
<https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…>
Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2023>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.
Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>).
==Important dates==
03 April 2023: paper submission deadline
21 April 2023: notification of acceptance
01 May 2023: camera-ready papers for publication
22 May 2023: workshop date
==Organizers==
David Alfter (1), Elena Volodina (2), Thomas François (3), Arne Jönsson (4), Evelina Rennes (4)
(1) Gothenburg Research Infrastructure for Digital Humanities, Department of Literature, History of Ideas, and Religion, University of Gothenburg, Sweden
(2) Språkbanken, Department of Swedish, Multilingualism, Language Technology, University of Gothenburg, Sweden
(3) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium
(4) Department of Computer and Information Science, Linköping University, Sweden
==Contact==
For any questions, please contact David Alfter, david.alfter(a)gu.se
For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…>
Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>
[Apologies for cross-posting]
Dear colleagues
We are inviting submissions for the next issue of Asia Pacific Journal of
Corpus Research, to appear on 31 December 2023.
*ABOUT*The Asia Pacific Journal of Corpus Research (APJCR, e-ISSN
2733-8096, DOI: https://doi.org/10.22925/apjcr) is an international and
interdisciplinary peer-reviewed journal intended to explore corpus research
in the Asia Pacific region. APJCR addresses areas of methodological,
applied and theoretical work in the field of corpus research. Examples of
such include discourse analysis, lexical studies, grammatical studies,
language acquisition, language learning, language education, lexicography,
pragmatics, sociolinguistics, (machine) translation studies, (digital)
literary studies, computational linguistics, speech, phonetics, deep
learning and natural language understanding in conjunction with corpus.
*NO ARTICLE PROCESS CHARGE*APJCR does not charge authors an Article
Processing Fee (APF).
*OPEN ACCESS POLICY*APJCR provides open access to its content under the
principle in the academic field that making research freely available to
the public supports a greater global exchange of knowledge.
*SUBMISSION*
Papers (in English or Korean) should be sent to *apjcreditor(a)icr.or.kr
<apjcreditor(a)icr.or.kr>*
*Full instruction can be found on http://icr.or.kr/apjcr
<http://icr.or.kr/apjcr>*
*IMPORTANT DATES*- Manuscript submission: 15 October 2023
- First decision (articles assessed by editors): October 2023
- Final decision: November 2023
- Production: December 2023
- Online publication: 31 December 2023
*APJCR ARCHIVE*- Google Scholar:
https://scholar.google.co.kr/scholar?hl=ko&as_sdt=0%2C5&q=apjcr&btnG=
- KoreaScience: http://koreascience.or.kr/journal/CPSOBX/v1n1.page
*ENQUIRIES*
help(a)icr.or.kr
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Associate Professor | Department of English Language and Literature,
Incheon National University, *South Korea*
President | The Korea Association of Secondary English Education, *South
Korea *(http://kasee.org)
Vice President | The Korea Association of Primary English Education), *South
Korea *(http://kapee.or.kr)
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor-in-Chief | Asia Pacific Journal of Corpus Research, ICR,
*International* (http://icr.or.kr/apjcr)
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
**
*SECOND CALL FOR PAPERS: EACL 2024 STUDENT RESEARCH WORKSHOP *
*
Student Research Workshop co-located with EACL 2024 in St. Julians, Malta.
Workshop Dates: March 21/22 2024
***Paper Submission Deadline: December 18, 2023 (Direct) and January 17,
2024 (through ARR)***
**
About the Student Research Workshop
**
The EACL 2024 Student Research Workshop (SRW) is a forum to bring
together students investigating various areas of Computational
Linguistics and Natural Language Processing. The workshop provides an
excellent opportunity for participants to present their work and to
receive mentorship and valuable feedback from the international research
community.
The workshop's goal is to aid students at multiple stages of their
education, including undergraduate, MSc/MA, junior and senior PhD
students, in getting familiar with conducting and presenting their
research.
General Invitation for Submission*
We invite papers in two different categories:
*
**
*
Thesis Proposals: This category is appropriate for PhD students who
have decided on a thesis topic and wish to get feedback on their
proposal and broader ideas for their continuing work.
*
Research Papers: Papers in this category can describe completed
work, or work in progress with preliminary results. For these
papers, the first author **MUST BE** a current student (graduate or
undergraduate). Topics of interest for the SRW are the same as for
the main EACL 2024
conference:<https://www.2022.aclweb.org/calls>https://2024.eacl.org/calls/papers/
<https://2024.eacl.org/calls/papers/>
We are opening a unique opportunity for the submission of research
papers that, while not accepted to the EACL main conference, align well
with the themes of this workshop. To be eligible for submission, the
first author must be a current student. Additionally, submissions should
be complemented with the reviews from ARR to provide context and
insights for evaluation. The submission deadline for this will be
January 17, 2024.
Why Submit to EACL SRW?
*
Mentorship program: EACL SRW provides a unique opportunity for
students to receive constructive feedback and advise from more
senior researchers through our on-site mentorship program.
*
Improving your publication record: Publishing a paper as an
undergraduate or as a MSc/MA student is beneficial when applying for
a PhD program. Publishing a paper in an EACL SRW workshop can be
really helpful for improving students’ publication records.
*
Negative results: we encourage the submission of studies with
negative results providing insights on why and in which scenarios a
particular method fails.
All accepted papers and thesis proposals will be presented in the main
conference poster sessions, which will give students an opportunity to
interact with and to present their work to a large and diverse audience,
including top researchers in the field and assigned mentors.**
*Important Dates*
****
*
Direct Workshop paper submission: December 18, 2023
*
Pre-reviewed ARR paper submission: January 17, 2024
*
Notification of acceptance: January 20, 2024
*
Camera-ready deadline: January 30 2024
*
Workshop dates: March 21-22, 2024
All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
**
Submission Requirements
**
We accept both archival submissions (which will be included in the
conference proceedings) and non-archival submissions (which will be
presented at the workshop but will not be included in the proceedings).
**
The archival submissions must follow the anonymity period and the
restrictions of the main conference.
Short papersconsist of up to four (4) pages of content, plus unlimited
references. Upon acceptance, they will be given five (5) content pages
in the proceedings.
Long papersconsist of up to eight (8) pages of content, plus unlimited
references. Upon acceptance, they will be given nine (9) content pages
in the proceedings.
Thesis proposalsconsist of up to eight (8) pages of content, plus
unlimited references. The title must begin with “Thesis Proposal:”. Upon
acceptance, they will be given nine (9) content pages in the proceedings.
We strongly recommend the use of the official ARR style templates. The
paper templates are available as an Overleaf template and can also be
downloaded directly (LaTeX and Word) via
https://aclrollingreview.org/cfp <https://aclrollingreview.org/cfp>under
'Paper Submission and Templates'.
All submissions must be in PDF format. Submissions that do not adhere to
the above author guidelines or ACL policies will be rejected without
review.
Submission is electronic, using the OpenReview conference management.
The submission link is available here:
https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SRW
<https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SRW>
Grants
We expect to have grants to offset some portion of students' travel,
conference registration, and accommodation expenses. Further details
will be posted on the SRW website.
To contact the organizers of the workshop, please email us at:
eaclsrw(a)gmail.com
Website and Contact Information
For more information, please visit
https://sites.google.com/view/eacl2024srw
<https://sites.google.com/view/eacl2024srw>and follow us on Twitter
@eacl_srw. To contact the organizers of the workshop, please email us at
eaclsrw(a)gmail.com*
Second Call for papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024
Website:
https://mormor-karl.github.io/events/CALD-pseudo/
Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/
Submission Deadline: Monday, 18 December 2023
We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* December 18, 2023: paper submission deadline
* January 17, 2024: resubmission of already pre-reviewed ARR papers
* January 20, 2024: notification of acceptance
* January, 30 2024: camera-ready papers due
* March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL)
[Introduction]
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected.
[Topics of Interest]
CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as
* automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person;
* developing context-sensitive algorithms for replacement of personal information in unstructured data;
* studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization;
* effectiveness of pseudonymization as a way of protecting writer identity;
*
reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data;
* constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those;
* approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data;
* pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance;
* and numerous other open questions.
[Submission Guidelines]
Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions
* Short papers (up to 4 pages) for ongoing or preliminary work
All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files
Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/
Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.)
We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop.
[Invited speakers]
We are happy to announce that the workshop will host two invited speakers:
*
Anders Søgaard, University of Copenhagen, Denmark
*
Ildikó Pilán, the Norwegian Computing Center, Norway
[Workshop Organizers]
* Elena Volodina, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Simon Dobnik, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå university, Sweden
[Program Committee]
A list of program committee members is available on the workshop website.
[Contact]
For inquiries, please contact mormor.karl(a)svenska.gu.se
ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat…
___________________
Elena Volodina, PhD, Docent
https://spraakbanken.gu.se/en/about/staff/elena
Life is like a mirror. Smile at it and it smiles back at you.
Peace Pilgrim
Dear corpora-list members,
We are announcing the first SemEval shared task on Semantic Textual
Relatedness (STR): A shared task on automatically detecting the degree of
semantic relatedness (closeness in meaning) between pairs of sentences.
The semantic relatedness of two language units has long been considered
fundamental to understanding meaning (Halliday and Hasan, 1976; Miller and
Charles, 1991), and automatically determining relatedness has many
applications such as evaluating sentence representation methods, question
answering, and summarization.
Two sentences are considered semantically similar when they have a
paraphrasal or entailment relation. On the other hand, relatedness is a
much broader concept that accounts for all the commonalities between two
sentences: whether they are on the same topic, express the same view,
originate from the same time period, one elaborates on (or follows from)
the other, etc. For instance, for the following sentence pairs:
-
Pair 1: a. There was a lemon tree next to the house. b. The boy enjoyed
reading under the lemon tree.
-
Pair 2: a. There was a lemon tree next to the house. b. The boy was an
excellent football player.
Most people will agree that the sentences in pair 1 are more related than
the sentences in pair 2.
In this task, new textual datasets will be provided for Afrikaans
<https://en.wikipedia.org/wiki/Afrikaans>, Algerian Arabic
<https://en.wikipedia.org/wiki/Algerian_Arabic>, Amharic
<https://en.wikipedia.org/wiki/Amharic>, English, Hausa
<https://en.wikipedia.org/wiki/Hausa_language>, Hindi
<https://en.wikipedia.org/wiki/Hindi>, Indonesian
<https://en.wikipedia.org/wiki/Indonesian_language>, Kinyarwanda
<https://en.wikipedia.org/wiki/Kinyarwanda>, Marathi
<https://en.wikipedia.org/wiki/Marathi_language>, Moroccan Arabic
<https://en.wikipedia.org/wiki/Moroccan_Arabic>, Modern Standard Arabic
<https://en.wikipedia.org/wiki/Modern_Standard_Arabic>, Punjabi
<https://en.wikipedia.org/wiki/Punjabi_language>, Spanish
<https://en.wikipedia.org/wiki/Spanish_language>, and Telugu
<https://en.wikipedia.org/wiki/Telugu_language>.
Data
Each instance in the training, development, and test sets is a sentence
pair. The instance is labeled with a score representing the degree of
semantic textual relatedness between the two sentences. The scores can
range from 0 (maximally unrelated) to 1 (maximally related). These gold
label scores have been determined through manual annotation. Specifically,
a comparative annotation approach was used to avoid known limitations of
traditional rating scale annotation methods This comparative annotation
process (which avoids several biases of traditional rating scales) led to a
high reliability of the final relatedness rankings.
Further details about the task, the method of data annotation, how STR is
different from semantic textual similarity, applications of semantic
textual relatedness, etc. can be found in this paper:
https://aclanthology.org/2023.eacl-main.55.pdf
Tracks
Each team can provide submissions for one, two or all of the tracks shown
below:
Track A: Supervised
Participants are to submit systems that have been trained using the labeled
training datasets provided. Participating teams are allowed to use any
publicly available datasets (e.g., other relatedness and similarity
datasets or datasets in any other languages). However, they must report
additional data they used, and ideally report how impactful each resource
was on the final results.
Track B: Unsupervised
Participants are to submit systems that have been developed without the use
of any labeled datasets pertaining to semantic relatedness or semantic
similarity between units of text more than two words long in any language.
The use of unigram or bigram relatedness datasets (from any language) is
permitted.
Track C: Cross-lingual
Participants are to submit systems that have been developed without the use
of any labeled semantic similarity or semantic relatedness datasets in the
target language and with the use of labeled dataset(s) from at least one
other language. Note: Using labeled data from another track is mandatory
for submission to this track.
Deciding which track a submission should go to:
-
If a submission uses labeled data in the target language: submit to
Track A
-
If a submission does not use labeled data in the target language but
uses labeled data from another language: submit to Track C
-
If a submission does not use labeled data in any language: submit to
Track B
** Here ‘labeled data’ refers to labeled datasets pertaining to semantic
relatedness or semantic similarity between units of text more than two
words long.
Evaluation
The official evaluation metric for this task is the Spearman rank
correlation coefficient, which captures how well the system-predicted
rankings of test instances align with human judgments. You can find the
evaluation script for this shared task on our Github page
<https://github.com/semantic-textual-relatedness/Semantic_Relatedness_SemEva…>
.
Helpful Links
-
Competition Website: https://codalab.lisn.upsaclay.fr/competitions/15704
-
Task Website: <https://afrisenti-semeval.github.io/>
https://semantic-textual-relatedness.github.io
-
Twitter X: <https://twitter.com/AfriSenti2023>
https://twitter.com/SemRel2024
-
Contact organisers semrel-semeval-organisers(a)googlegroups.com
-
Google group for participants semrel
-semeval-participants(a)googlegroups.com
Important Dates
-
Training data ready: 11 September 2023
-
Evaluation Starts: 10 January 2024
-
Evaluation End: 31 January 2024
-
System Description Paper Due: February 2024
-
SemEval workshop: Summer 2024 - (co-located with NAACL 2024)
NB. We will organise a mentorship session in January and a system
description writing tutorial in February for all participants, especially
students and junior researchers.
References
-
Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko. 2019. Big BiRD: A
Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic
Composition. Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language
Technologies.
-
M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. London:
Longman.
-
George A Miller and Walter G Charles. 1991. Contextual Correlates of
Semantic Similarity. Language and Cognitive Processes, 6(1):1–28
-
Mohamed Abdalla, Krishnapriya Vishnubhotla, and Saif Mohammad. 2023.
What Makes Sentences Semantically Related? A Textual Relatedness Dataset
and Empirical Study. In Proceedings of the 17th Conference of the European
Chapter of the Association for Computational Linguistics, pages 782–796,
Dubrovnik, Croatia. Association for Computational Linguistics.
Task Organizers
Nedjma Ousidhoum
Shamsuddeen Hassan Muhammad
Mohamed Abdalla
Krishnapriya Vishnubhotla
Vladimir Araujo
Meriem Beloucif
Idris Abdulmumin
Seid Muhie Yimam
Nirmal Surange
Christine De Kock
Sanchit Ahuja
Oumaima Hourrane
Manish Shrivastava
Alham Fikri Aji
Thamar Solorio
Saif M. Mohammad
Call for Papers
2024 CORE Project Workshop Unpacking Efficient Communication: The Roles of
Cognitive Bias and Extralinguistic Context in Referring Expression Choice
When: April 18-19, 2024
Where: Universitat Pompeu Fabra, Barcelona
Language offers a rich set of lexical and syntactic options for reference,
reflecting the different ways we can choose to identify,
describe, categorize, and differentiate the entities and events we talk
about. For example, in any given context, a speaker can choose between a
more or less specific expression (the dog, the spotted dog, the Dalmatian),
or between expressions that convey complementary information about the
referent (the woman, the skier). A well-established line of research
highlights the role of efficiency in referring expression choice. But what
makes a referring expression “efficient”? Efficiency in communication has
been frequently characterized in terms of an informativity/effort
trade-off, with informativity operationalized in terms of inference, and
effort, in terms of cognitive or physical cost (Horn 1984, Levshina 2021).
However, there is also evidence that other factors such as the salience of
visual features (e.g., color, Rubio-Fernández 2016) or the prototypicality
of an entity as an exemplar of a category (see, e.g., Degen, et al. 2020)
can lead speakers to use expressions that are, strictly speaking,
overinformative in the narrowest sense of the term. Efficiency can also be
examined at the level of the whole system; for instance, Brochhagen and
Boleda (2022) argue that the informativity/effort trade-off helps explain
cross-linguistic patterns in colexification, or how meanings are organized
in the lexicon.
The goal of this workshop, supported by the Spanish AEI-funded CORE project
(“COntextual effects in the choice of Referring Expressions for visually
presented entities”, PID2020-112602GB-I00), is to dig deeper into what
makes a linguistic expression “efficient”, considering factors such as:
- Cognitive biases that influence the potential for rapid/efficient
discrimination.
- Potential for exploiting inferences due to choice of one expression vs.
another.
- Information load a referring expression has to bear given extralinguistic
sources of information in the context, especially visual information.
- Lexical/constructional frequency effects and association strength between
RE options and the referent in question.
The workshop aims to give a forum to new and especially exploratory
research in this area. The workshop will include a combination of invited
talks, presentations of ongoing research by project members, and
presentations and/or posters selected in this open call.
We invite submissions on topics including, but not limited to:
- The general principles that intervene in efficient communication,
especially alternatives to or refined definitions of notions such as
“efficiency”, “effort”, and “informativity”.
- Which features of entities or events are more likely to be used for
discrimination.
- The role of the visual context and/or distractor entities in influencing
RE choice; more generally, the role of multi-modal aspects.
- The role of the implicit semantic organization of RE alternatives and the
conventionalized division of labor between them, especially organization
based on implicative semantic relations (e.g. hyponymy, troponymy).
- The factors influencing the choice among alternative
cross-classifications of a target referent (e.g. the choice between
“taxonomic” descriptions such as woman vs. role-based descriptions such as
skier).
- The dynamics between reference and the linguistic system, that is, how
efficient communication is enabled by and at the same time transforms a
given language.
We take a methodologically pluralistic approach and thus welcome
presentations on experimental studies, analysis of corpus data,
computational modeling, critiques or analyses of published research, as
well as position papers.
Invited speakers:
Lilia Rissman, University of Wisconsin - Madison
Paula Rubio-Fernández, Max Planck Institute for Psycholinguistics
Sina Zarrieß, University of Bielefeld
Abstract guidelines: Abstracts should not exceed 2 pages in length (A4 or
letter-size), in 12 pt. font, with 1-inch/2,5-cm margins; a third page can
be used for references, data, and figures. Please indicate whether you want
the submission to be considered for a paper, a poster, or either. Abstracts
should be submitted to EasyChair at the following link:
https://easychair.org/conferences/?conf=core2024.
Important dates:
Deadline for abstract submission: December 20, 2023
Notification of acceptance: January 15, 2024
Workshop dates: April 18-19, 2024
Organizers: Louise McNally, Gemma Boleda, Jialing Liang, Marina Bolea.
References:
Degen, J., Hawkins, R. D., Graf, C., Kreiss, E., & Goodman, N. D. (2020).
When redundancy is useful: A Bayesian approach to “overinformative”
referring expressions. Psychological Review, 127(4), 591–621.
Gualdoni, E., T. Brochhagen, A. Mädebach, G. Boleda. 2023. What's in a
name? A large-scale computational study on how competition between names
affects naming variation. Journal of Memory and Language, 133, 104459.
Brochhagen, T., G. Boleda. 2022. When do languages use the same word for
different meanings? The Goldilocks Principle in colexification. Cognition,
226, 105179.
Horn, L.R. (1984). Towards a new taxonomy for pragmatic inference: Q-based
and R-based implicature. In Schiffrin, D. (ed.), Meaning, Form, and Use in
Context: Linguistic Applications, 11-42. Georgetown University Press,
Washington, DC. Levshina, N. (2023). Communicative
efficiency: Language structure and use. Cambridge: Cambridge University
Press.
Rissman, L., & Lupyan, G. (2022). A Dissociation Between Conceptual
Prominence and Explicit Category Learning: Evidence From Agent and Patient
Event Roles. Journal of Experimental Psychology: General, 151(7):1707-1732.
Rubio-Fernandez, P., Mollica, F., & Jara-Ettinger, J. (2021). Speakers and
listeners exploit word order for communicative efficiency: A
cross-linguistic investigation. Journal of Experimental Psychology:
General, 150(3), 583–594.
Schüz, S., Han, T., Zarrieß, S. (2021) Diversity as a By-Product:
Goal-oriented Language Generation Leads to Linguistic Variation.
Proceedings of the 22nd Annual SIGdial Meeting on Discourse and Dialogue.
Association for Computational Linguistics.
Second CFP: The 6th Workshop on Research in Computational Linguistic
Typology and Multilingual NLP (SIGTYP 2024)
To be held at EACL 2024 (March 21 or 22, 2024 Malta)
Website: https://sigtyp.github.io/
Submission website:
https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SIGTYP
<https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SIGTYP>
Submission deadline: December 18, 2023 We invite submissions to the 6th
edition of the SIGTYP workshop on Research in Computational Linguistic
Typology and Multilingual NLP, to be held at EACL 2024 on March 21 or
22, 2024.
Workshop description
The aim of the 6th edition of SIGTYP workshop is to act as a platform
and a forum for the exchange of information between typology-related
research, multilingual NLP, and other research areas that can lead to
the development of truly multilingual NLP methods. The workshop is
specifically aimed at raising awareness of linguistic typology and its
potential in supporting and widening the global reach of multilingual
NLP, as well as at introducing computational approaches to linguistic
typology. It will foster research and discussion on open problems, not
only within the active community working on cross- and multilingual NLP
but also inviting input from leading researchers in linguistic typology.
Our workshop will serve as a platform to enable fruitful discussions. In
2024, we additionally focus on bridging the gap between cross-linguistic
and universal annotation, models, and technology.
SIGTYP is the first dedicated venue for typology-related research and
its integration in multilingual NLP. Appropriate topics include (but are
not limited to) the following as they relate to the areas of the workshop:
*
Integration of typological features in language transfer and joint
multilingual learning. In addition to established techniques such as
“selective sharing”, are there alternative ways to encoding
heterogeneous external knowledge in machine learning algorithms?
*
Development of unified taxonomy and resources. Building universal
databases and models to facilitate understanding and processing of
diverse languages.
*
Automatic inference of typological features. The pros and cons of
existing techniques (e.g. heuristics derived from morphosyntactic
annotation, propagation from features of other languages, supervised
Bayesian and neural models) and discussion on emerging ones.
*
Typology and interpretability. The use of typological knowledge for
interpretation of hidden representations of multilingual neural
models, multilingual data generation and selection, and typological
annotation of texts.
*
Improvement and completion of typological databases. Combining
linguistic knowledge and automatic data-driven methods towards the
joint goal of improving the knowledge on cross-linguistic variation
and universals.
*
Linguistic diversity and universals. Challenges of cross-lingual
annotation. Which linguistic phenomena or categories should be
considered universal? How should they be annotated?
*
Language-specific studies to support or contradict universals.
Framing a study on 1-3 languages that would shed more light on common
linguistic structures and properties.
*
Extra topics also include: generation of constructed languages,
universals in diachronic languages changes, information-theoretic
approaches to typology, automated approaches to etymology.
Important Dates (all deadlines are 23:59 AoE)
— December 18, 2023: Paper submission deadline
— January 20, 2024: Notification of acceptance
— January 30, 2024: Camera-ready deadline
— March 21 or 22, 2024: Workshop
Submissions
We invite both extended abstract submissions (non-archival) and general
paper submissions (archival). The accepted submissions will be presented
at the workshop, providing new insights and ideas. Extended abstracts
should describe already published work or work in progress and should
not exceed two (2) pages. This way, we will not discourage researchers
from preferring main conference proceedings, at the same time ensuring
that interesting and thought-provoking research is presented at the
workshop. For general (archival) submissions we accept both long and
short papers. Short papers should not exceed four (4) pages, long papers
should not exceed eight (8) pages papers. Unlimited additional pages are
allowed for the references section in all submission types.
Submissions should be anonymous, without authors or an acknowledgement
section; self-citations should appear in third person.
Submissions must follow the EACL 2024 stylesheet
https://github.com/acl-org/acl-style-files
<https://github.com/acl-org/acl-style-files>; both long and short paper
submissions must follow the two-column format of ACL proceedings. All
submissions must be in PDF format.
These should be submitted via OpenReview:
https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SIGTYP
<https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SIGTYP>.
ARR submissions that were rejected or withdrawn from EACL can be
submitted to SIGTYP by January 17, 2024. We will create a web form for
submitting, and announce it at
https://sigtyp.github.io/sigtyp-cfp2024.html by January 15, 2024.
Acceptance decisions will be made based on the existing ARR reviews.
Authors will be notified by January 20, 2024.
*Shared Task*
In 2024, SIGTYP is hosting a Word Embedding Evaluation for Ancient and
Historical Languages. More details can be found here:
https://sigtyp.github.io/st2024.html.
Organizing Committee
Michael Hahn, Rena Gao, Saliha Muradoglu, Yulia Otmakhova, Andreas
Shcherbakov, Oleg Serikov, Jinrui Yang, Alexey Sorokin, Priya Rani,
Ritesh Kumar, Ryan Cotterell, Edoardo M. Ponti, Kat Vylomova
Anti-harassment policy
The workshop follows the ACL anti-harassment policy:
https://www.aclweb.org/adminwiki/index.php?title=Anti-Harassment_Policy
<https://www.aclweb.org/adminwiki/index.php?title=Anti-Harassment_Policy>.
Contact
For any inquiries regarding the workshop, please send an email to the
Organizing Committee at sigtyp(a)gmail.com
First Call for papers: MOOMIN (the first workshop on Modular and Open Multilingual NLP) collocated with EACL 2024, March 21 or 22, 2024
Website: https://moomin-workshop.github.io/
Submission website: https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/MOOMIN
We invite submissions to the first edition of the MOOMIN workshop on Modular and Open Multilingual NLP, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* Workshop paper due: December 18, 2023
* Resubmission deadline (for pre-reviewed ARR & main conference submissions): January 17, 2024
* Notification of acceptance: January 20, 2024
* Camera-ready papers due: January 30 2024
* Workshop dates: March 21-22, 2024
[Introduction]
NLP in the age of monolithic large language models starts to hit the limits in terms of size and information that can be handled. The trend goes to modularization, a necessary step into the direction of designing smaller sub-networks and components with specialized functionality. This allows researchers to design scalable, wide-coverage, efficient and reusable models.
Multilingual NLP is today faced with a number of difficult challenges. Scaling a multilingual model to a high number of languages is prone to suffer from negative interference, also known as the curse of multilinguality, leading to degradation in per-language performance, while earlier approaches to improving model capacity have hit the ceiling in terms of hardware, data and training algorithms. At the same time, we as a community wish to foster the development of open components that can be shared, deployed and widely integrated within the broader research community without incurring computational costs that add to the overall carbon footprint of NLP engineering. Modularity is a practical solution to answer all of these challenges and more, as it offers a very promising set of tools towards increased multilinguality of larger foundation models, either during their pretraining or in a post-hoc post-pretraining manner.
[Topics of Interest]
With this in mind, the MOOMIN workshop invites contributions related but not limited to the following topics:
* mixture of expert models and gated routing
* modular pre-training of multilingual language and translation models
* effective transfer with modular architectures such as adapters and hypernetworks
* efficient parallelization and distribution of modular model training
* modular frameworks and architecture implementations
* massively multilingual models with large language coverage
* subnet selection and pruning
* modular distillation
* modular extensions of existing NLP models systems, especially in low-resource settings and for low-resource languages
* evaluation of modular systems in terms of performance, efficiency, and computational costs
* platforms for distributing, sharing, and integrating NLP components
[Submission Guidelines]
Authors are invited to submit original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions.
* Short papers (up to 4 pages) for ongoing or preliminary work.
All submissions must be in PDF format, submitted electronically via OpenReview (https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/MOOMIN) and should follow the EACL 2024 formatting guidelines (following the ARR CfP<https://aclrollingreview.org/cfp>: use the official ACL style templates, which are available here<https://github.com/acl-org/acl-style-files>).
We also intend to invite papers accepted to Findings to reach out to the organizing committee of MOOMIN to present their papers at the workshop, if in line with the topics as described above.
[Workshop Organizers]
* Timothee Mickus, University of Helsinki
* Jörg Tiedemann, University of Helsinki
* Ahmet Üstün, Cohere For AI
* Raúl Vázquez, University of Helsinki
* Ivan Vulić, University of Cambridge & PolyAI
[Program Committee]
A list of program committee members will be available on the workshop website.
[Contact]
For inquiries, please contact moomin.nlp.workshop(a)gmail.com<mailto:moomin.nlp.workshop@gmail.com>
The next meeting of the Edge Hill Corpus Research Group will take place online (via MS Teams) on Thursday 14 December 2023, 2:00-3:30 pm (UK time).
Topics: Discourse-Oriented Corpus Studies, Collocation Networks
Speakers: Dan Malone<https://independent.academia.edu/DanielMalone14> (Edge Hill University, UK) & Hanna Schmück<https://hannaschmueck.github.io/> (Lancaster University, UK)
Title: A pack of lone wolves? Exploring the nexus between the lone-wolf terrorist, Al-Qaeda, and ISIS in the British Press
Abstract
Following recent events in Belgium and Israel, the lone-wolf terrorist re-emerged in media reportage, with President Joe Biden<https://edition.cnn.com/2011/09/11/tv/biden-does-not-rule-out-possibility-o…> and former GCHQ Director Sir David Omand<https://inews.co.uk/news/uk-facing-heightened-threat-from-lone-wolf-terror-…> expressing concerns over potential attacks in the USA and UK. Days later, Belgian Prime Minister Alexander De Croo described the neutralised Brussels shooter as "probably a lone wolf,"<https://www.theguardian.com/world/2023/oct/17/killing-of-two-swedes-in-brus…> thus aiming to downplay the risk of subsequent incidents. Together, these instances exemplify that by shaping a "reality" (Entman, 2004), (in)security discourses can amplify or downplay a terrorist threat, in turn reflecting and/or influencing public perception and potentially guiding policy responses. Historically, the lone wolf has been associated with different movements, ranging from the propaganda of the deed in the 19th Century to the leaderless resistance of white-supremacist groups in the 1980s and 90s. More recently, it is within the domain of Islamist terrorism, often dominated by Al-Qaeda and ISIS, where the lone wolf has become increasingly associated, especially in the British press. In this joint presentation, we discuss the analytical approaches and results from our analysis of discourses surrounding the lone-wolf terrorist, al Qaeda, and ISIS in three diachronic sub-corpora of the Lone Wolf Corpus (Malone, 2020), a compilation of British Press articles from 2000 to 2019. In a unique methodological combination, we employed large-scale collocation networks and topical clustering to examine shifting discourses through collocational clusters, and applied a corpus-based critical discourse analysis to examine representations of the Al-Qaeda-ISIS nexus. Hanna introduces the methodology employed to generate topical clusters and discusses collocational changes and constants in emerging discourses surrounding the lone-wolf terrorist. The resulting patterns present a discursive shift from clusters related to causative factors (e.g., a mental health subcluster), towards the internationalisation and institutionalisation of lone-wolf terrorism, and finally to response management in the form of sentencing and punitive actions (e.g., a court proceedings/prison subcluster). Reporting on his corpus-based critical discourse analysis, Daniel presents the emergent representations surrounding co-occurrences of the node AL QAEDA with ISIS. These discourses were categorised into four modes of representation of presented relationship-types: Convergence, Association, Dissociation, and Divergence. These modes contributed to surrounding (in)security discourses that at times equate, promote and/or relegate different entities in a continual reshuffling of the threat hierarchy; a process termed here enmity reimagining.
References
Entman, R. (2004). Projections of Power: Framing News, Public Opinion, and U.S. Foreign Policy. The University of Chicago Press: London.
Malone, D. (2020). Developing a complex query to build a specialised corpus: Reducing the issue of polysemous query terms. Corpora and Discourse International Conference 2020.
You can register here:
https://store.edgehill.ac.uk/conferences-and-events/faculty-of-arts-and-sci…
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>