17th Workshop on Building and Using Comparable Corpora --- Call for Papers
Co-located with LREC-COLING 2024
Torino, Italia, 20 May 2024
Workshop website: https://comparable.limsi.fr/bucc2024/
LREC-COLING website: BLOCKEDlrec-coling-2024[.]org/BLOCKED
Workshop proceedings to be published in the ACL Anthology
MOTIVATION
In the language engineering and linguistics communities, research in comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is chiefly motivated by the need to use comparable corpora as training data for statistical NLP applications such as statistical and neural machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest because they enable cross-language discoveries and comparisons. It is generally accepted in both communities that comparable corpora consist of documents that are comparable in content and form in various degrees and dimensions across several languages. Parallel corpora are on the one end of this spectrum, unrelated corpora on the other.
Comparable corpora have been used in a range of applications, including Information Retrieval, Machine Translation, Cross-lingual text classification, etc. The linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of neural NLP, for example, to extract parallel corpora from comparable corpora for neural machine translation. As such, it is of great interest to bring together builders and users of such corpora.
TOPICS
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, paraphrases etc. from comparable corpora
- Induction of morphological, grammatical, and translation rules from comparable corpora
- Induction of multilingual word classes from comparable corpora
Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
IMPORTANT DATES
21 Feb 2024: Paper submission deadline
24 Mar 2024: Notification of acceptance
7 Apr 2024: Camera-ready final papers
20 May 2024: Workshop date
For updates, please see the workshop website at https://comparable.limsi.fr/bucc2024/
PRACTICAL INFORMATION
The workshop is an in-person event. Workshop registration is via the main conference registration site, see BLOCKEDlrec-coling-2024[.]org/BLOCKED
The workshop proceedings will be published in the ACL Anthology.
SUBMISSION GUIDELINES
Please follow the style sheet and templates (for LaTeX, Overleaf and MS-Word) provided for the main conference at BLOCKEDlrec-coling-2024[.]org/authors-kit/BLOCKED
Papers should be submitted as a PDF file using the START conference manager at https://secure-web.cisco.com/1UUoVNXimK0Jzna4dQKSutgJlLRB94SkbvGnq5AUpyqLNT…
Submissions must describe original and unpublished work and range from 4 to 8 pages plus unlimited references.
Reviewing will be double blind, so the papers should not reveal the authors' identity. Accepted papers will be published in the workshop proceedings, which will be included in the ACL Anthology.
Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately (i.e. as soon as known to the authors) notified to the workshop organizers by e-mail.
For further information and updates, please see the BUCC 2024 website: https://comparable.limsi.fr/bucc2024/
WORKSHOP ORGANIZERS
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay, France)
- Reinhard Rapp (University of Mainz and Magdeburg-Stendal University of Applied Sciences, Germany)
- Serge Sharoff (University of Leeds, United Kingdom)
Contact: pz (at) lisn (dot) fr
PROGRAMME COMMITTEE
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences, Iran)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan)
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (Language Weaver, Inc., USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Ayla Rigouts Terryn (KU Leuven, Belgium)
- Reinhard Rapp (University of Mainz and Magdeburg-Stendal University of Applied Sciences, Germany)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Serge Sharoff (University of Leeds, UK)
- Richard Sproat (OGI School of Science & Technology, USA)
- Tim Van de Cruys (KU Leuven, Belgium)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay, France)
*** First Call for Journal First Submissions ***
36th International Conference on Advanced Information Systems Engineering
(CAiSE'24)
June 3-7, 2024, 5* St. Raphael Resort and Marina, Limassol, Cyprus
https://cyprusconferences.org/caise2024/
(*** Submission Deadline: 31st March, 2024 AoE ***)
CAiSE 2024 is organising journal-first sessions as part of the scientific program. The aim of
these sessions is to disseminate recent important research contributions and spark
discussions between authors and researchers in the CAiSE community. Authors of selected
journal articles on CAiSE-related topics will be invited to present their work at the
conference.
SCOPE
For the journal-first sessions, we solicit submissions related to articles that have been
accepted for publication by a reputable journal and that meet the following criteria:
• The article relates to the topics of the CAiSE conference and the recent call for papers.
• The article is an original submission to the journal and not an extension of an earlier
conference or workshop paper.
• The article is an original research article; review articles or commentaries will not be
considered.
• The article was accepted for publication by a journal on or after 1 January 2023, the
acceptance must have been publicly announced, the article must be available at the
publisher’s website (e.g., as "articles in advance" or published on a journal’s website), and
the article must be written in English.
• The article has not been presented at, and is not under consideration for, journal-first
tracks of other conferences.
FORMAT
Accepted submissions will be presented as part of the CAiSE 2024 scientific programme.
SUBMISION
Submissions must be done electronically via Easychair
(https://easychair.org/my/conference?conf=caise2024) and include:
• Title and author information of the article.
• The original abstract and keywords.
• DOI of the original publication or, alternatively, a link to the publication at the journal’s
website.
EVALUATION
All submissions will be reviewed by the track chairs with the aim to accept all qualifying
submissions subject to ability to accommodate them in the program. If needed, priority will
be given to submissions according to their topical fit with the scope of the conference, the
importance of the contribution, as well as the standing of the respective journal (including,
but not limited to, the journal's impact factor and ranking results).
ATTENDANCE AND PRESENTATION
At least one author of each submission accepted for the journal-first track must register
and attend the conference to present the work. The author needs a full registration to
present the journal article. As the articles of the journal-first track have been published
already, they will not be part of the CAiSE 2024 proceedings. The articles will be listed in
the conference program and CAiSE 2024 participants will have access to the respective
abstracts and a pointer to the original journal article.
IMPORTANT DATES
• Submission: 31st March, 2024 (AoE)
• Notification of Acceptance: 14th April, 2024
• Author Registration: 17th May, 2024
• Conference Dates: 3rd-7th June, 2024
JOURNAL FIRST CHAIRS
• Paolo Giorgini, University of Trento, Italy
• Jeffrey Parsons, Memorial University of Newfoundland, Canada
The UKP Lab at the Department of Computer Science, Technical University Darmstadt, Germany, is hiring several
*** Postdoc Research Fellows in the field of AI/Natural Language Processing. ***
Areas of work include Conversational AI, Multimodal fact-checking, Interactive Code Generation, NLP for mental health and privacy-aware NLP. It is also possible to propose a topic bottom-up.
https://www.informatik.tu-darmstadt.de/ukp/ukp_home/jobs_ukp/2023_postdoc_u…
Join our internationally recognized team at TU Darmstadt, enjoy diverse opportunities for professional development, and conduct cutting-edge research! Application deadline: January 30th, 2024.
Please submit your application via the following form: https://careers.ukp.informatik.tu-darmstadt.de/ukprecruitment
--------------------------------------------------------------------
Prof. Dr. Iryna Gurevych
UKP Lab
Technical University Darmstadt, Germany
http://www.ukp.tu-darmstadt.de/
EvaLatin, at its third edition, is the campaign devoted to the evaluation of NLP tools for Latin. This year we invite all those interested in parsing and sentiment analysis to undertake the challenge of working on Latin by partecipating in the following tasks:
- dependency parsing;
- emotion polarity detection.
Test sets for both tasks will be released on the EvaLatin 2024 web page in the first half of February.
Check all the important dates here: https://circse.github.io/LT4HALA/2024/EvaLatin
EvaLatin 2024 is organized as part of the "Workshop on Language Technologies for Historical and Ancient Languages" (LT4HALA) which will be held in Turin on May 25, 2024 in the context of the LREC-COLING 2024 conference.
Prof. Marco C. Passarotti
Computational Linguistics
Index Thomisticus Treebank https://itreebank.marginalia.it/
ERC Grantee, P.I. LiLa https://lila-erc.eu/ (Grant Agreement No. 769994)
CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html
[cid:38DBA4B0-3169-48DD-B59A-4F3A679F9DD9@lan] [cid:D415BF3A-E244-4BC4-9FB5-064066B300AD@lan] [cid:13BA173A-59CB-4F2D-9B90-DE302E870A50@lan]
Università Cattolica del Sacro Cuore
Largo Gemelli, 1
20123 Milan, Italy
marco.passarotti(a)unicatt.it<mailto:marco.passarotti@unicatt.it>
tel. +39-02-72342380
[http://static.unicatt.it/ext-portale/5xmille_firma_mail_2023.jpg] <https://www.unicatt.it/uc/5xmille>
Dear list members (esp. those in the Japanese community),
for a cross-linguistic evaluation of co-reference annotations, I was
interested into looking into the NAIST Coreference Corpus, which is based
on the Kyoto Corpus. Luckily, both annotations are available, but not the
primary text. According to the documentation of both corpora, it is
necessary to acquire the Mainichi Shimbun CD-ROM (1995), first. I really
tried my best, and I followed several catalogues (incl.
https://www.jaist.ac.jp/project/NLP_Portal/doc/LR/lr-cat-e.html#jp:mainichi…),
but the URL is points to (
https://www.nichigai.co.jp/sales/mainichi/mainichi-data.html) isn't
operational any more. Does anyone know where and how to buy that CDROM? Is
there another way to get access to that data?
Thanks a lot,
Christian
Journal of Data Mining and Digital Humanities (JDMDH)
organizes a call for papers about the topic
Chinese Natural Language Processing for Digital Humanities (CNLP4DH)
As a reminder JDMDH is an international-based journal managed by French
national research institutions and green open access (no charge for readers
and authors).
This special issue is dedicated to natural language processing for digital
humanities involving the documents written in Chinese, including Modern,
Ancient and dialectal Chinese. Mandarin, which is the national official and
main common language, can be accepted and research on texts written in
other languages, such as Tibet, Inner Mongolia, etc., is also welcome.
A list of suitable topics includes but are not limited to:
- Text analysis and processing related to humanities using computational
methods
- Dataset creation and curation for NLP (e.g. digitization, datafication,
and data preservation).
- Research on cultural heritage collections such as national archives and
libraries using NLP
- NLP for error detection, correction, normalization and denoising data
- Generation and analysis of literary works such as poetry and novels
- Analysis and detection of text genres
- Word segmentation, part-of-speech tagging of Ancient Chinese
- Large Language Models (LLM) for Chinese in Digital Humanities
- Cross modal Models (text-speech-video-image) for Chinese in Digital
Humanities
- Visualization of text analytics
- Ontology models for natural language text
- Applications in Chinese Literature, Traditional Chinese medicine,
Learning Chinese language as second language, Sentiment Analysis in Chinese
Social Media, China Cultural Heritage, Chinese History, Ancient Chinese
language
Website and more details:
https://jdmdh.episciences.org/page/chinese-natural-language-processing-for-…
submission guideline: https://jdmdh.episciences.org/page/submissions
Paper submission : https://jdmdh.episciences.org/submit
Guest Editors:
Dr. Wenhe FENG (Guangdong University of Foreign Studies, Laboratory of
Language Engineering and Computing)
Dr. Bin LI (Nanjing Normal University, School of Chinese Language and
Literature, Center of Linguistic Big Data and Computational Humanities)
Dr. Nicolas TURENNE (Guangdong University of Foreign Studies, School of
Information Science and Technology)
Dr. Tong WEI (Beijing University, Digital Humanities Center)
************************************************************************************
Second Call for papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024
Website:
https://mormor-karl.github.io/events/CALD-pseudo/
Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/
Submission Deadline: Monday, 18 December 2023
We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* December 18, 2023: paper submission deadline
* January 17, 2024: resubmission of already pre-reviewed ARR papers
* January 20, 2024: notification of acceptance
* January, 30 2024: camera-ready papers due
* March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL)
[Introduction]
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected.
[Topics of Interest]
CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as
* automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person;
* developing context-sensitive algorithms for replacement of personal information in unstructured data;
* studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization;
* effectiveness of pseudonymization as a way of protecting writer identity;
*
reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data;
* constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those;
* approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data;
* pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance;
* and numerous other open questions.
[Submission Guidelines]
Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions
* Short papers (up to 4 pages) for ongoing or preliminary work
All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files
Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/
Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.)
We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop.
[Invited speakers]
We are happy to announce that the workshop will host two invited speakers:
*
Anders Søgaard, University of Copenhagen, Denmark
*
Ildikó Pilán, the Norwegian Computing Center, Norway
[Workshop Organizers]
* Elena Volodina, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Simon Dobnik, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå university, Sweden
[Program Committee]
A list of program committee members is available on the workshop website.
[Contact]
For inquiries, please contact mormor.karl(a)svenska.gu.se
ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat…
___________________
Elena Volodina, PhD, Docent
https://spraakbanken.gu.se/en/about/staff/elena
Life is like a mirror. Smile at it and it smiles back at you.
Peace Pilgrim
[apologies for potential cross-posting]
==================================================================================================
Bridging Neurons and Symbols for Natural Language Processing and
Knowledge Graphs Reasoning @ LREC-COLING 2024
=====================================
Co-located with LREC-COLING in Turin, Italy
21st May 2024
Workshop webpage:https://neusymbridge.github.io/
Call for Papers
--------------------
The 1st Workshop on Bridging Neurons and Symbols for Natural Language
Processing and Knowledge Graphs Reasoning — to be held at LREC-COLING
2024 — will promote two directions for exploring neural reasoning:
starting from existing neural networks to enhance the reasoning
performance with the target of symbolic-level reasoning, and starting
from symbolic reasoning to explore its novel neural implementation.
These two directions will ideally meet somewhere in the middle and will
lead to representations that can act as a bridge for novel neural
computing, which qualitatively differs from traditional neural networks,
and for novel symbolic computing, which inherits the good features of
neural computing. Hence the name of our workshop, with a focus on
Natural Language Processing and Knowledge Graph reasoning.
Topics (include, but are not limited to)
--------------------------------------------------
• Proposing novel knowledge representations that are derived from
transdisciplinary research
• Using knowledge graphs or other types of symbolic Knowledge to improve
the quality of LLMs
• Exploring the reasoning mechanism of LLMs
• Distilling symbolic knowledge from LLMs
• Proposing benchmark datasets and evaluation matrices for
neuro-symbolic approaches to NLP tasks
• Proposing novel NLP tasks for neuro-symbolic approaches
• NLP applications in classification, sense-disambiguation, sentiment
analysis, question-answering, knowledge graph reasoning
• Critical analysis of traditional deep learning or LLMs
• Analysing spatial reasoning of LLMs
• Proposing novel neural computing that may reach symbolic-level reasoning
• Proposing benchmark datasets and matrices to evaluate the gap between
neural reasoning and symbolic reasoning
• Addressing efficiency issues in neuro-symbolic systems
• Identifying challenges and opportunities of neuro-symbolic systems
• Developing retrieval augmented models for combining KG and LLMs
• Applying neuro-symbolic approaches to humor generation and other
real-life applications
Submissions:
------------------
• The papers should be submitted as a PDF document, conforming to the
formatting guidelines provided in the call for papers of LREC-COLING
conference (https://lrec-coling-2024.org/authors-kit/)
• Submissions via Softconf/START Conference Manager
athttps://softconf.com/lrec-coling2024/neusymbridge2024/
Important Dates
---------------------
• Submission Deadline: Mar 3rd
• Notification of Acceptance: April 10th
• Camera Ready Deadline: Apr 21st
• Workshop: May 21st
Keynotes
--------------------------------
• Pascale Fung - The Hong Kong University of Science and Technology
• Alessandro Lenci - Università di Pisa
• Juanzi Li - Tsinghua University
• Volker Tresp - Ludwig Maximilian University of Munich
Organisation Committee
--------------------------------
• Tiansi Dong - Fraunhofer IAIS
• Erhard Hinrichs - University of Tübingen
• Zhen Han - Amazon Inc.
• Kang Liu - Chinese Academy of Sciences
• Yangqiu Song - The Hong Kong University of Science and Technology
• Yixin Cao - Singapore Management University
• Christian F. Hempelmann - Texas A&M-Commerce
• Rafet Sifa - University of Bonn
Programme Committee
-------------------------------
• Claire Bonial - U.S. Army DEVCOM Army Research Laboratory
• Meiqi Chen - Peking University
• Shuo Chen - Ludwig Maximilian University of Munich
• Hejie Cui - Emory University
• Xinyu Dai - Nanjing University
• Zifeng Ding - Ludwig Maximilian University of Munich
• Kathrin Erk - The University of Texas at Austin
• Irlan G Gonzalez - Bosch Center for Artificial Intelligence
• Shizhu He - Institute of Automation, Chinese Academy of Sciences
• Bailan He - Ludwig Maximilian University of Munich
• Jens U. Kreber - Saarland University
• Sandra Kübler - Indiana University
• Hang Li - Ludwig Maximilian University of Munich
• Honglei Li - Northumbria University
• Yong Liu - Plunk
• Xinze Liu - Nanyang Technological University
• Xin Liu - Amazon Inc.
• Tong Liu - Ludwig Maximilian University of Munich
• Yunfei Long - Essex University
• Yubo Ma - Nanyang Technological University
• Emanuele Marconato - University of Trento
• Petra Osenova - University of Sofia
• Parth Padalkar - University of Texas at Dallas
• Martha Palmer - University of Colorado
• Barbara Plank - Ludwig Maximilian University of Munich
• Julia Rayz - Purdue University
• Ryan Riegel - IBM Research
• Timo Schick - Meta AI
• Christoph Schommer - University of Luxembourg
• Wangtao Sun - Institute of Automation, Chinese Academy of Sciences
• Xun Wang - Microsoft Corporation
• Jingpei Wu - Ludwig Maximilian University of Munich
• Kai Xiong - Harare Institute of Technology
• Yuan Yang - Georgia Institute of Technology
• Michihiro Yasunaga - Stanford University
• Jiahao Ying - Singapore Management University
• Ziqian Zeng - South China University of Technology
• Hongming Zhang - Tencent AI Lab, Seattle
• Gengyuan Zhang - Ludwig Maximilian University of Munich
==================================================================================================
Call for papers: Second Workshop on Computation and Written Language (CAWL
2024)
CAWL 2024 will be held in conjunction with LREC-COLING 2024 on May 21 in
Torino, Italy. The workshop will feature an invited talk by Nizar Habash
(NYU Abu Dhabi), and has a special theme for workshop submissions: Writing
Systems of Africa. Annual CAWL workshops are organized under the guidance
of the newly formed ACL Special Interest Group on Writing Systems and
Written Language (SIGWrit). We welcome submissions of scientific papers to
be presented at the workshop and archived in the ACL Anthology. Please see
explicit submission guidelines below, including details on topics of
interest and the special workshop theme, and see the workshop webpage
https://sigwrit.org/workshops/cawl2024/ for additional relevant information.
Most work in NLP focuses on language in its canonical written form. This
has often led researchers to ignore the differences between written and
spoken language or, worse, to conflate the two. Instances of conflation are
statements like “Chinese is a logographic language" or “Persian is a
right-to-left language", variants of which can be found frequently in the
ACL anthology. These statements confuse properties of the language with
properties of its writing system. Ignoring differences between written and
spoken language leads, among other things, to conflating different words
that are spelled the same (e.g., English bass), or treating as different,
words that have multiple spellings (e.g., Japanese umai ‘tasty’, which can
be written 旨い, うまい, ウマい, or 美味い).
Furthermore, methods for dealing with written language issues (e.g.,
various kinds of normalization or conversion) or for recognizing text input
(e.g. OCR & handwriting recognition or text entry methods) are often
regarded as precursors to NLP rather than as fundamental parts of the
enterprise, despite the fact that most NLP methods rely centrally on
representations derived from text rather than (spoken) language. This
general lack of consideration of writing has led to much of the research on
such topics to largely appear outside of ACL venues, in conferences or
journals of neighboring fields such as speech technology (e.g., text
normalization) or human-computer interaction (e.g., text entry).
This workshop will bring together researchers who are interested in the
relationship between written and spoken language, the properties of written
language, the ways in which writing systems encode language, and
applications specifically focused on characteristics of writing systems.
Topics of interest include but are not limited to:
- Text entry
- Text tokenization
- Disambiguation of abbreviations and homographs
- Grapheme-to-phoneme conversion, transliteration, and diacritization
- Text normalization for speech and for processing "informal" genres of
text
- Computational study of literary devices involving writing systems,
such as eye dialect
- Information-theoretic and machine-learning approaches to decipherment
- Methods for specialized text genres, e.g., clinical notes
- Optical character (incl. handwriting) recognition and historical
document processing
- Orthographic representation for unwritten languages
- Spelling error detection and correction
- Script normalization and encoding
- Writing system typology and its relevance to speech and language
processing
We invite submissions on the relationship between written and spoken
language, the properties of written language, the ways in which writing
systems encode language, and applications specifically focused on
characteristics of writing systems.
Additionally, we particularly encourage, and will prioritize, papers
on the special
theme of the workshop: Writing Systems of Africa. African languages make
use of a wide variety of writing systems, from those based on the
Perso-Arabic or Latin scripts throughout Africa, the Ge'ez script in the
Horn of Africa, or the Tifinagh script for Berber languages in North
Africa, to recently invented writing systems such as the Adlam alphabet
created for Fula. Issues arising from the adaptation of scripts to new
languages, such as Ajami or orthographies using the Latin script, would be
of interest. For example, the primary language of instruction in the
schools of Mali is French, so that speakers of Bambara, despite not
generally being taught to read that language in the schools, will often
make use of either the Latin script that they learned via French in school
or the Perso-Arabic (Ajami) script from religious instruction to write
their language. Bambara is also sometimes written with the modern N'Ko
script. Given this diversity of options, Bambara written language can be
extremely varied, presenting major challenges to corpus building and
automatic language processing methods.
Important dates:
Paper submission deadline: February 22, 2024 (anywhere in the world)
Notification of acceptance: March 25, 2024
Camera-ready paper due: April 5, 2024
Workshop date: May 21, 2024
Submission Guidelines
Please submit short (4 page) or long (8 page) submissions in PDF format to
https://softconf.com/lrec-coling2024/cawl2024/. Both short and long paper
submissions will be reviewed in the same process. Authors should follow the
formatting guidelines of LREC-COLING 2024, available in the authors kit (
https://lrec-coling-2024.org/authors-kit/), and we will follow the paper
submission and reviewing policies detailed in the LREC-COLING 2024 call for
papers (https://lrec-coling-2024.org/2nd-call-for-papers/). Note that, as
with the main conference, reviewing is double-anonymous, i.e., reviewers
will not know author identity and vice versa, hence no author information
should be included in the papers; self-reference that identifies the
authors should be avoided or anonymised. Accepted papers will appear in the
workshop proceedings in the ACL anthology.
For questions about the submission guidelines, please contact workshop
organizers at cawl.workshop.2024(a)gmail.com.
Organizers:
- Kyle Gorman <https://wellformedness.com/>, Graduate Center, City
University of New York & Google, USA
- Emily Prud’hommeaux <http://cs.bc.edu/~prudhome/>, Boston College, USA
- Brian Roark <https://lanzaroark.org/brian-roark/>, Google, USA
- Richard Sproat <https://rws.xoba.com/>, Google DeepMind, Japan
Program Committee:
- David Ifeoluwa Adelani <https://dadelani.github.io/>, University
College London, UK
- Manex Agirrezabal <https://manexagirrezabal.github.io/>, University of
Copenhagen, Denmark
- Sina Ahmadi <https://sinaahmadi.github.io/>, George Mason University,
USA
- Cecilia Alm <https://www.rit.edu/directory/coagla-cecilia-alm>,
Rochester Institute of Technology, USA
- Mark Aronoff <https://linguistics.stonybrook.edu/faculty/mark.aronoff/>,
Stony Brook University, USA
- Steven Bedrick
<https://www.ohsu.edu/school-of-medicine/csee/steven-bedrick>, Oregon
Health & Science University, USA
- Taylor Berg-Kirkpatrick <https://cseweb.ucsd.edu/~tberg/>, UC San
Diego, USA
- Amalia Gnanadesikan
<https://scholar.google.com/citations?user=HkNhAoAAAAAJ&hl=en>,
University of Maryland, USA
- Christian Gold
<https://www.fernuni-hagen.de/english/research/clusters/catalpa/about-catalp…>,
CATALPA, FernUniversität in Hagen, Germany
- Alexander Gutkin <https://research.google/people/AlexanderGutkin/>,
Google, UK
- Nizar Habash
<https://nyuad.nyu.edu/en/academics/divisions/science/faculty/nizar-habash.h…>,
NYU Abu Dhabi, United Arab Emirates
- Yannis Haralambous
<https://www.imt-atlantique.fr/en/person/yannis-haralambous>, IMT
Atlantique & CNRS Lab-STICC, France
- Cassandra Jacobs <https://www.acsu.buffalo.edu/~cxjacobs/>, University
at Buffalo, USA
- Martin Jansche
<https://scholar.google.com/citations?user=z8yPdQQAAAAJ&hl=en>, Amazon,
UK
- Kathryn Kelley
<https://www.unibo.it/sitoweb/kathrynerin.kelley/research>, Università
di Bologna, Italy
- George Kiraz <https://www.ias.edu/scholars/george-kiraz>, Princeton
University, USA
- Christo Kirov <https://ckirov.github.io/>, Google, USA
- Jordan Kodner <https://jkodner05.github.io/>, Stony Brook University,
USA
- Anoop Kunchukuttan <http://anoopk.in/>, Microsoft, India
- Yang Li <https://npuliyang.github.io/>, Northwestern Polytechnical
University, China
- Constantine Lignos <https://lignos.org/>, Brandeis University, USA
- Zoey Liu <https://zoeyliu18.github.io/>, University of Florida, USA
- Jalal Maleki <https://liu.se/en/employee/jalma87>, Linköping
University, Sweden
- M. Willis Monroe <https://www.willismonroe.com/>, University of New
Brunswick, Canada
- Gerald Penn <http://www.cs.toronto.edu/~gpenn/>, University of
Toronto, Canada
- Yuval Pinter <https://www.cs.bgu.ac.il/~pintery/>, Ben-Gurion
University of the Negev, Israel
- William Poser <https://billposer.org/>, independent scholar, Canada
- Shruti Rijhwani <https://shrutirij.github.io/>, Google, USA
- Maria Ryskina <https://ryskina.github.io/>, MIT, USA
- Anoop Sarkar
<https://www.sfu.ca/computing/people/faculty/anoopsarkar.html>, Simon
Fraser University, Canada
- Lane Schwartz <http://dowobeha.github.io/>, University of Alaska,
Fairbanks, USA
- Djamé Seddah <http://pauillac.inria.fr/~seddah/>, Sorbonne University
& Inria, France
- Shuming Shi
<https://scholar.google.com/citations?user=Lg31AKMAAAAJ&hl=en>, Tencent,
China
- Claytone Sikasote <https://csikasote.github.io/>, University of Zambia
(UNZA), Zambia
- Fabio Tamburini <https://corpora.ficlit.unibo.it/People/Tamburini/>,
University of Bologna, Italy
- Kumiko Tanaka-Ishii <https://www.cl.rcast.u-tokyo.ac.jp/Top.html>,
University of Tokyo, Japan
- Lawrence Wolf-Sonkin
<https://aclanthology.org/people/l/lawrence-wolf-sonkin/>, Google, USA
- Martha Yifiru Tachbelie
<https://scholar.google.com/citations?user=9N37SgoAAAAJ>, Addis Ababa
University, Ethiopia
Call for Participation
We are announcing the first BEA (2024) shared-task on automated prediction of Difficulty And Response Time for Multiple Choice Questions (DART-MCQ).
Motivation
For standardized exams to be fair and valid, test questions, otherwise known as items, must meet certain criteria. One important criterion is that the items should cover a wide range of difficulty levels to gather information about the abilities of test takers effectively. Additionally, it is essential to allocate an appropriate amount of time for each item: too little time can make the exam speeded, while too much time can make it inefficient.
There is growing interest in predicting item characteristics such as difficulty and response time based on the item text. However, due to difficulties with sharing exam data, efforts to advance the state-of-the-art in item parameter prediction have been fragmented and conducted in individual institutions, with no transparent evaluation on a publicly available dataset. In this Shared Task, we bridge this gap by sharing practice item content and characteristics from a high-stakes medical exam called the United States Medical Licensing Examination® (USMLE®) for the exploration of two topics: predicting item difficulty (Track 1) and item response time (Track 2) based on item text.
Participation
The shared-task has two separate tracks as follows:
• Track 1: Given the item text and metadata, predict the item difficulty variable.
• Track 2: Given the item text and metadata, predict the time intensity variable.
Important Dates
Training data release: January 15
Test data release: February 10
Results due: February 16
Announcement of winners: February 21
Paper submissions due: March 10
Camera-ready papers due: April 22
Links
For more information about the shared task, see: https://sig-edu.org/sharedtask/2024
Organizers
Victoria Yaneva, National Board of Medical Examiners
Peter Baldwin, National Board of Medical Examiners
Kai North, George Mason University
Brian Clauser, National Board of Medical Examiners
Saed Rezayi, National Board of Medical Examiners
Yiyun Zhou, National Board of Medical Examiners
Le An Ha, Ho Chi Minh City University of Foreign Languages - Information Technology (HUFLIT)
Polina Harik, National Board of Medical Examiners