Call for papers: Second Workshop on Computation and Written Language (CAWL 2024)
CAWL 2024 will be held in conjunction with LREC-COLING 2024 on May 21 in Torino, Italy. The workshop will feature an invited talk by Nizar Habash (NYU Abu Dhabi), and has a special theme for workshop submissions: Writing Systems of Africa. Annual CAWL workshops are organized under the guidance of the newly formed ACL Special Interest Group on Writing Systems and Written Language (SIGWrit). We welcome submissions of scientific papers to be presented at the workshop and archived in the ACL Anthology. Please see explicit submission guidelines below, including details on topics of interest and the special workshop theme, and see the workshop webpage https://sigwrit.org/workshops/cawl2024/ for additional relevant information.
Most work in NLP focuses on language in its canonical written form. This has often led researchers to ignore the differences between written and spoken language or, worse, to conflate the two. Instances of conflation are statements like “Chinese is a logographic language" or “Persian is a right-to-left language", variants of which can be found frequently in the ACL anthology. These statements confuse properties of the language with properties of its writing system. Ignoring differences between written and spoken language leads, among other things, to conflating different words that are spelled the same (e.g., English bass), or treating as different, words that have multiple spellings (e.g., Japanese umai ‘tasty’, which can be written 旨い, うまい, ウマい, or 美味い).
Furthermore, methods for dealing with written language issues (e.g., various kinds of normalization or conversion) or for recognizing text input (e.g. OCR & handwriting recognition or text entry methods) are often regarded as precursors to NLP rather than as fundamental parts of the enterprise, despite the fact that most NLP methods rely centrally on representations derived from text rather than (spoken) language. This general lack of consideration of writing has led to much of the research on such topics to largely appear outside of ACL venues, in conferences or journals of neighboring fields such as speech technology (e.g., text normalization) or human-computer interaction (e.g., text entry).
This workshop will bring together researchers who are interested in the relationship between written and spoken language, the properties of written language, the ways in which writing systems encode language, and applications specifically focused on characteristics of writing systems. Topics of interest include but are not limited to:
- Text entry - Text tokenization - Disambiguation of abbreviations and homographs - Grapheme-to-phoneme conversion, transliteration, and diacritization - Text normalization for speech and for processing "informal" genres of text - Computational study of literary devices involving writing systems, such as eye dialect - Information-theoretic and machine-learning approaches to decipherment - Methods for specialized text genres, e.g., clinical notes - Optical character (incl. handwriting) recognition and historical document processing - Orthographic representation for unwritten languages - Spelling error detection and correction - Script normalization and encoding - Writing system typology and its relevance to speech and language processing
We invite submissions on the relationship between written and spoken language, the properties of written language, the ways in which writing systems encode language, and applications specifically focused on characteristics of writing systems.
Additionally, we particularly encourage, and will prioritize, papers on the special theme of the workshop: Writing Systems of Africa. African languages make use of a wide variety of writing systems, from those based on the Perso-Arabic or Latin scripts throughout Africa, the Ge'ez script in the Horn of Africa, or the Tifinagh script for Berber languages in North Africa, to recently invented writing systems such as the Adlam alphabet created for Fula. Issues arising from the adaptation of scripts to new languages, such as Ajami or orthographies using the Latin script, would be of interest. For example, the primary language of instruction in the schools of Mali is French, so that speakers of Bambara, despite not generally being taught to read that language in the schools, will often make use of either the Latin script that they learned via French in school or the Perso-Arabic (Ajami) script from religious instruction to write their language. Bambara is also sometimes written with the modern N'Ko script. Given this diversity of options, Bambara written language can be extremely varied, presenting major challenges to corpus building and automatic language processing methods.
Important dates:
Paper submission deadline: February 22, 2024 (anywhere in the world) Notification of acceptance: March 25, 2024 Camera-ready paper due: April 5, 2024 Workshop date: May 21, 2024
Submission Guidelines
Please submit short (4 page) or long (8 page) submissions in PDF format to https://softconf.com/lrec-coling2024/cawl2024/. Both short and long paper submissions will be reviewed in the same process. Authors should follow the formatting guidelines of LREC-COLING 2024, available in the authors kit ( https://lrec-coling-2024.org/authors-kit/), and we will follow the paper submission and reviewing policies detailed in the LREC-COLING 2024 call for papers (https://lrec-coling-2024.org/2nd-call-for-papers/). Note that, as with the main conference, reviewing is double-anonymous, i.e., reviewers will not know author identity and vice versa, hence no author information should be included in the papers; self-reference that identifies the authors should be avoided or anonymised. Accepted papers will appear in the workshop proceedings in the ACL anthology.
For questions about the submission guidelines, please contact workshop organizers at cawl.workshop.2024@gmail.com.
Organizers:
- Kyle Gorman https://wellformedness.com/, Graduate Center, City University of New York & Google, USA - Emily Prud’hommeaux http://cs.bc.edu/~prudhome/, Boston College, USA - Brian Roark https://lanzaroark.org/brian-roark/, Google, USA - Richard Sproat https://rws.xoba.com/, Google DeepMind, Japan
Program Committee:
- David Ifeoluwa Adelani https://dadelani.github.io/, University College London, UK - Manex Agirrezabal https://manexagirrezabal.github.io/, University of Copenhagen, Denmark - Sina Ahmadi https://sinaahmadi.github.io/, George Mason University, USA - Cecilia Alm https://www.rit.edu/directory/coagla-cecilia-alm, Rochester Institute of Technology, USA - Mark Aronoff https://linguistics.stonybrook.edu/faculty/mark.aronoff/, Stony Brook University, USA - Steven Bedrick https://www.ohsu.edu/school-of-medicine/csee/steven-bedrick, Oregon Health & Science University, USA - Taylor Berg-Kirkpatrick https://cseweb.ucsd.edu/~tberg/, UC San Diego, USA - Amalia Gnanadesikan https://scholar.google.com/citations?user=HkNhAoAAAAAJ&hl=en, University of Maryland, USA - Christian Gold https://www.fernuni-hagen.de/english/research/clusters/catalpa/about-catalpa/members/christian.gold.shtml, CATALPA, FernUniversität in Hagen, Germany - Alexander Gutkin https://research.google/people/AlexanderGutkin/, Google, UK - Nizar Habash https://nyuad.nyu.edu/en/academics/divisions/science/faculty/nizar-habash.html, NYU Abu Dhabi, United Arab Emirates - Yannis Haralambous https://www.imt-atlantique.fr/en/person/yannis-haralambous, IMT Atlantique & CNRS Lab-STICC, France - Cassandra Jacobs https://www.acsu.buffalo.edu/~cxjacobs/, University at Buffalo, USA - Martin Jansche https://scholar.google.com/citations?user=z8yPdQQAAAAJ&hl=en, Amazon, UK - Kathryn Kelley https://www.unibo.it/sitoweb/kathrynerin.kelley/research, Università di Bologna, Italy - George Kiraz https://www.ias.edu/scholars/george-kiraz, Princeton University, USA - Christo Kirov https://ckirov.github.io/, Google, USA - Jordan Kodner https://jkodner05.github.io/, Stony Brook University, USA - Anoop Kunchukuttan http://anoopk.in/, Microsoft, India - Yang Li https://npuliyang.github.io/, Northwestern Polytechnical University, China - Constantine Lignos https://lignos.org/, Brandeis University, USA - Zoey Liu https://zoeyliu18.github.io/, University of Florida, USA - Jalal Maleki https://liu.se/en/employee/jalma87, Linköping University, Sweden - M. Willis Monroe https://www.willismonroe.com/, University of New Brunswick, Canada - Gerald Penn http://www.cs.toronto.edu/~gpenn/, University of Toronto, Canada - Yuval Pinter https://www.cs.bgu.ac.il/~pintery/, Ben-Gurion University of the Negev, Israel - William Poser https://billposer.org/, independent scholar, Canada - Shruti Rijhwani https://shrutirij.github.io/, Google, USA - Maria Ryskina https://ryskina.github.io/, MIT, USA - Anoop Sarkar https://www.sfu.ca/computing/people/faculty/anoopsarkar.html, Simon Fraser University, Canada - Lane Schwartz http://dowobeha.github.io/, University of Alaska, Fairbanks, USA - Djamé Seddah http://pauillac.inria.fr/~seddah/, Sorbonne University & Inria, France - Shuming Shi https://scholar.google.com/citations?user=Lg31AKMAAAAJ&hl=en, Tencent, China - Claytone Sikasote https://csikasote.github.io/, University of Zambia (UNZA), Zambia - Fabio Tamburini https://corpora.ficlit.unibo.it/People/Tamburini/, University of Bologna, Italy - Kumiko Tanaka-Ishii https://www.cl.rcast.u-tokyo.ac.jp/Top.html, University of Tokyo, Japan - Lawrence Wolf-Sonkin https://aclanthology.org/people/l/lawrence-wolf-sonkin/, Google, USA - Martha Yifiru Tachbelie https://scholar.google.com/citations?user=9N37SgoAAAAJ, Addis Ababa University, Ethiopia