Language Technologies and Digital Humanities: Resources and Applications (LTаDH-RA)
CLaDA-BG 2026 Conference
Sofia, Bulgaria
Venue: tba
25-26 June 2026
CLaDA-BG is the Bulgarian national research infrastructure for resources and technologies for linguistic, cultural and historical heritage, integrated within CLARIN EU and DARIAH EU. Its mission is to provide access to the necessary resources and technologies that would support the research in Social Sciences and Humanities (SS&H). Modeling and linking of various types of knowledge and its contexts is crucial for the successful research in the interdisciplinary field of resources and technologies related to language, culture and history.
This is the fifth edition of the CLaDA-BG conference. It aims at bringing together NLP developers, linguists, digital humanitarians, scholars and all parties interested in knowledge modeling and linking data for research.
Topics of Interest
The topics include, but are not limited to, the following ones:
• Problems in SS&H – research methods, technological support, applications
• Language technologies for sentiment analysis, semantic technologies, trust-worthiness of knowledge graphs, ethical challenges in digital SS&H
• Knowledge Modeling and Elicitation for digital SS&H
• Specific Language Resources and Technologies for historical texts, parliamentary records, speech and multimodal corpora, social media data, etc.
• The role of digital libraries, archives and museums in digital SS&H research
• Language Interface to Knowledge Graphs in SS&H
• Knowledge-modeled and linked applications in SS&H
• Large Language Models for DH
• Best practices and new trends in Knowledge Modeling and Linking for language, culture and history
Invited Speakers
The invited speakers will be announced soon
Important Dates
Submission deadline: 19.04.2026
Notification of acceptance: 24.05.2026
Final Submission: 20.06.2026
Conference: 25-26.06.2026
Submissions
We welcome oral presentations or posters (optionally with demo). We conform to CEUR-WS.org proceedings but the proceedings will not be published there. The instructions for preparing the submissions are here: https://ceur-ws.org/HOWTOSUBMIT.html#CEURART
We invite two types of papers: regular papers (between 10 and 12 "standard" pages) and short papers (5-9 "standard" pages) in accordance with CEURART, 2-column style. A "standard" is 2500 characters.
We also accept extended abstract submissions (3-5 "standard" pages) in accordance with CEURART, 2-column style. They will be presented at the conference and will be published in a Book of Abstracts in electronic form.
Please submit your full paper or extended abstract in PDF to following email: ltadh-ra(a)bultreebank.org
For contacting organizers, please use the following email: ltadh-ra(a)bultreebank.org
The CLaDA-BG Organizers
Joint Call for Tutorial Proposals (EMNLP/AACL-IJCNLP) 2026
The Conference on Empirical Methods in Natural Language Processing (EMNLP), the Asia-Pacific Chapter of the Association for Computational Linguistics (AACL), and the Asian Federation of Natural Language Processing (AFNLP) invite proposals for tutorials in conjunction with the EMNLP 2026 and AACL-IJCNLP 2026 conferences. We welcome submissions covering all areas of computational linguistics (CL) and natural language processing (NLP), broadly defined to include related disciplines. We are soliciting proposals for two types of tutorials:
Cutting-edge tutorials in CL/NLP: Covering recent advances in emerging areas not previously addressed in tutorials at EMNLP, AACL, IJCNLP, ACL, NAACL-HLT, or EACL.
Introductory tutorials in related fields: Offering overviews of disciplines potentially relevant to the CL/NLP community, such as linguistics, bioinformatics, machine learning, human-computer interaction, or applications of large language models in non-English languages.
In both cases, the primary goal is to help CL/NLP researchers understand key scientific challenges, their tractability, and their theoretical and practical implications. Presentations of specific technologies or systems are welcome when used to illustrate broader scientific insights.
Tutorials will be held at one of the following conference venues:
EMNLP 2026 (the 2026 Conference on Empirical Methods in Natural Language Processing: https://2026.emnlp.org/), which will be held as a hybrid conference, and physically held in Budapest, Hungary from October 24th to October 29th, 2026.
AACL-IJCNLP 2026 (the 5th Asia-Pacific Chapter of the Association for Computational Linguistics & the 15th International Joint Conference on Natural Language Processing: https://2026.aaclnet.org/), which will be held as a hybrid conference and physically held in Hengqin, China from November 6th to November 10th, 2026.
Important Dates
EMNLP/AACL-IJCNLP 2026 shared dates:
Proposal submission deadline: June 1, 2026
Notification of acceptance: July 15, 2026
Tutorial slides + abstract + bibliography + any other materials
one month prior to the date of the tutorial
All deadlines are 11:59 PM UTC-12:00 ("anywhere on Earth").
Fee Waivers
Up to 3 instructors per tutorial can have their registration fees waived for the main conference and any subset of co-located tutorials and workshops.
Diversity & Inclusion
To foster an inclusive culture in our field, we particularly encourage submissions from members of underrepresented groups in CL/NLP, i.e., researchers from any demographic or geographic minority, researchers with disabilities, among others. The overall diversity of the tutorial organizers and potential audience will be taken into account to ensure that the conference program is varied and balanced.
Tutorial proposals should describe and will be evaluated according to how the tutorial contributes to topics promoting diversity (e.g., working on minority languages or groups), participation diversity (e.g., coordinating with social affinity groups, providing subsidies, making a promotional plan for the tutorial), and representation diversity among tutorial presenters. For more information or advice, organizers may consult resources such as the BIG directory (http://www.winlp.org/big-directory/), Black in AI (https://blackinai.github.io/#/membership), Disability in AI (https://elesa.github.io/ability_in_AI/), Indigenous AI (https://www.indigenous-ai.net/), LatinX in AI (https://lxai.app/PUBLIC-DIRECTORY), Masakhane (https://www.masakhane.io/), 500 Queer Scientists (https://500queerscientists.com/), and Women-in-ML's directory (https://www.wiml.org/directory).
Submission Details
Proposals should use the ACL paper submission format. Authors can download the LaTeX template (https://github.com/acl-org/acl-style-files) or use the Overleaf template (https://www.overleaf.com/latex/templates/association-for-computational-ling…). Proposals should not exceed 4 pages of content (plus one page for tutor biographies and unlimited pages for references), should be submitted as PDF documents, and should contain the following:
A title and authors, affiliations, and contact information.
A brief description of the tutorial content and its relevance to the CL/NLP community.
Type of the tutorial: "cutting-edge in CL/NLP" vs "introductory to fields related to CL/NLP".
Briefly describe the target audience and any expected prerequisites for the attendees, for example:
Math: e.g., "Understand derivatives and integrals as found in introductory calculus"
Linguistics: e.g., "Be able to parse and generate text with dependency grammars"
Machine Learning: e.g., "Understand 'classical' supervised methods such as SVM and perceptron"
Neural Network: e.g., "Familiarity with Transformers"
Programming or other tools: e.g., "Knowledge of PyTorch and Unix command line tools"
An outline of the tutorial structure and content, and how it will be covered in a three-hour slot. In exceptional cases, six-hour tutorial slots are available. These time limits do not include coffee breaks, e.g., a three-hour tutorial in fact occupies a 3.5-hour slot, and a six-hour tutorial occupies a 7-hour slot.
Explain how the tutorial includes other people's work. We recommend that the tutorial cover work by the presenters as well as by other researchers. The submission should explain how this breadth is ensured. Tutorials should not be "self-invited talks".
Diversity considerations, e.g., use of multilingual data, indications of how the described methods scale up to various languages or domains, participation of both senior and junior instructors, demographic and geographical diversity of the instructors, plans for how to diversify audience participation, etc.
Reading list. Work that you expect the audience to read before the tutorial can be indicated by an asterisk. Recommended papers should provide breadth of authorship and include work by other authors, as well as work from other disciplines, if relevant.
For each tutorial presenter, a one-paragraph statement of their research interests and areas of expertise for the tutorial topic, as well as experience in instructing an international audience.
An estimate of the audience size for the tutorial. If the same or a similar tutorial (or workshops, talks, etc.) has been given before, include information on where any previous version of the tutorial was given and how many attendees the tutorial attracted.
A description of special requirements for technical equipment.
We intend to make tutorial presentation materials publicly available (e.g., tutorial slides, captioned video recording, as well as software, data, or other resources as applicable) in the ACL Anthology. If any of your tutorial materials cannot be shared, please explain why this is the case.
An ethics statement that discusses the ethical considerations related to the topics of the tutorial.
A description of any limitations that would restrict the tutorial to a specific venue (EMNLP or AACL-IJCNLP). For example, if the tutorial is compatible with only one of these events, logistically, thematically, or otherwise, or if the tutorial cannot be held at a venue for logistical reasons.
OPTIONAL: We welcome proposals on the special conference themes. If your tutorial proposal aligns with the special themes of EMNLP (theme "New Missions for NLP Research"), then please explain why this is the case.
OPTIONAL: We invite tutorial instructors to include pedagogical material that the audience can bring into classrooms or similar spaces of discussion, to bring attention to the tutorial topic (e.g., a hands-on exercise, discussion questions, a demo, or an assignment). If you would like to provide this, then please explain why this is the case.
Tutorial proposals should be submitted online on OpenReview at the following link: https://openreview.net/group?id=EMNLP/2026/Tutorials. Proposals will be reviewed jointly by the Tutorial Co-Chairs of the conferences and, optionally, by a group of external experts.
Evaluation Criteria
Each tutorial proposal will be evaluated according to its clarity and preparedness, novelty or timely character of the topic, instructors' experience, target audience, open access of the tutorial instructional material, and diversity and inclusion.
Instructor Responsibilities
Tutorial decisions along with reviews will be released by July 15, 2026. Accepted tutorial proposers must then provide abstracts for inclusion in the conference registration material by the specific conference deadlines. The description should be in two formats: (a) an ASCII version that can be included in email announcements and published on the conference website, and (b) a PDF version for inclusion in the electronic proceedings (detailed instructions will be provided). Tutorial speakers must provide tutorial materials (e.g., slides, a relevant list of papers) at least one month prior to the start date of the hosting conference. The final submitted tutorial materials must minimally include copies of the course slides and a bibliography for the material covered in the tutorial. After the conference, the presenters will be invited to update their slides in the ACL Anthology (if needed).
Tutorial Chairs
EMNLP
Goran Glavaš, University of Würzburg, Germany
Ana Marasović, University of Utah, USA
AACL-IJCNLP
Zhongqing Wang, Soochow University, China
Naoaki Okazaki, Institute of Science Tokyo, Japan
If you have any questions related to tutorial proposals, you can reach us at emnlp-aacl-ijcnlp-2026-tutorial(a)googlegroups.com.
Deadline: 31-Jul-2026
The Institute of Cognitive Science (ICS) at the University of Colorado Boulder invites applications for a tenure-track Assistant Professor position in the area of AI and the Mind, with an anticipated start date of Fall 2027. Highly qualified candidates who have been recently promoted to Associate Professor level may also be considered.
We seek applicants advancing research at the intersection of artificial intelligence, broadly construed, and human cognition. Candidates may bridge artificial and natural intelligence through modeling, experimentation, or theory; candidates pursuing interdisciplinary and innovative approaches are encouraged to apply. The ideal candidate will have an outstanding track record of research, as evidenced by a notable publication history and a demonstrated record of seeking and securing funding as a Principal Investigator, as appropriate to their current academic rank.
Areas of interest include, but are not limited to:
- Computational Models of Cognition: using AI methods to model perception, memory, learning, decision-making, or language, including their neural bases.
- AI Tools for Cognitive Neuroscience: leveraging machine learning to analyze brain/behavioral data or to explore neural mechanisms.
- Human-AI Interaction: understanding how humans perceive, collaborate with, and may be enhanced or augmented by AI systems.
- Brain-Computer Interfaces: connecting neural processes to computational systems for communication, control, or rehabilitation.
- Algorithmic Bias & Cognitive Bias: studying parallels between human biases and AI decision-making; ensuring fair and ethical systems.
- Philosophy of Mind & AI: investigating conceptual questions about consciousness, representation, and machine intelligence.
Apply: https://jobs.colorado.edu/jobs/JobDetail/?jobId=71179
Dear all,
starting January 2027, 8 doctoral positions are available within
RTG KEMAI (Knowledge Infusion and Extraction for Explainable Medical AI)
at Ulm University, funded by DFG.
The KEMAI team aims at combining the benefits of knowledge- and
learning-based systems, to not only allow for state-of-the-art accuracy
in medical diagnosis, but to also clearly communicate the obtained
predictions to physicians, considering ethical implications within the
medical decision process.
KEMAI’s main purpose is to interdisciplenarily train doctoral students
from computer science, medicine, and ethics in the area of explainable
medical AI. The RTG offers a structured doctoral program that creates an
environment in which young scientists can conduct research at the
highest level in the field of medical AI.
We invite highly motivated candidates with a passion for research and a
desire to contribute to an interdisciplinary academic environment to
apply for these positions. (The positions are fully funded for 3+1 years
and come with an E13 salary.)
Applications are now being accepted, with a deadline at the end of each
month until all positions have been filled.
For further information and application please visit the RTG's webiste:
https://kemai.uni-ulm.de/
Best regards
Christiane Boehm
Coordinator
Ulm University
*KEMAI*
*Knowledge Infusion and Extraction for Explainable Medical AI*
Research Training Group funded by the German Research Foundation (DFG)
https://kemai.uni-ulm.de
James-Franck-Ring 1 | room: O27/3217 | 89081 Ulm | Germany
Phone: +49 (0)731 50-31321 | e-mail: christiane.boehm(a)uni-ulm.de
*contact hours:*
Monday & Thursday morning
Tuesday & Wednesday afternoon
Phone: +49 (0)731 50-31321 | e-mail: christiane.boehm(a)uni-ulm.de
*contact hours:*
Monday & Thursday morning
Tuesday & Wednesday afternoon
*Summary:*
* Subject: CookBot - Assistive robot for cooking
* Keywords: Robotics manipulation, Tasks Planning, Assistive Technologies
* Research Unit: Lab-STICC (UMR CNRS 6285)
* Team: RAMBO - Robot interaction, Ambient system, Machine learning,
Behaviour, Optimization
* Location: IMT Atlantique, Brest
* Start: September/October 2026
* Duration: 3 years
* Supervision: Christophe Lohr, Mihai Andries
*
Full subject description and Application instructions:*
https://www.imt-atlantique.fr/sites/default/files/recherche/Offres%20de%20t…
*Application*
The candidate must hold (or is about to obtain) a Master Degree in
Computer Science with theoretical and practical skills in AI algorithms
and associated deep-learning tools, and a solid background in robotics.
The candidate should be fluent in English (working and publishing main
language).
A detailed application should be addressed to Christophe Lohr and Mihai
Andries, including a cover letter, an up-to-date CV, transcripts of
grades (last two years), and a list of referees.
*Deadline:* 15 May 2026
*** apologies for cross-posting ***
Registration for the 8th edition of the Translation in Transition
Conference, taking place 9-11 September 2026 at RWTH Aachen University, is
now open!
Please see our website for the registration link and more useful information
[1]. Early-bird registration ends on June 30. Under Accommodation [2], you
can find a selection of different hotel rooms in Aachen that have been
reserved for conference participants.
The conference makes room for discussion of all strands of empirical
research in translation and interpreting studies (TIS), including at the
intersection of various multilingual text production contexts. Our invited
keynote speakers are Gaëtanelle Gilquin (UCLouvain, Louvain-la-Neuve), Marta
Kajzer-Wietrzny (Adam Mickiewicz University, Poznan) and Jean Nitzke
(University of Agder, Kristiansand).
In addition to this, the 2026 edition will host a workshop on the topic of
transfer from Translation and Interpreting Studies into other fields both
within and outside of academia in cooperation with the Institute of
Translatology [3].
For further information, please visit the conference website or contact the
conference organisers at <mailto:events@ifaar.rwth-aachen.de>
events(a)ifaar.rwth-aachen.de.
We hope to welcome you in Aachen this September!
Best wishes,
The TT8 organising committee
[1]
<https://www.anglistik.rwth-aachen.de/cms/Anglistik/Forschung/Konferenzen-Ve
ranstaltungen/Translation-in-Transition-Conference/~bofdjv/Registration-Subm
ission/>
https://www.anglistik.rwth-aachen.de/cms/Anglistik/Forschung/Konferenzen-Ver
anstaltungen/Translation-in-Transition-Conference/~bofdjv/Registration-Submi
ssion/
[2]
<https://www.anglistik.rwth-aachen.de/cms/Anglistik/Forschung/Konferenzen-Ve
ranstaltungen/Translation-in-Transition-Conference/~bofdtn/Venue/>
https://www.anglistik.rwth-aachen.de/cms/Anglistik/Forschung/Konferenzen-Ver
anstaltungen/Translation-in-Transition-Conference/~bofdtn/Venue/
[3] https://institut-translatologie.de/
Prof. Dr. Stella Neumann
Anglistische Sprachwissenschaft
RWTH Aachen University
Institut für Anglistik
Zi. 101
Kármánstr. 17/19
D-52062 Aachen
Tel. +49 (0)241 80-96105
Dear all,
WOCHAT 2026 (Workshop on Chatbots and Agentic Technologies) is calling for papers.
The venue is co-located with SIGDIAL 2026 @ Emory University, Atlanta, Georgia, August 2nd, 2026.
We're looking for original research on:
→ Agentic & goal-driven dialogue systems
→ Multi-agent coordination & negotiation
→ Multimodal grounding (text, speech, vision)
→ Commonsense reasoning & theory of mind in dialogue
→ Emotion modeling beyond basic categories
→ Robustness, safety & trustworthiness in conversational AI
→ Evaluation beyond surface fluency
→ Dialogue in finance, healthcare, legal, cybersecurity & more
Important Dates:
📅 Submission Deadline: June 1st, 2026 (AoE)
📅 Notification of acceptance: June 22nd, 2026
Format:
📄 Long papers (max 8 pages) | Short papers (max 4 pages)
🔒 Double-blind | ACL two-column format | Original & unpublished work only
If your work pushes dialogue systems beyond surface-level responses, this is your venue.
📋 Full CFP: https://sites.google.com/view/wochat2026/call-for-papers
See you in Atlanta!
(s.) Mahed Mousavi, Ph.D.
Assistant Professor (RTD-A)
Dept. of Information Engineering & Computer Science
University of Trento
Dear all,
I would like to invite you to join our Open Research Group organised by the ESRC Centre for Corpus Approaches to Social Science, Lancaster University, beginning on Wednesday 22 April at 12:00 (UK time).
This term��s theme is Collocations and how to compute collocation measures, alongside a gentle introduction to R for beginners. The sessions are designed to be accessible and practical, with opportunities to explore key concepts in corpus linguistics and gain hands-on experience with basic statistical analysis in R.
Schedule (all sessions 12:00�C12:50 UK time, online or County South B089):
* Wednesday, 22/04/2026
* Wednesday, 06/05/2026
* Wednesday, 20/05/2026
* Wednesday, 03/06/2026
FREE registration: https://forms.office.com/e/YT5md2fjka
Suggested readings:
* Brezina, V. & Gablasova, D. (2026). A Frequency Dictionary of Multi-Word Expressions in British English: Core Phrases and Exercises for Learners. Routledge.
* Brezina, V. (2018). Statistics in Corpus Linguistics. CUP, Chapter 3 (collocation section): https://www.google.co.uk/books/edition/Statistics_in_Corpus_Linguistics/zLB…
* Evert, S. (2008). ��Corpora and collocations.�� In Corpus Linguistics: An International Handbook: https://stephanie-evert.de/PUB/Evert2007HSK_extended_manuscript.pdf
* Gablasova, D., Brezina, V., & McEnery, T. (2017). ��Collocations in corpus�\based language learning research.�� Language Learning, 67(S1), 155�C179: https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/lang.12225
* Sinclair, J., Jones, S., & Daley, R. (2004). English Collocation Studies: The OSTI Report. Continuum: https://www.google.co.uk/books/edition/English_Collocation_Studies/1kTkHAXe…
Professor Vaclav Brezina
Professor in Corpus Linguistics
Co-Director of the ESRC Centre for Corpus Approaches to Social Science
Faculty of Humanities, Arts and Social Sciences, Lancaster University
Lancaster, LA1 4YD
Office: County South, room B46
T: +44 (0)1524 510828
@vaclavbrezina
Dear Colleagues,
I am writing to share a brief reminder regarding the call for chapter proposals for our upcoming Springer edited volume, "Data-Driven Language Teaching and Learning: Theory, Research, and Practice."
We have already received a range of fascinating proposals, and we would love to see your work represented in this collection. We are specifically seeking contributions that span the full spectrum of Data-Driven Learning, from theoretical frameworks offering new perspectives on data-informed pedagogy to conceptual models for innovative instructional design and practical applications rooted in classroom-based research. Whether your research focuses on AI-assisted feedback, corpus-based materials development, or the unique challenges of implementing DDL in multilingual settings, we warmly invite you to share your insights.
Key Deadlines & Details:
Abstract Submission (400-600 words): 1 May 2026
Notification of Acceptance: June 2026
Full Chapters (7,000-9,000 words): October 2026
Review Process: Double-blind peer review
For full details on submission guidelines and the volume's scope, please refer to the Call for Chapters: https://tinyurl.com/3dbhk97f
Please feel free to reach out if you have any questions or would like to discuss a potential topic. We also encourage you to share this invitation with any colleagues or researchers who may be interested.
We look forward to receiving your abstracts!
Best,
Cansu Akan
Dear Professor Wan,
Thank you for the pointer to your work — I'll read through the site and
your posts. I appreciate the quickness of your response and look forward to
reading more of it.
I think our concerns may be partly orthogonal. The paper's headline
measure is phone-level (schwa proportion in CMUdict transcriptions), not
word-level; "word" enters only as an operational unit for the
Flesch-Kincaid baseline we compare against, and the scope is bounded to
English prose register classification on four named corpora. We don't claim
cross-linguistic generality or make prescriptive claims
about "language."
That said, your point about the undefined status of "word" across writing
systems is well-taken, and I'll add a scope/limitations note making the
English-and-CMUdict dependency explicit rather than implicit. It does seem
important to identify and accommodate implicit anglophone bias in
scientific contexts.
Best,
Kyle
On Thu, Apr 16, 2026 at 3:22 PM Ada Wan <adawan919(a)gmail.com> wrote:
> Dear Kyle
>
> Please be notified of my findings from 2019 on (see
> sites.google.com/view/adawan) as well as all my posts and
> replies/comments on X.com since 2021 (@adawan919) and on LinkedIn.com.
> Please note that working on/with "word(s)" can be considered a violation
> of research integrity and/or of the law.
>
> Feedback:
> While I understand that most of my findings might seem distant to those
> from "linguistics proper", the most important takeaway is the same: "word"
> is not a reliable unit for scientific work. And working on "grammar" and
> "language" is / can be unethical. If you can transition to research without
> working on or leveraging "w/s/ls/g/l", that'd be optimal. Otherwise, an
> immediate feedback to your work would be just do it without "word(s)".
> At the first sight, evaluating schwa density of a particular dataset is
> not wrong in itself --- in fact, from the perspective of linguistics or
> "'language' science" (except my findings show that there cannot be any more
> science with "language"), it could even be avant-garde work to estimate,
> without "words", schwa density of a given document and/or compare it with
> another document. But when one considers how in the context of "language"
> (or "w/s/ls/g/l") being un- and under-defined, and often defaulting
> subliminally to a prescriptive grammarian perspective, it'd be better and
> safer to simply refrain from publishing this kind of philological
> contributions (and yes, I understand that within linguistics, this work
> would/could be considered scientific/rigorous already, but that is not
> enough).
>
> Feel free to let me know if you should have any questions.
>
> Best regards
> Ada Wan
> https://sites.google.com/view/adawan
>
>
> On Thu, Apr 16, 2026 at 8:17 PM Kyle Townsend via Corpora <
> corpora(a)list.elra.info> wrote:
>
>>
>> Dear colleagues,
>>
>> I'd like to share a new preprint on single-feature register
>> classification in English text:
>>
>> "Schwa Density as a Phonological Stylistic Classifier: Primary
>> Stylistic, Secondary Modality -- A Four-Corpus Pre-Registered
>> Replication"
>>
>> Preprint:
>> https://ling.auf.net/lingbuzz/009926/current.pdf?_s=WPGovroKhmABLC0P
>> Materials/code:
>> https://github.com/kylegtownsend-collab/schwa-density-spgc
>> Paper site: https://papers.letsharkness.com/schwa-density/
>>
>> The paper tests whether schwa density -- the proportion of vowel
>> phones in a text that are unstressed schwa (CMUdict AH0) -- can
>> serve as a phonologically motivated single-feature register
>> classifier. A pre-registered confirmatory plan was applied to NLTK
>> multi-source (N=164) and the Standardized Project Gutenberg Corpus
>> (N=2,767), with sensitivity analyses on Brown (N=313) and OANC
>> (N=4,375).
>>
>> Headline findings:
>>
>> - Schwa density matches or exceeds Flesch-Kincaid on all
>> pre-registered corpora.
>>
>> - A function-word ablation (masking the 198 NLTK English stopwords
>> before computing schwa density) preserves or amplifies register
>> discrimination on all four corpora (eta^2 retention 0.93-1.27),
>> ruling out stopword frequency as a confound.
>>
>> - The ablation operationalises a two-regime finding: schwa density
>> functions as a Primary Stylistic Feature on within-prose
>> variation (NLTK, SPGC, Brown) and a Secondary Modality Feature on
>> speech-versus-writing variation (OANC).
>>
>> - Joint partial-eta^2 retains 46-53% of the register signal on the
>> pre-registered corpora after controlling jointly for syllables
>> per word, mean word length, and Latinate ratio.
>>
>> The pre-registration, deviation log, analyser, ablation and
>> G2P-fallback scripts, per-corpus feature tables, and
>> figure-generation code are all openly available in the repository
>> (MIT / CC-BY-4.0).
>>
>> Comments and criticisms welcome.
>>
>> Thanks,
>> Kyle Townsend
>> Independent
>> ktownsend(a)spfk12.org
>>
>>
>> _______________________________________________
>> Corpora mailing list -- corpora(a)list.elra.info
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to corpora-leave(a)list.elra.info
>>
>
--
Thanks,
Kyle Townsend
Instructor, English IIA, Humanities, Yearbook I/II
Scotch Plains-Fanwood High School
Pronouns: he/him/his (What's This? <https://www.mypronouns.org/>)