Call for abstracts
Workshop: Computational models of diachronic language change
@the International Conference on Historical Linguistics (ICHL26)
Organizers:
Stefania Degaetano-Ortlieb*, Lauren Fonteyn+, Marie-Pauline Krielke*, Elke
Teich*
*Saarland University, +Leiden University
Submission deadline: January 1st 2023
Submission format: One-page abstract (plus references) to be sent to
s.degaetano(a)mx.uni-saarland.de <mailto:s.degaetano@mx.uni-saarland.de>
Notification of acceptance: January 12th 2023
Additional information:
We envisage a full day workshop in presence with presentations (20min + 5-10
min discussion). After the workshop, we aim to publish a Special Journal
Issue in an open access journal.
Workshop description:
While the study of diachronic language change has long been firmly grounded
in corpus data analysis, it seems fair to state that the field has been
subject of a computational turn over the last decade or so, computational
models being increasingly adopted across several research communities,
including corpus and computational linguistics, computational social
science, digital humanities, and historical linguistics.
The core technique for the investigation of diachronic change are
distributional models (DMs). DMs rely on the fact that related meanings
occur in similar contexts and allow us to study lexical-semantic change in a
data-driven way (e.g. as argued by Sagi et al. 2011), and on a larger scale
(e.g. as shown on the Google NGram corpus by Gulordava & Baroni 2011).
Besides count-based models (e.g. Hilpert & Saavedra 2017), contextualized
word embeddings are increasingly employed for diachronic modeling, as such
models are able to encode rich, context-sensitive information on word usage
(see Lenci 2018 or Fonteyn et al., 2022 for discussion).
In previous work, DMs have been used to determine laws of semantic change
(e.g. Hamilton et al. 2016b, Dubossarsky et al. 2017) as well as develop
statistical measures that help detect different types of change (e.g.
specification vs. broadening; cultural change vs. linguistic change;
Hamilton et al. 2016a, Del Tredici et al. 2019). DMs have also been used to
map change in specific (groups of) concepts (e.g. racism, knowledge; see
Sommerauer & Fokkens 2019 for a discussion). Further studies have suggested
ways of improving the models that generate (diachronic) word embeddings to
attain these goals (e.g. Rudolph & Blei 2018).
Existing studies and projects focus on capturing and quantifying aspects of
semantic change. Yet, over the past decade, DMs have also been shown to be
useful to investigate other types of change in language use, including
grammatical change. Within the computational and corpus linguistic
communities, for example, Bizzoni et al. (2019, 2020) have shown an
interdependency between lexical and grammatical changes and Teich et al.
(2021) use embeddings to detect (lexico-) grammatical conventionalization
(which may lead to grammaticalization). Within diachronic linguistics, the
use of distributional models is focused on examining the underlying
functions of grammatical structures across time (e.g. Perek 2016, Hilpert
and Perek 2015, Gries and Hilpert 2008, Fonteyn 2020, Budts 2020).
Specifically targeting historical linguistic questions, Rodda et al. (2019)
and Sprugnoli et al. (2020) have shown that computational models are
promising for analyzing ancient languages, and McGillivray et al. (2022)
highlight the advantages of word embeddings (vs. count-based methods) while
also pointing to the challenges and the limitations of these models.
A common concern across these different communities is to better understand
the general principles or laws of language change and the underlying
mechanisms (analogy, priming, processing efficiency, contextual
predictability as measured by surprisal, etc.). In the proposed workshop,
we want to bring together researchers from relevant communities to talk
about the unique promises that computational models hold when applied to
diachronic data as well as the specific challenges they involve. In doing
so, we will identify common ground and explore the most pressing problems
and possible solutions.
Specific questions will concern:
Model utility: How can we capture change in language use beyond
lexical-semantic change, e.g. change in grammatical constructions,
collocations, phraseology?
Model quality: How can we evaluate computational models of historical
language stages in absence of native-speaker gold standards? To what
extent does the quality of historical and diachronic corpora affect the
performance of models?
Model analytics: How do we transition from testing the reliability of models
to employing them to address previously unanswered research questions on
language change? How can we detect and measure change? What are suitable
analytic procedures to interpret the output of models?
References:
Bizzoni, Y., Degaetano-Ortlieb, S., Menzel, K., Krielke, P., and Teich, E.
(2019). Grammar and meaning: analysing the topology of diachronic word
embeddings. In Proceedings of the 1st International Workshop on
Computational Approaches to Historical Language Change, ACL, Florence,
Italy, pp. 175185.
Bizzoni, Y., Degaetano-Ortlieb, S., Fankhauser, P., and Teich, E. (2020).
Linguistic variation and change in 250 years of English scientific writing:
a data-driven approach. Frontiers in Artificial Intelligence, 3.
Budts, S. (2020). "A connectionist approach to analogy. On the modal meaning
of periphrastic do in Early Modern English". Corpus Linguistics and
Linguistic Theory, 18(2), pp. 337364.
Del Tredici, M., Fernández, R., and Boleda, G. (2019). Short-term meaning
shift: A distributional exploration. In Proceedings of the Conference of
the North American Chapter of the Association for Computational Linguistics
(NAACL): Human Language Technologies, Minneapolis, Minnesota, USA, pp.
20692075.
Dubossarsky, H., Weinshall, D., and Grossman, E. (2017). Outta control:
laws of semantic change and inherent biases in word representation models.
In Proceedings of the Conference on Empirical Methods in Natural Language
Processing (EMNLP), Copenhagen, Denmark, pp. 11361145.
Fonteyn, L. (2020). "What about grammar? Using BERT embeddings to explore
functional-semantic shifts of semi-lexical and grammatical constructions."
Computational Humanities Research CEUR-WS, pp. 257268.
Fonteyn, L., Manjavacas, E., and Budts, S. (2022). Exploring
Morphosyntactic Variation & Change with Distributional Semantic Models.
Journal of Historical Syntax, 7(12), pp. 141.
Gries, S. T., and Hilpert, M. (2008). The identification of stages in
diachronic data: variability-based Neighbor Clustering. Corpora, 3(1), pp.
5981.
Gulordava, K., and Baroni, M. (2011). A distributional similarity approach
to the detection of semantic change in the Google Books Ngram corpus. In
Proceedings of Geometrical Models for Natural Language Semantics (GEMS),
EMNLP, Edinburgh, United Kingdom, pp. 6771.
Hamilton, W. L., Leskovec, J., and Jurafsky, D. (2016a). Cultural shift or
linguistic drift? comparing two computational models of semantic change. In
Proceedings of the Conference on Empirical Methods in Natural Language
Processing, EMNLP, Austin, Texas, USA, pp. 21162121.
Hamilton, W. L., Leskovec J., and Jurafsky, D. (2016b). Diachronic word
embeddings reveal statistical laws of semantic change. In Proceedings of
Morphosyntactic Variation & Change with DSMs, 54th Annual Meeting of the
Association for Computational Linguistics, ACL, Berlin, Germany, pp.
14891501.
Hilpert, M., and Saavedra, D.C. (2020). "Using token-based semantic vector
spaces for corpus-linguistic analyses: From practical applications to tests
of theoretical claims". Corpus Linguistics and Linguistic Theory, 16(2), pp.
393424.
Hilpert, M. and Perek, F. (2015). Meaning change in a petri dish:
constructions, semantic vector spaces, and motion charts. Linguistics
Vanguard, 1(1), pp. 339350.
Lenci, A. (2018). Distributional Models of Word Meaning. Annual Review of
Linguistics, 4, pp. 151171.
Perek, F. (2016). Using distributional semantics to study syntactic
productivity in diachrony: a case study. Linguistics, 54(1), pp. 149188.
Rodda, M.A., Probert, P., and McGillivray, B. (2019). Vector space models
of Ancient Greek word meaning, and a case study on Homer. TAL Traitement
Automatique des Langues, 60(3), pp. 6387.
Rudolph, M., and Blei, D. (2018). Dynamic embeddings for language
evolution. In Proceedings of the 2018 World Wide Web Conference (WWW 18),
Lyon, France, pp. 10031011.
Sagi, E., Kaufmann, S., and Clark, B. (2011). Tracing semantic change with
Latent Semantic Analysis. Current Methods in Historical Semantics, 73, pp.
161183.
Sommerauer, P., and Fokkens, A. (2019). Conceptual Change and
Distributional Semantic Models: An Exploratory Study on Pitfalls and
Possibilities. In Proceedings of the 1st International Workshop on
Computational Approaches to Historical Language Change, Florence, Italy, pp.
223233.
Sprugnoli, R., Moretti, G., and Passarotti, M. (2020). Building and
Comparing Lemma Embeddings for Latin. Classical Latin versus Thomas
Aquinas. IJCoL. Italian Journal of Computational Linguistics, 6(6-1), pp.
2945.
Teich, E., Fankhauser P., Degaetano-Ortlieb, S., and Bizzoni, Y. (2021).
Less is More/More Diverse: On the Communicative Utility of Linguistic
Conventionalization. Frontiers in Communication, 5.