Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
-----
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view (It took me a while for everything to sink in.)
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info