On Tue, Jun 21, 2022 at 12:14 AM Flor, Michael MFlor@ets.org wrote:
The notion of 'word' has difficulties in linguistics. But not enough for abandoning it.
Except we don't need it at all --- for both human or machine processing.
The argument from the paper "Fairness in Representation for Multilingual NLP" is not convincing at all.
Even if the early findings are correct for transformers ,
applicability to human language faculty is not yet supported.
Right, this paper version has not yet addressed the whole story, which I
have yet to continue with. But one can get the gist from conditional probability, context, and finer granularity.
On the other hand, it is not even needed. Developmental linguists have noted long ago that babies acquire all natural languages at approximately the same rate (under some 'standard conditions'), despite vast morphological and other differences between languages. Thus, in some sense, all natural human languages are already deemed 'equal' vis-a-vis acquisition complexity.
Well, talk to the NLP crowd or the ones who expect LM/MT results from
different languages should have different performances, even if/when all else were equal. (I remember how hard and how many rounds I had to work for my rebuttals....)
For language learning later in life, if one's native language is morphologically rich, learning (some types of) morphologically rich languages (as an adult) is a bit easier than learning a language that is very different, etc.
That's the thing about this paper --- my personal take with L_n learning
is that, no, it's actually also just a length and vocabulary thing wrt whatever one is used to (e.g. with L1), the environment/support available, and +/- personal propensity towards new lang.
Complexity of words in a language for non-native speakers/learners is actually a big issue and a field of research in EFL (and now in NLP as well).
See above.
Finally, word complexity is often defined within the same language (e.g. able-ability, function-dysfunctional), and so a notion of cross-linguistic hegemony or malice is not even applicable here.
What would it take for me to convince you that such "complexity" really
boils down to just length and vocab (think the examples you gave, viewed from, say, a character perspective)? E.g. is 'Xjfewijpiweoheymqaweopaf'h' more or less complex than 'multiple-dysfunction-prone' to you?