[Corpora-List] Re: [External] Re: NIF: NLP Interchange Format

17 Oct 2023


      Dear Ada,
I agree that lemmatisation is a construct and is not a universal method for linguistic analyses, but I don't understand why it is imperative that I wean myself from using lemmas.
What is it that restricts my freedom to invent the lemma (a non-universal construct) AĞAÇ-, for example, to refer to the one and only "meaningful thing" that is common to the very many (theoretically infinite, practically probably around 10,000) strings including ağaç, ağacı, ağaca, ağaçlar, ağacımızdaki, ağaçlandırılabilmesinden, ağaçsızlaşmasını, etc. etc.? How (and why) am I supposed to talk about that very large set without using a label for it?
Best,
Orhan Bilgin
On 17 Oct 2023 18:36, Ada Wan via Corpora corpora@list.elra.info wrote:
This email originated outside the University. Check before clicking links or attachments.
Dear Christian
Re your PS:
one doesn't need to debate the use/future of lemmatization, though I'd welcome such as part of scholarship. For those experienced in matters in/of Linguistics, it should be clear that lemmatization was simply a cconstruct, a entry-level philological exercise (esp. for those from Computer Science with less of a background in Linguistics and language(s)). It has been sad that some have picked up the habit of using lemmatization as a heuristic (though for what, specifically?) and might have become, apparently, too addicted to it to let it go. It is imperative that one weans themselves from such habit.
Methods for linguistic morphology, e.g. (morphological) parsing or stemming, are not a universal decomposition scheme, nor a universal method for language/linguistic analyses. Also important is to bear in mind is that neither linguistic morphology nor lemmas/lemmata doesn't/don't have that long of a history.
Thanks for being open-minded enough to read this far.
Best
Ada

2026

2025

2024

2023

2022

[Corpora-List] Re: [External] Re: NIF: NLP Interchange Format