I am sorry, I forgot to mention that the strings in my example are Turkish words, and the lemma AĞAÇ- corresponds to the lemma TREE- in English, which is a label that allows me to talk about the only meaningful thing that is common to the set {tree, trees}.
Best,
Orhan
On 17 Oct 2023 20:44, "Bilgin, Orhan (Postgraduate Researcher)" o.bilgin@lancaster.ac.uk wrote: Dear Ada,
I agree that lemmatisation is a construct and is not a universal method for linguistic analyses, but I don't understand why it is imperative that I wean myself from using lemmas.
What is it that restricts my freedom to invent the lemma (a non-universal construct) AĞAÇ-, for example, to refer to the one and only "meaningful thing" that is common to the very many (theoretically infinite, practically probably around 10,000) strings including ağaç, ağacı, ağaca, ağaçlar, ağacımızdaki, ağaçlandırılabilmesinden, ağaçsızlaşmasını, etc. etc.? How (and why) am I supposed to talk about that very large set without using a label for it?
Best,
Orhan Bilgin
On 17 Oct 2023 18:36, Ada Wan via Corpora corpora@list.elra.info wrote:
This email originated outside the University. Check before clicking links or attachments.
Dear Christian
Re your PS: one doesn't need to debate the use/future of lemmatization, though I'd welcome such as part of scholarship. For those experienced in matters in/of Linguistics, it should be clear that lemmatization was simply a cconstruct, a entry-level philological exercise (esp. for those from Computer Science with less of a background in Linguistics and language(s)). It has been sad that some have picked up the habit of using lemmatization as a heuristic (though for what, specifically?) and might have become, apparently, too addicted to it to let it go. It is imperative that one weans themselves from such habit. Methods for linguistic morphology, e.g. (morphological) parsing or stemming, are not a universal decomposition scheme, nor a universal method for language/linguistic analyses. Also important is to bear in mind is that neither linguistic morphology nor lemmas/lemmata doesn't/don't have that long of a history.
Thanks for being open-minded enough to read this far.
Best Ada
[To those who do not have shared interests on issues that pertain to Corpora-List matters, such as data/corpora and their handling which includes but is not limited to linguistic/NLP theories/methods (and the validity thereof): please disregard.]
Dear Orhan
Thanks for your interests in this discussion. I think it is high time that our community comes to a critical (re-)examination of (linguistic) morphology (and to address issues concerning reinterpretation and transition).
First of all, allow me to put my traditional grammarian hat on to get to your question more directly. You brought up an example of a morphological paradigm. Now, as linguists or language professionals, we know that language is (re-)productive in nature. So, if you don't mind, we can do a thought experiment and go through this dialectically (pls note that I only check my emails about once a day on weekdays, however).
1. Let's think of a verb that does not yet exist (in any particular language(s) that you can think of or that you are used to). Would you mind conjugating it for me? How many patterns would you have? And what would the forms be like? 2. Where did you get the patterns/paradigm from? If you were able to come up with a "full paradigm" (whatever that should refer to (?) --- but let's suppose, you have 6 forms (as per some textbook paradigms from some "Indo-European languages" --- 1st/2nd/3rd person in sg/pl), you surely haven't seen any of these forms combined with the verb before, have you? So where is your evidence that these forms exist in reality beyond that of your mind? And if such "perfect/ideal paradigm" exists only in your mind (and minds of some of your friends as well), how do you justify that morphological paradigma (the form/"structure"/pattern) are a necessary or intrinsic part of language (may these be of any particular language (which "one"?) or or language in general)? Wouldn't morphology as well as the perpetual construction and reconstruction of morphological patterns be a self-fulfilling prophecy only? And how often do we impose our conceptual/perceptual habits/categories upon whatever "new" that we encounter? 3. If, however, you were not able to construct a "full paradigm" or any part thereof at all, or you claim you were not able to think of a hypothetical verb either, because to you morphology is solely based on what has been written and analyzed beforehand/historically, then what is there to claim about morphological analyses? Not only does such practice not generalize, but it would also just apply to calcified segments analyzed/interpreted in a certain way as part of philological pursuits in the past. One should bear in mind that philological methods can progress and update as well.
There are no limits as to how one can *use* (or some might even claim *define* here) "language", including how various modalities can combine/fuse with each other. Meaning has no fixed boundaries. When it comes to language or meaning, there is no "completeness" to "speak of" or to serve as basis of any science/study. And there are no fixed demarcations between any "particular languages" either.
Other perspectives on (the shortcomings of) morphology and "words" can be found on my rebuttal page here: https://openreview.net/forum?id=-llS6TiOew. Please also read the references cited therein.
I look forward to your reply, comments/remarks, or questions. (Actually, the floor can also be opened to anyone who would like to join.)
Thank you and best Ada
On Tue, Oct 17, 2023 at 7:52 PM Bilgin, Orhan (Postgraduate Researcher) via Corpora corpora@list.elra.info wrote:
I am sorry, I forgot to mention that the strings in my example are Turkish words, and the lemma AĞAÇ- corresponds to the lemma TREE- in English, which is a label that allows me to talk about the only meaningful thing that is common to the set {tree, trees}.
Best,
Orhan
On 17 Oct 2023 20:44, "Bilgin, Orhan (Postgraduate Researcher)" < o.bilgin@lancaster.ac.uk> wrote:
Dear Ada,
I agree that lemmatisation is a construct and is not a universal method for linguistic analyses, but I don't understand why it is imperative that I wean myself from using lemmas.
What is it that restricts my freedom to invent the lemma (a non-universal construct) AĞAÇ-, for example, to refer to the one and only "meaningful thing" that is common to the very many (theoretically infinite, practically probably around 10,000) strings including ağaç, ağacı, ağaca, ağaçlar, ağacımızdaki, ağaçlandırılabilmesinden, ağaçsızlaşmasını, etc. etc.? How (and why) am I supposed to talk about that very large set without using a label for it?
Best,
Orhan Bilgin
On 17 Oct 2023 18:36, Ada Wan via Corpora corpora@list.elra.info wrote:
*This email originated outside the University. Check before clicking links or attachments.* Dear Christian
Re your PS: one doesn't need to debate the use/future of lemmatization, though I'd welcome such as part of scholarship. For those experienced in matters in/of Linguistics, it should be clear that lemmatization was simply a cconstruct, a entry-level philological exercise (esp. for those from Computer Science with less of a background in Linguistics and language(s)). It has been sad that some have picked up the habit of using lemmatization as a heuristic (though for what, specifically?) and might have become, apparently, too addicted to it to let it go. It is imperative that one weans themselves from such habit. Methods for linguistic morphology, e.g. (morphological) parsing or stemming, are not a universal decomposition scheme, nor a universal method for language/linguistic analyses. Also important is to bear in mind is that neither linguistic morphology nor lemmas/lemmata doesn't/don't have that long of a history.
Thanks for being open-minded enough to read this far.
Best Ada
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info