[Corpora-List] Re: [External] Re: NIF: NLP Interchange Format

18 Oct 2023


      [To those who do not have shared interests on issues that pertain to
Corpora-List matters, such as data/corpora and their handling which
includes but is not limited to linguistic/NLP theories/methods (and the
validity thereof): please disregard.]
Dear Orhan
Thanks for your interests in this discussion. I think it is high time that
our community comes to a critical (re-)examination of (linguistic)
morphology (and to address issues concerning reinterpretation and
transition).
First of all, allow me to put my traditional grammarian hat on to get to
your question more directly. You brought up an example of a morphological
paradigm.
Now, as linguists or language professionals, we know that language is
(re-)productive in nature. So, if you don't mind, we can do a thought
experiment and go through this dialectically (pls note that I only check my
emails about once a day on weekdays, however).
1. Let's think of a verb that does not yet exist (in any particular
language(s) that you can think of or that you are used to). Would you mind
conjugating it for me? How many patterns would you have? And what would the
forms be like?
2. Where did you get the patterns/paradigm from? If you were able to come
up with a "full paradigm" (whatever that should refer to (?) --- but let's
suppose, you have 6 forms (as per some textbook paradigms from some
"Indo-European languages" --- 1st/2nd/3rd person in sg/pl), you surely
haven't seen any of these forms combined with the verb before, have you? So
where is your evidence that these forms exist in reality beyond that of
your mind? And if such "perfect/ideal paradigm" exists only in your mind
(and minds of some of your friends as well), how do you justify that
morphological paradigma (the form/"structure"/pattern) are a necessary or
intrinsic part of language (may these be of any particular language (which
"one"?) or or language in general)? Wouldn't morphology as well as the
perpetual construction and reconstruction of morphological patterns be a
self-fulfilling prophecy only? And how often do we impose our
conceptual/perceptual habits/categories upon whatever "new" that we
encounter?
3. If, however, you were not able to construct a "full paradigm" or any
part thereof at all, or you claim you were not able to think of a
hypothetical verb either, because to you morphology is solely based on what
has been written and analyzed beforehand/historically, then what is there
to claim about morphological analyses? Not only does such practice not
generalize, but it would also just apply to calcified segments
analyzed/interpreted in a certain way as part of philological pursuits in
the past. One should bear in mind that philological methods can progress
and update as well.
There are no limits as to how one can *use* (or some might even claim
*define* here) "language", including how various modalities can
combine/fuse with each other. Meaning has no fixed boundaries. When it
comes to language or meaning, there is no "completeness" to "speak of" or
to serve as basis of any science/study. And there are no fixed demarcations
between any "particular languages" either.
Other perspectives on (the shortcomings of) morphology and "words" can be
found on my rebuttal page here: https://openreview.net/forum?id=-llS6TiOew.
Please also read the references cited therein.
I look forward to your reply, comments/remarks, or questions. (Actually,
the floor can also be opened to anyone who would like to join.)
Thank you and best
Ada
On Tue, Oct 17, 2023 at 7:52 PM Bilgin, Orhan (Postgraduate Researcher) via
Corpora corpora@list.elra.info wrote:
...
I am sorry, I forgot to mention that the strings in my example are Turkish
words, and the lemma AĞAÇ- corresponds to the lemma TREE- in English, which
is a label that allows me to talk about the only meaningful thing that is
common to the set {tree, trees}.
Best,
Orhan
On 17 Oct 2023 20:44, "Bilgin, Orhan (Postgraduate Researcher)" <
o.bilgin@lancaster.ac.uk> wrote:
Dear Ada,
I agree that lemmatisation is a construct and is not a universal method
for linguistic analyses, but I don't understand why it is imperative that I
wean myself from using lemmas.
What is it that restricts my freedom to invent the lemma (a non-universal
construct) AĞAÇ-, for example, to refer to the one and only "meaningful
thing" that is common to the very many (theoretically infinite, practically
probably around 10,000) strings including ağaç, ağacı, ağaca, ağaçlar,
ağacımızdaki, ağaçlandırılabilmesinden, ağaçsızlaşmasını, etc. etc.? How
(and why) am I supposed to talk about that very large set without using a
label for it?
Best,
Orhan Bilgin
On 17 Oct 2023 18:36, Ada Wan via Corpora corpora@list.elra.info wrote:
*This email originated outside the University. Check before clicking links
or attachments.*
Dear Christian
Re your PS:
one doesn't need to debate the use/future of lemmatization, though I'd
welcome such as part of scholarship. For those experienced in matters in/of
Linguistics, it should be clear that lemmatization was simply a cconstruct,
a entry-level philological exercise (esp. for those from Computer Science
with less of a background in Linguistics and language(s)). It has been sad
that some have picked up the habit of using lemmatization as a heuristic
(though for what, specifically?) and might have become, apparently, too
addicted to it to let it go. It is imperative that one weans themselves
from such habit.
Methods for linguistic morphology, e.g. (morphological) parsing or
stemming, are not a universal decomposition scheme, nor a universal method
for language/linguistic analyses. Also important is to bear in mind is that
neither linguistic morphology nor lemmas/lemmata doesn't/don't have that
long of a history.
Thanks for being open-minded enough to read this far.
Best
Ada

Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info

2026

2025

2024

2023

2022

[Corpora-List] Re: [External] Re: NIF: NLP Interchange Format