Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...

List overview All Threads
Download

newer

older

Job vacancy: Lecturer in Spanish...

Two Open Positions at JMU...

Albretch Mueller

23 Jul 2023 23 Jul '23

6:20 a.m.

At times google showers you with senseless links. lbrtchx

Show replies by date

Darren Cook

23 Jul 23 Jul

6 p.m.

...

At times google showers you with senseless links.

A tensor is just a generalization of a vectors and matrices, so might be the distracting search term?

Though I've just put "tensors-based corpora NLP research" in DuckDuckGo and all the first page of (non-Ad) hits look to be on-topic.

Can you be more specific about what you are hoping to find, or why you are searching. You mention homework, so are you after a textbook? "examples" means coding examples? Searching for "corpora NLP" on oreilly.com gets 993 hits, 944 of them books.

Darren

Peratham Wiriyathammabhum

7:34 p.m.

Albretch Mueller

11:58 p.m.

On 7/23/23, Darren Cook via Corpora corpora@list.elra.info wrote:

...

A tensor is just a generalization of a vectors and matrices, so might be the distracting search term?

Perhaps my doubts relate to the fact that as a theoretical physicist myself, the kind of "mathematical purity" I was trained into can't digest well how you can use vector/tensor algebra with texts if, based on my way of seeing this type of matter, the concepts of space, vector and consequently product of two vectors have not been properly defined.

How do they define "space" and "vector" when it comes to corpora?

...

Searching for "corpora NLP" on oreilly.com gets 993 hits, 944 of them books.

I haven't found a convincing definition, yet. The concepts of metric space and measurement are well-defined in Mathematics:

https://en.wikipedia.org/wiki/Metric_space

https://en.wikipedia.org/wiki/Measure_(mathematics)

but you don't notice references to applications to textual processing ... even though those culturing the "AI" techne can't stop talking about "deep learning", "information", "the semantic web", ...

lbrtchx

Albretch Mueller

24 Jul 24 Jul

12:01 a.m.

I can't possibly be the only one who has noticed such things. Do you know of any paper going over such foundational issues? Comparing in actual corpora different definitions of text "similarity" ...? lbrtchx

Peratham Wiriyathammabhum

12:28 a.m.

It’s simply something called “distributional semantics” if you want to know.

I just chimed in since I have a few pubs on this tensor thing.

...

On 24 Jul BE 2566, at 05:01, Albretch Mueller via Corpora corpora@list.elra.info wrote:

I can't possibly be the only one who has noticed such things. Do you know of any paper going over such foundational issues? Comparing in actual corpora different definitions of text "similarity" ...? lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Peratham Wiriyathammabhum

12:29 a.m.

Feel free to email experts in the field for answers.

...

On 24 Jul BE 2566, at 05:28, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

It’s simply something called “distributional semantics” if you want to know.

I just chimed in since I have a few pubs on this tensor thing.

...
On 24 Jul BE 2566, at 05:01, Albretch Mueller via Corpora corpora@list.elra.info wrote:

I can't possibly be the only one who has noticed such things. Do you know of any paper going over such foundational issues? Comparing in actual corpora different definitions of text "similarity" ...? lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Peratham Wiriyathammabhum

12:32 a.m.

Since you are from Germany, I guess from your name, Prof.Volker Tresp or Dr.Maximillian Nickel are experts from Germany I know their names of.

...

On 24 Jul BE 2566, at 05:28, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

It’s simply something called “distributional semantics” if you want to know.

I just chimed in since I have a few pubs on this tensor thing.

...
On 24 Jul BE 2566, at 05:01, Albretch Mueller via Corpora corpora@list.elra.info wrote:

I can't possibly be the only one who has noticed such things. Do you know of any paper going over such foundational issues? Comparing in actual corpora different definitions of text "similarity" ...? lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Albretch Mueller

3:08 a.m.

On 7/23/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

...

It’s simply something called “distributional semantics” if you want to know.

https://en.wikipedia.org/wiki/Distributional_semantics

I know about such thing, but, quite honestly, I understand neither its mathematical grounding nor the possible reach of its pragmatism. Protagonism aside, I have had personal experiences with such things which I found abysmally stupid. I took two poems I wrote, one in German when I was in my early 20's and one in English later in life:

https://hsymbolicus.wordpress.com/category/gedichter/ (Pyramiden)

https://hsymbolicus.wordpress.com/category/poems/ (lies ...)

The reaction of ChatGPT was hilarious (so cunningly weird that I thought "the government" was once again messing with me). It couldn't even identify the language of the German poem (which I forced a bit for poetic purposes and "hey! it might have even 'learned' its share by now, no?" ;-)). It could not detect that both poems were written by the same writer and what kind of person would author such poems (it was just "stochastically parroting" "lies ..." stanzas). Someone who told me he worked on Latent Semantic Analysis said to me "my poem had broken his algorithm". I had shared my poems. I don't know what he meant.

I can certainly share the kinds of anxieties I had in mind when I wrote "lies ...", but I would love to hear first what Frau "Distributional Semantics", Herr "Latent Semantic Analysis", ... have to say about them.

lbrtchx

Peratham Wiriyathammabhum

3:15 a.m.

Tensor brains are not robustly trained. Ones can simply mess up with their condition numbers without typical memory address attacks as in hackers.

In these ways, we can interact without adding another mode :)

...

On 24 Jul BE 2566, at 08:08, Albretch Mueller via Corpora corpora@list.elra.info wrote:

On 7/23/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

...
It’s simply something called “distributional semantics” if you want to know.

https://en.wikipedia.org/wiki/Distributional_semantics

I know about such thing, but, quite honestly, I understand neither its mathematical grounding nor the possible reach of its pragmatism. Protagonism aside, I have had personal experiences with such things which I found abysmally stupid. I took two poems I wrote, one in German when I was in my early 20's and one in English later in life:

https://hsymbolicus.wordpress.com/category/gedichter/ (Pyramiden)

https://hsymbolicus.wordpress.com/category/poems/ (lies ...)

The reaction of ChatGPT was hilarious (so cunningly weird that I thought "the government" was once again messing with me). It couldn't even identify the language of the German poem (which I forced a bit for poetic purposes and "hey! it might have even 'learned' its share by now, no?" ;-)). It could not detect that both poems were written by the same writer and what kind of person would author such poems (it was just "stochastically parroting" "lies ..." stanzas). Someone who told me he worked on Latent Semantic Analysis said to me "my poem had broken his algorithm". I had shared my poems. I don't know what he meant.

I can certainly share the kinds of anxieties I had in mind when I wrote "lies ...", but I would love to hear first what Frau "Distributional Semantics", Herr "Latent Semantic Analysis", ... have to say about them.

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Peratham Wiriyathammabhum

3:17 a.m.

Que sera sera

...

On 24 Jul BE 2566, at 08:16, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

Tensor brains are not robustly trained. Ones can simply mess up with their condition numbers without typical memory address attacks as in hackers.

In these ways, we can interact without adding another mode :)

...
...
On 24 Jul BE 2566, at 08:08, Albretch Mueller via Corpora corpora@list.elra.info wrote:

On 7/23/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote: It’s simply something called “distributional semantics” if you want to know.

https://en.wikipedia.org/wiki/Distributional_semantics

I know about such thing, but, quite honestly, I understand neither its mathematical grounding nor the possible reach of its pragmatism. Protagonism aside, I have had personal experiences with such things which I found abysmally stupid. I took two poems I wrote, one in German when I was in my early 20's and one in English later in life:

https://hsymbolicus.wordpress.com/category/gedichter/ (Pyramiden)

https://hsymbolicus.wordpress.com/category/poems/ (lies ...)

The reaction of ChatGPT was hilarious (so cunningly weird that I thought "the government" was once again messing with me). It couldn't even identify the language of the German poem (which I forced a bit for poetic purposes and "hey! it might have even 'learned' its share by now, no?" ;-)). It could not detect that both poems were written by the same writer and what kind of person would author such poems (it was just "stochastically parroting" "lies ..." stanzas). Someone who told me he worked on Latent Semantic Analysis said to me "my poem had broken his algorithm". I had shared my poems. I don't know what he meant.

I can certainly share the kinds of anxieties I had in mind when I wrote "lies ...", but I would love to hear first what Frau "Distributional Semantics", Herr "Latent Semantic Analysis", ... have to say about them.

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Peratham Wiriyathammabhum

3:32 a.m.

(When I was working on a paper draft in this tensor topic, I put some equations in the abstract where all were removed by my advisor at that time. Another professor in the lab emphasized more about mathematics abstraction. In fact, this topic has a very unlengthly formulation but unusual operations like kronecker products or flattening where their semantics are not very clear. It is probably smarter to sparsify some entries or constraint its structures.)

...

On 24 Jul BE 2566, at 08:18, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

Que sera sera

...
On 24 Jul BE 2566, at 08:16, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

Tensor brains are not robustly trained. Ones can simply mess up with their condition numbers without typical memory address attacks as in hackers.

In these ways, we can interact without adding another mode :)

...
...
...
On 24 Jul BE 2566, at 08:08, Albretch Mueller via Corpora corpora@list.elra.info wrote:

On 7/23/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote: It’s simply something called “distributional semantics” if you want to know.

https://en.wikipedia.org/wiki/Distributional_semantics

I know about such thing, but, quite honestly, I understand neither its mathematical grounding nor the possible reach of its pragmatism. Protagonism aside, I have had personal experiences with such things which I found abysmally stupid. I took two poems I wrote, one in German when I was in my early 20's and one in English later in life:

https://hsymbolicus.wordpress.com/category/gedichter/ (Pyramiden)

https://hsymbolicus.wordpress.com/category/poems/ (lies ...)

The reaction of ChatGPT was hilarious (so cunningly weird that I thought "the government" was once again messing with me). It couldn't even identify the language of the German poem (which I forced a bit for poetic purposes and "hey! it might have even 'learned' its share by now, no?" ;-)). It could not detect that both poems were written by the same writer and what kind of person would author such poems (it was just "stochastically parroting" "lies ..." stanzas). Someone who told me he worked on Latent Semantic Analysis said to me "my poem had broken his algorithm". I had shared my poems. I don't know what he meant.

I can certainly share the kinds of anxieties I had in mind when I wrote "lies ...", but I would love to hear first what Frau "Distributional Semantics", Herr "Latent Semantic Analysis", ... have to say about them.

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Darren Cook

12:38 p.m.

...

"mathematical purity" ... how you can use vector/tensor algebra with texts

I'd suggest using the search word "embeddings" instead of "tensor".

The concept is being used in other fields, even physics, but (sticking with linguistics) if you've not looked into Word2Vec yet that is a good place to appreciate how human language and linear algebra come together.

It is normally introduced as a ready-made model of dim 300, trained on millions of words. Like you I wanted to understand what it was actually doing, so a few years ago I did a presentation using just two dimensions and a handful of words and sentences, then plotting the embeddings found for each word. You can add or remove a sentence at a time to see what it is learning from each.

You can see how each dimension is being given some meaning, even if they are not the way a human linguist would have structured it.

It is also a good test bed for finding the limits, such as playing around with ambiguous words and proper nouns, increasing the amount of training data without increasing dimension, etc.

Darren

P.S. The embedding layer is the first layer in transformers, the layer where tokens ("words") are turned into numbers, typically of dim 512 or higher. But note that they are randomly generated, not initialized from word2vec or similar. And any modification to their initial randomness is to please the layers above, not humans trying to peer inside the box.

P.P.S. I think you might also enjoy https://transformer-circuits.pub/2021/framework/index.html which is exploring how transformers work at a very low-level.

The gap between their minimalist models and something like ChatGPT is huge, though, and reading their work isn't going to help you appreciate why ChatGPT says stupid things to you.

andrea.nini＠manchester.ac.uk

3:31 p.m.

I think there might be some kind of terminological confusion where the same word, Tensor, is used in physics and in machine learning with two different but related meanings. See: https://en.wikipedia.org/wiki/Tensor_(machine_learning)

Albretch Mueller

25 Jul 25 Jul

1:12 a.m.

On 7/24/23, Andrea Nini via Corpora corpora@list.elra.info wrote:

...

... See: https://en.wikipedia.org/wiki/Tensor_(machine_learning)

Oh! Am I silly! ;-) That is why I was noticing a really strident impedance between what they were saying and what we, Mathematicians, mean by, have been taught to understand as:

https://en.wikipedia.org/wiki/Tensor

I was fancying self-describing decentralized hyper-forests of text segments out of which a Language's grammar could be derived ... and based on such totally off the mark, fanciful ideations I was trying to somehow figure out how to describe the inner intersubjective aspects of valuation through tensor planes ... there I went. ~ On 7/24/23, Darren Cook darren@dcook.org wrote:

...

...
Perhaps my doubts relate to the fact that as a theoretical physicist myself, the kind of "mathematical purity" I was trained into...

By the way, this is probably veering off-topic for corpora-l.

datascience.stackexchange.com is quite a good place for questions about transformers, embeddings, NLP, etc.

As a TI I can't use stackoverflow, stackexchange ... (they start road blocking you in really obnoxious ways) I can't even visit public libraries in "'the' 'land' of 'the' free ...", "because" they blacklisted me in the FBI criminal index (believe me, you would laugh about it if you could if you knew me)

lbrtchx

Peratham Wiriyathammabhum

10:40 a.m.

Not talking to any medical doctors for another sense :) From WordNet (r) 3.0 (2006) [wn]:

tensor n 1: a generalization of the concept of a vector 2: any of several muscles that cause an attached structure to become tense or firm

...

On 25 Jul BE 2566, at 06:13, Albretch Mueller via Corpora corpora@list.elra.info wrote:

On 7/24/23, Andrea Nini via Corpora corpora@list.elra.info wrote:

...
... See: https://en.wikipedia.org/wiki/Tensor_(machine_learning)

Oh! Am I silly! ;-) That is why I was noticing a really strident impedance between what they were saying and what we, Mathematicians, mean by, have been taught to understand as:

https://en.wikipedia.org/wiki/Tensor

I was fancying self-describing decentralized hyper-forests of text segments out of which a Language's grammar could be derived ... and based on such totally off the mark, fanciful ideations I was trying to somehow figure out how to describe the inner intersubjective aspects of valuation through tensor planes ... there I went. ~ On 7/24/23, Darren Cook darren@dcook.org wrote:

...
...
Perhaps my doubts relate to the fact that as a theoretical physicist myself, the kind of "mathematical purity" I was trained into...

By the way, this is probably veering off-topic for corpora-l.

datascience.stackexchange.com is quite a good place for questions about transformers, embeddings, NLP, etc.

As a TI I can't use stackoverflow, stackexchange ... (they start road blocking you in really obnoxious ways) I can't even visit public libraries in "'the' 'land' of 'the' free ...", "because" they blacklisted me in the FBI criminal index (believe me, you would laugh about it if you could if you knew me)

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Ada Wan

5:27 p.m.

Dear lbrtchx

Yes, indeed, it is possible for a string (or an expression or a lexical item... etc.) to refer to different things based on different contexts. One could refer to it as polysemy (or not). Many fields have shared vocabulary items. Same character or character strings can be used in ways that show differences "in nature"/"by definition" (i.e. different due to discipline-specific, historical reasons) or differences in practice (which could be more general/generalized). Esp. in an engineering field nowadays, a term used for/in practice is likely to gradually take over the one favored historically over time.

Then again, Is your inquiry more about vocabulary use, or for what reason are you asking your question(s)?

Best Ada

On Tue, Jul 25, 2023 at 10:40 AM Peratham Wiriyathammabhum via Corpora < corpora@list.elra.info> wrote:

...

Not talking to any medical doctors for another sense :)

From WordNet (r) 3.0 (2006) [wn]:

tensor n 1: a generalization of the concept of a vector 2: any of several muscles that cause an attached structure to become tense or firm

On 25 Jul BE 2566, at 06:13, Albretch Mueller via Corpora < corpora@list.elra.info> wrote:

On 7/24/23, Andrea Nini via Corpora corpora@list.elra.info wrote:

... See:

https://en.wikipedia.org/wiki/Tensor_(machine_learning)

Oh! Am I silly! ;-) That is why I was noticing a really strident impedance between what they were saying and what we, Mathematicians, mean by, have been taught to understand as:

https://en.wikipedia.org/wiki/Tensor

I was fancying self-describing decentralized hyper-forests of text segments out of which a Language's grammar could be derived ... and based on such totally off the mark, fanciful ideations I was trying to somehow figure out how to describe the inner intersubjective aspects of valuation through tensor planes ... there I went. ~ On 7/24/23, Darren Cook darren@dcook.org wrote:

Perhaps my doubts relate to the fact that as a theoretical physicist

myself, the kind of "mathematical purity" I was trained into...

By the way, this is probably veering off-topic for corpora-l.

datascience.stackexchange.com is quite a good place for questions about

transformers, embeddings, NLP, etc.

As a TI I can't use stackoverflow, stackexchange ... (they start road blocking you in really obnoxious ways) I can't even visit public libraries in "'the' 'land' of 'the' free ...", "because" they blacklisted me in the FBI criminal index (believe me, you would laugh about it if you could if you knew me)

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Rodolfo Delmonte

6:01 p.m.

In fact tensors should capture ideally both paradigmatic and synthagmatic properties of a word in a sentence given the fact that they are usually made up of matrices, that is at least couples of vectors where the rows are represented by embeddings. The question is do matrices represent all needed semantic and syntactic properties of a sentence? I doubt it and in fact when it comes to deep implicit content they certainly fail. But also with OOVWs or simply rare words no reasonable outcome is obtained. Rodolfo

Il mar 25 lug 2023, 17:28 Ada Wan via Corpora corpora@list.elra.info ha scritto:

...

Dear lbrtchx

Yes, indeed, it is possible for a string (or an expression or a lexical item... etc.) to refer to different things based on different contexts. One could refer to it as polysemy (or not). Many fields have shared vocabulary items. Same character or character strings can be used in ways that show differences "in nature"/"by definition" (i.e. different due to discipline-specific, historical reasons) or differences in practice (which could be more general/generalized). Esp. in an engineering field nowadays, a term used for/in practice is likely to gradually take over the one favored historically over time.

Then again, Is your inquiry more about vocabulary use, or for what reason are you asking your question(s)?

Best Ada

On Tue, Jul 25, 2023 at 10:40 AM Peratham Wiriyathammabhum via Corpora < corpora@list.elra.info> wrote:

...
Not talking to any medical doctors for another sense :)

From WordNet (r) 3.0 (2006) [wn]:

tensor n 1: a generalization of the concept of a vector 2: any of several muscles that cause an attached structure to become tense or firm

On 25 Jul BE 2566, at 06:13, Albretch Mueller via Corpora < corpora@list.elra.info> wrote:

On 7/24/23, Andrea Nini via Corpora corpora@list.elra.info wrote:

... See:

https://en.wikipedia.org/wiki/Tensor_(machine_learning)

Oh! Am I silly! ;-) That is why I was noticing a really strident impedance between what they were saying and what we, Mathematicians, mean by, have been taught to understand as:

https://en.wikipedia.org/wiki/Tensor

I was fancying self-describing decentralized hyper-forests of text segments out of which a Language's grammar could be derived ... and based on such totally off the mark, fanciful ideations I was trying to somehow figure out how to describe the inner intersubjective aspects of valuation through tensor planes ... there I went. ~ On 7/24/23, Darren Cook darren@dcook.org wrote:

Perhaps my doubts relate to the fact that as a theoretical physicist

myself, the kind of "mathematical purity" I was trained into...

By the way, this is probably veering off-topic for corpora-l.

datascience.stackexchange.com is quite a good place for questions about

transformers, embeddings, NLP, etc.

As a TI I can't use stackoverflow, stackexchange ... (they start road blocking you in really obnoxious ways) I can't even visit public libraries in "'the' 'land' of 'the' free ...", "because" they blacklisted me in the FBI criminal index (believe me, you would laugh about it if you could if you knew me)

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

-- Nota automatica aggiunta dal sistema di posta *Sostieni il futuro* Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271*

Peratham Wiriyathammabhum

6:46 p.m.

Rodolfo Delmonte

6:56 p.m.

My pleasure! RD

Il mar 25 lug 2023, 18:46 Peratham Wiriyathammabhum peratham.bkk@gmail.com ha scritto:

...

A pleasure to me to be cc’ed by Prof. Delmonte.

On 25 Jul BE 2566, at 23:01, Rodolfo Delmonte delmont@unive.it wrote:

In fact tensors should capture ideally both paradigmatic and synthagmatic properties of a word in a sentence given the fact that they are usually made up of matrices, that is at least couples of vectors where the rows are represented by embeddings. The question is do matrices represent all needed semantic and syntactic properties of a sentence? I doubt it and in fact when it comes to deep implicit content they certainly fail. But also with OOVWs or simply rare words no reasonable outcome is obtained. Rodolfo

Il mar 25 lug 2023, 17:28 Ada Wan via Corpora corpora@list.elra.info ha scritto:

...
Dear lbrtchx

Yes, indeed, it is possible for a string (or an expression or a lexical item... etc.) to refer to different things based on different contexts. One could refer to it as polysemy (or not). Many fields have shared vocabulary items. Same character or character strings can be used in ways that show differences "in nature"/"by definition" (i.e. different due to discipline-specific, historical reasons) or differences in practice (which could be more general/generalized). Esp. in an engineering field nowadays, a term used for/in practice is likely to gradually take over the one favored historically over time.

Then again, Is your inquiry more about vocabulary use, or for what reason are you asking your question(s)?

Best Ada

On Tue, Jul 25, 2023 at 10:40 AM Peratham Wiriyathammabhum via Corpora < corpora@list.elra.info> wrote:

...
Not talking to any medical doctors for another sense :)

From WordNet (r) 3.0 (2006) [wn]:

tensor n 1: a generalization of the concept of a vector 2: any of several muscles that cause an attached structure to become tense or firm

On 25 Jul BE 2566, at 06:13, Albretch Mueller via Corpora < corpora@list.elra.info> wrote:

On 7/24/23, Andrea Nini via Corpora corpora@list.elra.info wrote:

... See:

https://en.wikipedia.org/wiki/Tensor_(machine_learning)

Oh! Am I silly! ;-) That is why I was noticing a really strident impedance between what they were saying and what we, Mathematicians, mean by, have been taught to understand as:

https://en.wikipedia.org/wiki/Tensor

I was fancying self-describing decentralized hyper-forests of text segments out of which a Language's grammar could be derived ... and based on such totally off the mark, fanciful ideations I was trying to somehow figure out how to describe the inner intersubjective aspects of valuation through tensor planes ... there I went. ~ On 7/24/23, Darren Cook darren@dcook.org wrote:

Perhaps my doubts relate to the fact that as a theoretical physicist

myself, the kind of "mathematical purity" I was trained into...

By the way, this is probably veering off-topic for corpora-l.

datascience.stackexchange.com is quite a good place for questions about

transformers, embeddings, NLP, etc.

As a TI I can't use stackoverflow, stackexchange ... (they start road blocking you in really obnoxious ways) I can't even visit public libraries in "'the' 'land' of 'the' free ...", "because" they blacklisted me in the FBI criminal index (believe me, you would laugh about it if you could if you knew me)

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Nota automatica aggiunta dal sistema di posta

*Sostieni il futuro* Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271*

Ada Wan

8:16 p.m.

@Rodolfo: I think it is imperative to point out that it is one thing as to how tensors and matrices can be applied to compute certain (statistical) values, and it is possible that sometimes certain statistical values/patterns correspond to certain values/patterns in data (in this case, text data); but it is another thing to claim that "tensors should capture ... both paradigmatic and syntagmatic properties of a word in a sentence..." etc. [1] without addressing what statistical patterns could/would be sufficient to describe the paradigmatic/syntagmatic properties (of anything defined). As "words" are often undefined (and undefinable computationally), one should be careful with jumping to this kind of interpretations/conclusions which then often only speaks to a particular way of how data is segmented and processed, and/or to a specific dataset. There has been, unfortunately, a tradition of over-generalizing in this regard in Computational Linguistics and NLP (and in some other computational sciences too, I reckon). Re "[t]he question is do matrices represent all needed semantic and syntactic properties of a sentence?": it depends on your data (and one's interpretation thereof) --- even if/when there is a "sentence" to speak of. [1] again, note that "word" should not be underspecified here

@Peratham: Re "[w]e could continue this conversation or definition and pursue another topic of how to define these symbolic/scientific/computational systems": or one could just look at the numerical values, and try things out with some carefully controlled experiments (but I doubt one would achieve much with "words", at most one gets is whatever that is similar to the shape of a "word"). I remain open-minded though as to what one can achieve with characters in the computational setting!

On Tue, Jul 25, 2023 at 6:56 PM Rodolfo Delmonte delmont@unive.it wrote:

...

My pleasure! RD

Il mar 25 lug 2023, 18:46 Peratham Wiriyathammabhum < peratham.bkk@gmail.com> ha scritto:

...
A pleasure to me to be cc’ed by Prof. Delmonte.

On 25 Jul BE 2566, at 23:01, Rodolfo Delmonte delmont@unive.it wrote:

In fact tensors should capture ideally both paradigmatic and synthagmatic properties of a word in a sentence given the fact that they are usually made up of matrices, that is at least couples of vectors where the rows are represented by embeddings. The question is do matrices represent all needed semantic and syntactic properties of a sentence? I doubt it and in fact when it comes to deep implicit content they certainly fail. But also with OOVWs or simply rare words no reasonable outcome is obtained. Rodolfo

Il mar 25 lug 2023, 17:28 Ada Wan via Corpora corpora@list.elra.info ha scritto:

...
Dear lbrtchx

Yes, indeed, it is possible for a string (or an expression or a lexical item... etc.) to refer to different things based on different contexts. One could refer to it as polysemy (or not). Many fields have shared vocabulary items. Same character or character strings can be used in ways that show differences "in nature"/"by definition" (i.e. different due to discipline-specific, historical reasons) or differences in practice (which could be more general/generalized). Esp. in an engineering field nowadays, a term used for/in practice is likely to gradually take over the one favored historically over time.

Then again, Is your inquiry more about vocabulary use, or for what reason are you asking your question(s)?

Best Ada

On Tue, Jul 25, 2023 at 10:40 AM Peratham Wiriyathammabhum via Corpora < corpora@list.elra.info> wrote:

...
Not talking to any medical doctors for another sense :)

From WordNet (r) 3.0 (2006) [wn]:

tensor n 1: a generalization of the concept of a vector 2: any of several muscles that cause an attached structure to become tense or firm

On 25 Jul BE 2566, at 06:13, Albretch Mueller via Corpora < corpora@list.elra.info> wrote:

On 7/24/23, Andrea Nini via Corpora corpora@list.elra.info wrote:

... See:

https://en.wikipedia.org/wiki/Tensor_(machine_learning)

Oh! Am I silly! ;-) That is why I was noticing a really strident impedance between what they were saying and what we, Mathematicians, mean by, have been taught to understand as:

https://en.wikipedia.org/wiki/Tensor

I was fancying self-describing decentralized hyper-forests of text segments out of which a Language's grammar could be derived ... and based on such totally off the mark, fanciful ideations I was trying to somehow figure out how to describe the inner intersubjective aspects of valuation through tensor planes ... there I went. ~ On 7/24/23, Darren Cook darren@dcook.org wrote:

Perhaps my doubts relate to the fact that as a theoretical physicist

myself, the kind of "mathematical purity" I was trained into...

By the way, this is probably veering off-topic for corpora-l.

datascience.stackexchange.com is quite a good place for questions about

transformers, embeddings, NLP, etc.

As a TI I can't use stackoverflow, stackexchange ... (they start road blocking you in really obnoxious ways) I can't even visit public libraries in "'the' 'land' of 'the' free ...", "because" they blacklisted me in the FBI criminal index (believe me, you would laugh about it if you could if you knew me)

lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Nota automatica aggiunta dal sistema di posta

*Sostieni il futuro* Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271*

Nota automatica aggiunta dal sistema di posta

*Sostieni il futuro* Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271*

Albretch Mueller

8:56 p.m.

On 7/24/23, Paula .* wrote

...

They (Pytorch et Tensorflow authors) generally would not discuss the tensor operations used to implement the NNs.

adawan919@gmail.com Tue, Jul 25, 2023 at 3:27 PM:

...

Yes, indeed, it is possible for a string (or an expression or a lexical item... etc.) to refer to different things based on different contexts ...

Rodolfo Delmonte via Corporacorpora@list.elra.info Tue, Jul 25, 2023 at 4:01 PM

...

In fact tensors should capture ideally both paradigmatic and synthagmatic properties of a word in a sentence given the fact that they are usually made up of matrices, that is at least couples of vectors where the rows are represented by ...

At the risk of being considered a "purist", an "elitist" and if I am following your comments (making sense of them somewhat hopefully), tensors and matrices are definitely more than a visual table-like arrangement; which is also the case of vectors, vector operations, vector space, points in a space, distance between two points (as a zero order invariant tensor) ...

Take a bunch of texts and first show to me how do you define "space", then "vector", ... in a thoroughgoing "character-by-character" way. For example, how could you then use vector addition parallelograms to explain paraphrasing and go about summarizations in a corpus ...

Such concepts have been very profitably cultured for millennia by generations after generations of Mathematicians and empirical scientists to very precisely land rovers on the moon and to coordinate the work of the robots they use to make transistors.

If those Pytorch et Tensorflow yahoos (behaving more like politicians and magicians than true to matters tech monkeys) would not even show what they mean how are you so sure about what you mean when you speak of "tensors", "vectors", ...

How does the concept of vector in a space translates to whatever you mean by "vectors" in a text bank/corpus. What would be its magnitude and direction? How would you calculate a dot product between two vectors? ...

Here is a very basic introduction to what a dot product and a tensor mean:

// __ Tensors for Beginners 9: The Metric Tensor

https://www.youtube.com/watch?v=C76lWSOTqnc ~ lbrtchx

Albretch Mueller

9 p.m.

On 7/25/23, Albretch Mueller lbrtchx@gmail.com wrote:

...

Take a bunch of texts and first show to me how do you define "space", then "vector", ... in a thoroughgoing "character-by-character" way. For example, how could you then use vector addition parallelograms to explain paraphrasing and go about summarizations in a corpus ...

You were telling me about "books". None of the ones I have checked even attempts at explaining any of it.

lbrtch

Ada Wan

9:12 p.m.

A brief reply: may I invite you to take a look at my work from recent years, e.g. Fairness in Representation and Representation and Bias (all versions are linked here: https://sites.google.com/view/adawan)? There may be a lot to abstract from my findings. But I can imagine your obtaining some insights from them, which may indirectly answer some of your questions.

On Tue, Jul 25, 2023 at 9:01 PM Albretch Mueller via Corpora < corpora@list.elra.info> wrote:

...

On 7/25/23, Albretch Mueller lbrtchx@gmail.com wrote:

...
Take a bunch of texts and first show to me how do you define "space", then "vector", ... in a thoroughgoing "character-by-character" way. For example, how could you then use vector addition parallelograms to explain paraphrasing and go about summarizations in a corpus ...

You were telling me about "books". None of the ones I have checked even attempts at explaining any of it.

lbrtch _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Peratham Wiriyathammabhum

26 Jul 26 Jul

2:17 a.m.

A dot product is undefined in many tensor decompositions.

Well. And tensor methods do not protect lots of people living under illegal and crime circumstances. This is probably off-topic but it is possible for many people to be not protected by laws and polices. As you may know.

...

On 26 Jul BE 2566, at 01:56, Albretch Mueller via Corpora corpora@list.elra.info wrote:

On 7/24/23, Paula .* wrote

...
They (Pytorch et Tensorflow authors) generally would not discuss the tensor operations used to implement the NNs.

adawan919@gmail.com Tue, Jul 25, 2023 at 3:27 PM:

...
Yes, indeed, it is possible for a string (or an expression or a lexical item... etc.) to refer to different things based on different contexts ...

Rodolfo Delmonte via Corporacorpora@list.elra.info Tue, Jul 25, 2023 at 4:01 PM

...
In fact tensors should capture ideally both paradigmatic and synthagmatic properties of a word in a sentence given the fact that they are usually made up of matrices, that is at least couples of vectors where the rows are represented by ...

At the risk of being considered a "purist", an "elitist" and if I am following your comments (making sense of them somewhat hopefully), tensors and matrices are definitely more than a visual table-like arrangement; which is also the case of vectors, vector operations, vector space, points in a space, distance between two points (as a zero order invariant tensor) ...

Take a bunch of texts and first show to me how do you define "space", then "vector", ... in a thoroughgoing "character-by-character" way. For example, how could you then use vector addition parallelograms to explain paraphrasing and go about summarizations in a corpus ...

Such concepts have been very profitably cultured for millennia by generations after generations of Mathematicians and empirical scientists to very precisely land rovers on the moon and to coordinate the work of the robots they use to make transistors.

If those Pytorch et Tensorflow yahoos (behaving more like politicians and magicians than true to matters tech monkeys) would not even show what they mean how are you so sure about what you mean when you speak of "tensors", "vectors", ...

How does the concept of vector in a space translates to whatever you mean by "vectors" in a text bank/corpus. What would be its magnitude and direction? How would you calculate a dot product between two vectors? ...

Here is a very basic introduction to what a dot product and a tensor mean:

// __ Tensors for Beginners 9: The Metric Tensor

https://www.youtube.com/watch?v=C76lWSOTqnc ~ lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Peratham Wiriyathammabhum

12:52 a.m.

Albretch Mueller

4 a.m.

On 7/25/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

...

Luckily, words are often relational. Nice having some dialogue with you.

characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational and, once again, being "relational" has a measurably tractable meaning brought about by the dot product in a vector space ;-)

Other people stumbling onto this thread will certainly notice the context in which it was framed.

lbrtchx

Ada Wan

5:15 p.m.

Dear all The primary reason I got onto this thread has to do with what I sensed might be an attempt to promote certain methodology, one that direly needs some re-evaluation, much like many other in the space of language and computing (and/or CL/NLP, digital humanities... etc.). I know that many practitioners in this space have computed using "words" as a representation and therefore might have had many hypotheses as to what kinds of textual relations is to behave how in the vector space etc., many might even have related grammatical relations to certain spatial relations --- but what is one to make of e.g. different grammatical relations having the same statistical representations, or different statistical representations having the same grammatical relations? And as any trained linguists could inform one honestly, there is really no "grammar". There are no "grammatical relations" that are "intrinsic" to language.

@Peratham: many of the statements that you made don't really make sense or lack clarity, if you think about them, e.g. "[t]ensor arrays are just ER diagrams most of the time" --- this depends on the data and how it is being represented. (I assume "ER" here refers to "entity relationship".) Re "I don’t feel them as a very powerful framework for every system.": the matter is not about having "a very powerful framework for every system" but to understand the limit (and the lack and irrelevance) of "words" (esp. in computing). Re "And tensor methods do not protect lots of people living under illegal and crime circumstances. This is probably off-topic but it is possible for many people to be not protected by laws and polices. As you may know.": I don't understand this statement of yours. Would you please clarify?

@Ibrtchx: Re "characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational" --- I agree, except for the inclusion of "words" and "sentences" as these are, at least, obsolete, unreliable, and non-universal. We can do better in this regard. Anything we examine can be relational, assuming we have established or understood the connection. But note that the connection may be in us, instead. Re "being 'relational' has a measurably tractable meaning brought about by the dot product in a vector space ;-)": this depends.

On Wed, Jul 26, 2023 at 4:00 AM Albretch Mueller lbrtchx@gmail.com wrote:

...

On 7/25/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

...
Luckily, words are often relational. Nice having some dialogue with you.

characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational and, once again, being "relational" has a measurably tractable meaning brought about by the dot product in a vector space ;-)

Other people stumbling onto this thread will certainly notice the context in which it was framed.

lbrtchx

Ada Wan

5:39 p.m.

Re there is no grammar: this has been a perennial issue in CL/NLP. Different people have grown up with different relations to language. Some take some habits more seriously than others. Some give some habits more value/authority/status than they deserve, so they become rules. And some obey rules more than others, so rules get internalized and one ends up believing that there is something "magical" about language. (Some also try to exploit rules enough to make others' lives miserable --- this is how language/grammar can be used as a weapon [language attitudes]!). But all in all, we just communicate in whichever way we end up doing so.

On Wed, Jul 26, 2023 at 5:15 PM Ada Wan adawan919@gmail.com wrote:

...

Dear all The primary reason I got onto this thread has to do with what I sensed might be an attempt to promote certain methodology, one that direly needs some re-evaluation, much like many other in the space of language and computing (and/or CL/NLP, digital humanities... etc.). I know that many practitioners in this space have computed using "words" as a representation and therefore might have had many hypotheses as to what kinds of textual relations is to behave how in the vector space etc., many might even have related grammatical relations to certain spatial relations --- but what is one to make of e.g. different grammatical relations having the same statistical representations, or different statistical representations having the same grammatical relations? And as any trained linguists could inform one honestly, there is really no "grammar". There are no "grammatical relations" that are "intrinsic" to language.

@Peratham: many of the statements that you made don't really make sense or lack clarity, if you think about them, e.g. "[t]ensor arrays are just ER diagrams most of the time" --- this depends on the data and how it is being represented. (I assume "ER" here refers to "entity relationship".) Re "I don’t feel them as a very powerful framework for every system.": the matter is not about having "a very powerful framework for every system" but to understand the limit (and the lack and irrelevance) of "words" (esp. in computing). Re "And tensor methods do not protect lots of people living under illegal and crime circumstances. This is probably off-topic but it is possible for many people to be not protected by laws and polices. As you may know.": I don't understand this statement of yours. Would you please clarify?

@Ibrtchx: Re "characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational" --- I agree, except for the inclusion of "words" and "sentences" as these are, at least, obsolete, unreliable, and non-universal. We can do better in this regard. Anything we examine can be relational, assuming we have established or understood the connection. But note that the connection may be in us, instead. Re "being 'relational' has a measurably tractable meaning brought about by the dot product in a vector space ;-)": this depends.

On Wed, Jul 26, 2023 at 4:00 AM Albretch Mueller lbrtchx@gmail.com wrote:

...
On 7/25/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

...
Luckily, words are often relational. Nice having some dialogue with you.

characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational and, once again, being "relational" has a measurably tractable meaning brought about by the dot product in a vector space ;-)

Other people stumbling onto this thread will certainly notice the context in which it was framed.

lbrtchx

Arturo Montejo-Ráez

5:59 p.m.

If Ada allows me, I will take this as an ending quote in every email I wrote. :-D

[image: Universidad de Jaén] https://www.ujaen.es/ Arturo Montejo Ráez Profesor Titular de Universidad | Associated Professor (Tenured) amontejo@ujaen.es

Universidad de Jaén Departamento de Informática, A3-114 Las Lagunillas s/n, 23071 - Jaén (Spain) +34 953 212 882 https://www.ujaen.es/servicios/sinformatica/sites/servicio_sinformatica/files/piefirmacorreo4/index.html ORCID: http://orcid.org/0000-0002-8643-2714 Researcher ID: D-3387-2009 SINAI Research Group https://sinai.ujaen.es

[image: Universidad de Jaén] https://www.ujaen.es/ *Antes de imprimir este mensaje, piense si es necesario. Proteger el medio ambiente es cosa de todos.* *** CLÁUSULA DE CONFIDENCIALIDAD *** Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción.

This message is intended exclusively for its recipient and may contain information that is CONFIDENTIAL. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received by mistake, please let us know immediately via e-mail and delete it.

El mié, 26 jul 2023 a las 17:39, Ada Wan via Corpora (< corpora@list.elra.info>) escribió:

...

Re there is no grammar: this has been a perennial issue in CL/NLP. Different people have grown up with different relations to language. Some take some habits more seriously than others. Some give some habits more value/authority/status than they deserve, so they become rules. And some obey rules more than others, so rules get internalized and one ends up believing that there is something "magical" about language. (Some also try to exploit rules enough to make others' lives miserable --- this is how language/grammar can be used as a weapon [language attitudes]!). But all in all, we just communicate in whichever way we end up doing so.

On Wed, Jul 26, 2023 at 5:15 PM Ada Wan adawan919@gmail.com wrote:

...
Dear all The primary reason I got onto this thread has to do with what I sensed might be an attempt to promote certain methodology, one that direly needs some re-evaluation, much like many other in the space of language and computing (and/or CL/NLP, digital humanities... etc.). I know that many practitioners in this space have computed using "words" as a representation and therefore might have had many hypotheses as to what kinds of textual relations is to behave how in the vector space etc., many might even have related grammatical relations to certain spatial relations --- but what is one to make of e.g. different grammatical relations having the same statistical representations, or different statistical representations having the same grammatical relations? And as any trained linguists could inform one honestly, there is really no "grammar". There are no "grammatical relations" that are "intrinsic" to language.

@Peratham: many of the statements that you made don't really make sense or lack clarity, if you think about them, e.g. "[t]ensor arrays are just ER diagrams most of the time" --- this depends on the data and how it is being represented. (I assume "ER" here refers to "entity relationship".) Re "I don’t feel them as a very powerful framework for every system.": the matter is not about having "a very powerful framework for every system" but to understand the limit (and the lack and irrelevance) of "words" (esp. in computing). Re "And tensor methods do not protect lots of people living under illegal and crime circumstances. This is probably off-topic but it is possible for many people to be not protected by laws and polices. As you may know.": I don't understand this statement of yours. Would you please clarify?

@Ibrtchx: Re "characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational" --- I agree, except for the inclusion of "words" and "sentences" as these are, at least, obsolete, unreliable, and non-universal. We can do better in this regard. Anything we examine can be relational, assuming we have established or understood the connection. But note that the connection may be in us, instead. Re "being 'relational' has a measurably tractable meaning brought about by the dot product in a vector space ;-)": this depends.

On Wed, Jul 26, 2023 at 4:00 AM Albretch Mueller lbrtchx@gmail.com wrote:

...
On 7/25/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

...
Luckily, words are often relational. Nice having some dialogue with

you.

characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational and, once again, being "relational" has a measurably tractable meaning brought about by the dot product in a vector space ;-)

Other people stumbling onto this thread will certainly notice the context in which it was framed.

lbrtchx

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Ada Wan

6:01 p.m.

:-D

On Wed, Jul 26, 2023 at 5:59 PM Arturo Montejo-Ráez amontejo@ujaen.es wrote:

...

If Ada allows me, I will take this as an ending quote in every email I wrote. :-D

[image: Universidad de Jaén] https://www.ujaen.es/ Arturo Montejo Ráez Profesor Titular de Universidad | Associated Professor (Tenured) amontejo@ujaen.es

Universidad de Jaén Departamento de Informática, A3-114 Las Lagunillas s/n, 23071 - Jaén (Spain) +34 953 212 882 https://www.ujaen.es/servicios/sinformatica/sites/servicio_sinformatica/files/piefirmacorreo4/index.html ORCID: http://orcid.org/0000-0002-8643-2714 Researcher ID: D-3387-2009 SINAI Research Group https://sinai.ujaen.es

[image: Universidad de Jaén] https://www.ujaen.es/ *Antes de imprimir este mensaje, piense si es necesario. Proteger el medio ambiente es cosa de todos.* *** CLÁUSULA DE CONFIDENCIALIDAD *** Este mensaje se dirige exclusivamente a su destinatario y puede contener información privilegiada o confidencial. Si no es Ud. el destinatario indicado, queda notificado de que la utilización, divulgación o copia sin autorización está prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, se ruega lo comunique inmediatamente por esta misma vía y proceda a su destrucción.

This message is intended exclusively for its recipient and may contain information that is CONFIDENTIAL. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received by mistake, please let us know immediately via e-mail and delete it.

El mié, 26 jul 2023 a las 17:39, Ada Wan via Corpora (< corpora@list.elra.info>) escribió:

...
Re there is no grammar: this has been a perennial issue in CL/NLP. Different people have grown up with different relations to language. Some take some habits more seriously than others. Some give some habits more value/authority/status than they deserve, so they become rules. And some obey rules more than others, so rules get internalized and one ends up believing that there is something "magical" about language. (Some also try to exploit rules enough to make others' lives miserable --- this is how language/grammar can be used as a weapon [language attitudes]!). But all in all, we just communicate in whichever way we end up doing so.

On Wed, Jul 26, 2023 at 5:15 PM Ada Wan adawan919@gmail.com wrote:

...
Dear all The primary reason I got onto this thread has to do with what I sensed might be an attempt to promote certain methodology, one that direly needs some re-evaluation, much like many other in the space of language and computing (and/or CL/NLP, digital humanities... etc.). I know that many practitioners in this space have computed using "words" as a representation and therefore might have had many hypotheses as to what kinds of textual relations is to behave how in the vector space etc., many might even have related grammatical relations to certain spatial relations --- but what is one to make of e.g. different grammatical relations having the same statistical representations, or different statistical representations having the same grammatical relations? And as any trained linguists could inform one honestly, there is really no "grammar". There are no "grammatical relations" that are "intrinsic" to language.

@Peratham: many of the statements that you made don't really make sense or lack clarity, if you think about them, e.g. "[t]ensor arrays are just ER diagrams most of the time" --- this depends on the data and how it is being represented. (I assume "ER" here refers to "entity relationship".) Re "I don’t feel them as a very powerful framework for every system.": the matter is not about having "a very powerful framework for every system" but to understand the limit (and the lack and irrelevance) of "words" (esp. in computing). Re "And tensor methods do not protect lots of people living under illegal and crime circumstances. This is probably off-topic but it is possible for many people to be not protected by laws and polices. As you may know.": I don't understand this statement of yours. Would you please clarify?

@Ibrtchx: Re "characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational" --- I agree, except for the inclusion of "words" and "sentences" as these are, at least, obsolete, unreliable, and non-universal. We can do better in this regard. Anything we examine can be relational, assuming we have established or understood the connection. But note that the connection may be in us, instead. Re "being 'relational' has a measurably tractable meaning brought about by the dot product in a vector space ;-)": this depends.

On Wed, Jul 26, 2023 at 4:00 AM Albretch Mueller lbrtchx@gmail.com wrote:

...
On 7/25/23, Peratham Wiriyathammabhum peratham.bkk@gmail.com wrote:

...
Luckily, words are often relational. Nice having some dialogue with

you.

characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational and, once again, being "relational" has a measurably tractable meaning brought about by the dot product in a vector space ;-)

Other people stumbling onto this thread will certainly notice the context in which it was framed.

lbrtchx

Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Albretch Mueller

27 Jul 27 Jul

5:01 a.m.

On 7/26/23, Ada Wan adawan919@gmail.com wrote:

...

Re there is no grammar: this has been a perennial issue in CL/NLP.

What do you mean when you say "there is no grammar"? Do you mean in those kinds of "tensors" they use or factually, in general "in reality"? ~

...

Different people have grown up with different relations to language.

I would have said: "different people have grown giving more or less importance to different aspects of their life about which they talk to one another in particular ways ..."

Most importantly, people do change their views towards what they had believed in some era's to be their very essence! ~

...

Some take some habits more seriously than others. Some give some habits more value/authority/status than they deserve, so they become rules. And some obey rules more than others, so rules get internalized and one ends up believing that there is something "magical" about language.

Yes, there definitely is. Hegel explained it to us ;-)

https://en.wikipedia.org/wiki/Zeitgeist

https://en.wikipedia.org/wiki/The_Phenomenology_of_Spirit

https://plato.stanford.edu/entries/hegel-dialectics/

https://www.dw.com/en/hegel-the-philosopher-who-viewed-history-as-inevitable... ~

...

(Some also try to exploit rules enough to make others' lives miserable --- this is how language/grammar can be used as a weapon [language attitudes]!).

Thank to Hollywood, when most people think of "'the' land of 'the' 'free' ..." they think of the Apollo landing, their skyscrapers, computers, Beyoncé's rear end, ... to me the most amazing thing I have learned in the U.S. (even if cosmically hopeless) is that thing of using lies not just as "tools", but industries! What the preposterous eff does USG care about "freedom", "democracy", "the rule of law", ...?!?

I lived in NYC for 25 years and, I don't watch TV I don't even own a TV set, but you couldn't live there without being aware of the Seinfeld show (penned by Seth Meyers) one of the memes they repeated was: "remember, it is not a lie anymore if you believe it". Silly me would have thought of it as some sort of odd joke (even if you believe a lie it doesn't make it true, right?), but then I realized that there was more to it and started interpreting it as some sort of Anthropology about the U.S. I noticed once gringos arguing with British people online and their argument went like: "don't make fun of our media and we won't make fun of your royalty" ...

I was once telling an European lady who minded such issues how they keep people in the U.S. I noticed how she was getting anxious but chose to keep parsing me. At some point she couldn't take it anymore and told me: "but what you are telling me is so stupid that it can't possibly work with people". I just smiled. I saw myself. I wouldn't have believed any of what I was telling her hadn't I experienced it myself. USG manages to very easily and cheaply do that fine job with "language" and some "social control" as good Christians call -repression- (-torture- being "enhanced interrogation techniques"). The other day I noticed CNN folks talking as if they cared about people in "Afghanistan and Yemen, Ethiopia, Somalia, Sudan, and so on, countries that desperately need that food assistance" and they were saying such thing quite naturally with a straight face (youtube.com/watch?v=MorNgyUyV10&t=85)

But then I wondered about those who have a hard time digesting USG's lies. One of my girlfriend' best friends who was an eye witness of the incident in which the IDF run a bulldozer over Rachel Corrie told us how she was murdered.

What I have related may not exactly relate to corpora research, but IMO it does definitely relate to "language" among other things.

lbrtchx

Albretch Mueller

5:34 a.m.

On 7/26/23, Ada Wan adawan919@gmail.com wrote:

...

Re "being 'relational' has a measurably tractable meaning brought about by the dot product in a vector space ;-)": this depends.

I meant and should have stated: "in Mathematics" (again Mathematics and science is my background)

...

Re "characters, words, phrases, sentences, ... all the way to whole books are always intra- and intertextually relational" --- I agree, except for the inclusion of "words" and "sentences" as these are, at least, obsolete, unreliable, and non-universal. We can do better in this regard. Anything we examine can be relational, assuming we have established or understood the connection. But note that the connection may be in us, instead.

Hmm! What do you mean exactly when you say that: "'words' and 'sentences' as these are, at least, obsolete, unreliable, and non-universal". Do you mean "as I understand them in the kind of research I do" or in general? If it is the second case I think Gogol would disagree: https://en.wikipedia.org/wiki/The_Tale_of_How_Ivan_Ivanovich_Quarreled_with_... https://www.gutenberg.org/ebooks/author/531 ~ Aristotle as well. Eco pointed out that the only true to matters statements trying to discern the essence of poetry were written by Aristotle. Ancient Greek people were obsessed with like ratios. Aristotle explained Poetry though some sort of parallel comparison of like ratios such as: "as sight is to the body, so mind is to the soul", out of which you would algebraically phrase: "the eyes of your soul", meaning "your mind". Now, if "words don't matter", how could we understand poetry? figurative meaning? ~

...

... But all in all, we just communicate in whichever way we end up doing so.

Yes, yes, yes! I totally agree, but we do still use language, especially words as some sort of brokering device "to communicate" (which comes from Latin communicare, literally meaning "to make common", "to share" in an intersubjective way, not as some "post-modern" folks would imply today "transmit 'information'" (whatever they mean)). In a sense you have ~retaken~ a theme that tormented Greek sages in a documented way going back to the 6th century Athens. The relationship between techne and arete to which Plato dedicated 5 of his dialogues. After three millennia we haven't still been able to shed some light, let alone figured out such issues, which I think matters, because we have come to believe that being smart means having the fastest computer, ... morally, however, we have apparently totally lost our senses. I think in order to understand language (in general semiotics) we need to make sense of the inner and outer intersubjective dialectic we employ when we "communicate" and (as I see things) there are various conceptual constellations essentially interlocked when it comes to making sense of such matters: a) techne (functions) b) the concept of the general (das Hegelsche Allgemeine) c) figuring out how intersubjectivity works (AFAIK, IMHO; Karl Marx's theory of valuation is the best in town even though it generated an industry of criticism mostly with positivistic leanings) d) the mind-body link (which I think taking into consideration all other aspects above could be proved to simply be of a semiological nature (no pineal gland, no microtubules, no ...))

lbrtchx

Albretch Mueller

12:17 p.m.

Once again, I found wikipedia lacking:

https://en.wikipedia.org/wiki/Techne

doesn't mean "art" (Latin translation which meant something different to them, closer to the Greek concept) or "craft" (at least not in the mundane sense of doing things manually, more like a "skill").

I think (quite forcefully, in the least amount of words) technê pertains to: "the functionally intersubjective aspects of productive knowledge".

In addition to the recommended book: "Productive Knowledge in Ancient Philosophy: The Concept of Technê", by Johansen, Thomas Kjeller; I would suggest: "Of Art and Wisdom" by David Roochnik (which I recommend not only as a necessary complementary reading to Johansen's, but I found much better at explaining the concept and its very interesting historical grounding from pre-Socratic times to Plato).

You will also need to understand well the mathematical concept of function, which has been cannibalized by all other scientific endeavors; not in the "post-modern" way in which it is explained on wikipedia:

https://en.wikipedia.org/wiki/Function_(mathematics)

but in the "Geometric" (which in those times didn't mean "visual" but more like -logical-) way Ancient Greeks understood the concept as they used it in the best corpus ever build:

https://en.wikipedia.org/wiki/Euclid%27s_Elements

all the way to Descartes.

When I have had to teach that concept to high school students I explained it "the old way":

https://ergosumus.wordpress.com/2021/11/09/nerds-gang-math-functions/

showing to my students how even month old ravens understand that concept without having to sit years in school ;-), also proving that our mind-body link is semiological (supervening on the negentropy brought about by our quite Saussurean neurons), not anatomical or physiological. As Kant explained to us, even when we dream, we dream "functionally".

Sorry for my latest mini rants. I decided to be more explicit about what I meant by technê, functions, ... because to my understanding it is not only more enlightening, but downright profitable when it comes to corpora research. I don't want for other people to be carried adrift as it happened to silly me with "tensors". I promise I won't say a word for the next five minutes or so ;-)

lbrtchx

Ada Wan

28 Jul 28 Jul

7:21 p.m.

Dear Ibrtchx

Re your 1st email (dated Jul 27, 2023, 5:01 AM, UTC+2): i. Re "no grammar": in reality. It's "made up" of (post-hoc) analyses and normative values from language judgment based on (more/less) well-formed data. [Of course, most of us who entered the language space didn't see it as such in the beginning. Many just took/take it for granted, as some necessary part of language. For me, at least, I've always had my reservations about e.g. syntax or much from syntactic theories, but it is not until I reflected further on my results did things become clearer to me (or did I realize that I just had a computational proof for dissolving "words").] (There can be a weaker formulation to "no grammar" --- that its existence being in the mind of the beholder, subject to each person's "belief" in the matter. There is also the interpretation of "no grammar" as an imperative/request: that we shouldn't use/endorse grammar (esp. to judge ourselves and others). For ethical reasons, one may benefit from the "grammar as style guide / mnemonics"-interpretation in communication. That is, take it easy with "grammar", in a way, it's just "recycled peer pressure", a "2nd/3rd/n-th hand emotion" :).) ii. Re "something 'magical' about language": it depends on one's def of "magical" too, I suppose. I just used it in the sense that there isn't that much that can't be explained away wrt language. Of course, there is a limit to human knowledge and one has to be(come) at peace with some things being just the way they are, e.g. our "initial conditions". But, sure, some people may find some things to be more "magical"/extraordinary than others. I don't see reason for disagreement here. iii. Are the Hegel links supposed to inform me of the concept of "Zeitgeist"? Just checking here. :p iv. I don't quite understand your point(s)/opinion(s) re the US or your experiences described in the last few paragraphs of this email. *A disclaimer: my views and opinions here on this forum / mailing list are not politically driven or oriented ("politics" here in the sense of government-related). When I mention "language politics", it usually has to do with language ideology and identity politics ("politics", as in, e.g. [from https://www.thefreedictionary.com/politics https://www.thefreedictionary.com/politics]: "[t]he often internally conflicting interrelationships among people in a society" (American Heritage Dictionary), "any activity concerned with the acquisition of power, gaining one's own ends" (Collins), or "the use of strategy or intrigue in obtaining power, control, or status" (Random House); that is, a more general, vanilla, "stateless" interpretation of "politics", similar to a more general interpretation of "language" on which I'd prefer to theorize). Note that "language" does not have to relate to "nation".* That having been expressed, sure, there can be all kinds of "propaganda" *everywhere and anywhere*, I can imagine. The intent behind my interaction with you all on this thread/forum, however, is to get people to do better science.

Re your 2nd email (dated Jul 27, 2023, 5:34 AM, UTC+2): Re "Now, if "words don't matter", how could we understand poetry? figurative meaning?": do you really think that if you understand all character strings in a poem, you'd understand it?

Re your 3rd email (Jul 27, 2023, 12:17 PM, UTC+2), aka your "latest mini rants": No prob. (Yes, I prefer a more comprehensive, holistic view as well.)

Best Ada

On Thu, Jul 27, 2023 at 12:17 PM Albretch Mueller lbrtchx@gmail.com wrote:

...

Once again, I found wikipedia lacking:

https://en.wikipedia.org/wiki/Techne

doesn't mean "art" (Latin translation which meant something different to them, closer to the Greek concept) or "craft" (at least not in the mundane sense of doing things manually, more like a "skill").

I think (quite forcefully, in the least amount of words) technê pertains to: "the functionally intersubjective aspects of productive knowledge".

In addition to the recommended book: "Productive Knowledge in Ancient Philosophy: The Concept of Technê", by Johansen, Thomas Kjeller; I would suggest: "Of Art and Wisdom" by David Roochnik (which I recommend not only as a necessary complementary reading to Johansen's, but I found much better at explaining the concept and its very interesting historical grounding from pre-Socratic times to Plato).

You will also need to understand well the mathematical concept of function, which has been cannibalized by all other scientific endeavors; not in the "post-modern" way in which it is explained on wikipedia:

https://en.wikipedia.org/wiki/Function_(mathematics)

but in the "Geometric" (which in those times didn't mean "visual" but more like -logical-) way Ancient Greeks understood the concept as they used it in the best corpus ever build:

https://en.wikipedia.org/wiki/Euclid%27s_Elements

all the way to Descartes.

When I have had to teach that concept to high school students I explained it "the old way":

https://ergosumus.wordpress.com/2021/11/09/nerds-gang-math-functions/

showing to my students how even month old ravens understand that concept without having to sit years in school ;-), also proving that our mind-body link is semiological (supervening on the negentropy brought about by our quite Saussurean neurons), not anatomical or physiological. As Kant explained to us, even when we dream, we dream "functionally".

Sorry for my latest mini rants. I decided to be more explicit about what I meant by technê, functions, ... because to my understanding it is not only more enlightening, but downright profitable when it comes to corpora research. I don't want for other people to be carried adrift as it happened to silly me with "tensors". I promise I won't say a word for the next five minutes or so ;-)

lbrtchx

Albretch Mueller

30 Jul 30 Jul

10:28 a.m.

On 7/28/23, Ada Wan adawan919@gmail.com wrote:

...

Re your 1st email (dated Jul 27, 2023, 5:01 AM, UTC+277): i. Re "no grammar": in reality. It's "made up" of (post-hoc) analyses and normative values from language judgment based on (more/less) well-formed data. [Of course, most of us who entered the language space didn't see it as such in the beginning. Many just took/take it for granted, as some necessary part of language. For me, at least, I've always had my reservations about e.g. syntax or much from syntactic theories, but it is not until I reflected further on my results did things become clearer to me (or did I realize that I just had a computational proof for dissolving "words").]

(There can be a weaker formulation to "no grammar" --- that its existence being in the mind of the beholder, subject to each person's "belief" in the matter. There is also the interpretation of "no grammar" as an imperative/request: that we shouldn't use/endorse grammar (esp. to judge ourselves and others). For ethical reasons, one may benefit from the "grammar as style guide / mnemonics"-interpretation in communication. That is, take it easy with "grammar", in a way, it's just "recycled peer pressure", a "2nd/3rd/n-th hand emotion" :).)

in the character-by-character way in which I see texts/corpora, you have clusters of referent, modifiers and links: an rml grammar, which happens as a way to organize links and frame a bit better the sense of a phrase. It is some form of graphical user interface, but without it there are plenty of sentences you couldn't make sense of.

...

ii. Re "something 'magical' about language": it depends on one's def of "magical" too, I suppose. I just used it in the sense that there isn't that much that can't be explained away wrt language. Of course, there is a limit to human knowledge and one has to be(come) at peace with some things being just the way they are, e.g. our "initial conditions".

But, sure, some people may find some things to be more "magical"/extraordinary than others. I don't see reason for disagreement here.

"magical" in the sense that when we go about our intersubjective business (I am saying something to you, which you can’t help reading in your own ways; other people may read, mind, as well ...; Alice bought some veggies from Bob, …). We see more in money ("words", ...) than just a piece of paper or some transactional electronic ("air" ...) excitations. Another aspect of that "magic" which I think hasn't been studied enough is that even though your "magic" and mine are different we are still able to "communicate". How on earth do such things happen?

...

iii. Are the Hegel links supposed to inform me of the concept of "Zeitgeist"? Just checking here. :p

No, not just about "Zeitgeist". Sorry! More about "how 'Zeitgeist' happens" which is what becomes interesting from a corpora research, semiological point of view, which I actually learned from a Russian philosopher:

// __ Evald Ilyenkov's philosophy revisited: I was really glad (even confused in this "post modernistic" age of nonsense and "alternate facts") ...

https://www.amazon.com/gp/customer-reviews/R115QLRWYD52M8/ ~ who (re)explained to me what Hegel meant by his Begriff des „Allegemienes”. Let me forcefully try it with the least amount of words:

Think of society at large as a corpus (not just texts, but all kinds of techne/functions as well (the Mayan culture had a God of "the Verb")). As we all go about our daily business we do so functionally, step-by-step through engineering and societal devices ("words") we have created (in a sense we are kind of "reading" as we "mind our business"). Yet, when someone uses a cell phone she doesn't have to understand sh!t about the technicalities of such devices, nor does she need to have a clue about the UVEL physics of the machines used to make phones ... all she needs is a graphical user interface which is also another device comprising engineering and societal aspects. In that sense I don't understand what consciousness studies folks mean when they talk about the "physical closure of reality" kind of writing off consciousness. It is not like our semiosis is puncturing consciousness to any extent.

Something that I find very interesting is that sleep researchers have ways to gauge when someone goes into REM and its cycles. However, when they interrupt the subject half way through, s/he always gives a whole functional narrative of "their dream" (part of which researchers "objectively" see from their devices). I think that may relate to what Kant meant when he said that "we are framed by our minds" (my way of putting it).

Each entity, either objects or conscious subjects, "lives" to a large extent conditioned by and conditioning/(determined by and determining?) the confluence and affluence of various aspects relating to its very self. The degree of interconnectedness of all such paths is what enriches „das Allegemienes”. This is something I noticed myself before I understood Hegel from my "best known hells". I was born and raised in Cuba where most people tend not to mind the screw holding the handrail to the stair. You would step into the same bus and notice the screw getting looser ... until you notice no hand rail ... In Germany (in a sense the opposite of Cuba within the Western culture) as part of their "deutsche Ordung" thing, you would walk into a bathroom and notice next to a toilet a sign like: "object USH!T2:69~47:(long-lat-height):201508 administered by MIN47 and the product of your act will be USH!T:201508XX:XX:XX:XX" the "XX:XX:XX:XX" part being updated every time you walked in ;-). THis is how you felt.

...

iv. I don't quite understand your point(s)/opinion(s) re the US or your experiences described in the last few paragraphs of this email.

A disclaimer: my views and opinions here on this forum / mailing list are not politically driven or oriented ("politics" here in the sense of government-related). When I mention "language politics", it usually has to do with language ideology and identity politics ("politics", as in, e.g. [from https://www.thefreedictionary.com/politics]: "[t]he often internally conflicting interrelationships among people in a society" (American Heritage Dictionary), "any activity concerned with the acquisition of power, gaining one's own ends" (Collins), or "the use of strategy or intrigue in obtaining power, control, or status" (Random House); that is, a more general, vanilla, "stateless" interpretation of "politics", similar to a more general interpretation of "language" on which I'd prefer to theorize). Note that "language" does not have to relate to "nation". That having been expressed, sure, there can be all kinds of "propaganda" everywhere and anywhere, I can imagine. The intent behind my interaction with you all on this thread/forum, however, is to get people to do better science.

We were talking about "language" and you mentioned how "some people" use it to abuse others. I wasn't trying to persuade anyone one way or the other. Not me! I mentioned it because this is the most amazing thing I have learned about "language": how easily it can be used in the way you mentioned (the theme of my "lies ..."). I also think that we, scientists and tech monkeys, think of ourselves as some sort of aristocracy, "because we can" and that we should talk about our problems (I'd rather) instead of "Ancient Greece"'s. They minded -their- problems in thoroughgoing argumentative ways, right? Just my opinion.

What do you mean when you say "you understand all character strings". BTW, I think perhaps we have a hard time with our back and forths because you are thinking in terms of "character strings" and I am thinking like a "human" as you guys call us ;-) You don’t read "strings of characters". Once you learn how to read a language, you can’t help but parse, try to make sense of those "strings of characters".

The other day we (all multi-lingual, multi-kulti people) were talking about languages and cultures. My priest boasts to know well, be somewhat fluent in 10 languages! I was thoroughly amazed when they confided to me that they were multi-lingual from their mouth out. My three languages English, Spanish and German (quite schizophrenically (some may say)) feel very different to me. Probably, because I could say I grew up in a music school, I read as if I were reading music, even the white space doesn’t "feel" the same to me in the three different languages, not even within the same sentence. When I read it feels quite a bit as if I were reading music (including the harmony). I have noticed other people saying similar things.

lbrtchx

Albretch Mueller

10:36 a.m.

On 7/30/23, Albretch Mueller lbrtchx@gmail.com wrote:

...

... We see more in money ("words", ...) than just a piece of paper or some transactional electronic ("air" ...) excitations. Another aspect of that "magic" which I think hasn't been studied enough is that even though your "magic" and mine are different we are still able to "communicate". How on earth do such things happen?

It is kind of like we are all constantly lying to one another in some controlled "category mistake" kind of way and that works just fine, but HOW?!

lbrtchx

Albretch Mueller

10:42 a.m.

"It is not like our semiosis is puncturing 'the closure of physical reality' to any extent". I meant to say. Sorry, that happens when you type fast. lbrtchx

Hugh Paterson III

11:41 p.m.

It seems to me that in some way the character-by-character analysis of language over-specifies the input and at the same time misses the meaning of the term "language". It makes the assumption that language is bound up in character strings, and at the same time these character strings represent all of the communicative message. Such assumptions hardly work with a corpus of signed languages.

...

in the character-by-character way in which I see texts/corpora, you

have clusters of referent, modifiers and links: an rml grammar, which happens as a way to organize links and frame a bit better the sense of a phrase. ....

On Sun, Jul 30, 2023 at 3:42 AM Albretch Mueller via Corpora < corpora@list.elra.info> wrote:

...

"It is not like our semiosis is puncturing 'the closure of physical reality' to any extent". I meant to say. Sorry, that happens when you type fast. lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Ada Wan

31 Jul 31 Jul

4:19 p.m.

Dear Ibrtchx

*I. The "grammar"* I was referring to is not exactly the heuristic you wrote about. * One can certainly read and analyze texts/corpora/literature, I don't disagree with that.

That having been expressed, here are a couple of points re RML that one should pay heed: i. to what extent and in what context is this a technology relevant? ii. one can certainly dissect/decompose texts (for e.g. for GUI in HCI --- categories for which depend on the task, not on the "form" of language**), so it'd be a misnomer to call it "grammar"*. * Note that "grammar" (in context of much of what's been passed down to us in Computational Linguistics (CL), Linguistics, Text Technologies and/or Natural Language Processing (NLP)) has been intermingled with many philological/historical pursuits and leveraged many less scientific terms (e.g. parts of speech). ** Btw, this is a difference not clear to many.

Re "without it there are plenty of sentences you couldn't make sense of": this is an ethical aspect in pedagogy that I am hoping will improve with the lessening of one's dependence of "grammar", i.e. prototypical wellformedness. (Also, do try this at home yourself: many lexical items we see in dictionaries are in their "canonical form", yet many of us are often able to survive what many others will consider to be "misspelled" strings, typographical "errors" (in scare quotes here because these are rather biased/punitive formulations from our less accommodating practices in our pedagogical and research perspectives.

*II. Re ""magical" in the sense that when we go about our intersubjective business": * some intersubjectivity can be further clarified. I don't see much of your examples as being "magical". E.g. i. "I am saying something to you, which you can’t help reading in your own ways": right, classic. We never know if we really understand each other. You being you, me being me. It's an approximation. Most of the time, we agree by agreement/confirmation on intent, most of the time implicitly, sometimes explicitly. ii. "other people may read, mind, as well ...;": so? iii. "Alice bought some veggies from Bob, …)": this I don't understand. iv. "We see more in money ("words", ...) than just a piece of paper": "[l]egal tender is a form of money that courts of law are required to recognize as satisfactory payment for any monetary debt" (from https://en.wikipedia.org/wiki/Legal_tender and https://www.royalmint.com/aboutus/policies-and-guidelines/legal-tender-guide... ). v. "some transactional electronic ("air"...) excitations": I don't get this. vi. "your 'magic' and mine are different we are still able to 'communicate'. How on earth do such things happen?": a disclaimer: I am not using any magic in my attempts to communicate with you here. I try my best to place myself in your shoes to guesstimate the points that you are trying to get across. But many (as you can see above) didn't quite reach me.

*III. Re the Hegel links: * right, I forgot the Ding an sich debate etc., from "The Phenomenology of Spirit". That is also relevant for the "no 'word' etc."-initiative. (Please pardon my not entertaining much of the/your p-language/culture discussion here, for the sake of time and priority. I think there are more urgent issues to solve atm.)

*IV. Re language politics/ideology: * btw, the fact that you could describe to me a meta-view about a discrepancy between "it is not a lie anymore if you believe it" and potential dissenting perspectives --- "it is / can still be a lie if you believe it" (as in, e.g. one could just be or want to be fooled) or "it is / can still be a lie regardless of whether you believe it" etc. --- already suffices as an argument against "it is not a lie anymore if you believe it" (and that there is some truth possible, so long is one smart enough to not inappropriately go into an infinite regress about things).

Re "I also think that we, scientists and tech monkeys, think of ourselves as some sort of aristocracy, "because we can" and that we should talk about our problems (I'd rather) instead of "Ancient Greece"'s. They minded -their- problems in thoroughgoing argumentative ways, right? Just my opinion.": historically, it has not been uncommon for scholars or learned persons to demonstrate their intellectual prowess by "acquiring" useless skills. (That's not to state that I am for/against such practice, here.)

*V. Re poetry and others: * "What do you mean when you say "you understand all character strings"": right, it's all a terminology issue, much with much of academic debates/misunderstandings. "You don’t read "strings of characters"": yes, you do. E.g. in our email exchanges here, all you have been reading/seeing are strings of characters. What you take in from these may not be what you'd continue to regard as "strings of characters", but then again, it's all a matter of naming/terminology. We can continue our de dicto conversation, but I think I understand your position (and I might even surmise that you understand mine as well). "Once you learn how to read a language, you can’t help but parse": uh, no, that's just a habit of some. "even the white space doesn’t "feel" the same to me in the three different languages, not even within the same sentence": congrats! I am glad! De-pedanticization has not been an easy task for many.

*@Hugh: * Please adapt "character in text" to other context/modality accordingly (could be signs, segments of text passages, documents etc.). My formulation should NOT be read by the "word".

Best Ada https://sites.google.com/view/adawan (Follow me on Twitter @adawan919 for <daily rants on language-no-language, my journey to out bad research, and on the "no 'word'"-initiative :P. Tons of cyber intimidation might follow you ;), do so only if you're not 🐣.)

On Sun, Jul 30, 2023 at 11:41 PM Hugh Paterson III sil.linguist@gmail.com wrote:

...

It seems to me that in some way the character-by-character analysis of language over-specifies the input and at the same time misses the meaning of the term "language". It makes the assumption that language is bound up in character strings, and at the same time these character strings represent all of the communicative message. Such assumptions hardly work with a corpus of signed languages.

...
in the character-by-character way in which I see texts/corpora, you

have clusters of referent, modifiers and links: an rml grammar, which happens as a way to organize links and frame a bit better the sense of a phrase. ....

On Sun, Jul 30, 2023 at 3:42 AM Albretch Mueller via Corpora < corpora@list.elra.info> wrote:

...
"It is not like our semiosis is puncturing 'the closure of physical reality' to any extent". I meant to say. Sorry, that happens when you type fast. lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info

Peratham Wiriyathammabhum

25 Jul 25 Jul

6:41 p.m.

749

Age (days ago)

757

Last active (days ago)

corpora@list.elra.info

40 comments

8 participants

tags (0)

participants (8)

Ada Wan
Albretch Mueller
andrea.nini＠manchester.ac.uk
Arturo Montejo-Ráez
Darren Cook
Hugh Paterson III
Peratham Wiriyathammabhum
Rodolfo Delmonte