@Rodolfo: I think it is imperative to point out that it is one thing as to how tensors and matrices can be applied to compute certain (statistical) values, and it is possible that sometimes certain statistical values/patterns correspond to certain values/patterns in data (in this case, text data); but it is another thing to claim that "tensors should capture ... both paradigmatic and syntagmatic properties of a word in a sentence..." etc. [1] without addressing what statistical patterns could/would be sufficient to describe the paradigmatic/syntagmatic properties (of anything defined). As "words" are often undefined (and undefinable computationally), one should be careful with jumping to this kind of interpretations/conclusions which then often only speaks to a particular way of how data is segmented and processed, and/or to a specific dataset. There has been, unfortunately, a tradition of over-generalizing in this regard in Computational Linguistics and NLP (and in some other computational sciences too, I reckon). Re "[t]he question is do matrices represent all needed semantic and syntactic properties of a sentence?": it depends on your data (and one's interpretation thereof) --- even if/when there is a "sentence" to speak of. [1] again, note that "word" should not be underspecified here
@Peratham: Re "[w]e could continue this conversation or definition and pursue another topic of how to define these symbolic/scientific/computational systems": or one could just look at the numerical values, and try things out with some carefully controlled experiments (but I doubt one would achieve much with "words", at most one gets is whatever that is similar to the shape of a "word"). I remain open-minded though as to what one can achieve with characters in the computational setting!
On Tue, Jul 25, 2023 at 6:56 PM Rodolfo Delmonte delmont@unive.it wrote:
My pleasure! RD
Il mar 25 lug 2023, 18:46 Peratham Wiriyathammabhum < peratham.bkk@gmail.com> ha scritto:
A pleasure to me to be cc’ed by Prof. Delmonte.
On 25 Jul BE 2566, at 23:01, Rodolfo Delmonte delmont@unive.it wrote:
In fact tensors should capture ideally both paradigmatic and synthagmatic properties of a word in a sentence given the fact that they are usually made up of matrices, that is at least couples of vectors where the rows are represented by embeddings. The question is do matrices represent all needed semantic and syntactic properties of a sentence? I doubt it and in fact when it comes to deep implicit content they certainly fail. But also with OOVWs or simply rare words no reasonable outcome is obtained. Rodolfo
Il mar 25 lug 2023, 17:28 Ada Wan via Corpora corpora@list.elra.info ha scritto:
Dear lbrtchx
Yes, indeed, it is possible for a string (or an expression or a lexical item... etc.) to refer to different things based on different contexts. One could refer to it as polysemy (or not). Many fields have shared vocabulary items. Same character or character strings can be used in ways that show differences "in nature"/"by definition" (i.e. different due to discipline-specific, historical reasons) or differences in practice (which could be more general/generalized). Esp. in an engineering field nowadays, a term used for/in practice is likely to gradually take over the one favored historically over time.
Then again, Is your inquiry more about vocabulary use, or for what reason are you asking your question(s)?
Best Ada
On Tue, Jul 25, 2023 at 10:40 AM Peratham Wiriyathammabhum via Corpora < corpora@list.elra.info> wrote:
Not talking to any medical doctors for another sense :)
From WordNet (r) 3.0 (2006) [wn]:
tensor n 1: a generalization of the concept of a vector 2: any of several muscles that cause an attached structure to become tense or firm
On 25 Jul BE 2566, at 06:13, Albretch Mueller via Corpora < corpora@list.elra.info> wrote:
On 7/24/23, Andrea Nini via Corpora corpora@list.elra.info wrote:
... See:
https://en.wikipedia.org/wiki/Tensor_(machine_learning)
Oh! Am I silly! ;-) That is why I was noticing a really strident impedance between what they were saying and what we, Mathematicians, mean by, have been taught to understand as:
https://en.wikipedia.org/wiki/Tensor
I was fancying self-describing decentralized hyper-forests of text segments out of which a Language's grammar could be derived ... and based on such totally off the mark, fanciful ideations I was trying to somehow figure out how to describe the inner intersubjective aspects of valuation through tensor planes ... there I went. ~ On 7/24/23, Darren Cook darren@dcook.org wrote:
Perhaps my doubts relate to the fact that as a theoretical physicist
myself, the kind of "mathematical purity" I was trained into...
By the way, this is probably veering off-topic for corpora-l.
datascience.stackexchange.com is quite a good place for questions about
transformers, embeddings, NLP, etc.
As a TI I can't use stackoverflow, stackexchange ... (they start road blocking you in really obnoxious ways) I can't even visit public libraries in "'the' 'land' of 'the' free ...", "because" they blacklisted me in the FBI criminal index (believe me, you would laugh about it if you could if you knew me)
lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Nota automatica aggiunta dal sistema di posta
*Sostieni il futuro* Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271*
Nota automatica aggiunta dal sistema di posta
*Sostieni il futuro* Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271*