"mathematical purity" ... how you can use vector/tensor algebra with texts
I'd suggest using the search word "embeddings" instead of "tensor".
The concept is being used in other fields, even physics, but (sticking with linguistics) if you've not looked into Word2Vec yet that is a good place to appreciate how human language and linear algebra come together.
It is normally introduced as a ready-made model of dim 300, trained on millions of words. Like you I wanted to understand what it was actually doing, so a few years ago I did a presentation using just two dimensions and a handful of words and sentences, then plotting the embeddings found for each word. You can add or remove a sentence at a time to see what it is learning from each.
You can see how each dimension is being given some meaning, even if they are not the way a human linguist would have structured it.
It is also a good test bed for finding the limits, such as playing around with ambiguous words and proper nouns, increasing the amount of training data without increasing dimension, etc.
Darren
P.S. The embedding layer is the first layer in transformers, the layer where tokens ("words") are turned into numbers, typically of dim 512 or higher. But note that they are randomly generated, not initialized from word2vec or similar. And any modification to their initial randomness is to please the layers above, not humans trying to peer inside the box.
P.P.S. I think you might also enjoy https://transformer-circuits.pub/2021/framework/index.html which is exploring how transformers work at a very low-level.
The gap between their minimalist models and something like ChatGPT is huge, though, and reading their work isn't going to help you appreciate why ChatGPT says stupid things to you.