Dear Hugh
Thanks. I think it is important to have the right unit of measurement (not just how we refer to them) --- both for processing and evaluation. While "strings" could be a more general reference, units in finer granularity (e.g. characters, bytes, character-/byte-n-grams) would be more precise, units in bigger span (e.g. documents) would be more suitable for many computational multilingual processing (e.g. in the case of parallel corpora). But, right, I don't disagree with using the term "strings" either, when used appropriately.
Best Ada
On Wed, Aug 2, 2023 at 10:39 PM Hugh Paterson III sil.linguist@gmail.com wrote:
Dear Ada,
I think I am agreeing with you in terms of finding the right labels for the scientific units of reference. I have always wondered why computational linguists have not just simply called these units "strings".
Kind regards, Hugh
On Wed, Aug 2, 2023 at 11:12 AM Ada Wan via Corpora < corpora@list.elra.info> wrote:
Re RML or any "text technologies" leveraging "grammar" (misnomer or not): it is not the right time right now to be "campy" about (as in, to be arguing/protesting for) "grammar", at the moment, esp. if you do not have a background in Linguistics. There has been quite some abuse/misconduct with concepts/units/assumptions such as "words", "sentences", and "grammar" in the language space (with or without computational implementation).
The priority of my communications here is to clarify the part on the scientific front, to make sure that if one happens to have gotten oneself involved in this space, how one can come to more clarity on the status quo, esp. given my results. There is a lot that needs to be re-evaluated and re-interpreted. Simply stating that something might have been useful in the past is not going to be helpful with going forward.
If one is working in technologies with language/text data (e.g., in a user-based format/framework, and not working on "grammar" as a "linguistic"/philological pursuit), it is recommended that the name(s) of such technologies get updated --- if "grammar" [1] does not have to be mentioned or be involved, don't. [1] or, including but not limited to any of the following: "word", "sentence", "linguistic structure(s)", "meaning", "morphology", "syntax", "parsing", various terms related to parts of speech (e.g. "nouns", "verbs")....
Re "BTW, regarding that "parsing" aspect, what is the term used to describe the gradual process of "terminological inception"?": conceptualization? Coining of terms? According to me, "lexical priming" is different from "terminological inception".
Re "How could you clarified intersubjectivity?": https://en.wikipedia.org/wiki/Intersubjectivity :) Your question is way too broad, or requires an answer that is such, which I cannot entertain at the moment.
Thanks for sharing your perspectives. I must admit I have not had time to digest all of your points. But this impression recurred in me as I was reading them: sometimes, I sense that when one claims some concepts are not universal (e.g. the ones mentioned in [1] above), others take it as that all concepts are categorically invalid. That is not what I intended to communicate (with all my papers, scientific work, and my comments here). It is an expert opinion/finding that I shared, upon some careful evaluation.
On Tue, Aug 1, 2023 at 10:26 PM Albretch Mueller via Corpora < corpora@list.elra.info> wrote:
On 7/31/23, Ada Wan adawan919@gmail.com wrote:
That having been expressed, here are a couple of points re RML that
one should pay heed:
i. to what extent and in what context is this a technology relevant?
If you were able to device an algorithm which taking as input only NL texts (composed of: 1) a start (semantic end); b) a sequence of characters from a relatively large and representative text bank; c) an end (a semantic start)) is able to exhaustively "deduce" the grammar of such texts, in addition to being able to use it with any language, you would then:
- have defined a "space"/"coordinate system" for those texts, to
frame (pretty much) all possible "meaningful 'points'"/"phrases" in terms of such grammar, which would also; 2) be a 0-search structure describing the text bank/corpus (every text segment would also become a pointer to every single actualization of that very segment in all texts, no more "n-grams" necessary!), which could; 3) be used with minimal turking/supervision to: 3.1) cleanse up all automatic translations from youtube; 3.2) keep multilingual corpora; 3.3) use it for automatic translations (demonstrably, in an almost foolproof, perfect way, since you always have the words/phrases with their context); 3.4) "cosmic/tree reading": instead reading books/sequences of characters, you would read that text as it relates to all other texts from the same topic; 3.5) parsing: you would keep a corpus of what you know so you wont have to reread about certain topics and aspects you already know (great Lord! how I hate reading a whole book to only find a few, at times marginal, sentences worth reading! or that "youthful" thing of thinking that they just discovered/created an idea because they are just verbalizing it or made a movie about it!) BTW, regarding that "parsing" aspect, what is the term used to describe the gradual process of "terminological inception"? I have heard the term "Adamization", but, even though that word doesn't really rub me the wrong way, I could imagine it is "too sexist" to some people. I wouldn't really care calling it Eveization or "pussyfication" or whatever. I just don't want to use the term that the government uses: "lexical priming" and "terminological inception" sounds too cumbersome as a verb: "terminologically incept"? doesn't sound OK in English; 3.6) of course, an easy application of that contextual parsing would be removing all that js crap and ads before they reach your awareness; ... 3.n) not last and definitely not least I am thinking hard about how to make sure police and politicians at least have a hard time while using what I have described to "freedom love" people (I know, I know, ... "3.n" doesn't "technically" pertain to quality of implementation issues ..., but I, for one, disagree. Giving the "all tangible things" (tm) panopticon in which we are all living these days, each of us in one's own "virtual prison cell" to call it somehow, we should also think about, be openly honest about such matters)
I am working right now on such Leibnizian "characteristica universalis" kind of thing. First cleansing approx. 1.2 million texts mostly from archive.org, *.pub and the NYS Regents exams (nysedregents.org + nysl.ptfs.com) which they have, at least partially, translated to more than 10 languages. Is that relevant enough to you? ;-) I am also being quite selfish about it because I have always dreamed of being able to "read"/mind all texts which have ever been written in the same way that teens think they have to have sex with everybody in town to make sense of things.
ii. one can certainly dissect/decompose texts ...
Computing power has become insanely cheap, but it has also enabled too much "cleverhansing" out there. The Delphic phrase: "you can make sense or money" these times translates as some sort of corollary to: "using computers and then thinking about it makes you smart"; but, does it really?
It amazes me how easily you can "dissect"/"decompose texts", talk about "tensors", "vectors", ... (I am not trying to police language usage, it just amazes me); let alone all the insufferable bsing claims by the "Artificial Intelligentsia".
I would go with one character after the other and an open attempt to use the minimal amount of principles to then see what I get. IMO, when you start getting too smart about what you do, of course, you will "see" how smart you are. The poet in me likes Borges' stanzas: "... el nombre es arquetipo de la cosa, en las letras de 'rosa' está la rosa y todo el Nilo en la palabra 'Nilo'" ("its name is a thing's archetype, in the letters of 'rose' is the rose and the whole of the Nile (river) in the word 'Nile'")
II. Re ""magical" in the sense that when we go about our
intersubjective business": some intersubjectivity can be further clarified. I don't see much of your examples as being "magical".
I actually do! How could you clarified intersubjectivity? I am trying to do so (somewhat) Mathematically (to the extent you could). Could you share any papers, "prior art" on such matters?
ii. "other people may read, mind, as well ...;": so?
which is a good thing it is alright, fine and dandy in the hippie way, I meant.
iii. "Alice bought some veggies from Bob, ...)": this I don't
understand.
iv. "We see more in money ("words", ...) than just a piece of paper"
iii. and iv. overlap to some extent so I will try to explain them both quickly (which is impossible since you can write philosophies about each line, but there I'll go). To understand what Marx (may have) meant by „gesellschaftlich notwendige Arbeit” ("socially necessary labour time", wording which has made quite a few go berserk ever since):
https://en.wikipedia.org/wiki/Socially_necessary_labour_time
https://en.wikipedia.org/wiki/Transformation_problem
you have to understand the basic mathematical concepts of:
a) combined rates, and b) intratextual systems of linear equations
Based on my teaching experience §b is easier to understand. Sorry I couldn't find an "easier" explanation on youtube of that type of SLEs than the one I used with my students preparing for the Regents:
https://ergosumus.files.wordpress.com/2018/10/sle04-en.pdf
the intratextuality of those problems matter to corpora research because different strata of "like terms" ("verbs", "adjectives", ...) is what creates grammar. "Crazy me" thinks you could to some extent describe the "likeness of terms" underlying grammar! ~ I also have a guideline about combined rates which I successfully used with my students:
https://ergosumus.files.wordpress.com/2018/06/word_problems12-en00.pdf ~ What the eff do combined rates and SLEs have to do with Marx' transformation problem? ;-)
Well, notice that the -equitable aspect- used to solve combined rates problems is the time (regardless of how differently fast one "works" in comparison with others). There is also another type of combine rate problems: you drive to some place with a friend who doesn't care about driving fast, but you need to rest so she drives for a while ... that problem is different from two people meeting at a place each driving "on their own cars" (at their own average speed).
Serge Heiden shared a paper about presidential debates which could be also Mathematically studied as a CR kind of problem (even if politicians as the crowd management clowns they all are don't have to make sense, anyway), but as it happens with any dialogue there are parts of the conversations in which both the cars and the time is shared and other times when only (or more of) the time. I don't know of a general Mathematical formulation to CRs kinds of problems, which could be used for corpora research. On my "to do" list I have writing papers studying Euclid's Elements and Plato's Dialogues in that way.
Karl Marx's as part of his „Wertgesetz der Waren” (reChristened in English as "labor theory of value") somewhat metaphorically stated that the exchange value of a commodity is a function of "society's labour-time". He also rendered his ideas as equations (in more of a verbally descriptive, metaphorical way), but that phrase: "society's labour-time", was and is still found from questionable to unfalsifiably wild. I don't claim to have mind reading powers, but I think in his letter to his friend Ludwig Kugelmann, the thoroughgoing Hegelian Marx was, he clearly explained what he meant (page: 222 in file, 208 in book):
https://archive.org/download/marxengelsselectedcorrespondence/Marx%20%26%20E...
Marx To Ludwig Kugelmann In Hanover London, July 11, 1868: All that palaver about the necessity of proving the concept of value comes from complete ignorance both of the subject dealt with and of scientific method. Every child knows that a nation which ceased to work, I will not say for a year, but even for a few weeks, would perish. Every child knows, too, that the masses of products corresponding to the different needs require different and quantitatively determined masses of the total labour of society. That this necessity of the distribution of social labour in definite proportions cannot possibly be done away with by a particular form of social production but can only change the mode of its appearance, is self-evident. No natural laws can be done away with. What can change in historically different circumstances is only the form in which these laws assert themselves. And the form in which this proportional distribution of labour asserts itself, in a state of society where the interconnection of social labour is manifested in the private exchange of the individual products of labour, is precisely the exchange value of these products. ~ So, as I see it, in a Hegelian way, Marx was seeing the whole of society as a corpus (in which we all live through our own texts/narratives) talking about "socially necessary labour time" in the way that "time" becomes the equitable aspect shared when people/(-society as a whole-) work together as described by combined rates kinds of problems.
When "Alice buys some veggies from Bob, ..." she used money as "equitable aspect" to get Bob's veggies (in the Marxian way they were both part of a combine rates problem) and you tell me this is not magical!
v. "some transactional electronic ("air"...) excitations": I don't get
this.
you may pay with cash using coins or bills or using your debit card which at the end of the day become transactional electronic excitations on some hard drives. When you speak there is more to it than vibrations/fluctuations of air. (I am referring to the medium which Saussurean signifiers use)
vi. "your 'magic' and mine are different we are still able to
'communicate'. How on earth do such things happen?": a disclaimer: I am not using any magic in my attempts to communicate with you here. I try my best to place myself in your shoes to guesstimate the points that you are trying to get across. But many (as you can see above) didn't quite reach me.
"I try my best to place myself in your shoes" ... ;-) Ha, ha, ha! that is just a functional illusion. What do you know about "my shoes"? I work as a gardener (which I love to do) so they are dirty and smelly, ... I also love to eat garlic ... As I see things standing on "my dirty and smelly shoes and voicing it from my garlicky mouth" being honest and true to matters is good enough.
lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info