Toms
No, not my arrogance, but my expertise is outstanding. To my background: before my graduate-level theoretical linguistics curriculum (Chomskyan lineage) in the 1990s [1], I'd spent about 1-2 decades in multilingual, international environments, making keen observations and reflections on various language, cultural/social phenomena. After graduation, I have traveled to all 7 continents to continue with my linguistic and philosophical observations and learning (aka fieldwork). I have studied in 3 continents and had about 5 rounds of graduate training [2]. I have learned about 10+(?) languages/varieties (EN,ZH,FR,ES,RU,DE,LA,NL,IT,JA,ASL) and dabbled on a few other more (Sanskrit, Ancient Greek, AR...) and did fieldwork on/with a couple more (Zapotec, various varieties of PNG, Tok Pisin...) --- I mean, I don't remember much/any of these, it's been quite a while... now that I am/was (?) [3] about to retire. In the 2000s, I revisited "Linguistics proper" from another perspective, including but not limited to what you may know as "NLP" nowadays. I did not start publishing until the 2010s, so I assume that's when one might have become familiar with my work (assuming that they have read it). [1] doing original research, including but not limited to something similar to G2P work, and on what one could consider as writing pseudo code for computational systems (so, Computational Linguistics) [2] So to have finally come up with the experimental results I did and to have figured out what "language complexity" (in both the context of computing and not) as well as various other DL/NN phenomena were all about, I surely think it deserves a celebration! [3] Recent happenings seem to suggest that I should stay in the arena to keep an eye out on things. Indeed, as you might have liked to suggest, there are plenty of "NLP practitioners" out there nowadays who think they are qualified to work on "language" just because they can speak one. But if you need to pick a battle about that with me, I'm afraid you might have picked the wrong person.
So I hope you could consider me as "not-a-noob". Being a woman in STEM/tech can be hard, but I didn't realize how little of a benefit of a doubt some choose to afford. I write this because not only of the tone of your inquiry, but also because of it is hard to not take offense with what you actually wrote, including this "yet that did not put you off from writing bogus papers on machine translation". Which part of my work do you regard as "bogus papers on MT"?
***
Re ""The priority of my communications here is to clarify the part on the scientific front, to make sure that if one happens to have gotten oneself involved in this space, how one can come to more clarity on the status quo, esp. given my results." Is this about your "results" in that one paper...": No, it is not only about: i. my results in "Fairness in representation for multilingual NLP" ( https://openreview.net/forum?id=-llS6TiOew or https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view?pli=1), or ii. those in "Representation and Bias in Multilingual NLP" ( https://openreview.net/forum?id=dKwmCtp6YI), or iii. some of the remarks on language matters and current practices in the "language space" I've tweeted --- some of which further explain my solution to language complexity, some advance arguments from traditional academic debates, some on meta-theoretical and transdisciplinary development, some on possible future directions... etc., but also iv. some of the dependencies or potential impact of my results, if one were to take their work with responsibility and integrity. To better understand the *impact* of my results (as in, why they are important), it'd be helpful for one to have knowledge/experience of the language space (or the "language enterprise" as Noam sometimes refers/referred to it) --- to understand the development and the dependencies (and the lack thereof) between Linguistics, Computational Linguistics, Natural Language Processing, Computing or Computer Science and/or Computational Sciences, Statistics/Mathematics, Social Sciences, Information Theory, Philosophy... etc. [4] (both as academic disciplines as well as as sciences on their own). [4] because *language*! (But if I had to opine on which areas are to be impacted most directly with my findings, as in paradigm-shifting type of impact, I'd probably say/write the first 3 or 4, i.e. Lx, CL, NLP, CS/CS-related.)
Re "For anyone wanting to continue this discussion, I strongly recommend reading Ada's work, so you have an informed opinion about what evidence she is referring to.": Thanks for your help in promoting my work. Yes, I think it'd be helpful for everyone to read my work, whether they'd like to partake in a conversation on it or not. Please feel free to send me any questions you may have --- it's been a few really intense years for me. I don't always know if/how my writing has been understood. I'd be grateful for your feedback.
Best Ada
On Thu, Aug 3, 2023 at 5:42 PM Toms Bergmanis toms.bergmanis@tilde.lv wrote:
Ada,
"it is not the right time right now to be "campy" about (as in, to be arguing/protesting for) "grammar", at the moment, esp. if you do not have a background in Linguistics."
Your arrogance is outstanding. I will ask again, as I have asked before - what background do you have? Last time I checked, I could not find any evidence of your background in NLP, yet that did not put you off from writing bogus papers on machine translation.
"The priority of my communications here is to clarify the part on the scientific front, to make sure that if one happens to have gotten oneself involved in this space, how one can come to more clarity on the status quo, esp. given my results."
Is this about your "results" in that one paper evaluating which data representation is better in machine translation without actually considering machine translation quality? It sounds like something that everyone should read before engaging in a debate with you.
For anyone wanting to continue this discussion, I strongly recommend reading Ada's work, so you have an informed opinion about what evidence she is referring to.
Sincerely,
Toms Bergmanis
*From:* Ada Wan via Corpora corpora@list.elra.info *Sent:* Wednesday, August 2, 2023 7:17:42 PM *To:* Albretch Mueller lbrtchx@gmail.com *Cc:* corpora corpora@list.elra.info *Subject:* [Corpora-List] Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...
Re RML or any "text technologies" leveraging "grammar" (misnomer or not):
it is not the right time right now to be "campy" about (as in, to be arguing/protesting for) "grammar", at the moment, esp. if you do not have a background in Linguistics.
There has been quite some abuse/misconduct with concepts/units/assumptions such as "words", "sentences", and "grammar" in the language space (with or without computational implementation).
The priority of my communications here is to clarify the part on the scientific front, to make sure that if one happens to have gotten oneself involved in this space, how one can come to more clarity on the status quo, esp. given my results. There is a lot that needs to be re-evaluated and re-interpreted. Simply stating that something might have been useful in the past is not going to be helpful with going forward.
If one is working in technologies with language/text data (e.g., in a user-based format/framework, and not working on "grammar" as a "linguistic"/philological pursuit), it is recommended that the name(s) of such technologies get updated --- if "grammar" [1] does not have to be mentioned or be involved, don't.
[1] or, including but not limited to any of the following: "word", "sentence", "linguistic structure(s)", "meaning", "morphology", "syntax", "parsing", various terms related to parts of speech (e.g. "nouns", "verbs")....
Re "BTW, regarding that "parsing" aspect, what is the term used to describe the gradual process of "terminological inception"?":
conceptualization? Coining of terms?
According to me, "lexical priming" is different from "terminological inception".
Re "How could you clarified intersubjectivity?":
https://en.wikipedia.org/wiki/Intersubjectivity :)
Your question is way too broad, or requires an answer that is such, which I cannot entertain at the moment.
Thanks for sharing your perspectives. I must admit I have not had time to digest all of your points. But this impression recurred in me as I was reading them:
sometimes, I sense that when one claims some concepts are not universal (e.g. the ones mentioned in [1] above), others take it as that all concepts are categorically invalid. That is not what I intended to communicate (with all my papers, scientific work, and my comments here). It is an expert opinion/finding that I shared, upon some careful evaluation.
On Tue, Aug 1, 2023 at 10:26 PM Albretch Mueller via Corpora < corpora@list.elra.info> wrote:
On 7/31/23, Ada Wan adawan919@gmail.com wrote:
That having been expressed, here are a couple of points re RML that one
should pay heed:
i. to what extent and in what context is this a technology relevant?
If you were able to device an algorithm which taking as input only NL texts (composed of: 1) a start (semantic end); b) a sequence of characters from a relatively large and representative text bank; c) an end (a semantic start)) is able to exhaustively "deduce" the grammar of such texts, in addition to being able to use it with any language, you would then:
- have defined a "space"/"coordinate system" for those texts, to
frame (pretty much) all possible "meaningful 'points'"/"phrases" in terms of such grammar, which would also; 2) be a 0-search structure describing the text bank/corpus (every text segment would also become a pointer to every single actualization of that very segment in all texts, no more "n-grams" necessary!), which could; 3) be used with minimal turking/supervision to: 3.1) cleanse up all automatic translations from youtube; 3.2) keep multilingual corpora; 3.3) use it for automatic translations (demonstrably, in an almost foolproof, perfect way, since you always have the words/phrases with their context); 3.4) "cosmic/tree reading": instead reading books/sequences of characters, you would read that text as it relates to all other texts from the same topic; 3.5) parsing: you would keep a corpus of what you know so you wont have to reread about certain topics and aspects you already know (great Lord! how I hate reading a whole book to only find a few, at times marginal, sentences worth reading! or that "youthful" thing of thinking that they just discovered/created an idea because they are just verbalizing it or made a movie about it!) BTW, regarding that "parsing" aspect, what is the term used to describe the gradual process of "terminological inception"? I have heard the term "Adamization", but, even though that word doesn't really rub me the wrong way, I could imagine it is "too sexist" to some people. I wouldn't really care calling it Eveization or "pussyfication" or whatever. I just don't want to use the term that the government uses: "lexical priming" and "terminological inception" sounds too cumbersome as a verb: "terminologically incept"? doesn't sound OK in English; 3.6) of course, an easy application of that contextual parsing would be removing all that js crap and ads before they reach your awareness; ... 3.n) not last and definitely not least I am thinking hard about how to make sure police and politicians at least have a hard time while using what I have described to "freedom love" people (I know, I know, ... "3.n" doesn't "technically" pertain to quality of implementation issues ..., but I, for one, disagree. Giving the "all tangible things" (tm) panopticon in which we are all living these days, each of us in one's own "virtual prison cell" to call it somehow, we should also think about, be openly honest about such matters)
I am working right now on such Leibnizian "characteristica universalis" kind of thing. First cleansing approx. 1.2 million texts mostly from archive.org, *.pub and the NYS Regents exams (nysedregents.org + nysl.ptfs.com) which they have, at least partially, translated to more than 10 languages. Is that relevant enough to you? ;-) I am also being quite selfish about it because I have always dreamed of being able to "read"/mind all texts which have ever been written in the same way that teens think they have to have sex with everybody in town to make sense of things.
ii. one can certainly dissect/decompose texts ...
Computing power has become insanely cheap, but it has also enabled too much "cleverhansing" out there. The Delphic phrase: "you can make sense or money" these times translates as some sort of corollary to: "using computers and then thinking about it makes you smart"; but, does it really?
It amazes me how easily you can "dissect"/"decompose texts", talk about "tensors", "vectors", ... (I am not trying to police language usage, it just amazes me); let alone all the insufferable bsing claims by the "Artificial Intelligentsia".
I would go with one character after the other and an open attempt to use the minimal amount of principles to then see what I get. IMO, when you start getting too smart about what you do, of course, you will "see" how smart you are. The poet in me likes Borges' stanzas: "... el nombre es arquetipo de la cosa, en las letras de 'rosa' está la rosa y todo el Nilo en la palabra 'Nilo'" ("its name is a thing's archetype, in the letters of 'rose' is the rose and the whole of the Nile (river) in the word 'Nile'")
II. Re ""magical" in the sense that when we go about our intersubjective
business": some intersubjectivity can be further clarified. I don't see much of your examples as being "magical".
I actually do! How could you clarified intersubjectivity? I am trying to do so (somewhat) Mathematically (to the extent you could). Could you share any papers, "prior art" on such matters?
ii. "other people may read, mind, as well ...;": so?
which is a good thing it is alright, fine and dandy in the hippie way, I meant.
iii. "Alice bought some veggies from Bob, ...)": this I don't understand. iv. "We see more in money ("words", ...) than just a piece of paper"
iii. and iv. overlap to some extent so I will try to explain them both quickly (which is impossible since you can write philosophies about each line, but there I'll go). To understand what Marx (may have) meant by „gesellschaftlich notwendige Arbeit” ("socially necessary labour time", wording which has made quite a few go berserk ever since):
https://en.wikipedia.org/wiki/Socially_necessary_labour_time
https://en.wikipedia.org/wiki/Transformation_problem
you have to understand the basic mathematical concepts of:
a) combined rates, and b) intratextual systems of linear equations
Based on my teaching experience §b is easier to understand. Sorry I couldn't find an "easier" explanation on youtube of that type of SLEs than the one I used with my students preparing for the Regents:
https://ergosumus.files.wordpress.com/2018/10/sle04-en.pdf
the intratextuality of those problems matter to corpora research because different strata of "like terms" ("verbs", "adjectives", ...) is what creates grammar. "Crazy me" thinks you could to some extent describe the "likeness of terms" underlying grammar! ~ I also have a guideline about combined rates which I successfully used with my students:
https://ergosumus.files.wordpress.com/2018/06/word_problems12-en00.pdf ~ What the eff do combined rates and SLEs have to do with Marx' transformation problem? ;-)
Well, notice that the -equitable aspect- used to solve combined rates problems is the time (regardless of how differently fast one "works" in comparison with others). There is also another type of combine rate problems: you drive to some place with a friend who doesn't care about driving fast, but you need to rest so she drives for a while ... that problem is different from two people meeting at a place each driving "on their own cars" (at their own average speed).
Serge Heiden shared a paper about presidential debates which could be also Mathematically studied as a CR kind of problem (even if politicians as the crowd management clowns they all are don't have to make sense, anyway), but as it happens with any dialogue there are parts of the conversations in which both the cars and the time is shared and other times when only (or more of) the time. I don't know of a general Mathematical formulation to CRs kinds of problems, which could be used for corpora research. On my "to do" list I have writing papers studying Euclid's Elements and Plato's Dialogues in that way.
Karl Marx's as part of his „Wertgesetz der Waren” (reChristened in English as "labor theory of value") somewhat metaphorically stated that the exchange value of a commodity is a function of "society's labour-time". He also rendered his ideas as equations (in more of a verbally descriptive, metaphorical way), but that phrase: "society's labour-time", was and is still found from questionable to unfalsifiably wild. I don't claim to have mind reading powers, but I think in his letter to his friend Ludwig Kugelmann, the thoroughgoing Hegelian Marx was, he clearly explained what he meant (page: 222 in file, 208 in book):
https://archive.org/download/marxengelsselectedcorrespondence/Marx%20%26%20E...
Marx To Ludwig Kugelmann In Hanover London, July 11, 1868: All that palaver about the necessity of proving the concept of value comes from complete ignorance both of the subject dealt with and of scientific method. Every child knows that a nation which ceased to work, I will not say for a year, but even for a few weeks, would perish. Every child knows, too, that the masses of products corresponding to the different needs require different and quantitatively determined masses of the total labour of society. That this necessity of the distribution of social labour in definite proportions cannot possibly be done away with by a particular form of social production but can only change the mode of its appearance, is self-evident. No natural laws can be done away with. What can change in historically different circumstances is only the form in which these laws assert themselves. And the form in which this proportional distribution of labour asserts itself, in a state of society where the interconnection of social labour is manifested in the private exchange of the individual products of labour, is precisely the exchange value of these products. ~ So, as I see it, in a Hegelian way, Marx was seeing the whole of society as a corpus (in which we all live through our own texts/narratives) talking about "socially necessary labour time" in the way that "time" becomes the equitable aspect shared when people/(-society as a whole-) work together as described by combined rates kinds of problems.
When "Alice buys some veggies from Bob, ..." she used money as "equitable aspect" to get Bob's veggies (in the Marxian way they were both part of a combine rates problem) and you tell me this is not magical!
v. "some transactional electronic ("air"...) excitations": I don't get
this.
you may pay with cash using coins or bills or using your debit card which at the end of the day become transactional electronic excitations on some hard drives. When you speak there is more to it than vibrations/fluctuations of air. (I am referring to the medium which Saussurean signifiers use)
vi. "your 'magic' and mine are different we are still able to
'communicate'. How on earth do such things happen?": a disclaimer: I am not using any magic in my attempts to communicate with you here. I try my best to place myself in your shoes to guesstimate the points that you are trying to get across. But many (as you can see above) didn't quite reach me.
"I try my best to place myself in your shoes" ... ;-) Ha, ha, ha! that is just a functional illusion. What do you know about "my shoes"? I work as a gardener (which I love to do) so they are dirty and smelly, ... I also love to eat garlic ... As I see things standing on "my dirty and smelly shoes and voicing it from my garlicky mouth" being honest and true to matters is good enough.
lbrtchx _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info