Dear all
Thank you for all your feedback.
As George mentioned in his reply "Please, let us always remember that there is a *person with feelings *on the other side of a communication. We need to gently and respectfully handle cases where we have objections.", I cannot agree more.
There is a certain degree of empathy that one needs to exercise in reading, writing, and in research (even for technical research, esp. if one has only been educated in one discipline. If one does not understand why other disciplines might have different assumptions and developmental histories or (perceived) narratives, it is best to check/verify that first before "attacking" others or their arguments. Interdisciplinary/Transdisciplinary work is also difficult for that reason (e.g. in translating/addressing/aligning assumptions/expectations). As Jonas noted, "(And everyone on the list -- I am sending this to the whole list in case I am wrong about some things, so others can add their thoughts)." --- I agree! This practice can also lead to better transdisciplinary understanding and exchange. It can also be hard to believe our research/tech space has come to this, but if you'd allow me to explain ---
First of all, I think most of you might know me from my public rebuttals for my ICLR2021 & 2022 submissions. For the latter, in which I decomposed "words" more explicitly, I had to really "fight" hard to convince the reviewers. That also has to do with the fact that the concept of "word" (and also "morphology") and the decades-long assumption and adoption of these in CL/NLP/Linguistics might have been too casual/imprecise/negligent of a choice and practice. As my work has shown, a mistake therein was / might have been made. Some students have been miseducated --- myself partly included, but since CL/NLP/Linguistics were not the only subject(s) that I have studied, it might have been easier for me to abandon these assumptions, but for many others, this may not have been the case. If we continue with these practices in the research space through conferences or research activities, such malpractice would be exacerbated.
Textual data can be *processed* without word tokenization or sentence segmentation [1]. One can process data in full --- in character/byte representations (depending on the task and computational resources, e.g. for pattern matching for strings, one would work with characters, for other tasks, bytes). Depending on the nature of the tasks and methods, our *evaluation and interpretation* strategies may differ. Computational neural network models are statistical models and need to be evaluated and interpreted statistically --- this is the perspective of many computer scientists and statisticians and it is correct. In the tradition of CL/NLP/Linguistics (or even many in data analytics or in digital humanities), there had been an erroneous assumption and practice that one could evaluate statistical models based on textual output only.
As with areas related to "language" outside the context of computing, e.g. Linguistics (without the use of computational tools), there are certain structural assumptions (from the past decades) that need to be refined. I have been trying to advocate the broadening of one's perspectives/interpretations of "language" to ones that are without "words", "sentences", "linguistic structure(s)", "grammar", and "p-language(s)". These concepts denote nothing universal (or determinate --- not without circularity) and the amplification of these through technology/computing can lead to unethical/unhealthy consequences. I have the impression our understanding of this (may one be a linguist, CL/NLPer, computing professional or AI-practitioner) may not be aligned.
As many disciplines/sectors are now leveraging similar/same methods, I feel that there is a responsibility to clarity this.
Last but not least, please note/notice that I have only been *responsive* to announcements with potential concerns (e.g.those scientific or ethical in nature). I did/do not proactively advertise my own work or have the intent to do so on this mailing list just for fun or to offend others.
As always, I remain open for your feedback.
Thank you for your attention.
Best regards Ada
[1] @Jonas: re "sentences": i. "sentence" is not a universal concept crosslinguistically or cross-stylistically (e.g. across genres) or across modalities (speech/signing does not occur in form of "sentences", esp. natural speech/signing); ii. even if "sentence" were defined "x-centrically" (if definable at all), where x denotes a certain style, for example; stylistic hegemony would occur, not to mention that overfitting to any one style is likely to lead to bad generalizations; iii. re "I don't think conference organizers usually make hard prescriptions on what constitutes a sentence" --- that is a problem, isn't it? There is no standardization possible either. "Sentences" are also indeterminate, esp. in the context of computing. We wouldn't want to encourage "sentence"-hacking, would we? iv. in many NLP toolkits, "sentence" often refers to "line" (as delimited by linebreaks), v. for those who have worked on data collection and curation before, esp. for parallel data, content is often aligned by line (and that can already be difficult). Thanks for your content-rich comments, btw!
On Fri, Aug 25, 2023 at 12:59 PM George Giannakopoulos < ggianna@iit.demokritos.gr> wrote:
Dear All,
I would like to warmly suggest/remind the following to all of us (as a friendly suggestion, on which I will not follow up):
- One can find online good examples for the *"netiquette"* of mailing
lists to reduce problems (see here https://www.snort.org/faq/what-is-the-mailing-list-etiquette, here https://en.opensuse.org/openSUSE:Mailing_list_netiquette and here https://sites.ualberta.ca/~pletendr/list-net.html for examples, which can be useful for all of us).
- Please, let us always remember that there is a *person with feelings
- on the other side of a communication. We need to gently and
respectfully handle cases where we have objections.
- If you feel that a conversation grows too big or is somehow
problematic, address a *personal** e-mail to a main contributor suggesting nicely an alternative* you consider more appropriate. If this fails systematically, then scale it up through a list moderator (or the list itself) politely.
- Specific *suggestions for appropriate digital spaces* that can hold
e.g. long discussions may allow all such discussion to find their own nest after a given point, so that we all have a common additional resource connected to the list, for topics that do need the added interaction.
- If you feel that a topic you contribute to really ignites
interesting conversation or if you simply receive an e-mail suggesting you to move a long conversation elsewhere due to its size, *consider an alternative* (or even ask the list for one), to facilitate the use of the mailing list itself.
- Let us remember that what is *uninteresting to us may be interesting
to others*.
As a final comment, before best practices comes *common understanding* and *good will*. Let us primarily build on these, as we have done in this list for many years.
Having said the above, I would like to thank Ada (and all the others) for the contributions (past, current and future) and discussions that keep this list alive.
Best regards, George G.
P.S. I would also like to thank Gully for trying to keep the list humane.
On 23/8/23 00:53, Gully Burns via Corpora wrote:
Dear all,
I was shocked to see a vitriolic ad-hominem attack on a colleague posted to this mailing list. It is entirely inappropriate to post this type of diatribe against an individual even though someone might disagree with either the tone or the content of an individual's messages or arguments. The fact that other members of the community chimed in to reinforce the attack is also appalling and entirely inappropriate.
Sincerely,
Gully Burns
On Tue, Aug 22, 2023 at 1:23 PM Ada Wan via Corpora < corpora@list.elra.info> wrote:
Dear all on the Corpora-List
I understand it is possible that some of you may harbor some negative sentiments towards me and/or my recent replies on the list. That having been expressed, I would like to remind everyone on this list it is important to understand that many subjects such as computational [x, where x can be e.g. linguistics, biology, physics, modeling...], digital humanities, data analytics, data science, and many of their dependencies have been / are in the public domain, much of which academic and scientific in nature. Science is in the public domain.
What we are experiencing here is sort of a computational and statistical turn in the computational sciences and studies --- anything that involves data (computational and otherwise). Previously (or even currently in many disciplines/practices), one has modeled / has been modeling many symbolic concepts and values computationally, directly inheriting these from "traditional sciences" (i.e. sciences from a time when all was done without any computational machinery), assuming that these values and the relationship between such would not only hold but also hold as the only ground truth. But as e.g. my results have shown, many of these scientific concepts, values, and relationships deserve to be re-evaluated and re-interpreted.
What I have been trying to do is to communicate this, as without any updates and/or self-correction, we could be experiencing many discrepancies in our experimental results. Good scientific practice (including good assumptions therefor) is fundamental to everyone. This includes but is not limited to having good assumptions, leveraging appropriate methods, being responsible in evaluation as well as addressing ethical concerns, e.g. in the case of my findings: a combination of false assumptions and miseducation. (Sorry to re-iterate this but it is just such an important lesson for many on this list... it may be painful for some too.)
Corpora-list might have changed more or less like how the field of CL/NLP has in the past decades. While these areas might have become more generalized and thus the audience more "diverse" in terms of background and areas of familiarity, there are certainly some on this list who are concerned about some of the "bad" science/values that could get propagated through the use of data/corpora. That is one of the reasons behind my many replies of late.
*If you should find my comments/replies an issue of concern, please let me know what in specifics you disagree with. I'd be happy to modify my formulations or discuss further. If you think I have been wrong somewhere, please do let me know. I'd be happy to update. *
Thanks and best Ada
On Mon, Aug 21, 2023 at 5:39 PM Ada Wan adawan919@gmail.com wrote:
Amendment: In short, there are no symbolic concepts relevant in computing / computational processing except for those which also align with statistics. (There are various levels of assumptions/abstractions that could be relevant depending on the goals/tasks. But much of what one might have been doing in "symbolic computing" surely deserves a critical re-examination.
On Mon, Aug 21, 2023 at 4:48 PM Ada Wan adawan919@gmail.com wrote:
Dear Ben, Rodolfo, and Toms
Please accept that there is a responsibility to science, technology, engineering, and education (or anything that we undertake).
If you could point out the specific arguments as to which of what I wrote may be problematic to you, perhaps we can have a constructive exchange. The way in which you three expressed your sentiments on this thread can be interpreted as mobbing.
Please note the intent behind my statement and lend me the benefit of a doubt as to why I would have invested my time and energy to write the reply that I did to the list: "As language sciences (e.g. Linguistics) and NLP are still taught at some universities, i.e. part of publicly accessible education, there is a general responsibility that one should bear when promoting/hosting events that would be explicitly/implicitly supporting biases and/or in violation of scientific integrity." This applies to the whole area of computing, including digital humanities and the computational social sciences.* In short, there are no symbolic concepts relevant in computing / computational processing.* I am sorry if that has not been clear.
I understand that there are members in the CL/NLP community/communities who might be interested in (or used/addicted to) "word" hacking. But it is now high time to stop.
@Ben: Please note that I am not doing this "for fun". I am not trying to ridicule anyone. My remarks are not ad personam. For each of the research directions/practices that I commented on, there are opportunities for all practitioners to do a better job, to refine our analyses.
Thanks and best Ada
On Mon, Aug 21, 2023 at 9:45 AM Toms Bergmanis via Corpora < corpora@list.elra.info> wrote:
Can’t agree more.
Toms
*From:* Rodolfo Delmonte via Corpora corpora@list.elra.info *Sent:* Monday, August 21, 2023 10:06 AM *To:* Ben Sir benoit.siroit@gmail.com *Cc:* corpora corpora@list.elra.info *Subject:* [Corpora-List] Re: RANLP 2023 Call for Participation
Fully agree with you Ben.
Rodolfo
Il lun 21 ago 2023, 01:00 Ben Sir via Corpora corpora@list.elra.info ha scritto:
Hi Ada,
It's understandable that enthusiasm can sometimes lead to excessive engagement, but your disruptive posting on the mailing list has reached an intolerable level. Please keep your conversations private instead of spamming everyone and curb your enthusiasm. Your obnoxious behavior reflects poorly on you.
Thanks. _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Nota automatica aggiunta dal sistema di posta
*Sostieni il futuro*
Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari
*FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271* _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.infohttps://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
--
*George Giannakopoulos, PhD*
*Researcher* Home page http://www.iit.demokritos.gr/~ggianna SKEL Lab - NCSR Demokritos http://www.iit.demokritos.gr and
*Scientific Officer* ahedd DIH - NCSR "Demokritos" https://ahedd.demokritos.gr and
*Co-founder, Chief Executive Officer* SciFY Not-for-Profit Company http://www.scify.org