[Corpora-List] Re: RANLP 2023 Call for Participation

27 Aug 2023

      Dear all
Thank you for all your feedback.
As George mentioned in his reply "Please, let us always remember that there
is a *person with feelings *on the other side of a communication. We need
to gently and respectfully handle cases where we have objections.", I
cannot agree more.
There is a certain degree of empathy that one needs to exercise in reading,
writing, and in research (even for technical research, esp. if one has only
been educated in one discipline. If one does not understand why other
disciplines might have different assumptions and developmental histories or
(perceived) narratives, it is best to check/verify that first before
"attacking" others or their arguments. Interdisciplinary/Transdisciplinary
work is also difficult for that reason (e.g. in
translating/addressing/aligning assumptions/expectations).
As Jonas noted, "(And everyone on the list -- I am sending this to the
whole list in case I am wrong about some things, so others can add their
thoughts)." --- I agree! This practice can also lead to better
transdisciplinary understanding and exchange.
It can also be hard to believe our research/tech space has come to this,
but if you'd allow me to explain ---
First of all, I think most of you might know me from my public rebuttals
for my ICLR2021 & 2022 submissions. For the latter, in which I decomposed
"words" more explicitly, I had to really "fight" hard to convince the
reviewers. That also has to do with the fact that the concept of "word"
(and also "morphology") and the decades-long assumption and adoption of
these in CL/NLP/Linguistics might have been too casual/imprecise/negligent
of a choice and practice. As my work has shown, a mistake therein was /
might have been made. Some students have been miseducated --- myself partly
included, but since CL/NLP/Linguistics were not the only subject(s) that I
have studied, it might have been easier for me to abandon these
assumptions, but for many others, this may not have been the case. If we
continue with these practices in the research space through conferences or
research activities, such malpractice would be exacerbated.
Textual data can be *processed* without word tokenization or sentence
segmentation [1]. One can process data in full --- in character/byte
representations (depending on the task and computational resources, e.g.
for pattern matching for strings, one would work with characters, for other
tasks, bytes). Depending on the nature of the tasks and methods, our
*evaluation
and interpretation* strategies may differ. Computational neural network
models are statistical models and need to be evaluated and interpreted
statistically --- this is the perspective of many computer scientists and
statisticians and it is correct. In the tradition of CL/NLP/Linguistics (or
even many in data analytics or in digital humanities), there had been an
erroneous assumption and practice that one could evaluate statistical
models based on textual output only.
As with areas related to "language" outside the context of computing, e.g.
Linguistics (without the use of computational tools), there are certain
structural assumptions (from the past decades) that need to be refined. I
have been trying to advocate the broadening of one's
perspectives/interpretations of "language" to ones that are without
"words", "sentences", "linguistic structure(s)", "grammar", and
"p-language(s)". These concepts denote nothing universal (or determinate
--- not without circularity) and the amplification of these through
technology/computing can lead to unethical/unhealthy consequences. I have
the impression our understanding of this (may one be a linguist, CL/NLPer,
computing professional or AI-practitioner) may not be aligned.
As many disciplines/sectors are now leveraging similar/same methods, I feel
that there is a responsibility to clarity this.
Last but not least, please note/notice that I have only been *responsive*
to announcements with potential concerns (e.g.those scientific or ethical
in nature). I did/do not proactively advertise my own work or have the
intent to do so on this mailing list just for fun or to offend others.
As always, I remain open for your feedback.
Thank you for your attention.
Best regards
Ada
[1] @Jonas:
re "sentences":
i. "sentence" is not a universal concept crosslinguistically or
cross-stylistically (e.g. across genres) or across modalities
(speech/signing does not occur in form of "sentences", esp. natural
speech/signing);
ii. even if "sentence" were defined "x-centrically" (if definable at all),
where x denotes a certain style, for example; stylistic hegemony would
occur, not to mention that overfitting to any one style is likely to lead
to bad generalizations;
iii. re "I don't think conference organizers usually make hard
prescriptions on what constitutes a sentence" --- that is a problem, isn't
it? There is no standardization possible either. "Sentences" are also
indeterminate, esp. in the context of computing. We wouldn't want to
encourage "sentence"-hacking, would we?
iv. in many NLP toolkits, "sentence" often refers to "line" (as delimited
by linebreaks),
v. for those who have worked on data collection and curation before, esp.
for parallel data, content is often aligned by line (and that can already
be difficult).
Thanks for your content-rich comments, btw!
On Fri, Aug 25, 2023 at 12:59 PM George Giannakopoulos <
ggianna@iit.demokritos.gr> wrote:
...
Dear All,
I would like to warmly suggest/remind the following to all of us (as a
friendly suggestion, on which I will not follow up):

One can find online good examples for the *"netiquette"* of mailing

lists to reduce problems (see here
   https://www.snort.org/faq/what-is-the-mailing-list-etiquette, here
   https://en.opensuse.org/openSUSE:Mailing_list_netiquette and here
   https://sites.ualberta.ca/~pletendr/list-net.html for examples,
   which can be useful for all of us).

Please, let us always remember that there is a *person with feelings

on the other side of a communication. We need to gently and

respectfully handle cases where we have objections.

If you feel that a conversation grows too big or is somehow

problematic, address a *personal** e-mail to a main contributor
   suggesting nicely an alternative* you consider more appropriate. If
   this fails systematically, then scale it up through a list moderator (or
   the list itself) politely.

Specific *suggestions for appropriate digital spaces* that can hold

e.g. long discussions may allow all such discussion to find their own nest
   after a given point, so that we all have a common additional resource
   connected to the list, for topics that do need the added interaction.

If you feel that a topic you contribute to really ignites

interesting conversation or if you simply receive an e-mail suggesting you
   to move a long conversation elsewhere due to its size, *consider an
   alternative* (or even ask the list for one),  to facilitate the use of
   the mailing list itself.

Let us remember that what is *uninteresting to us may be interesting

to others*.
As a final comment, before best practices comes *common understanding*
and *good will*. Let us primarily build on these, as we have done in this
list for many years.
Having said the above, I would like to thank Ada (and all the others) for
the contributions (past, current and future) and discussions that keep this
list alive.
Best regards,
George G.
P.S. I would also like to thank Gully for trying to keep the list humane.
On 23/8/23 00:53, Gully Burns via Corpora wrote:
Dear all,
I was shocked to see a vitriolic ad-hominem attack on a colleague posted
to this mailing list. It is entirely inappropriate to post this type of
diatribe against an individual even though someone might disagree with
either the tone or the content of an individual's messages or arguments.
The fact that other members of the community chimed in to reinforce the
attack is also appalling and entirely inappropriate.
Sincerely,
Gully Burns
On Tue, Aug 22, 2023 at 1:23 PM Ada Wan via Corpora <
corpora@list.elra.info> wrote:
...
Dear all on the Corpora-List
I understand it is possible that some of you may harbor some negative
sentiments towards me and/or my recent replies on the list.
That having been expressed, I would like to remind everyone on this list
it is important to understand that many subjects such as computational [x,
where x can be e.g. linguistics, biology, physics, modeling...], digital
humanities, data analytics, data science, and many of their dependencies
have been / are in the public domain, much of which academic and scientific
in nature. Science is in the public domain.
What we are experiencing here is sort of a computational and statistical
turn in the computational sciences and studies --- anything that involves
data (computational and otherwise). Previously (or even currently in many
disciplines/practices), one has modeled / has been modeling many symbolic
concepts and values computationally, directly inheriting these from
"traditional sciences" (i.e. sciences from a time when all was done without
any computational machinery), assuming that these values and the
relationship between such would not only hold but also hold as the only
ground truth. But as e.g. my results have shown, many of these scientific
concepts, values, and relationships deserve to be re-evaluated and
re-interpreted.
What I have been trying to do is to communicate this, as without any
updates and/or self-correction, we could be experiencing many discrepancies
in our experimental results. Good scientific practice (including good
assumptions therefor) is fundamental to everyone. This includes but is not
limited to having good assumptions, leveraging appropriate methods, being
responsible in evaluation as well as addressing ethical concerns, e.g. in
the case of my findings: a combination of false assumptions and
miseducation. (Sorry to re-iterate this but it is just such an important
lesson for many on this list... it may be painful for some too.)
Corpora-list might have changed more or less like how the field of CL/NLP
has in the past decades. While these areas might have become more
generalized and thus the audience more "diverse" in terms of background and
areas of familiarity, there are certainly some on this list who are
concerned about some of the "bad" science/values that could get propagated
through the use of data/corpora. That is one of the reasons behind my many
replies of late.
*If you should find my comments/replies an issue of concern, please let
me know what in specifics you disagree with. I'd be happy to modify my
formulations or discuss further. If you think I have been wrong somewhere,
please do let me know. I'd be happy to update.  *
Thanks and best
Ada
On Mon, Aug 21, 2023 at 5:39 PM Ada Wan adawan919@gmail.com wrote:
...
Amendment:
In short, there are no symbolic concepts relevant in computing /
computational processing except for those which also align with statistics.
(There are various levels of assumptions/abstractions that could be
relevant depending on the goals/tasks. But much of what one might have been
doing in "symbolic computing" surely deserves a critical re-examination.
On Mon, Aug 21, 2023 at 4:48 PM Ada Wan adawan919@gmail.com wrote:
...
Dear Ben, Rodolfo, and Toms
Please accept that there is a responsibility to science, technology,
engineering, and education (or anything that we undertake).
If you could point out the specific arguments as to which of what I
wrote may be problematic to you, perhaps we can have a constructive
exchange. The way in which you three expressed your sentiments on this
thread can be interpreted as mobbing.
Please note the intent behind my statement and lend me the benefit of a
doubt as to why I would have invested my time and energy to write the reply
that I did to the list:
"As language sciences (e.g. Linguistics) and NLP are still taught at
some universities, i.e. part of publicly accessible education, there is a
general responsibility that one should bear when promoting/hosting events
that would be explicitly/implicitly supporting biases and/or in violation
of scientific integrity."
This applies to the whole area of computing, including digital
humanities and the computational social sciences.* In short, there are
no symbolic concepts relevant in computing / computational processing.*
I am sorry if that has not been clear.
I understand that there are members in the CL/NLP community/communities
who might be interested in (or used/addicted to) "word" hacking. But it is
now high time to stop.
@Ben: Please note that I am not doing this "for fun". I am not trying
to ridicule anyone. My remarks are not ad personam. For each of the
research directions/practices that I commented on, there are opportunities
for all practitioners to do a better job, to refine our analyses.
Thanks and best
Ada
On Mon, Aug 21, 2023 at 9:45 AM Toms Bergmanis via Corpora <
corpora@list.elra.info> wrote:
...
Can’t agree more.
Toms
*From:* Rodolfo Delmonte via Corpora corpora@list.elra.info
*Sent:* Monday, August 21, 2023 10:06 AM
*To:* Ben Sir benoit.siroit@gmail.com
*Cc:* corpora corpora@list.elra.info
*Subject:* [Corpora-List] Re: RANLP 2023 Call for Participation
Fully agree with you Ben.
Rodolfo
Il lun 21 ago 2023, 01:00 Ben Sir via Corpora corpora@list.elra.info
ha scritto:
Hi Ada,
It's understandable that enthusiasm can sometimes lead to excessive
engagement, but your disruptive posting on the mailing list has reached an
intolerable level. Please keep your conversations private instead of
spamming everyone and curb your enthusiasm. Your obnoxious behavior
reflects poorly on you.
Thanks.
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info
Nota automatica aggiunta dal sistema di posta
*Sostieni il futuro*
Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari
*FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE
FISCALE: 80007720271*
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info

Corpora mailing list -- corpora@list.elra.infohttps://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info
--
*George Giannakopoulos, PhD*
*Researcher*
Home page http://www.iit.demokritos.gr/~ggianna
SKEL Lab - NCSR Demokritos http://www.iit.demokritos.gr
and
*Scientific Officer*
ahedd DIH - NCSR "Demokritos" https://ahedd.demokritos.gr
and
*Co-founder, Chief Executive Officer*
SciFY Not-for-Profit Company http://www.scify.org

2026

2025

2024

2023

2022

[Corpora-List] Re: RANLP 2023 Call for Participation

--