Dear RANLP organizers
I looked through your program and have a few questions:
i. I noticed there is a parallel session for "Sentence-level Representation and Analysis". Would you mind please letting me (or us all on this list) know why "sentence(s)" would be relevant and necessary in computing? Does the term "sentence" refer to line (as delimited by line breaks)? Or are you segmenting via some heuristics, e.g. with some punctuation indicators, for each dataset --- if so, would there not be a concern on grounds of fairness and diversity as well as on robustness, sufficiency, and applicability? [I also understand that in NLP there has been (an undue) grammarian influence, leading to the false assumption that processing/evaluation based on "sentences" would be necessary (when one can do so based on a wider span of text instead). As neither machines nor humans need the concept of "sentence" to produce/understand language, I wonder: wouldn't the restriction to "sentence"-level analyses lead to overfitting?]
ii. Re "multilinguality": would that be a session in which work would focus on computationally relevant topics such as character encoding (see https://openreview.net/forum?id=-llS6TiOew)? Or would it further perpetuate the adverse effects of grammatical teachings and concepts?
iii. Re Isabelle's keynote: I hope her social scientific studies would be ones leveraging comprehensive data (i.e. not ones with selection bias) and rigorous statistical testing. There are many ethical aspects in the experimentation aspect(s) in the social sciences, including but not limited to hypothesis formulation, that one should be very careful about. Otherwise, the study could be interpreted as sentiment/identity manipulation (e.g. your formulation with "origin" [1]). There has been some work in the CL/NLP space that touches on identity politics in ways that may not be necessary/inappropriate, hence my remark here.
iv. Re Ed's keynote on "neuro-symbolic approaches": I have previously replied to Alexander Koller's call on such for his DFG project on 16Jul2023 on this mailing list, as follows: "As we know, neural models are statistical models in nature. Symbolic representations could create/reinforce unnecessary circularity. The symbolic representations could obfuscate the precision needed. The findings of Mielke et al. (2019) https://arxiv.org/abs/1906.04726 and Wan (2022) https://openreview.net/forum?id=-llS6TiOew were a painful/bitter lesson to many. I'd hate to see another generation of students being misled."
(To Ed: I suspect that you are already familiar with these works. So I wonder what "symbolic approaches" refer to in your case, and whether they are being applied as a post-processing (e.g. post ML) strategy. If so, and if these are based on "grammar" etc., please be careful as it is not necessary for processing. One can post-edit ML-outputted texts according to some stylistic preferences as part of post-processing heuristics --- but I have concerns as to how much the employment of such heuristics could get abused. As you may know, many CL/NLPers might already be too "hooked" on grammar and textual representations. There are many dependencies to grammar teaching, so ethical concerns in pedagogy of such need to be considered.)
v. Re Sandra's keynote: Please see literature mentioned in [iv] above. Re "[u]sing transformers for hate speech detection tends to give good results ...": how do these results and domain effects reconcile with data statistics, even if/when one does not segment text into "words" or "sentences" (or any categories that grammarians like and many CL/NLPers used to be "addicted" to)? It might be a higher bar, but there is work to do to see where such correspondences (between language phenomena and data statistics, for example) exist and if they do! (And if not, negative results are also results! And with any data processing/interpretation, it's information, not "meaning", that matters.)
vi. Re Efstathios' keynote: Please see notes above.
As language sciences (e.g. Linguistics) and NLP are still taught at some universities, i.e. part of publicly accessible education, there is a general responsibility that one should bear when promoting/hosting events that would be explicitly/implicitly supporting biases and/or in violation of scientific integrity.
Thank you for reading, for tolerating my rant here. There has been some "bad research" (and some miseducation) in the area of CL/NLP. Hence, I thought to send a reminder (and call for action/correction).
Thanks and best Ada
[1] "origin": (mother's womb? [Jest... but yes and no.]) How current is this analysis from a globalized perspective? Do people categorically use language in one way or another based on some "types" related to "origin" (whatever that refers to), or more based on contexts and/or habits (personal and/or group-based, if the latter, what "group identity" is assumed in the data and in the experiment)?
On Thu, Aug 17, 2023 at 11:17 AM amalhaddad--- via Corpora < corpora@list.elra.info> wrote:
RANLP 2023 Call for Participation
We are pleased to share the programme of the international conference ‘Recent Advances in Natural Language Processing’ (RANLP’2023). To view the programme, please click here https://ranlp.org/ranlp2023/index.php/main-conference-programme/
To register, please visit https://ranlp.org/ranlp2023/index.php/fees-registration/
We very much hope to welcoming you at RANLP’2023 in Varna! _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info