Dear Gabriella
Thanks for your clarification.
One does not have to be necessarily "pro" any approach/method, e.g. "pro-ML" or "pro-statistics". One doesn't have to be "anti" either. Important and imperative is to keep a scientific mindset (don't just "believe"), be neutral, fair, and work conscientiously (be honest and transparent in reporting findings and willing to self-correct when a particular direction/method seems wrong --- may the reasons be scientific or ethical).
There are some opportunities in re-evaluating much of what has been practiced in the area of language and computing (NLP, digital humanities, or in fact, any applied ML areas) in the past decades, including but not limited to the interaction of ML systems and data statistics. This may apply to your project/initiative as well. One needs to be esp. careful with "textual computing". (But sure, annotators' perspectives can be more explicitly tested/examined, the issue is how that is being tested (consider also whether it is ethical to execute the human testing and what kind of testing would be in question (i.e. experiment operationalization)), and how the data is being annotated, what's being measured and claimed.)
There is a need for more clarity in computational tasks. Please report statistics transparently and explicitly.
Thanks and best Ada
On Tue, Aug 15, 2023 at 3:11 PM Gabriella Lapesa < gabriella.lapesa@ims.uni-stuttgart.de> wrote:
Dear Ada,
Thanks a lot for giving me the opportunity to clarify these points, which are very important, and to do so on the public list!
On 15. Aug 2023, at 13:44, Ada Wan adawan919@gmail.com wrote:
Dear Gabriella
I have 2 concerns about your post/project:
i. I noticed your formulation here in your call "as machine learning approaches which rely on gold standards which average annotators’ perspectives are particularly unsuitable for the highly subjective phenomena tackled in CSS research (e.g., persuasion in online discussions; harmful communication online; polarization)". I find that a bit unnecessarily antagonistic towards machine learning (ML). As we know, the driver of textual processing is data statistics. Statistics (may it refer to data statistics or statistical methods) is also the science that underlies much of our computational research in the sciences, including the social sciences. Are you trying to work on smaller data problems --- nothing wrong with that, btw, but it could be clearer in the announcement if that's what you are trying to do? What are you using as gold standard(s) (may it involve ML or not)? Will you be using ML/statistical approaches/methods, if not, what methods will you be using for data science?
[Why I am replying to all on this list here:] *I understand that there is sometimes an "anti-ML" and "anti-statistics" sentiment in the tradition of Computational Linguistics --- it probably started when grammar/grammarian values were found to not be portrayable/essential in language data. I just wanted to make sure that this project does not and would not steer students and practitioners into an erroneous path of thinking/practice. *
I absolutely don’t have an anti-ML and anti-statistics sentiment, the contrary! I have been and will continue using ML/statistical approaches and methods. The line of research I have in mind goes in the perspectivist direction ( refer to https://pdai.info/ for an overview), which aims at
- Developing better data collection/distribution strategies which give
credit to annotator perspectives and 2. Developing ML strategies that can help us make better generalizations out of these data. So I hope this makes it clear, that we are all pro-ML and pro-statistics.
ii. Perhaps I misunderstood, but how should "persuasion in online discussions; harmful communication online; polarization" be treated as "highly subjective phenomena" in the context of statistical computing? Where / in which direction are you trying to go with "high subjectivity"? And what are the ethical consequences of naming and modeling "highly subjective phenomena"?
I think that acknowledging the high subjectivity of these phenomena (and therefore of the annotations we would use to tackle them in a data-driven approach) gives full credit to the multiple perspectives involved in dealing with them. I think acknowledging this challenge is a very important step in the direction of avoiding ethical consequences.
Again, thanks for pointing this out and giving me/us the occasion to think about these points!
Best Gabriella
Thanks in advance for your clarification.
Best Ada
On Tue, Aug 15, 2023 at 9:17 AM Gabriella Lapesa via Corpora < corpora@list.elra.info> wrote:
Postdoc and PhD position in NLP/CL/CSS at GESIS (Cologne)
The newly established Data Science Methods team led by Gabriella Lapesa [2,3] (Leibnitz Institute for Social Sciences GESIS, Cologne [1], Computational Social Science department [4]) has two positions available from November 2023:
- one postdoctoral researcher (100%, 4 years, with possibility of tenure)
- one doctoral researcher (75%, 4 years). The PhD project will be
pursued at the Heinrich Heine University of Düsseldorf (where Gabriella Lapesa is a junior professor in Responsible Data Science and Machine Learning).
** The team **
The Data Science Methods team will contribute to build and mantain the GESIS infrastructure for Computational Social Science (CSS) research by developing novel methods and making them available, documented, and accessible through the GESIS services. The team will focus on fostering the interaction between Natural Language Processing and Social Science by developing solutions that allow for the integration of multiple information sources (e.g., different textual sources for the same debate; socio-demographic features of speakers and audiences; integration of textual and multimodal data) and address recent challenges in NLP (modeling subjective phenomena; low-resource scenarios; identifying and mitigating bias).
The team will tackle research questions at the interface between computational argumentation and CSS, and target political communication from a very broad perspective involving different types of actors (citizens, politicians, parties) and discourse contexts (e.g., online discussions vs. newspapers). From a methodological perspective, at the core of the team's research agenda will be the “learning from disagreements” challenge, as machine learning approaches which rely on gold standards which average annotators’ perspectives are particularly unsuitable for the highly subjective phenomena tackled in CSS research (e.g., persuasion in online discussions; harmful communication online; polarization).
** How to apply **
The official job announcement with more details about the requirements/tasks and the application procedure can be found at the following links: Postdoctoral researcher (deadline: September 5th): https://www.hidden-professionals.de//HPv3.Jobs/Gesis//stellenangebot/33073/1 Doctoral researcher (deadline: September 6th): https://www.hidden-professionals.de//HPv3.Jobs/Gesis//stellenangebot/33084/1
[1] https://www.gesis.org/en/home [2] https://www.gesis.org/institut/mitarbeitendenverzeichnis/person/Gabriella.La... [3] https://www.ims.uni-stuttgart.de/institut/team/Lapesa/ [4] https://www.gesis.org/en/institute/departments/computational-social-science _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info