I second Edyta's points too.
I have been on this list since 2015 and since then, the mailing list's standout feature has lied in its informative capacity to circulate calls for papers and job opportunities. While occasional "discussions" have also been a breath of fresh air, the current discourse doesn't quite align with this sentiment.
It would be more beneficial if the list could enhance its utility by containing intense discussions privately rather than disseminating them widely.
Thanks.
Best regards, Sina Ahmadi Postdoctoral Researcher & Adjunct Lecturer Geroge Mason University http://sinaahmadi.github.io/ **On the job market! I'm seeking out new opportunities to collaborate and innovate as a researcher and lecturer (in Europe).** ________________________________ De : Daniela Cesiri via Corpora corpora@list.elra.info Envoyé : mercredi 30 août 2023 11:32 À : Edyta Jurkiewicz-Rohrbacher edytaj@gmail.com Cc : corpora@list.elra.info corpora@list.elra.info Objet : [Corpora-List] Re: RANLP 2023 Call for Participation
Dear All,
I agree with Edyta's polite remarks.
I find the discussions below purely informative posts quite confusing, and I am "losing track" of the original posts to the point that I fear I might miss calls that could be relevant for my work, or miss discussions that are worth joining. Before Edyta's remarks I was even considering leaving the list because of the current situation in the list.
So, I join Edyta's kind request to keep discussions as separate threads and leave call for papers/abstracts or job calls as purely informative posts. Perhaps opening a new, separate discussion thread might be an alternative option that would allow us to filter the different kinds of communications we received from the list.
Best wishes to everyone, Daniela Cesiri
Il Mer 30 Ago 2023, 17:15 Edyta Jurkiewicz-Rohrbacher via Corpora <corpora@list.elra.infomailto:corpora@list.elra.info> ha scritto: Dear Ada, dear all, I'm a bit concerned with what has been going with the list recently. The list, as far as I understand, serves several purposes. One of them is purely informative, where one informs the community about potentially interesting jobs, conferences etc. If I open an answer to a job advertisment, I expect it will be a question useful for the potential applicants or correction about, for example, deadlines.
Another thing is to ask questions or start some discussions on various topics, either theoretical or purely practical. There I will expect people sharing their experience and opinions.
What I do not find ok, is giving the feedback to purely informational posts in the way Ada does. In my opinion the discussions whether words or sentences are up-to-date concepts in any (general)linguistic or computational linguistic framework should be led in separate threads. (Notice also that the problem of text segmentation has been topic for already long time.) Summing up, I wouldn't mind if Adas comments were presented maybe privately to the authors of posts, or discussed in separate list-mails. Otherwise, we are facing chaos here.
Summing up, I would be more than happy to participate, if discussions about the relation between linguistics and NLP took place, but not mixed with advertisments.
I hope I did not offend anybody with this message. Best, Edyta Jurkiewicz-Rohrbacher
śr., 30 sie 2023 o 16:35 Gilles Sérasset via Corpora <corpora@list.elra.infomailto:corpora@list.elra.info> napisał(a):
Dear Ada, dear all,
I am not a linguist but a computational scientist which is quite used to talk with (and tries to understand) linguists. I must say that I usually read your mails as thoroughly as my schedule and patience allows me to, but, to be honest, I also have a rather negative feeling when reading your “discourse”.
In this discourse, I see facts + interpretation + rhetorics.
[Here I take the risk of caricaturing for the sake of shortness, I hope you will understand that I have no time nor intention to really go deeply in all the intricacies of your different claims as I am more a witness than an actor of this scientific dispute]
My understanding of your facts: Neural models do not use the concept of word in any of their tasks, but achieve very interesting results in their modelling of the language.
My understanding of your interpretation: this is the proof that there is no such thing as a word.
My understanding of your rhetoric: linguists are still using “words”, so they are wrong or dishonest or miseducated or dumb, we should wipe out entirely any occurence of this concept and start over with another modelling of the language.
Please, understand that I am just presenting the way I am interpreting your different messages. And even if I am wrong here, this interpretation is to be taken into account as we are all persons with feeling. This feeling is a fact, even if I do not particularly feel targeted by your different criticisms. I hope this will help you ponder the terms involved in your next messages.
This being said, I was not particularly surprised to see some “passionate” replies to your different messages. And I agree with everyone here, we should not go into such passion and use ad-hominem attacks on a mailing list, AND you should also understand that most of your rhetoric do contains such passion and attacks.
Concerning the facts :
You are right, Neural models does not use any notion of word (or word morphology) as it is usually thought in linguistics as it usually first decide what is the granularity with which it will aggregate its input (sequence of characters) into tokens to which it attaches an “interpretation” (modelled as a multi-dimensional vector).
Concerning the interpretation :
You want to wipe out the notion of word based on such a fact. I would agree somehow if we were dealing with a universal modelling of language, but this is not the case. Human model language in a certain way and neural models in another way (even if neural networks are claimed to be inspired by biological neurones in our brains). The fact that a concept does not exist in a model does not entail that it does not exist in another model.
Also, you do make the very same mistake concerning the way you look at the facts: i.e. there is no such thing as a character…, which means that the input of NN is already flown with a bias with which we look at language. Indeed characters are a very recent invention that builds on different concerns:
- usual graphical elements that are traditionally used in language writing and that has been interpreted as atomic,
- their interpretation by the encoding authorities (see the differences and debates about code points vs characters)
- arbitrary decision made (e.g. why model A and a as 2 different characters?)
Moreover, all corpora are usually badly encoded by using one character for another (quote instead of apostrophe, unbreakable character instead of a space, …) and this only accounts for languages with a writing system or transcription, i.e. not the majority of them.
The conclusion is that even Neural Network uses artificial bias in the way they model language, which means that the conclusion we draw from them are as flawed as the one we draw from the classical way linguists look at languages.
- Most serious linguists never defined “words” lightly and most of them know that this concept is an "approximation” of something that is very difficult to apprehend and seems to be more grounded into linguistics from human introspection than linguistics from corpora. It somehow represents the way our human brain aggregates the atoms of the language (characters/phonemes) into something to which we associate an interpretation. In this sense, it is somehow the “tokens” of our biological neural network (and certainly far more).
As an utterance production is not a bijection between whatever we have in our head and the sequential signal we use to communicate, I agree with you on the fact that “words" are certainly not present in a corpus (but I do think that our inner “tokens” may be observed somehow there).
Concerning the rhetoric:
I do not think any linguist or computational linguist is naive enough to think that any of the modelling we deal with are a “truth” and I doubt any of them is miseducated enough to think that “words” are clearly defined and undoubtedly present in corpora. I do think though that they are usually right to observe occurrences (or hints) of non atomic constructs we associate with some interpretation. I also think that this way of looking to a corpus has some advantages that are not really present in NN (for instance, it can observe some regularity that will help human produce new utterances without being shown a large amount of examples).
I also do think that even if you were totally right in your facts and interpretations, asking for a denial of current/past ways of looking to the texts will be a mistake. Even in physics, since the general theory of relativity, we know the classical mechanics is wrong, however it is still in use and it is not a problem as long as everybody know under which hypothesis it is a good enough approximation and under which hypothesis it does not work anymore.
I know this message will certainly not make you think differently, but if it allows you to communicate differently with persons that still use the terms “words" or “sentences" as a simple shortcut to position their work into a shared/common understanding of the state of the art, in contexts where there is no room for better explanation (e.g. in summaries of their keynote speech), then I will have achieved something.
Hoping this scientifical debate will continue in an appeased manner,
Regards,
Gilles Sérasset,
Corpora mailing list -- corpora@list.elra.infomailto:corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.infomailto:corpora-leave@list.elra.info
_______________________________________________ Corpora mailing list -- corpora@list.elra.infomailto:corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.infomailto:corpora-leave@list.elra.info
Nota automatica aggiunta dal sistema di posta
Sostieni il futuro Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271
For me personally, calls for papers and job opportunities are uninteresting aspects of the list. I filter out messages containing phrases like "deadline extension" and wade through the residue in order to be informed by the occasional presentation of linguistic insights (preferably expressed in civilised and reasonably concise terms), or by news of linguistic software or data. Contributions which do not bear on career enhancement are scarce enough as it is, and without them, the list loses its remaining relevance for me. Ciarán Ó Duibhín.
------ Original Message ------ From: "Sina Ahmadi via Corpora" corpora@list.elra.info To: "edytaj@gmail.com" edytaj@gmail.com Cc: "corpora@list.elra.info" corpora@list.elra.info Sent: 2023-08-30 17:14:11 Subject: [Corpora-List] Re: RANLP 2023 Call for Participation
I second Edyta's points too.
I have been on this list since 2015 and since then, the mailing list's standout feature has lied in its informative capacity to circulate calls for papers and job opportunities. While occasional "discussions" have also been a breath of fresh air, the current discourse doesn't quite align with this sentiment.
It would be more beneficial if the list could enhance its utility by containing intense discussions privately rather than disseminating them widely.
Thanks.
Best regards, Sina Ahmadi Postdoctoral Researcher & Adjunct Lecturer Geroge Mason University http://sinaahmadi.github.io/ **On the job market! I'm seeking out new opportunities to collaborate and innovate as a researcher and lecturer (in Europe).**
[Please ignore it if you are not interested.]
I have thought several times before writing this email and I apologise if someone finds it irrelevant and irritating. I just want to express *my personal opinion*, nothing more than that. I am not making any claims about anything. I have decided to write this mail as I have been on this list in some way since 2007 and have participated in some of the rare discussions that have taken place here.
On Ada's comments:
- I have been trying to understand what her findings are and what implications they have. So far, I have not been able to understand how these findings can prove the non-existence of words or sentences or (p)-language. As far as I know, you can't prove the negative of anything, particularly by empirical research. In any case, modern Physics tells us that almost everything is just metaphysics and at the bottom there are only fields, nothing else. Even so, as someone said, Newtonian Physics is still extremely relevant for most of our daily purposes and it is totally unnecessary to think in terms of relativity or quantum mechanics in daily life most of the time, that is, even in scientific and engineering works.
- Also, as someone pointed out, even characters are not clearly defined. I would go further and say that a byte is also an arbitrary unit, having to do with the way our computers work and the history of their development. Perhaps then we should only talk of bits, because they are the only real units? Ada is --- in my opinion (which may be wrong) --- looking at the issues from a purely, I could say purist, Information Theoretical way, where information is in the Shannon sense and has no meaning and is emphatically defined as having no meaning. Well, in that sense, life is just a random increase in entropy, nothing else, but that is neither here nor there for practical purposes. There may be some philosophical or spiritual relevance of this, but in actual life it is almost always a non sequitur. (Please note again that this is simply my personal opinion).
- What her claims --- as contrasted to her findings --- basically mean is that she is basically denying the existence of metaphysics. In my opinion, humans live and breathe metaphysics and they live and die with it. It doesn't really matter if it "doesn't exist", whatever that can mean. We might as well deny the existence of species or of colours or of [__FILL_IN_THE_BLANKS__].
- About words, as pointed out in this thread, almost every linguist and even most of the experienced NLP practitioners know about the problems with the concept of 'word'. The same with the concept of 'sentence'. As for the concept of 'language', every book on sociolinguistics explains why the concept of (p)-language is unscientific. That is why the term 'variety' is preferred in sociolinguistics. I have been teaching this fact to students *in CL/NLP courses* for years without fail.
- Finally, in case she really is right (although I don't think so till now), well, then there is a need to be patient with the world. As I am sure she knows, paradigm shifts even in the world of science and technology take time to happen.
- I don't have anything more to say about this matter. I will try not to send any more mails on this thread.
About matters related to the use of this list:
- My opinion of this matter may be biased due to the fact that I use a different email id for most mailing lists, including this one, so perhaps I have much less reason to be irritated with unnecessary emails.
- Having said that, I find this list to be lifeless or inert for the very long durations since 2007 when there is no discussion or argument going on. Most of the announcements I get on this email id are irrelevant for me, but I can simply ignore them as I use a different email id here. Still, sometimes the discussions can become stressful in some sense.
- The discussions on this list are --- for me --- mostly interesting breaks from the commodified world of science and technology, as of everything else now.
- I have never understood why people get irritated in today's world by a few and far between emails which may be irrelevant for them. Almost everyone is on one or --- usually --- more social media, where there is a deluge of such messages and posts and whatnot. BTW, I have never been on any social media, except having a personal blog for some years. This is one of the rare forums where I have participated. And again, BTW, I do regret some of the rash mail I had sent on this and other lists, mostly when I was doing PhD.
- I personally think that people, including Ada, can be more tolerant.
- As for *mails should not be advertisement*, I am puzzled by this. Seriously? In today's commodified science and technology, where you *have to advertise* as part of your work. You are supposed to advertise, even offensively. I think everyone will understand what I mean. If researchers, on a research forum, do not talk about their own research, what do you expect them to talk about? I think "advertising" one's research work here is more democratic than advertising in other formal or official forums. Researchers --- at least most of them --- don't earn anything from their research, apart from their salary. They publish papers, from which other people make a lot of money, but they don't get anything. They, in many places, don't even get to read a lot of the research papers by others for free, which they simply *need to read* for their same work!
- And then there is social media. People like me, who are not on any social media, have no other forum to express their opinion. Why can't we simply ignore some *thread* if we don't like the discussion going on there, just as we ignore most of the announcement emails?
To sum up, I think we can all be more tolerant, perhaps including myself.
[Please ignore if not interested]
Hi Anil
Thanks for your comments.
Just one, perhaps most important, clarification for now: I was/am not denying the existence of metaphysics. If anything, I sometimes think of my work as an instance of "computational phenomenology".
Thanks and best Ada
On Thu, Aug 31, 2023 at 1:00 PM Anil Singh via Corpora < corpora@list.elra.info> wrote:
[Please ignore it if you are not interested.]
I have thought several times before writing this email and I apologise if someone finds it irrelevant and irritating. I just want to express *my personal opinion*, nothing more than that. I am not making any claims about anything. I have decided to write this mail as I have been on this list in some way since 2007 and have participated in some of the rare discussions that have taken place here.
On Ada's comments:
- I have been trying to understand what her findings are and what
implications they have. So far, I have not been able to understand how these findings can prove the non-existence of words or sentences or (p)-language. As far as I know, you can't prove the negative of anything, particularly by empirical research. In any case, modern Physics tells us that almost everything is just metaphysics and at the bottom there are only fields, nothing else. Even so, as someone said, Newtonian Physics is still extremely relevant for most of our daily purposes and it is totally unnecessary to think in terms of relativity or quantum mechanics in daily life most of the time, that is, even in scientific and engineering works.
- Also, as someone pointed out, even characters are not clearly defined. I
would go further and say that a byte is also an arbitrary unit, having to do with the way our computers work and the history of their development. Perhaps then we should only talk of bits, because they are the only real units? Ada is --- in my opinion (which may be wrong) --- looking at the issues from a purely, I could say purist, Information Theoretical way, where information is in the Shannon sense and has no meaning and is emphatically defined as having no meaning. Well, in that sense, life is just a random increase in entropy, nothing else, but that is neither here nor there for practical purposes. There may be some philosophical or spiritual relevance of this, but in actual life it is almost always a non sequitur. (Please note again that this is simply my personal opinion).
- What her claims --- as contrasted to her findings --- basically mean is
that she is basically denying the existence of metaphysics. In my opinion, humans live and breathe metaphysics and they live and die with it. It doesn't really matter if it "doesn't exist", whatever that can mean. We might as well deny the existence of species or of colours or of [__FILL_IN_THE_BLANKS__].
- About words, as pointed out in this thread, almost every linguist and
even most of the experienced NLP practitioners know about the problems with the concept of 'word'. The same with the concept of 'sentence'. As for the concept of 'language', every book on sociolinguistics explains why the concept of (p)-language is unscientific. That is why the term 'variety' is preferred in sociolinguistics. I have been teaching this fact to students *in CL/NLP courses* for years without fail.
- Finally, in case she really is right (although I don't think so till
now), well, then there is a need to be patient with the world. As I am sure she knows, paradigm shifts even in the world of science and technology take time to happen.
- I don't have anything more to say about this matter. I will try not to
send any more mails on this thread.
About matters related to the use of this list:
- My opinion of this matter may be biased due to the fact that I use a
different email id for most mailing lists, including this one, so perhaps I have much less reason to be irritated with unnecessary emails.
- Having said that, I find this list to be lifeless or inert for the very
long durations since 2007 when there is no discussion or argument going on. Most of the announcements I get on this email id are irrelevant for me, but I can simply ignore them as I use a different email id here. Still, sometimes the discussions can become stressful in some sense.
- The discussions on this list are --- for me --- mostly interesting
breaks from the commodified world of science and technology, as of everything else now.
- I have never understood why people get irritated in today's world by a
few and far between emails which may be irrelevant for them. Almost everyone is on one or --- usually --- more social media, where there is a deluge of such messages and posts and whatnot. BTW, I have never been on any social media, except having a personal blog for some years. This is one of the rare forums where I have participated. And again, BTW, I do regret some of the rash mail I had sent on this and other lists, mostly when I was doing PhD.
I personally think that people, including Ada, can be more tolerant.
As for *mails should not be advertisement*, I am puzzled by this.
Seriously? In today's commodified science and technology, where you *have to advertise* as part of your work. You are supposed to advertise, even offensively. I think everyone will understand what I mean. If researchers, on a research forum, do not talk about their own research, what do you expect them to talk about? I think "advertising" one's research work here is more democratic than advertising in other formal or official forums. Researchers --- at least most of them --- don't earn anything from their research, apart from their salary. They publish papers, from which other people make a lot of money, but they don't get anything. They, in many places, don't even get to read a lot of the research papers by others for free, which they simply *need to read* for their same work!
- And then there is social media. People like me, who are not on any
social media, have no other forum to express their opinion. Why can't we simply ignore some *thread* if we don't like the discussion going on there, just as we ignore most of the announcement emails?
To sum up, I think we can all be more tolerant, perhaps including myself.
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
I agree. And this intense discussion is not just now in 2023, we have had it before some years ago and I found it something of a dialogue of the deaf then. Time to move that particular discussion permanently elsewhere, please!
Thanks in advance
Mike
On 30/08/2023 17:14, Sina Ahmadi via Corpora wrote:
I second Edyta's points too.
I have been on this list since 2015 and since then, the mailing list's standout feature has lied in its informative capacity to circulate calls for papers and job opportunities. While occasional "discussions" have also been a breath of fresh air, the current discourse doesn't quite align with this sentiment.
It would be more beneficial if the list could enhance its utility by containing intense discussions privately rather than disseminating them widely.
Thanks.
Best regards, Sina Ahmadi Postdoctoral Researcher & Adjunct Lecturer Geroge Mason University http://sinaahmadi.github.io/ http://sinaahmadi.github.io/ ***On the job market!* I'm seeking out new opportunities to collaborate and innovate as a researcher and lecturer (in Europe).**
*De :* Daniela Cesiri via Corpora corpora@list.elra.info *Envoyé :* mercredi 30 août 2023 11:32 *À :* Edyta Jurkiewicz-Rohrbacher edytaj@gmail.com *Cc :* corpora@list.elra.info corpora@list.elra.info *Objet :* [Corpora-List] Re: RANLP 2023 Call for Participation Dear All,
I agree with Edyta's polite remarks.
I find the discussions below purely informative posts quite confusing, and I am "losing track" of the original posts to the point that I fear I might miss calls that could be relevant for my work, or miss discussions that are worth joining. Before Edyta's remarks I was even considering leaving the list because of the current situation in the list.
So, I join Edyta's kind request to keep discussions as separate threads and leave call for papers/abstracts or job calls as purely informative posts. Perhaps opening a new, separate discussion thread might be an alternative option that would allow us to filter the different kinds of communications we received from the list.
Best wishes to everyone, Daniela Cesiri
Il Mer 30 Ago 2023, 17:15 Edyta Jurkiewicz-Rohrbacher via Corpora <corpora@list.elra.info mailto:corpora@list.elra.info> ha scritto:
Dear Ada, dear all, I'm a bit concerned with what has been going with the list recently. The list, as far as I understand, serves several purposes. One of them is purely informative, where one informs the community about potentially interesting jobs, conferences etc. If I open an answer to a job advertisment, I expect it will be a question useful for the potential applicants or correction about, for example, deadlines. Another thing is to ask questions or start some discussions on various topics, either theoretical or purely practical. There I will expect people sharing their experience and opinions. What I do not find ok, is giving the feedback to purely informational posts in the way Ada does. In my opinion the discussions whether words or sentences are up-to-date concepts in any (general)linguistic or computational linguistic framework should be led in separate threads. (Notice also that the problem of text segmentation has been topic for already long time.) Summing up, I wouldn't mind if Adas comments were presented maybe privately to the authors of posts, or discussed in separate list-mails. Otherwise, we are facing chaos here. Summing up, I would be more than happy to participate, if discussions about the relation between linguistics and NLP took place, but not mixed with advertisments. I hope I did not offend anybody with this message. Best, Edyta Jurkiewicz-Rohrbacher śr., 30 sie 2023 o 16:35 Gilles Sérasset via Corpora <corpora@list.elra.info <mailto:corpora@list.elra.info>> napisał(a): > > Dear Ada, dear all, > > I am not a linguist but a computational scientist which is quite used to talk with (and tries to understand) linguists. I must say that I usually read your mails as thoroughly as my schedule and patience allows me to, but, to be honest, I also have a rather negative feeling when reading your “discourse”. > > In this discourse, I see facts + interpretation + rhetorics. > > [Here I take the risk of caricaturing for the sake of shortness, I hope you will understand that I have no time nor intention to really go deeply in all the intricacies of your different claims as I am more a witness than an actor of this scientific dispute] > > My understanding of your facts: Neural models do not use the concept of word in any of their tasks, but achieve very interesting results in their modelling of the language. > > My understanding of your interpretation: this is the proof that there is no such thing as a word. > > My understanding of your rhetoric: linguists are still using “words”, so they are wrong or dishonest or miseducated or dumb, we should wipe out entirely any occurence of this concept and start over with another modelling of the language. > > Please, understand that I am just presenting the way I am interpreting your different messages. And even if I am wrong here, this interpretation is to be taken into account as we are all persons with feeling. This feeling is a fact, even if I do not particularly feel targeted by your different criticisms. I hope this will help you ponder the terms involved in your next messages. > > This being said, I was not particularly surprised to see some “passionate” replies to your different messages. And I agree with everyone here, we should not go into such passion and use ad-hominem attacks on a mailing list, AND you should also understand that most of your rhetoric do contains such passion and attacks. > > > > > Concerning the facts : > > You are right, Neural models does not use any notion of word (or word morphology) as it is usually thought in linguistics as it usually first decide what is the granularity with which it will aggregate its input (sequence of characters) into tokens to which it attaches an “interpretation” (modelled as a multi-dimensional vector). > > > > > Concerning the interpretation : > > 1. You want to wipe out the notion of word based on such a fact. I would agree somehow if we were dealing with a universal modelling of language, but this is not the case. Human model language in a certain way and neural models in another way (even if neural networks are claimed to be inspired by biological neurones in our brains). The fact that a concept does not exist in a model does not entail that it does not exist in another model. > > > 2. Also, you do make the very same mistake concerning the way you look at the facts: i.e. there is no such thing as a character…, which means that the input of NN is already flown with a bias with which we look at language. Indeed characters are a very recent invention that builds on different concerns: > - usual graphical elements that are traditionally used in language writing and that has been interpreted as atomic, > - their interpretation by the encoding authorities (see the differences and debates about code points vs characters) > - arbitrary decision made (e.g. why model A and a as 2 different characters?) > Moreover, all corpora are usually badly encoded by using one character for another (quote instead of apostrophe, unbreakable character instead of a space, …) and this only accounts for languages with a writing system or transcription, i.e. not the majority of them. > > The conclusion is that even Neural Network uses artificial bias in the way they model language, which means that the conclusion we draw from them are as flawed as the one we draw from the classical way linguists look at languages. > > > 3. Most serious linguists never defined “words” lightly and most of them know that this concept is an "approximation” of something that is very difficult to apprehend and seems to be more grounded into linguistics from human introspection than linguistics from corpora. It somehow represents the way our human brain aggregates the atoms of the language (characters/phonemes) into something to which we associate an interpretation. In this sense, it is somehow the “tokens” of our biological neural network (and certainly far more). > > As an utterance production is not a bijection between whatever we have in our head and the sequential signal we use to communicate, I agree with you on the fact that “words" are certainly not present in a corpus (but I do think that our inner “tokens” may be observed somehow there). > > > Concerning the rhetoric: > > I do not think any linguist or computational linguist is naive enough to think that any of the modelling we deal with are a “truth” and I doubt any of them is miseducated enough to think that “words” are clearly defined and undoubtedly present in corpora. I do think though that they are usually right to observe occurrences (or hints) of non atomic constructs we associate with some interpretation. I also think that this way of looking to a corpus has some advantages that are not really present in NN (for instance, it can observe some regularity that will help human produce new utterances without being shown a large amount of examples). > > I also do think that even if you were totally right in your facts and interpretations, asking for a denial of current/past ways of looking to the texts will be a mistake. Even in physics, since the general theory of relativity, we know the classical mechanics is wrong, however it is still in use and it is not a problem as long as everybody know under which hypothesis it is a good enough approximation and under which hypothesis it does not work anymore. > > > > I know this message will certainly not make you think differently, but if it allows you to communicate differently with persons that still use the terms “words" or “sentences" as a simple shortcut to position their work into a shared/common understanding of the state of the art, in contexts where there is no room for better explanation (e.g. in summaries of their keynote speech), then I will have achieved something. > > Hoping this scientifical debate will continue in an appeased manner, > > Regards, > > Gilles Sérasset, > > _______________________________________________ > Corpora mailing list -- corpora@list.elra.info <mailto:corpora@list.elra.info> > https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ <https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/> > To unsubscribe send an email to corpora-leave@list.elra.info <mailto:corpora-leave@list.elra.info> _______________________________________________ Corpora mailing list -- corpora@list.elra.info <mailto:corpora@list.elra.info> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ <https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/> To unsubscribe send an email to corpora-leave@list.elra.info <mailto:corpora-leave@list.elra.info>
Nota automatica aggiunta dal sistema di posta
*Sostieni il futuro* Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari *FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271*
Corpora mailing list --corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email tocorpora-leave@list.elra.info