I used "term" bc it makes room for a little bit of (mental) shifting for some ppl... Everyone (non-specialists included) uses "w*rd". Nothing is 100% --- when it comes to "language" or abstract concepts (or everything in the empirical world?), but 99% is better than 98 or 60%. (E.g. we may have 99% of known lgs in character encoding down vs. a very shaky never-ending-story with w-segmentation, not even for one language.)
On Mon, Jun 20, 2022 at 11:18 PM Daniel HENKEL daniel.henkel@univ-paris8.fr wrote:
Just to clarify my position, I don't actually think that the En. lexeme “w*rd” is easy to define, precise or theoretically well-founded (I prefer “lexeme” here, as Ada's previous use of “term” is improper from a wusterian point of view, given that “w*rd” lacks distinctive traits due to its notorious ambiguity).
The situation is similar in mathematics where “number” is used to denote a variety of concepts such as natural numbers, integers, fractions, real numbers, irrational numbers, imaginary numbers … which may be inclusive or exclusive of each other. There are thus numerous contexts in which colloquial use of the w*rd “number” would be imprecise, inappropriate and might even lead to confusion. Nonetheless, I'm not aware of any mathematicians who advocate censorship of the w*rd “number”.
If “w*rd” lacks a clear definition and a clear theoretical foundation (which I actually agree with), then it can't really be used as a “term” until the concept has been given an adequate definition in relation to other terms within the relevant domain or theoretical framework.
On the other hand, though precise terminology is always preferable whenever and wherever precision is necessary, there's nothing ever to be gained scientifically through censorship (sorry to use an ungood w*rd, but, in all earnestness, when I see a spade I call it a “spade”).
DH
On 20/06/2022 22:13, Daniel HENKEL wrote:
Not to mention all these shamefully unscientific posts on Corporalist:
*12th International Global W*rdnet Conference Donostia / San Sebastian, Basque Country 23-27, 2023 Global W*rdnet Association: www.globalw*rdnet.org http://rdnet.org* *Conference website: https://hitz.eus/gwc2023 https://hitz.eus/gwc2023*
*18th Workshop on Multiw*rd Expressions (MWE 2022) Organized and sponsored by SIGLEX, the Special Interest Group on the Lexicon of the ACL*
*The 5th Workshop on Multi-w*rd Units in Machine Translation and Translation Technology (MUMTTT 2022) Malaga, 30th September 2022*
...
Definitely time for some lexical/terminological restrictions/updates, for the sake of goodthink/processing, and science!
(actually "science" is heretical/redundant, "goodthink/processing" will do the job:
*"As we have already seen in the case of the word FREE, w*rds which had once borne a heretical meaning were sometimes retained for the sake of convenience, but only with the undesirable meanings purged out of them. Countless other w*rds such as HONOUR, JUSTICE, MORALITY, INTERNATIONALISM, DEMOCRACY, SCIENCE, and RELIGION had simply ceased to exist."*)
DH
On 20/06/2022 21:47, Daniel HENKEL wrote:
Looks as if Linguistlist is in need of some scientific enlightenment as well :
http://linguistlist.org/issues/33/33-2063.html
*In the new, thoroughly revised second edition of W*rds of Wonder: Endangered Languages and What They Tell Us, Second Edition (formerly called Dying W*rds: Endangered Languages and What They Have to Tell Us), renowned scholar Nicholas Evans delivers an accessible and incisive text covering the impact of mass language endangerment. The distinguished author explores issues surrounding the preservation of indigenous languages, ...*
(ungood w*rds unw*rded to protect the faint of mind against ungood thinking/processing).
Best,
DH
On 20/06/2022 20:27, Ada Wan wrote:
(I just expounded on a point as a twitter reply today re the granularity of one's thinking/processing. Pls feel free to read that also.)
One can think of it in a less binary manner --- not "good" vs "bad", not "words" then "sentences", but to think of an utterance/sequence with all the finer connections in between... That is the beauty of language --- from a "philological" point of view.
I am not sure, though, if you were speaking from a scientific perspective, because I have a paper to back my argument in that regard.
On Mon, Jun 20, 2022 at 6:06 PM Sylvain Kahane sylvain@kahane.fr wrote:
“We’re destroying words–scores of them, hundreds of them, every day. We’re cutting the language down to the bone.” […]
“It’s a beautiful thing, the destruction of words. Of course the great advantage is in the verbs and adjectives, but there are hundreds of nouns that can be got rid of as well. It isn’t only the synonyms; there are also the antonyms. After all, what justification is there for a word which is simply the opposite of some other words? A word contains its opposite in itself. Take ‘good,’ for instance. If you have a word like ‘good,’ what need is there for a word like ‘bad’? ‘Ungood’ will do just as well–better, because it’s an exact opposite, which the other is not. Or again, if you want a stronger version of ‘good,’ what sense is there in having a whole string of vague useless words like ‘excellent’ and ‘splendid’ and all the rest of them? ‘Plusgood’ covers the meaning, or ‘doubleplusgood’ if you want something stronger still. Of course we use those forms already, but in the final version of Newspeak there’ll be nothing else. In the end the whole notion of goodness and badness will be covered by only six words–in reality, only one word. Don’t you see the beauty of that, Ada?…”
George Orwell, 1984
Le 20 juin 2022 à 17:33, Ada Wan adawan919@gmail.com a écrit :
Hi Christopher,
It is of the best interest of the community to discontinue the usage of
"word". The term is not only very shaky in its foundation (if any), but it can also effect disparity in performance in computational processing and robustness when human evaluation is involved.
Despite the term has been casually adopted by many in the past, like
many un-PC terms that may have an inappropriate undertone, it needs to be discouraged and abandoned.
Last but not least, I noticed that you are located in Canada, in the
event that you were to work with any indigenous communities, one MUST be advised to be careful with the usage of such term --- you could be imposing your own (EN- / FR- / dominant language-centric) view onto another individual/community. There is an element of cultural and linguistic hegemony with the usage of such term (including and not limited to making applications with it).
Please also consult recent work in this area:
https://openreview.net/forum?id=-llS6TiOew.
Feel free to get in touch if you should have any questions.
Best, Ada
On Mon, Jun 20, 2022 at 4:53 PM Christopher Collins <
Christopher.Collins@ontariotechu.ca> wrote:
Hello,
I’m looking for any open source or cloud-hosted solution for complex
word identification or word difficulty rating in French for a reading application.
As a backup plan we can use measures like corpus frequency, length,
number of senses, but we’re hoping someone has already made a tool available.
We found this but that’s it: https://github.com/sheffieldnlp/cwi
Would appreciate any tips!
Thanks,
Chris
Christopher Collins [he/him] Associate Professor - Faculty of Science Canada Research Chair in Linguistic Information Visualization Ontario Tech University vialab.ca
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list -- corpora@list.elra.info To unsubscribe send an email to corpora-leave@list.elra.info _______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list -- corpora@list.elra.info To unsubscribe send an email to corpora-leave@list.elra.info
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list -- corpora@list.elra.info To unsubscribe send an email to corpora-leave@list.elra.info
-- Daniel HENKEL https://univ-paris8.academia.edu/DanielHENKEL
*Maître de Conférences (Linguistique et Traduction) UFR5 LLCE-LEA • EA1569 TransCrit* Université Paris 8 Vincennes-St-Denis
*“non si può stendere una tipologia delle traduzioni, ma al massimo una tipologia di diversi modi di tradurre, volta per volta negoziando il fine che ci si propone – e volta per volta scoprendo che i modi di tradurre sono più di quelli che sospettiamo.”* U. Eco
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list -- corpora@list.elra.info To unsubscribe send an email to corpora-leave@list.elra.info
-- Daniel HENKEL https://univ-paris8.academia.edu/DanielHENKEL
*Maître de Conférences (Linguistique et Traduction) UFR5 LLCE-LEA • EA1569 TransCrit* Université Paris 8 Vincennes-St-Denis
*“non si può stendere una tipologia delle traduzioni, ma al massimo una tipologia di diversi modi di tradurre, volta per volta negoziando il fine che ci si propone – e volta per volta scoprendo che i modi di tradurre sono più di quelli che sospettiamo.”* U. Eco
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list -- corpora@list.elra.info To unsubscribe send an email to corpora-leave@list.elra.info
-- Daniel HENKEL https://univ-paris8.academia.edu/DanielHENKEL
*Maître de Conférences (Linguistique et Traduction) UFR5 LLCE-LEA • EA1569 TransCrit* Université Paris 8 Vincennes-St-Denis
*“non si può stendere una tipologia delle traduzioni, ma al massimo una tipologia di diversi modi di tradurre, volta per volta negoziando il fine che ci si propone – e volta per volta scoprendo che i modi di tradurre sono più di quelli che sospettiamo.”* U. Eco