[Corpora-List] Re: Fwd: Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023)

15 Feb 2023

      Hi Ada,
Of course what counts as a morpheme or as a lexical expression, and what
inventory of compositional rules one assumes, is subject to one's theory,
to language change, and also to ad-hoc playful reinterpretation (that's
what I would see your "s or p?" example as). But these are notions that
need not be 100% precise in order to delineate a research area such as MWE.
There can be a gray area as to what counts as an MWE and what doesn't. For
example, many MWE researchers would probably not count a prefix verb like
*beguile* as an MWE, simply because it fulfills all the criteria of
wordhood in traditional Western NLP. But if we assume a wide definition
such as "lexical expression consisting of more than morpheme", it too would
fall under the MWE label. In fact, it exhibits the same competition between
a lexical/idiomatic reading and a compositional reading that is typical of
more complex MWEs: "kick the bucket" could mean to kick the bucket or to
die, "beguile" could mean to affect with guile or to deceive.
I would support a name change from MWE to CLE or similar, because I agree
that "word" is not a very useful notion cross-linguistically. (Then again,
the notion of MWE might still work okay if we assume Martin Haspelmath's
retro-definition https://dlc.hypotheses.org/2621 of "word".)
Cheers,
Kilian
Am Sa., 11. Feb. 2023 um 18:29 Uhr schrieb Ada Wan adawan919@gmail.com:
...
Hi Archna
Thanks for your reply.
Your justification of the continual usage of "MWEs"/"words" is based on
history and shared understanding (from 09Feb2023: "since the term has been
used for a long while, there is a bit of a shared understanding of this
term, including about these stipulations"), both of these criteria are
achievable with alternate formulations.
Re "the category of items, of which idioms is a subset, has been referred
to as multiwords for a long time": "MWE" does not have that long of a
history --- what is the earliest use of "MWEs" that you have in your
records? And even if terms have been used for a long while, it doesn't mean
that we cannot change them for the better, esp. when they have been
inappropriately adopted or found outdated. What objections do you have with
"lexical expressions", for example?
The issue/problem with "word" is that, aside from it not being necessary
or sufficient in the study of language or in computing, there is also an
implicit, shared understanding that it is arbitrary, redundant, and
indeterminate. (This applies also to the notion of wordhood within one
language.) The indeterminacy part is evident in your not having provided me
with a definition of "words" thus far as well. Furthermore, as you
confirmed earlier: "the notion of wordhood may not be applicable to every
single language and in the same way", then how should "words" be robust
enough for computational processing?
Re emojis: here are some examples of emoji combinations that show a sense
of idiosyncrasy when they (co-)occur:
🤩 for "star-struck" (from
https://unicode.org/emoji/charts/full-emoji-list.html)
Or from from
https://www.elitedaily.com/lifestyle/funny-emoji-combinations-tiktok:
👉 👈 (feeling shy/simping)
🚪🏃‍♀️💨 (time to leave)
🍿🤏😯 (when drama is happening/when something is going down)
👁👄👁 (blank stare)
🕳👨‍🦯 (I didn't see anything)
👩🤏👩‍🦲 (wig snatched)
🐂💩 (bullsh*t)
My concern is on "wordhood" in the "language space"
(science/engineering/technology) in general, not just on lexical
expressions. I do think, however, that SIGLEX could help play an important
role in effecting some positive changes in this regard.

Hi Kilian
Let's suppose that what we have thus far known as "grammar" (the one that
has been based on or related to "words" or "sentences", i.e.
morphology/syntax (and some phonology)) can be decomposed into (sequential)
ordering and linguistic attitudes/normativity [1]. I do think
judgments/attitudes play a role in language as it exists in the social
world and can affect, or even determine, how registers/styles etc. are
defined, but I also think that there is more rigorous science of (the
remaining aspects of) language possible if we were to separate such
attitudes/prescriptivism from a more descriptive stance (e.g. in the
direction of information sciences).
Once we remove the attitudes/normativity part from the science of
language, lexical and contextual information as well as function/use
remain.
The reason why I hesitated in referring to MWEs as "complex" is because
(lexical) "complexity" can be broken down into vocabulary and length, with
use/frequency accounting for pragmatic/functional one. Hence every
expression (or any character string) is lexical.
The element of idiosyncrasy/idiomaticity is really a pragmatic one (e.g.
in the rarity/archaic-ness/uniqueness of the use of the
expressions/segment/span or character n-grams).
So "sing" can be seen as a lexical expression, just like "bing" or "ping".
Let's not forget that (even according to traditional grammatical analyses)
various linguistic effects can happen to expressions when they undergo
frequent use over an extended period of time. E.g. "ping me" may be seen
thus far as relatively more idiomatic than "sing me a song", but that's due
to the former expression being more specialized, less general, or rarer in
use. Also, e.g. in a conversation, if one said "sing me" and the other
didn't quite catch the first bit of the phrase, they might ask "[s] or
[p]?" or "'s' or 'p'?". And one can well imagine that if this becomes in
use more frequently, "s" and "p" can be regarded as what we'd now interpret
as "idiomatic". Hence "sing" does not have to be seen as a "single
morpheme".
[1] I have tweeted this before on 28Jan2023:
https://twitter.com/adawan919/status/1619401653962297344?cxt=HHwWgMDS0a3Oovk...
In a way, I am reinterpreting "(non)-compositionality" as
normalization/frequency effects via the decomposed view of "grammar" above.

*Hence, my proposal (not just for MWE workshop folks but perhaps for all
who might be interested) would be:
https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7j...
https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7jRG0/edit?usp=sharing*
*Comments welcome. *
Thanks and best
Ada
On Sat, Feb 11, 2023 at 2:01 PM Kilian Evang kilian.evang@gmail.com
wrote:
...
Hi Ada,
The problem I have with the term "expression" without further
qualification is that to my mind it includes any kind of linguistic sign,
including ones like "to pay a visit to my dear aunt Ruth" which can clearly
be interpreted compositionally. So I think we do have to specify "lexical"
to delineate what we are studying in the MWE community. "Lexical item" or,
sure, "lexical expression". Either would also include signs, of course. I
do also feel we have to add "complex" or similar, because otherwise it
includes single-morpheme lexical expressions like "sing".
Cheers,
Kilian
Am Fr., 10. Feb. 2023 um 23:32 Uhr schrieb Ada Wan adawan919@gmail.com:
...
Hi Archna
"Idioms"/"Idiomatic expressions" are established terms in the study of
language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed
phrases", is mentioned in, inter alia, [3], which was the earliest cite
from the earliest work on MWEs in the ACL Anthology [4]. If I understand
correctly, "MWEs" was a term so coined in order to establish a practice
based on "words" (if anyone should view this differently, please do correct
me here).
You're right, the task I suggested can be seen as orthogonal to
distinguishing between lexical expressions or non-lexical expressions. I
think it's important to document also the contexts surrounding expressions,
instead of just picking expressions out and studying them in an isolated
manner. It was just a suggestion for those who might be interested in
building a multilingual parallel lexical database as well as those who
might want to get a more holistic understanding of language while weaning
oneself of "words" --- now that it's become even more obvious how
superfluous the term/concept is.
[1] See e.g. https://en.wikipedia.org/wiki/Phraseme
[2] "Idiomatic expression" is just another formulation of "idiom" (see
https://www.thefreedictionary.com/idiomatic+expression).
According to Collins English Dictionary (accessed via
https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th
century Latin idiōma, denoting "pecularity of language".
[3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms.
Language, 70:491–538. https://doi.org/10.2307/416483
(Many older references on "idioms" by linguists can be found therein.)
[4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond,
Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword
expressions: linguistic precision and reusability. In Proceedings of the
Third International Conference on Language Resources and Evaluation
(LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources
Association (ELRA).

Hi Kilian
Sorry about my oversight on "item". I do think "item" could be better
than "term" in this case, but it does carry a sense of "a single element",
a more discrete "singleton". It's ok to combine it with "complex" to
mitigate the sense of "singleton", but then "complex" as you suggested is
dependent on morphology, which can be problematic.
Re "lexical": sure. (I think there have been so many different
views/traditions/conventions among linguists and computational linguists in
the past, we don't necessarily have to agree on how we or our
definitions/methods might differ or might have differed, as long as we have
the same goal now?)
One argument for "expressions" would be that they could include a sign
(e.g. hand sign in motion).
So how about updating "MWEs" to:
i. "lexical expressions", or
ii. "lexical expressions (of one character or more when written)*", or
iii. [i] or [ii] without "lexical", or
iv. others?

I'm trying to incorporate how expressions with emojis would/should be

treated too.

What do you all think?
Thanks and best
Ada
On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora <
corpora@list.elra.info> wrote:
...
Forwarded message from Archna below
---------- Forwarded message ---------
Von: Archna Bhatia abhatia@ihmc.org
Date: Do., 9. Feb. 2023 um 19:58 Uhr
Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on
Multiword Expressions (MWE 2023)
To: Ada Wan adawan919@gmail.com, kilian Evang <kilian.evang@gmail.com
...
Cc: Mike Scott mike@lexically.net, mweworkshop2023@googlegroups.com <
mweworkshop2023@googlegroups.com>, corpora@list.elra.info <
corpora@list.elra.info>
Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the
category appear more restrictive, and would need qualifications such as
“fixed” is a relative term here, etc. With “multiwords/multiword
expressions” also, there are stipulations (the notion of wordhood may not
be applicable to every single language and in the same way) but since the
term has been used for a long while, there is a bit of a shared
understanding of this term, including about these stipulations. I am open
to better terminology. Using just “expressions”, however, seems too vague
and loses some generalizations about the idiosyncrasies that "multiword
expressions” demonstrate. Every expression in not the same, “multiword
expressions” show characteristics different from other expressions. I
understand there is some fluidity also there when trying to distinguish
between multiwords and non multiword expressions.
There are so many angles that one could look at language from. I don’t
see anything wrong with the view that studies expressions covering all
aspects as you suggest without distinguishing between expressions based on
notions of wordhood. The task you suggest will help in developing
understanding about language and how languages are similar or different and
how they are used.  I don’t think it disqualifies efforts that distinguish
between “multiword expressions” and non-multiword expressions though, and
the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are
found in other linguistic aspects too when characterizing "multiword
expressions”.
~ Archna
On Feb 9, 2023, at 11:17 AM, Ada Wan adawan919@gmail.com wrote:
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of
"fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax"
(apart from the ordering of elements and/or sequential patterns) is
necessary in the analyses of such.

More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters
much whether they are "fixed" or "idiomatic". A "fixed expression" is one
that is usually more impervious to (lexical) change. One can measure this
quality in a longitudinal study, e.g. in relation to other aspects of
language change etc.. Re how "fixed" is "fixed": it's relative, much like
many other aspects of language studies. By "idiomatic", one could mean that
there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a
superflous category in the study of language. Would you mind please
justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological
analyses involve selective decomposition, not decomposition of all
decomposable units. Hence if one is only accounting for variations within
an expression as a ((sub-)character) sequence involving "morphemes"
(assuming definable rigorously) and discounting the changes in other parts
of the sequence, that would be an incomplete analysis of the expression.
Instead, one can just refer to expressions as "expressions", as e.g.
sequences/strings of various lengths/vocabs in (sub-)characters --- such an
account is also more flexible and accommodating to diverse
languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just
lexical but also functional ones. One doesn't need to incorporate/impose
any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this
area, instead of trying to find/define "words" within expressions, or to
"parse" them according to some structuralist assumptions (i.e.
morphologically/syntactically). For example, the community could start
(some multi-year project) building an international multilingual parallel
(note: not everything would be parallelizable) database of all expressions
and terminologies ever existed with contextual (historical/cultural/social)
information and start verifying their sources and status of current use.
(Just be aware, though, that one is not reinforcing values that shouldn't
be further emphasized / transfered to posterity --- as an ethical
consideration. So if something is in the grey area now, document clearly
what the current attitudes towards a certain value are, so posterity can
look back and evaluate with respect to their point of view.)
Counter questions to Archna:
What are the motivations behind your suggestion to access/interpret
language using "words"? How do you define "words" and justify the
sufficiency/necessity of morphology/syntax in relation to the study of
these expressions, esp. when the morphological decomposition of these
expressions is arbitrary and helps little (or not at all) with explanation
or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of
terms that would be considered "terms" that wouldn't be considered lexical
(I was tempted to add "lexical" to "expressions" as well, but thought that
might be a bit redundant)? It depends on how one defines "terms", of
course. And how "complex" are expressions really? They are just more
calcified units after all, aren't they? (Why do we/some always seem to want
to add the term "complex" to everything? Things that aren't "complex" are
also worthy of studying!)
Curious what you think...
Thanks and best
Ada
Why I'm advocating #noWords:
Fairness in Representation for Multilingual NLP: Insights from
Controlled Experiments on Conditional Language Modeling
https://openreview.net/forum?id=-llS6TiOew
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0
https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9%2Fview&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZZ%2F8v%2FsH6RRAlIxLYsG1tYvFOFaTZFzVtCfvsQ8ZcuY%3D&reserved=0
(It took me a while for everything to sink in.)
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora <
corpora@list.elra.info> wrote:
...
I must say I'm perfectly happy with "multi-word expression", or
"multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna
didn't have to go through a friend!)
Cheers -- Mike
--
Mike Scottlexically.net https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0
Lexical Analysis Software and Aston University

Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0
To unsubscribe send an email to corpora-leave@list.elra.info
--
You received this message because you are subscribed to the Google
Groups "MWE Workshop 2023 Organizers" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to mweworkshop2023+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VM...
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0
.
For more options, visit https://groups.google.com/d/optout
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0
.
--
Archna Bhatia, Ph.D.
Research Scientist, Institute for Human & Machine Cognition
15 SE Osceola Ave, Ocala, FL 34471
(352) 387-3061

Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info

2025

2024

2023

2022

[Corpora-List] Re: Fwd: Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023)