FYI.
---------- Forwarded message --------- From: 'Archna Bhatia' via MWE Workshop 2023 Organizers < mweworkshop2023@googlegroups.com> Date: Wed, Feb 8, 2023 at 8:09 PM Subject: Fwd: [Corpora-List] Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023) To: MWE Workshop 2023 Organizers mweworkshop2023@googlegroups.com
I have no idea why my emails seem like they need to be moderated recently (I must have sent something inappropriate, just kidding!), but even to my response below I received a notification that it is awaiting moderation and that either this would post or I would hear back of the moderator’s decision. From the past two CFPs (small sample), my experience is that posts awaiting moderation do not get posted nor do I hear of the moderator’s decision. So could someone else forward my response?
thanks, Archna
Begin forwarded message:
*From: *Archna Bhatia abhatia@ihmc.org *Subject: **Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023)* *Date: *February 8, 2023 at 2:59:35 PM EST *To: *Ada Wan adawan919@gmail.com *Cc: *Ken Litkowski ken@clres.com, corpora@list.elra.info
Hi Ada,
While appropriate space is found for this discussion, let me respond to just your first suggestion (for now): Why do you think they should be renamed “fixed/idiomatic expressions”? What would your definition of “fixed” and of “idiomatic” mean? How fixed would you say these expressions would be? Is morphological variation allowed? Is variation in any of the other linguistic aspects allowed? From my point of view, “fixed/idiomatic expressions” results in a much restricted category than what all we consider could be treated as multiwords.
Thanks, Archna
On Feb 8, 2023, at 2:38 PM, Ada Wan via Corpora corpora@list.elra.info wrote:
Hi Ken
Thanks for the message. Unfortunately, it looks like there has been no prior discussions on any of the topics I suggested, and the earliest post I can access dates back only to 22Nov2020. I can surely start a discussion, but that might look to be the first/only discussion on the list? (I went through all the conversations accessible thus far and only saw announcements.)
Perhaps more importantly: as this seems to be an issue that could also affect other areas of concern to the general audience of the Corpora-List (*not just for MWEs/SIGLEX*), is there a way that we all can make some changes in the "language space" across the board?
Thanks and best Ada
On Wed, Feb 8, 2023 at 5:57 PM Ken Litkowski ken@clres.com wrote:
Dear Ada,
When I added the SIGLEX discussion code back in 2010, I did so with the idea that we would have discussion of just like the topic of yours. The morph of the discussion now is located on the Google group, via https://groups.google.com/g/siglex-members https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fg%2Fsiglex-members&data=05%7C01%7Cabhatia%40ihmc.us%7C70aa9e209bc240f4685e08db0a0c3fd7%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638114819973586000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vB7GidOLSmoO%2BybHDz0%2FM5RkqIzvzjvu9%2BLBZD2pMPY%3D&reserved=0. There, you will find a place "Search conversations ..." where you can add your topic so that all will be sent. Rather than just the announcements that are the mainly topics.
Ken (webmaster retiree)
On 2/8/2023 10:18 AM, Ada Wan via Corpora wrote:
Hi Kilian
Hope all has been well.
I'm surprised that people are still "wording around" nowadays. Some suggestions:
- Can't we rename "MWEs" to "fixed/idiomatic expressions" instead? One
can reformulate these as sequences/strings/expressions of various lengths/vocabs in characters. 2. Also, one can interpret these without information/association with any syntactic categories, nouns or verbs etc.. 3. They do just represent lexical info (some reflecting/encoding historico-social habits, though one also should be aware of the ethical aspects of reinforcing some "traditional values"). Perhaps a more sophisticated view of language could help wean practitioners from a mindframe that relies of "linguistic structure(s)" as we've had it thus far (i.e. based on "words" and "sentences")? 4. Re " their meaning often does not result from the direct combination of the meanings of their parts": non-compositionality may be a better description of a more realistic view of language, it should prob be our default expectation (instead of the cherry-picked compositional counterparts).
I think efforts towards mitigating a mental dependency on "words" would be a good direction to pursue, what do you think? Can we get SIGLEX to update in this regard?
Best Ada
On Wed, Feb 8, 2023 at 11:12 AM Kilian Evang via Corpora < corpora@list.elra.info> wrote:
[Apologies for cross-postings]
Call for Papers: Deadline extended
19th Workshop on Multiword Expressions (MWE 2023)
Organized and sponsored by SIGLEX, the Special Interest Group on the Lexicon of the ACL
Full-day workshop collocated with EACL 2023, Dubrovnik, Croatia, May 5 or 6, 2023
Hybrid (on-site & on-line)
NEW: Submission deadline: February 20, 2023
NEW: Invited speakers announced (see below)
NEW: Best paper award (see below)
MWE 2023 website: https://multiword.org/mwe2023/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmultiword.org%2Fmwe2023%2F&data=05%7C01%7Cabhatia%40ihmc.us%7C70aa9e209bc240f4685e08db0a0c3fd7%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638114819973586000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6Yf1BMGVwM3x1GF3UtQ4l%2FJfAhZgNGt%2FNCVTkOxU1HM%3D&reserved=0
Multiword expressions (MWEs) are word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one's leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalised phrases, etc. Their behaviour is often unpredictable; for example, their meaning often does not result from the direct combination of the meanings of their parts. Given their irregular nature, MWEs often pose complex problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and MT), hence still representing an open issue for computational linguistics (Constant et al. 2017).
For almost two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 19th edition of the workshop, we identified the following topics on which contributions are particularly encouraged:
MWE processing and identification in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Bonin et al. 2010) is of particular importance to various applications, such as MT (Semmar & Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou et al, 2021). We expect approaches that deal with the processing of MWEs as well as the processing of terminology in specialised domains can benefit from each other.
MWE processing to enhance end-user applications: MWEs have gained particular attention in end-user applications, including MT (Zaninello & Birch 2020; Han et al. 2021, 2022), simplification (Kochmar et al. 2020), language learning and assessment (Paquot et al. 2019; Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), and abusive language detection (Zampieri et al. 2020; Caselli et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications.
MWE identification and interpretation in pre-trained language models: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook 2021), how to better model the compositionality of MWEs from semantics (Moreau et al. 2018). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz & Dagan, 2019).
MWE processing in low-resource languages: The PARSEME shared tasks (Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. A few efforts have recently explored methods for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017), and their processing in low-resource languages (Liu & Wang 2020; Kumar et al. 2017). Resource creation and sharing should be pursued in parallel with the development of methods able to capitalize on small datasets (Han et al. 2020).
Through this workshop, we would like to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with processing of MWEs including processing for low-resource languages and for various applications can benefit from each other. We also intend to consolidate the converging effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022 joint session, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in:
Computationally-applicable theoretical work in psycholinguistics and corpus linguistics;
Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, and WordNets (also for low-resource languages);
Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.);
Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP;
Interpretation of MWEs and understanding of text containing them;
Language acquisition, language learning, and non-standard language (e.g. tweets, speech);
Evaluation of annotation and processing techniques;
Retrospective comparative analyses from the PARSEME shared tasks;
Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.);
Implicit and explicit representation in pre-trained language models and end-user applications;
Evaluation and probing of pre-trained language models;
Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications;
Multiword terminology extraction;
Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones.
Shared Task
We do not have a shared task this year, but a new release of the PARSEME corpus of verbal MWEs is currently underway. We encourage submission of research papers that include analyses of the new edition of the PARSEME data and improvements over the results for PARSEME 2020 shared task as well as SemEval 2022 task 2 on idiomaticity prediction.
*** Special Track on MWEs in Clinical NLP ***
Pursuing the MWE Section’s tradition of synergies with other communities, this year, we are organizing a joint session with the Clinical NLP workshop for shared papers/poster presentations. Since clinical texts contain an important amount of multiword expressions (e.g. medical terms or domain-specific collocations), a joint session is deemed beneficial for both communities. The goal is to foster future synergies that could address scientific challenges in the creation of resources, models and applications to deal with multiword expressions and related phenomena in the specialised domain of ClinicalNLP. Submissions describing research on MWEs in the specialized domain of ClinicalNLP, especially introducing new datasets or new tools and resources, are welcome. Papers accepted in this track will have the option to present their work in the Clinical NLP workshop at ACL 2023 as well, after being presented at MWE 2023.
Invited Speakers
We are looking forward to invited talks by two amazing speakers:
Leo Wanner, Universitat Pompeu Fabra
TBD
Best paper award
All full papers in the workshop will be considered by the program committee for a best paper award. The decision will be announced in the closing session.
Submission formats
The workshop invites two types of submissions:
archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references).
non-archival submissions of abstracts describing relevant research presented/published elsewhere which will not be included in the MWE proceedings.
Paper submission and templates
Papers should be submitted via the workshop's START submission page (https://softconf.com/eacl2023/mwe2023/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftconf.com%2Feacl2023%2Fmwe2023%2F&data=05%7C01%7Cabhatia%40ihmc.us%7C70aa9e209bc240f4685e08db0a0c3fd7%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638114819973586000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=nnI80X7msxpVa6SiUJpG0oDdBOPfTB1MgspFvS0lPD4%3D&reserved=0). Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL 2023 stylesheet.
Archival papers with existing reviews from ACL Rolling Review will also be considered. A paper may not be simultaneously under review through ARR and MWE. A paper that has or will receive reviews through ARR may not be submitted for review to MWE.
Important Dates
Paper submission: February 20, 2023
ARR paper commitment: March 6, 2023
Notification of acceptance: March 13, 2023
Camera-ready papers due: March 27, 2023
Workshop: May 5 or 6, 2023
All deadlines are at 23:59 UTC-12 (Anywhere on Earth).
Organizing Committee
Program chairs: Marcos Garcia, Voula Giouli, Lifeng Han, Shiva Taslimipoor
Publication chair: Archna Bhatia
Publicity chair: Kilian Evang
Anti-harassment policy
The workshop follows the ACL anti-harassment policy.
Contact
For any inquiries regarding the workshop, please send an email to the Organizing Committee at mweworkshop2023@googlegroups.com. _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.us%7C70aa9e209bc240f4685e08db0a0c3fd7%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638114819973586000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EtmrlPoxNcBRAxWzR04YKI0jAT8Ozz1ggclE3gsnDWo%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.infohttps://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.us%7C70aa9e209bc240f4685e08db0a0c3fd7%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638114819973586000%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EtmrlPoxNcBRAxWzR04YKI0jAT8Ozz1ggclE3gsnDWo%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.info
-- Ken Litkowski TEL.: 301-482-0237 CL Research EMAIL: ken@clres.com 9208 Gue Road Home Page: http://www.clres.com https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.clres.com%2F&data=05%7C01%7Cabhatia%40ihmc.us%7C70aa9e209bc240f4685e08db0a0c3fd7%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638114819973742240%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=KT5WnSjhg%2BICcv1TfbwUCRivaGOtArllmKWkEhA8F6s%3D&reserved=0 Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.clres.com%2Fblog&data=05%7C01%7Cabhatia%40ihmc.us%7C70aa9e209bc240f4685e08db0a0c3fd7%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638114819973742240%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FqKp3efb9GjcEBg8ywU%2FSgEvMwe9%2B4lulWzA6wCMXbk%3D&reserved=0
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
-- Archna Bhatia, Ph.D. Research Scientist, Institute for Human & Machine Cognition 15 SE Osceola Ave, Ocala, FL 34471 (352) 387-3061
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!)
Cheers -- Mike
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
-----
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view (It took me a while for everything to sink in.)
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the category appear more restrictive, and would need qualifications such as “fixed” is a relative term here, etc. With “multiwords/multiword expressions” also, there are stipulations (the notion of wordhood may not be applicable to every single language and in the same way) but since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations. I am open to better terminology. Using just “expressions”, however, seems too vague and loses some generalizations about the idiosyncrasies that "multiword expressions” demonstrate. Every expression in not the same, “multiword expressions” show characteristics different from other expressions. I understand there is some fluidity also there when trying to distinguish between multiwords and non multiword expressions.
There are so many angles that one could look at language from. I don’t see anything wrong with the view that studies expressions covering all aspects as you suggest without distinguishing between expressions based on notions of wordhood. The task you suggest will help in developing understanding about language and how languages are similar or different and how they are used. I don’t think it disqualifies efforts that distinguish between “multiword expressions” and non-multiword expressions though, and the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are found in other linguistic aspects too when characterizing "multiword expressions”.
~ Archna
On Feb 9, 2023, at 11:17 AM, Ada Wan <adawan919@gmail.commailto:adawan919@gmail.com> wrote:
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
-----
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0 https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9%2Fview&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZZ%2F8v%2FsH6RRAlIxLYsG1tYvFOFaTZFzVtCfvsQ8ZcuY%3D&reserved=0 (It took me a while for everything to sink in.)
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora <corpora@list.elra.infomailto:corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!)
Cheers -- Mike
--
Mike Scott lexically.nethttps://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0 Lexical Analysis Software and Aston University
_______________________________________________ Corpora mailing list -- corpora@list.elra.infomailto:corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.infomailto:corpora-leave@list.elra.info
-- You received this message because you are subscribed to the Google Groups "MWE Workshop 2023 Organizers" group. To unsubscribe from this group and stop receiving emails from it, send an email to mweworkshop2023+unsubscribe@googlegroups.commailto:mweworkshop2023+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VM...https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0. For more options, visit https://groups.google.com/d/optouthttps://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0.
-- Archna Bhatia, Ph.D. Research Scientist, Institute for Human & Machine Cognition 15 SE Osceola Ave, Ocala, FL 34471 (352) 387-3061
Forwarded message from Archna below
---------- Forwarded message --------- Von: Archna Bhatia abhatia@ihmc.org Date: Do., 9. Feb. 2023 um 19:58 Uhr Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023) To: Ada Wan adawan919@gmail.com, kilian Evang kilian.evang@gmail.com Cc: Mike Scott mike@lexically.net, mweworkshop2023@googlegroups.com < mweworkshop2023@googlegroups.com>, corpora@list.elra.info < corpora@list.elra.info>
Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the category appear more restrictive, and would need qualifications such as “fixed” is a relative term here, etc. With “multiwords/multiword expressions” also, there are stipulations (the notion of wordhood may not be applicable to every single language and in the same way) but since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations. I am open to better terminology. Using just “expressions”, however, seems too vague and loses some generalizations about the idiosyncrasies that "multiword expressions” demonstrate. Every expression in not the same, “multiword expressions” show characteristics different from other expressions. I understand there is some fluidity also there when trying to distinguish between multiwords and non multiword expressions.
There are so many angles that one could look at language from. I don’t see anything wrong with the view that studies expressions covering all aspects as you suggest without distinguishing between expressions based on notions of wordhood. The task you suggest will help in developing understanding about language and how languages are similar or different and how they are used. I don’t think it disqualifies efforts that distinguish between “multiword expressions” and non-multiword expressions though, and the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are found in other linguistic aspects too when characterizing "multiword expressions”.
~ Archna
On Feb 9, 2023, at 11:17 AM, Ada Wan adawan919@gmail.com wrote:
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
-----
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0 https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Ffile%2Fd%2F1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9%2Fview&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZZ%2F8v%2FsH6RRAlIxLYsG1tYvFOFaTZFzVtCfvsQ8ZcuY%3D&reserved=0 (It took me a while for everything to sink in.)
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0 Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.info
Hi Archna
"Idioms"/"Idiomatic expressions" are established terms in the study of language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed phrases", is mentioned in, inter alia, [3], which was the earliest cite from the earliest work on MWEs in the ACL Anthology [4]. If I understand correctly, "MWEs" was a term so coined in order to establish a practice based on "words" (if anyone should view this differently, please do correct me here).
You're right, the task I suggested can be seen as orthogonal to distinguishing between lexical expressions or non-lexical expressions. I think it's important to document also the contexts surrounding expressions, instead of just picking expressions out and studying them in an isolated manner. It was just a suggestion for those who might be interested in building a multilingual parallel lexical database as well as those who might want to get a more holistic understanding of language while weaning oneself of "words" --- now that it's become even more obvious how superfluous the term/concept is.
[1] See e.g. https://en.wikipedia.org/wiki/Phraseme [2] "Idiomatic expression" is just another formulation of "idiom" (see https://www.thefreedictionary.com/idiomatic+expression). According to Collins English Dictionary (accessed via https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th century Latin idiōma, denoting "pecularity of language". [3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. Language, 70:491–538. https://doi.org/10.2307/416483 (Many older references on "idioms" by linguists can be found therein.) [4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond, Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword expressions: linguistic precision and reusability. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).
------------------------------
Hi Kilian
Sorry about my oversight on "item". I do think "item" could be better than "term" in this case, but it does carry a sense of "a single element", a more discrete "singleton". It's ok to combine it with "complex" to mitigate the sense of "singleton", but then "complex" as you suggested is dependent on morphology, which can be problematic.
Re "lexical": sure. (I think there have been so many different views/traditions/conventions among linguists and computational linguists in the past, we don't necessarily have to agree on how we or our definitions/methods might differ or might have differed, as long as we have the same goal now?)
One argument for "expressions" would be that they could include a sign (e.g. hand sign in motion).
So how about updating "MWEs" to: i. "lexical expressions", or ii. "lexical expressions (of one character or more when written)*", or iii. [i] or [ii] without "lexical", or iv. others?
* I'm trying to incorporate how expressions with emojis would/should be treated too.
------------------------------
What do you all think?
Thanks and best Ada
On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora < corpora@list.elra.info> wrote:
Forwarded message from Archna below
---------- Forwarded message --------- Von: Archna Bhatia abhatia@ihmc.org Date: Do., 9. Feb. 2023 um 19:58 Uhr Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023) To: Ada Wan adawan919@gmail.com, kilian Evang kilian.evang@gmail.com Cc: Mike Scott mike@lexically.net, mweworkshop2023@googlegroups.com < mweworkshop2023@googlegroups.com>, corpora@list.elra.info < corpora@list.elra.info>
Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the category appear more restrictive, and would need qualifications such as “fixed” is a relative term here, etc. With “multiwords/multiword expressions” also, there are stipulations (the notion of wordhood may not be applicable to every single language and in the same way) but since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations. I am open to better terminology. Using just “expressions”, however, seems too vague and loses some generalizations about the idiosyncrasies that "multiword expressions” demonstrate. Every expression in not the same, “multiword expressions” show characteristics different from other expressions. I understand there is some fluidity also there when trying to distinguish between multiwords and non multiword expressions.
There are so many angles that one could look at language from. I don’t see anything wrong with the view that studies expressions covering all aspects as you suggest without distinguishing between expressions based on notions of wordhood. The task you suggest will help in developing understanding about language and how languages are similar or different and how they are used. I don’t think it disqualifies efforts that distinguish between “multiword expressions” and non-multiword expressions though, and the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are found in other linguistic aspects too when characterizing "multiword expressions”.
~ Archna
On Feb 9, 2023, at 11:17 AM, Ada Wan adawan919@gmail.com wrote:
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0 https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0 Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.info
-- You received this message because you are subscribed to the Google Groups "MWE Workshop 2023 Organizers" group. To unsubscribe from this group and stop receiving emails from it, send an email to mweworkshop2023+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VM... https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0 . For more options, visit https://groups.google.com/d/optout https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0 .
-- Archna Bhatia, Ph.D. Research Scientist, Institute for Human & Machine Cognition 15 SE Osceola Ave, Ocala, FL 34471 (352) 387-3061
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Hi Ada,
The problem I have with the term "expression" without further qualification is that to my mind it includes any kind of linguistic sign, including ones like "to pay a visit to my dear aunt Ruth" which can clearly be interpreted compositionally. So I think we do have to specify "lexical" to delineate what we are studying in the MWE community. "Lexical item" or, sure, "lexical expression". Either would also include signs, of course. I do also feel we have to add "complex" or similar, because otherwise it includes single-morpheme lexical expressions like "sing".
Cheers, Kilian
Am Fr., 10. Feb. 2023 um 23:32 Uhr schrieb Ada Wan adawan919@gmail.com:
Hi Archna
"Idioms"/"Idiomatic expressions" are established terms in the study of language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed phrases", is mentioned in, inter alia, [3], which was the earliest cite from the earliest work on MWEs in the ACL Anthology [4]. If I understand correctly, "MWEs" was a term so coined in order to establish a practice based on "words" (if anyone should view this differently, please do correct me here).
You're right, the task I suggested can be seen as orthogonal to distinguishing between lexical expressions or non-lexical expressions. I think it's important to document also the contexts surrounding expressions, instead of just picking expressions out and studying them in an isolated manner. It was just a suggestion for those who might be interested in building a multilingual parallel lexical database as well as those who might want to get a more holistic understanding of language while weaning oneself of "words" --- now that it's become even more obvious how superfluous the term/concept is.
[1] See e.g. https://en.wikipedia.org/wiki/Phraseme [2] "Idiomatic expression" is just another formulation of "idiom" (see https://www.thefreedictionary.com/idiomatic+expression). According to Collins English Dictionary (accessed via https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th century Latin idiōma, denoting "pecularity of language". [3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. Language, 70:491–538. https://doi.org/10.2307/416483 (Many older references on "idioms" by linguists can be found therein.) [4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond, Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword expressions: linguistic precision and reusability. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).
Hi Kilian
Sorry about my oversight on "item". I do think "item" could be better than "term" in this case, but it does carry a sense of "a single element", a more discrete "singleton". It's ok to combine it with "complex" to mitigate the sense of "singleton", but then "complex" as you suggested is dependent on morphology, which can be problematic.
Re "lexical": sure. (I think there have been so many different views/traditions/conventions among linguists and computational linguists in the past, we don't necessarily have to agree on how we or our definitions/methods might differ or might have differed, as long as we have the same goal now?)
One argument for "expressions" would be that they could include a sign (e.g. hand sign in motion).
So how about updating "MWEs" to: i. "lexical expressions", or ii. "lexical expressions (of one character or more when written)*", or iii. [i] or [ii] without "lexical", or iv. others?
- I'm trying to incorporate how expressions with emojis would/should be
treated too.
What do you all think?
Thanks and best Ada
On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora < corpora@list.elra.info> wrote:
Forwarded message from Archna below
---------- Forwarded message --------- Von: Archna Bhatia abhatia@ihmc.org Date: Do., 9. Feb. 2023 um 19:58 Uhr Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023) To: Ada Wan adawan919@gmail.com, kilian Evang kilian.evang@gmail.com Cc: Mike Scott mike@lexically.net, mweworkshop2023@googlegroups.com < mweworkshop2023@googlegroups.com>, corpora@list.elra.info < corpora@list.elra.info>
Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the category appear more restrictive, and would need qualifications such as “fixed” is a relative term here, etc. With “multiwords/multiword expressions” also, there are stipulations (the notion of wordhood may not be applicable to every single language and in the same way) but since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations. I am open to better terminology. Using just “expressions”, however, seems too vague and loses some generalizations about the idiosyncrasies that "multiword expressions” demonstrate. Every expression in not the same, “multiword expressions” show characteristics different from other expressions. I understand there is some fluidity also there when trying to distinguish between multiwords and non multiword expressions.
There are so many angles that one could look at language from. I don’t see anything wrong with the view that studies expressions covering all aspects as you suggest without distinguishing between expressions based on notions of wordhood. The task you suggest will help in developing understanding about language and how languages are similar or different and how they are used. I don’t think it disqualifies efforts that distinguish between “multiword expressions” and non-multiword expressions though, and the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are found in other linguistic aspects too when characterizing "multiword expressions”.
~ Archna
On Feb 9, 2023, at 11:17 AM, Ada Wan adawan919@gmail.com wrote:
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0 https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0 Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.info
-- You received this message because you are subscribed to the Google Groups "MWE Workshop 2023 Organizers" group. To unsubscribe from this group and stop receiving emails from it, send an email to mweworkshop2023+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VM... https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0 . For more options, visit https://groups.google.com/d/optout https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0 .
-- Archna Bhatia, Ph.D. Research Scientist, Institute for Human & Machine Cognition 15 SE Osceola Ave, Ocala, FL 34471 (352) 387-3061
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Hi Archna
Thanks for your reply.
Your justification of the continual usage of "MWEs"/"words" is based on history and shared understanding (from 09Feb2023: "since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations"), both of these criteria are achievable with alternate formulations.
Re "the category of items, of which idioms is a subset, has been referred to as multiwords for a long time": "MWE" does not have that long of a history --- what is the earliest use of "MWEs" that you have in your records? And even if terms have been used for a long while, it doesn't mean that we cannot change them for the better, esp. when they have been inappropriately adopted or found outdated. What objections do you have with "lexical expressions", for example?
The issue/problem with "word" is that, aside from it not being necessary or sufficient in the study of language or in computing, there is also an implicit, shared understanding that it is arbitrary, redundant, and indeterminate. (This applies also to the notion of wordhood within one language.) The indeterminacy part is evident in your not having provided me with a definition of "words" thus far as well. Furthermore, as you confirmed earlier: "the notion of wordhood may not be applicable to every single language and in the same way", then how should "words" be robust enough for computational processing?
Re emojis: here are some examples of emoji combinations that show a sense of idiosyncrasy when they (co-)occur: 🤩 for "star-struck" (from https://unicode.org/emoji/charts/full-emoji-list.html) Or from from https://www.elitedaily.com/lifestyle/funny-emoji-combinations-tiktok: 👉 👈 (feeling shy/simping) 🚪🏃♀️💨 (time to leave) 🍿🤏😯 (when drama is happening/when something is going down) 👁👄👁 (blank stare) 🕳👨🦯 (I didn't see anything) 👩🤏👩🦲 (wig snatched) 🐂💩 (bullsh*t)
My concern is on "wordhood" in the "language space" (science/engineering/technology) in general, not just on lexical expressions. I do think, however, that SIGLEX could help play an important role in effecting some positive changes in this regard.
----------
Hi Kilian
Let's suppose that what we have thus far known as "grammar" (the one that has been based on or related to "words" or "sentences", i.e. morphology/syntax (and some phonology)) can be decomposed into (sequential) ordering and linguistic attitudes/normativity [1]. I do think judgments/attitudes play a role in language as it exists in the social world and can affect, or even determine, how registers/styles etc. are defined, but I also think that there is more rigorous science of (the remaining aspects of) language possible if we were to separate such attitudes/prescriptivism from a more descriptive stance (e.g. in the direction of information sciences).
Once we remove the attitudes/normativity part from the science of language, lexical and contextual information as well as function/use remain.
The reason why I hesitated in referring to MWEs as "complex" is because (lexical) "complexity" can be broken down into vocabulary and length, with use/frequency accounting for pragmatic/functional one. Hence every expression (or any character string) is lexical. The element of idiosyncrasy/idiomaticity is really a pragmatic one (e.g. in the rarity/archaic-ness/uniqueness of the use of the expressions/segment/span or character n-grams). So "sing" can be seen as a lexical expression, just like "bing" or "ping". Let's not forget that (even according to traditional grammatical analyses) various linguistic effects can happen to expressions when they undergo frequent use over an extended period of time. E.g. "ping me" may be seen thus far as relatively more idiomatic than "sing me a song", but that's due to the former expression being more specialized, less general, or rarer in use. Also, e.g. in a conversation, if one said "sing me" and the other didn't quite catch the first bit of the phrase, they might ask "[s] or [p]?" or "'s' or 'p'?". And one can well imagine that if this becomes in use more frequently, "s" and "p" can be regarded as what we'd now interpret as "idiomatic". Hence "sing" does not have to be seen as a "single morpheme".
[1] I have tweeted this before on 28Jan2023: https://twitter.com/adawan919/status/1619401653962297344?cxt=HHwWgMDS0a3Oovk... In a way, I am reinterpreting "(non)-compositionality" as normalization/frequency effects via the decomposed view of "grammar" above.
-----------------------------------------------------------------------------------------
*Hence, my proposal (not just for MWE workshop folks but perhaps for all who might be interested) would be: https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7j... https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7jRG0/edit?usp=sharing*
*Comments welcome. * Thanks and best Ada
On Sat, Feb 11, 2023 at 2:01 PM Kilian Evang kilian.evang@gmail.com wrote:
Hi Ada,
The problem I have with the term "expression" without further qualification is that to my mind it includes any kind of linguistic sign, including ones like "to pay a visit to my dear aunt Ruth" which can clearly be interpreted compositionally. So I think we do have to specify "lexical" to delineate what we are studying in the MWE community. "Lexical item" or, sure, "lexical expression". Either would also include signs, of course. I do also feel we have to add "complex" or similar, because otherwise it includes single-morpheme lexical expressions like "sing".
Cheers, Kilian
Am Fr., 10. Feb. 2023 um 23:32 Uhr schrieb Ada Wan adawan919@gmail.com:
Hi Archna
"Idioms"/"Idiomatic expressions" are established terms in the study of language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed phrases", is mentioned in, inter alia, [3], which was the earliest cite from the earliest work on MWEs in the ACL Anthology [4]. If I understand correctly, "MWEs" was a term so coined in order to establish a practice based on "words" (if anyone should view this differently, please do correct me here).
You're right, the task I suggested can be seen as orthogonal to distinguishing between lexical expressions or non-lexical expressions. I think it's important to document also the contexts surrounding expressions, instead of just picking expressions out and studying them in an isolated manner. It was just a suggestion for those who might be interested in building a multilingual parallel lexical database as well as those who might want to get a more holistic understanding of language while weaning oneself of "words" --- now that it's become even more obvious how superfluous the term/concept is.
[1] See e.g. https://en.wikipedia.org/wiki/Phraseme [2] "Idiomatic expression" is just another formulation of "idiom" (see https://www.thefreedictionary.com/idiomatic+expression). According to Collins English Dictionary (accessed via https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th century Latin idiōma, denoting "pecularity of language". [3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. Language, 70:491–538. https://doi.org/10.2307/416483 (Many older references on "idioms" by linguists can be found therein.) [4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond, Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword expressions: linguistic precision and reusability. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).
Hi Kilian
Sorry about my oversight on "item". I do think "item" could be better than "term" in this case, but it does carry a sense of "a single element", a more discrete "singleton". It's ok to combine it with "complex" to mitigate the sense of "singleton", but then "complex" as you suggested is dependent on morphology, which can be problematic.
Re "lexical": sure. (I think there have been so many different views/traditions/conventions among linguists and computational linguists in the past, we don't necessarily have to agree on how we or our definitions/methods might differ or might have differed, as long as we have the same goal now?)
One argument for "expressions" would be that they could include a sign (e.g. hand sign in motion).
So how about updating "MWEs" to: i. "lexical expressions", or ii. "lexical expressions (of one character or more when written)*", or iii. [i] or [ii] without "lexical", or iv. others?
- I'm trying to incorporate how expressions with emojis would/should be
treated too.
What do you all think?
Thanks and best Ada
On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora < corpora@list.elra.info> wrote:
Forwarded message from Archna below
---------- Forwarded message --------- Von: Archna Bhatia abhatia@ihmc.org Date: Do., 9. Feb. 2023 um 19:58 Uhr Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023) To: Ada Wan adawan919@gmail.com, kilian Evang kilian.evang@gmail.com Cc: Mike Scott mike@lexically.net, mweworkshop2023@googlegroups.com < mweworkshop2023@googlegroups.com>, corpora@list.elra.info < corpora@list.elra.info>
Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the category appear more restrictive, and would need qualifications such as “fixed” is a relative term here, etc. With “multiwords/multiword expressions” also, there are stipulations (the notion of wordhood may not be applicable to every single language and in the same way) but since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations. I am open to better terminology. Using just “expressions”, however, seems too vague and loses some generalizations about the idiosyncrasies that "multiword expressions” demonstrate. Every expression in not the same, “multiword expressions” show characteristics different from other expressions. I understand there is some fluidity also there when trying to distinguish between multiwords and non multiword expressions.
There are so many angles that one could look at language from. I don’t see anything wrong with the view that studies expressions covering all aspects as you suggest without distinguishing between expressions based on notions of wordhood. The task you suggest will help in developing understanding about language and how languages are similar or different and how they are used. I don’t think it disqualifies efforts that distinguish between “multiword expressions” and non-multiword expressions though, and the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are found in other linguistic aspects too when characterizing "multiword expressions”.
~ Archna
On Feb 9, 2023, at 11:17 AM, Ada Wan adawan919@gmail.com wrote:
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0 https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0 Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.info
-- You received this message because you are subscribed to the Google Groups "MWE Workshop 2023 Organizers" group. To unsubscribe from this group and stop receiving emails from it, send an email to mweworkshop2023+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VM... https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0 . For more options, visit https://groups.google.com/d/optout https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0 .
-- Archna Bhatia, Ph.D. Research Scientist, Institute for Human & Machine Cognition 15 SE Osceola Ave, Ocala, FL 34471 (352) 387-3061
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Hi Ada,
Of course what counts as a morpheme or as a lexical expression, and what inventory of compositional rules one assumes, is subject to one's theory, to language change, and also to ad-hoc playful reinterpretation (that's what I would see your "s or p?" example as). But these are notions that need not be 100% precise in order to delineate a research area such as MWE. There can be a gray area as to what counts as an MWE and what doesn't. For example, many MWE researchers would probably not count a prefix verb like *beguile* as an MWE, simply because it fulfills all the criteria of wordhood in traditional Western NLP. But if we assume a wide definition such as "lexical expression consisting of more than morpheme", it too would fall under the MWE label. In fact, it exhibits the same competition between a lexical/idiomatic reading and a compositional reading that is typical of more complex MWEs: "kick the bucket" could mean to kick the bucket or to die, "beguile" could mean to affect with guile or to deceive.
I would support a name change from MWE to CLE or similar, because I agree that "word" is not a very useful notion cross-linguistically. (Then again, the notion of MWE might still work okay if we assume Martin Haspelmath's retro-definition https://dlc.hypotheses.org/2621 of "word".)
Cheers, Kilian
Am Sa., 11. Feb. 2023 um 18:29 Uhr schrieb Ada Wan adawan919@gmail.com:
Hi Archna
Thanks for your reply.
Your justification of the continual usage of "MWEs"/"words" is based on history and shared understanding (from 09Feb2023: "since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations"), both of these criteria are achievable with alternate formulations.
Re "the category of items, of which idioms is a subset, has been referred to as multiwords for a long time": "MWE" does not have that long of a history --- what is the earliest use of "MWEs" that you have in your records? And even if terms have been used for a long while, it doesn't mean that we cannot change them for the better, esp. when they have been inappropriately adopted or found outdated. What objections do you have with "lexical expressions", for example?
The issue/problem with "word" is that, aside from it not being necessary or sufficient in the study of language or in computing, there is also an implicit, shared understanding that it is arbitrary, redundant, and indeterminate. (This applies also to the notion of wordhood within one language.) The indeterminacy part is evident in your not having provided me with a definition of "words" thus far as well. Furthermore, as you confirmed earlier: "the notion of wordhood may not be applicable to every single language and in the same way", then how should "words" be robust enough for computational processing?
Re emojis: here are some examples of emoji combinations that show a sense of idiosyncrasy when they (co-)occur: 🤩 for "star-struck" (from https://unicode.org/emoji/charts/full-emoji-list.html) Or from from https://www.elitedaily.com/lifestyle/funny-emoji-combinations-tiktok: 👉 👈 (feeling shy/simping) 🚪🏃♀️💨 (time to leave) 🍿🤏😯 (when drama is happening/when something is going down) 👁👄👁 (blank stare) 🕳👨🦯 (I didn't see anything) 👩🤏👩🦲 (wig snatched) 🐂💩 (bullsh*t)
My concern is on "wordhood" in the "language space" (science/engineering/technology) in general, not just on lexical expressions. I do think, however, that SIGLEX could help play an important role in effecting some positive changes in this regard.
Hi Kilian
Let's suppose that what we have thus far known as "grammar" (the one that has been based on or related to "words" or "sentences", i.e. morphology/syntax (and some phonology)) can be decomposed into (sequential) ordering and linguistic attitudes/normativity [1]. I do think judgments/attitudes play a role in language as it exists in the social world and can affect, or even determine, how registers/styles etc. are defined, but I also think that there is more rigorous science of (the remaining aspects of) language possible if we were to separate such attitudes/prescriptivism from a more descriptive stance (e.g. in the direction of information sciences).
Once we remove the attitudes/normativity part from the science of language, lexical and contextual information as well as function/use remain.
The reason why I hesitated in referring to MWEs as "complex" is because (lexical) "complexity" can be broken down into vocabulary and length, with use/frequency accounting for pragmatic/functional one. Hence every expression (or any character string) is lexical. The element of idiosyncrasy/idiomaticity is really a pragmatic one (e.g. in the rarity/archaic-ness/uniqueness of the use of the expressions/segment/span or character n-grams). So "sing" can be seen as a lexical expression, just like "bing" or "ping". Let's not forget that (even according to traditional grammatical analyses) various linguistic effects can happen to expressions when they undergo frequent use over an extended period of time. E.g. "ping me" may be seen thus far as relatively more idiomatic than "sing me a song", but that's due to the former expression being more specialized, less general, or rarer in use. Also, e.g. in a conversation, if one said "sing me" and the other didn't quite catch the first bit of the phrase, they might ask "[s] or [p]?" or "'s' or 'p'?". And one can well imagine that if this becomes in use more frequently, "s" and "p" can be regarded as what we'd now interpret as "idiomatic". Hence "sing" does not have to be seen as a "single morpheme".
[1] I have tweeted this before on 28Jan2023: https://twitter.com/adawan919/status/1619401653962297344?cxt=HHwWgMDS0a3Oovk... In a way, I am reinterpreting "(non)-compositionality" as normalization/frequency effects via the decomposed view of "grammar" above.
*Hence, my proposal (not just for MWE workshop folks but perhaps for all who might be interested) would be: https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7j... https://docs.google.com/document/d/1n4QRn0CxbVMj6kbLWo-byT3S26ODJOjicU-ZYw7jRG0/edit?usp=sharing*
*Comments welcome. * Thanks and best Ada
On Sat, Feb 11, 2023 at 2:01 PM Kilian Evang kilian.evang@gmail.com wrote:
Hi Ada,
The problem I have with the term "expression" without further qualification is that to my mind it includes any kind of linguistic sign, including ones like "to pay a visit to my dear aunt Ruth" which can clearly be interpreted compositionally. So I think we do have to specify "lexical" to delineate what we are studying in the MWE community. "Lexical item" or, sure, "lexical expression". Either would also include signs, of course. I do also feel we have to add "complex" or similar, because otherwise it includes single-morpheme lexical expressions like "sing".
Cheers, Kilian
Am Fr., 10. Feb. 2023 um 23:32 Uhr schrieb Ada Wan adawan919@gmail.com:
Hi Archna
"Idioms"/"Idiomatic expressions" are established terms in the study of language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed phrases", is mentioned in, inter alia, [3], which was the earliest cite from the earliest work on MWEs in the ACL Anthology [4]. If I understand correctly, "MWEs" was a term so coined in order to establish a practice based on "words" (if anyone should view this differently, please do correct me here).
You're right, the task I suggested can be seen as orthogonal to distinguishing between lexical expressions or non-lexical expressions. I think it's important to document also the contexts surrounding expressions, instead of just picking expressions out and studying them in an isolated manner. It was just a suggestion for those who might be interested in building a multilingual parallel lexical database as well as those who might want to get a more holistic understanding of language while weaning oneself of "words" --- now that it's become even more obvious how superfluous the term/concept is.
[1] See e.g. https://en.wikipedia.org/wiki/Phraseme [2] "Idiomatic expression" is just another formulation of "idiom" (see https://www.thefreedictionary.com/idiomatic+expression). According to Collins English Dictionary (accessed via https://www.thefreedictionary.com/idiom), "idiom" stems from the 16th century Latin idiōma, denoting "pecularity of language". [3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. Language, 70:491–538. https://doi.org/10.2307/416483 (Many older references on "idioms" by linguists can be found therein.) [4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond, Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword expressions: linguistic precision and reusability. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources Association (ELRA).
Hi Kilian
Sorry about my oversight on "item". I do think "item" could be better than "term" in this case, but it does carry a sense of "a single element", a more discrete "singleton". It's ok to combine it with "complex" to mitigate the sense of "singleton", but then "complex" as you suggested is dependent on morphology, which can be problematic.
Re "lexical": sure. (I think there have been so many different views/traditions/conventions among linguists and computational linguists in the past, we don't necessarily have to agree on how we or our definitions/methods might differ or might have differed, as long as we have the same goal now?)
One argument for "expressions" would be that they could include a sign (e.g. hand sign in motion).
So how about updating "MWEs" to: i. "lexical expressions", or ii. "lexical expressions (of one character or more when written)*", or iii. [i] or [ii] without "lexical", or iv. others?
- I'm trying to incorporate how expressions with emojis would/should be
treated too.
What do you all think?
Thanks and best Ada
On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora < corpora@list.elra.info> wrote:
Forwarded message from Archna below
---------- Forwarded message --------- Von: Archna Bhatia abhatia@ihmc.org Date: Do., 9. Feb. 2023 um 19:58 Uhr Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword Expressions (MWE 2023) To: Ada Wan adawan919@gmail.com, kilian Evang <kilian.evang@gmail.com
Cc: Mike Scott mike@lexically.net, mweworkshop2023@googlegroups.com < mweworkshop2023@googlegroups.com>, corpora@list.elra.info < corpora@list.elra.info>
Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the category appear more restrictive, and would need qualifications such as “fixed” is a relative term here, etc. With “multiwords/multiword expressions” also, there are stipulations (the notion of wordhood may not be applicable to every single language and in the same way) but since the term has been used for a long while, there is a bit of a shared understanding of this term, including about these stipulations. I am open to better terminology. Using just “expressions”, however, seems too vague and loses some generalizations about the idiosyncrasies that "multiword expressions” demonstrate. Every expression in not the same, “multiword expressions” show characteristics different from other expressions. I understand there is some fluidity also there when trying to distinguish between multiwords and non multiword expressions.
There are so many angles that one could look at language from. I don’t see anything wrong with the view that studies expressions covering all aspects as you suggest without distinguishing between expressions based on notions of wordhood. The task you suggest will help in developing understanding about language and how languages are similar or different and how they are used. I don’t think it disqualifies efforts that distinguish between “multiword expressions” and non-multiword expressions though, and the idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are found in other linguistic aspects too when characterizing "multiword expressions”.
~ Archna
On Feb 9, 2023, at 11:17 AM, Ada Wan adawan919@gmail.com wrote:
Hi Archna, hi Kilian, hi all
Thanks for your replies.
TLDR on my part: I'd be fine going with "expressions" (instead of "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax" (apart from the ordering of elements and/or sequential patterns) is necessary in the analyses of such.
More specifically:
[@Archna] Re "fixed/idiomatic expressions": I don't think it matters much whether they are "fixed" or "idiomatic". A "fixed expression" is one that is usually more impervious to (lexical) change. One can measure this quality in a longitudinal study, e.g. in relation to other aspects of language change etc.. Re how "fixed" is "fixed": it's relative, much like many other aspects of language studies. By "idiomatic", one could mean that there is an element of idiosyncrasy (as "idiom"/"idioma").
The message that I am trying to get across is that "word" is a superflous category in the study of language. Would you mind please justifying why you need "words"?
The same goes for morphology, actually. In essence, morphological analyses involve selective decomposition, not decomposition of all decomposable units. Hence if one is only accounting for variations within an expression as a ((sub-)character) sequence involving "morphemes" (assuming definable rigorously) and discounting the changes in other parts of the sequence, that would be an incomplete analysis of the expression. Instead, one can just refer to expressions as "expressions", as e.g. sequences/strings of various lengths/vocabs in (sub-)characters --- such an account is also more flexible and accommodating to diverse languages/registers/modalities.
A study of "expressions" can cover all other aspects --- not just lexical but also functional ones. One doesn't need to incorporate/impose any ad hoc notions of "wordhood" in these studies.
Suggestion: I believe there are many more interesting tasks in this area, instead of trying to find/define "words" within expressions, or to "parse" them according to some structuralist assumptions (i.e. morphologically/syntactically). For example, the community could start (some multi-year project) building an international multilingual parallel (note: not everything would be parallelizable) database of all expressions and terminologies ever existed with contextual (historical/cultural/social) information and start verifying their sources and status of current use. (Just be aware, though, that one is not reinforcing values that shouldn't be further emphasized / transfered to posterity --- as an ethical consideration. So if something is in the grey area now, document clearly what the current attitudes towards a certain value are, so posterity can look back and evaluate with respect to their point of view.)
Counter questions to Archna: What are the motivations behind your suggestion to access/interpret language using "words"? How do you define "words" and justify the sufficiency/necessity of morphology/syntax in relation to the study of these expressions, esp. when the morphological decomposition of these expressions is arbitrary and helps little (or not at all) with explanation or prediction?
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fforum%3Fid%3D-llS6TiOew&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jea7YNI7295cJ2CY0jwxrsjID7DcDqerqI3IQxj9hUc%3D&reserved=0 https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.net%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=lnpEPfv%2B4UmB1e0xVkC4hsIs%2B9GqwDnSzzMpwiFWZHw%3D&reserved=0 Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra.info%2Fmailman3%2Fpostorius%2Flists%2Fcorpora.list.elra.info%2F&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78A%2BL24tQ0GBhZ3lAGipq9tLPZU%2FmydmGBGX1yE4BSA%3D&reserved=0 To unsubscribe send an email to corpora-leave@list.elra.info
-- You received this message because you are subscribed to the Google Groups "MWE Workshop 2023 Organizers" group. To unsubscribe from this group and stop receiving emails from it, send an email to mweworkshop2023+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VM... https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fmweworkshop2023%2FCAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%2540mail.gmail.com%3Futm_medium%3Demail%26utm_source%3Dfooter&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FX%2BoXH1j9XL5X0tJuqc%2BfKdFkuugawZvrtzdXNUG2%2FA%3D&reserved=0 . For more options, visit https://groups.google.com/d/optout https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Foptout&data=05%7C01%7Cabhatia%40ihmc.org%7C3d437044e42f42c2c61408db0ab92ccb%7C2b38115bebad4aba9ea3b3779d8f4f43%7C1%7C0%7C638115562691707319%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1SGOAvFNmKwsKKOx6Kc%2Fm1wHzDbm%2F4xiEge3RY5etrE%3D&reserved=0 .
-- Archna Bhatia, Ph.D. Research Scientist, Institute for Human & Machine Cognition 15 SE Osceola Ave, Ocala, FL 34471 (352) 387-3061
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Hi Ada,
Am Do., 9. Feb. 2023 um 17:17 Uhr schrieb Ada Wan adawan919@gmail.com:
Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms that would be considered "terms" that wouldn't be considered lexical (I was tempted to add "lexical" to "expressions" as well, but thought that might be a bit redundant)? It depends on how one defines "terms", of course. And how "complex" are expressions really? They are just more calcified units after all, aren't they? (Why do we/some always seem to want to add the term "complex" to everything? Things that aren't "complex" are also worthy of studying!)
I suggested "complex lexical items", not "complex lexical terms".
"Lexical item" as in "meaning does not follow compositionally, so must be assumed to be in the lexicon".
"Complex" as in "consists of more than one morpheme".
Cheers, Kilian
Curious what you think...
Thanks and best Ada
Why I'm advocating #noWords: Fairness in Representation for Multilingual NLP: Insights from Controlled Experiments on Conditional Language Modeling https://openreview.net/forum?id=-llS6TiOew https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view (It took me a while for everything to sink in.)
On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora < corpora@list.elra.info> wrote:
I must say I'm perfectly happy with "multi-word expression", or "multi-word unit".
I feel sympathy with Archna's post (and incidentally wish Archna didn't have to go through a friend!) Cheers -- Mike
--
Mike Scottlexically.net Lexical Analysis Software and Aston University
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
-- You received this message because you are subscribed to the Google Groups "MWE Workshop 2023 Organizers" group. To unsubscribe from this group and stop receiving emails from it, send an email to mweworkshop2023+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VM... https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431VMJJBo0BnPqjFsqqTP_sEE58Ezw%40mail.gmail.com?utm_medium=email&utm_source=footer . For more options, visit https://groups.google.com/d/optout.