Greetings,
InqBnB4 workshop: Inquisitiveness Below and Beyond the Sentence Boundary
Nancy (France), 20 June 2023, hosted by IWCS 2023
https://iwcs2023.loria.fr/inqbnb4-inquisitiveness-below-and-beyond-the-sent…
InqBnB is a workshop series bringing together researchers interested in the semantics and
pragmatics of interrogatives (questions or embedded interrogative clauses). This series was
originally organized by the Inquisitive Semantics Group of the Institute for Logic, Language
and Computation (ILLC) from the University of Amsterdam. As such, the focus point mainly
revolves around analyses using or related to inquisitive semantics.
After three successful editions in the Netherlands, we hope to open the inquisitive community
to a wider audience. The 4th edition is planned on 20 June 2023, just before IWCS 2023
(Internation Conference on Computational Semantics). As invited speakers we are welcoming
Wataru Uegaki (University of Edinburgh) and one other to be announced.
InqBnB4 invites submissions on original and unpublished research focussed on the properties
of inquisitive content. We are mainly interested in theoretical questions, formal models and
empirical work. But we are also welcoming papers based on statistical or neural models,
provided their main goal is to bring new insights regarding inquisitiveness.
Here are some examples of questions of interest:
* Which operators (connectives, quantifiers, modals, conditionals) generate inquisitiveness?
* How do these operators project the inquisitive content of their arguments?
* e.g. what triggers maximality, exhaustivity or uniqueness of readings?
* How does inquisitive content interact with informative content in compositional semantics?
* e.g. how do interrogative words interact with negative polarity items, free choice items,
indefinites or plurality?
* How do conventions of use interact with inquisitive content?
* e.g. how can non-answering responses (e.g. clarification questions) be handled?
* In which ways is pragmatics sensitive to inquisitive content?
* e.g. how does answer bias and ignorance inferences arise?
* What kind of discourse anaphora are licensed by inquisitive expressions?
* e.g. does dynamic inquisitive semantics manage to correctly derive donkey anaphora?
*Submission:*
Submission link on SoftConf:
https://softconf.com/iwcs2023/inqbnb4/
Sumitted papers must not exceed eight (8) pages (not counting acknowledgement,
references and appendices). Accepted papers get an extra page in the camera-ready version.
Submitted papers should be formatted following the common two-column structure as used by
ACL. Please use the specific style-files or the Overleaf template for IWCS 2023, taken from
ACL 2021. Initial submissions should be fully anonymous to ensure double-blind reviewing.
The proceedings will be published in the ACL anthology.
*Important dates:*
* Submission deadline: 14 April
* Author notification: 12 May
* Camera ready: 9 June
* Workshop day: 20 June
*Organizers:*
* Valentin D. Richard [1], Loria, Universit�� de Lorraine
* Philippe de Groote [2], Loria, INRIA Nancy ��� Grand Est
* Floris Roelofsen [3], ILLC, Universiteit van Amsterdam
*Programme committee:*
* Local chair: Valentin D. Richard, Universit�� de Lorraine
* Chair: Floris Roelofsen, Universiteit van Amsterdam
* Lucas Champollion [4], New York University (NYU)
* Jonathan Ginzburg [5], Universit�� Paris Cit��
* Philippe de Groote [2], INRIA Nancy ��� Grand Est
* Jakub Dotla��il [6], Universiteit Utrecht
* Reinhard Muskens [7], Universiteit van Amsterdam
* Maribel Romero [8], Universit��t Konstanz
* Wataru Uegaki [9], University of Edinburgh
* Yimei Xiang [10], Rutgers Linguistics
[1] https://valentin-d-richard.fr/
[2] https://members.loria.fr/PdeGroote/
[3] https://www.florisroelofsen.com/
[4] https://champollion.com/
[5] http://www.llf.cnrs.fr/fr/Gens/Ginzburg
[6] http://www.jakubdotlacil.com/
[7] http://freevariable.nl/
[8] https://ling.sprachwiss.uni-konstanz.de/pages/home/romero/
[9] https://www.wataruuegaki.com/
[10] https://yimeixiang.wordpress.com/
CALL FOR PARTICIPATION
EVALITA 2023 Task - PoliticIT: Political ideology detection in Italian texts
Held as part of EVALITA 2023
<https://www.evalita.it/campaigns/evalita-2023/>, a periodic evaluation
campaign of Natural Language Processing and speech tools for the Italian
language
September 7th-8th 2023, Parma
Codalab link: https://codalab.lisn.upsaclay.fr/competitions/8507
Dear All,
We are inviting researchers and students to participate in the
shared-task PoliticIT:
Political ideology detection in Italian texts held as part of EVALITA 2023,
the evaluation campaign of Natural Language Processing and speech tools for
the Italian language.
The goal of this task is to extract political ideology information from
Italian texts. For this, an automatic document classification task on
clusters of texts is proposed. It consists of extracting the self-assigned
gender as demographic trait, and the political ideology as a psychographic
trait from a set of texts written in Italian from several authors that
share those traits. Political ideology is considered as a binary and as a
multiclass problem. The PoliticIT shared task is based on a previous task
named PoliticES presented at IberLEF 2022 (García-Díaz et. al. 2022b) where
the dataset was an extension of the PoliCorpus 2020 dataset (García-Díaz et
al., 2022a).
The participants will be provided development, development_test, training
and test datasets in Italian. The corpus was collected between 2020 and
2022 from the Twitter accounts of politicians in Italy using the
UMUCorpusClassifier (García-Díaz et al., 2020). We created clusters of
texts mixing some of these extracted tweets in order to prevent ethical and
privacy issues about author profiling in Twitter. Consequently, all the
clusters are composed of texts written by different users that share all
the traits under evaluation. We labelled each cluster with his or her
self-assigned gender (male, female) and political spectrum on two axes:
binary (left, right) and multiclass (left, moderate_left, moderate_right,
right). Moreover, the Twitter mentions of the politicians were anonymised
by replacing them with the token @user. In addition, other Twitter accounts
mentions were also encoded as @user. Consequently, the text traits cannot
be guessed trivially by reading politician's names and searching
information on them on the Internet. The dataset is composed of different
clusters with around 80-100 tweets.
Moreover, in order to facilitate participation in the competition, a
Google Colab notebook will be provided. In this notebook, it is shown how
to load the development dataset and how to train 3 baselines models based
on logistic regression with a simple Bag-of-Words (BoW) model for each
trait (self_assigned_gender, ideology_binary and ideology_multiclass). In
addition, it is shown how to calculate the final F1-score of each model and
how to generate the final submission file. To download the data, the
notebook and participate, go to
https://codalab.lisn.upsaclay.fr/competitions/8507.
Best regards,
The PoliticIT 2023 organizing committee
References
-
García-Díaz, J. A., Almela, Á., Alcaraz-Mármol, G., & Valencia-García,
R. (2020). UMUCorpusClassifier: Compilation and evaluation of linguistic
corpus for Natural Language Processing tasks. Procesamiento del Lenguaje
Natural, 65, 139-142.
-
García-Díaz, J. A., Colomo-Palacios, R., & Valencia-García, R. (2022a).
Psychographic traits identification based on political ideology: An author
analysis study on Spanish politicians’ tweets posted in 2020. Future
Generation Computer Systems, 130(1), 59-74.
-
García-Díaz, J. A., Jiménez Zafra, S. M., Martín Valdivia, M. T.,
García-Sánchez, F., Ureña López, L. A., & Valencia García, R. (2022b).
Overview of PoliticEs 2022: Spanish Author Profiling for Political
Ideology. Procesamiento del Lenguaje Natural, 69, 265-272.
Important dates
-
Release of development corpora: Jan 31, 2023
-
Release of training corpora: Feb 7, 2023
-
Release of test corpora and start of evaluation campaign: May 2, 2023
-
End of evaluation campaign (deadline for runs submission): May 19, 2023
-
Publication of official results: May 30, 2023
-
Paper submission: Jun 14, 2023
-
Review notification: Jul 10, 2023
-
Camera ready submission: Jul 25, 2023
-
EVALITA Workshop: Parma, Sep 7th-8th, 2023
-
Publication of proceedings: Sep ??, 2023
Organizing committee
-
Daniel Russo (Language and Dialogue Technologies group at Fondazione
Bruno Kessler (FBK), UniTn)
-
Salud María Jiménez-Zafra (SINAI research group, Universidad de Jaén,
Spain)
-
José Antonio García-Díaz (UMUTeam research group, Universidad de Murcia,
Spain)
-
Tommaso Caselli (Faculty of Arts, Rijksuniveristeit Groningen)
-
Marco Guerini (Language and Dialogue Technologies group at Fondazione
Bruno Kessler (FBK), UniTn)
-
L. Alfonso Ureña-López (SINAI research group, Universidad de Jaén, Spain)
-
Rafael Valencia-García (UMUTeam research group, Universidad de Murcia,
Spain)
[image: Universidad de Jaén] <http://www.uja.es/> *Salud María Jiménez
Zafra*
sjzafra(a)ujaen.es
Universidad de Jaén
Grupo de Investigación SINAI <http://sinai.ujaen.es/> | Departamento de
Informática
EPS Jaén, Edificio A3, Despacho 219
Campus Las Lagunillas s/n 23071 - Jaén | +34 953212992
[image: Universidad de Jaén] <http://www.uja.es/>
*SpkAtt-2023: Shared Task on Speaker Attribution in German News Articles
and Parliamentary Debates*
We are happy to announce a newshared task on Speaker Attribution in
German, as part of the GermEval Campaign, co-located with the Conference
for Natural Language Processing (KONVENS 2023) in Ingolstadt, Germany,
in Sep 2023.
The goal of this shared task is the identification of speakers in
political debates and in news articles, and the attribution of speech
events to their respective speakers. Being able to identify this
information automatically, i.e., identifying who says what to whom, is a
necessary prerequisite for a deep semantic analysis of unstructured text.
For more information about the shared task, including the task settings,
datasets, evaluation metrics and link to the registration form, please
visit the shared task website on CodaLab
(https://codalab.lisn.upsaclay.fr/competitions/10431) Important dates:
+
April 1, 2023 - Training and development data release
+
June 15, 2023 - Test data release (blind)
+
July 1, 2023 - Submissions open
+
July 31, 2023 - Submissions close
+
August 14, 2023 - System descriptions due
+
September 7, 2023 - Camera-ready system paper deadline
+
September 18-22, 2023 - Workshop at KONVENS 2023
Organising Team:Ines Rehbein, Simone Ponzetto (U-Mannheim) Fynn
Petersen-Frey, Chris Biemann (U-Hamburg) Josef Ruppenhofer, Annelen
Brunner (IDS Mannheim) Contacts: fynn.petersen-frey(a)uni-hamburg.de,
rehbein(a)uni-mannheim.de
--
Fynn Petersen-Frey
Universität Hamburg
Language Technology Group (LT)
House of Computing and Data Science (HCDS)
* Apologies for cross-posting *
Call for papers: SemDial 2023
Submissions date: June 16th, 2023
Submissions website:
https://easychair.org/conferences/?conf=semdial2023marilogue
# CALL FOR PAPERS
SemDial 2023 -- MariLogue
The 27th Workshop on the Semantics and Pragmatics of Dialogue
16--17 August, University of Maribor, Slovenia
https://mezzanine.um.si/en/conference/semdial-2023-marilogue/
MariLogue will be the 27th edition of the SemDial workshop series,
which aims to bring together researchers working on the semantics and
pragmatics of dialogue in fields such as formal semantics and
pragmatics, computational linguistics, artificial intelligence,
philosophy, psychology, and neuroscience.
We welcome submissions with formal, computational and empirical
approaches to the semantics and pragmatics of dialogue, including, but
not limited to:
* the dynamics of agents' information states in dialogue
* common ground/mutual belief
* goals, intentions and commitments in communication
* turn-taking and interaction control
* semantic/pragmatic interpretation in dialogue
* dialogue and discourse structure
* categorisation of dialogue phenomena in corpora
* child-adult interaction
* language learning through dialogue
* gesture, gaze, and intonational meaning in communication
* multimodal dialogue
* interpretation and reasoning in spoken dialogue systems
* dialogue management
* designing and evaluating dialogue systems
* modelling miscommunication, disfluency and repair
* dialogue/interaction studies from a psychological perspective
* neuroscience of dialogue
* Interactivist approaches to dialogue
* animal communication
# SUBMISSION INSTRUCTIONS:
Long papers: Authors should submit an anonymous paper of at most 8
pages of content (up to 2 additional pages are allowed for references).
Short papers: Authors should submit a non-anonymized paper of at most
2 pages of content (up to 1 additional page allowed for references).
Submission to this track can be non-archival on request.
Submissions should be pdf files and use the LaTeX
(https://2023.aclweb.org/downloads/acl2023.zip) or Word
(https://2023.aclweb.org/downloads/acl2023.docx) templates provided
for ACL 2023 submissions. The LaTeX template is readily available on
Overleaf
(https://www.overleaf.com/latex/templates/acl-2023-proceedings-template/qjdg…).
Concurrent submission policy: Papers that have been or will be
submitted to other meetings or publications must provide this
information, using a footnote on the title page of the submissions.
SemDial 2023 cannot accept work for publication or presentation that
will be (or has been) published elsewhere.
Submission is electronic, using the EasyChair conference management
system at our Easychair submission site
(https://easychair.org/conferences/?conf=semdial2023marilogue).
# IMPORTANT DATES:
Note: All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
* Long paper submissions due: June 16th, 2023
* Notification: July 14th, 2023
* Short paper submissions due: July 21st, 2023
* Final versions due: August 4th, 2023
* Registration Deadline for presenters/refund deadline: August 4th, 2023
* MariLogue Conference: August 16th--17th, 2023
PhD student position in Neuro-Symbolic Synthesis and Reinforcement Learning
In this project we study a new approach to synthesis of efficient communication schemes in multi-agent systems, trained via reinforcement learning. We combine symbolic methods with machine learning, in what is referred to as a neuro-symbolic system, where a neural network learns to produce programs in a symbolic language to solve a task at hand. We believe this combination of neural and symbolic methods will be an important next step in the development of AI beyond todays capabilities.
This project is a collaboration between researchers at the division of Data Science and AI<https://www.chalmers.se/en/departments/cse/research/dsai/Pages/default.aspx> at Chalmers and the Center for Linguistic Theory and Studies in Probability (CLASP) at Gothenburg University.
Deadline for applications: 28 February:
https://www.chalmers.se/en/about-chalmers/work-with-us/vacancies/
BRIEF DESCRIPTION
The University of Melbourne's Department of General Practice, in collaboration with the University of Melbourne’s School of Computing and Information Systems invite applications for a 3.5 year funded PhD scholarship focused on the development and validation of NLP algorithms for the automatic extraction of adolescent risk-taking behaviours (e.g. substance use, risky sexual behaviours) and health conditions (e.g. depression, anxiety) from clinical notes.
SELECTION CRITERIA
Eligibility will be assessed on the following criteria:
* Applicants must be an outstanding applicant with an Australian first class undergraduate degree (or international equivalent), or masters degree in Computer Science or a related discipline
* Applicants must demonstrate strong Python skills
* Applicants must have some experience with Natural Language Processing methods
* Applicants must demonstrate an interest in medical and health research with a focus on data analysis skills
* Applicants must demonstrate an openness to learn new things, versatility, creativity, problem solving skills, and attention to detail
* Applications must demonstrate effective verbal and written English language communication skills
It is also desirable, but not necessary, that applicants have experience working with data derived from electronic health records.
FURTHER DETAILS AND HOW TO APPLY
For more information regarding this opportunity, see:
https://www.statsoc.org.au/forum-job-vacancies/13086448
To apply for this position, please email PCTU-enquiries(a)unimelb.edu.au<mailto:PCTU-enquiries@unimelb.edu.au> with a cover letter addressing the selection criteria, a CV, and academic transcripts
SCHOLARSHIP VALUE & DURATION
The scholarship consists of living allowance of AU$33,000 per year (2023, indexed annually) for up to a maximum of 3.5 years. Additional "top-up" funding may be available
Clinical text is growing rapidly as electronic health records become
pervasive. Much of the information recorded in a clinical encounter is
located exclusively in provider narrative notes, which makes them
indispensable for supplementing structured clinical data in order to better
understand patient state and care provided. The methods and tools developed
for the clinical domain have historically lagged behind the scientific
advances in the general-domain NLP. Despite the substantial recent strides
in clinical NLP, a substantial gap remains. The goal of this workshop is to
address this gap by establishing a regular event in CL conferences that
brings together researchers interested in developing state-of-the-art
methods for the clinical domain. The focus is on improving NLP technology
to enable clinical applications, and specifically, information extraction
and modeling of narrative provider notes from electronic health records,
patient encounter transcripts, and other clinical narratives.
Relevant topics for the workshop include, but are not limited to:
- Modeling clinical text in standard NLP tasks (tagging, chunking,
parsing, entity identification, entity linking/normalization, relation
extraction, coreference, summarization, etc.)
- De-identification and other handling of protected health information
- Structure of clinical documents (e.g., section identification)
- Information extraction from clinical text
- Integration of structured and textual data for clinical tasks
- Domain adaptation and transfer learning techniques for clinical data
- Generation of clinical notes: summarization, image-to-text, generation
of notes from clinical conversations, etc.
- Annotation schemes and annotation methodology for clinical data
- Evaluation techniques for the clinical domain
- Bias and fairness in clinical text
In 2023, Clinical NLP will encourage submissions from the following special
tracks:
- Clinical NLP in languages other than English
- Clinical NLP in low-resource settings
- Clinical NLP for clinical conversations (e.g., doctor-patient)
The 5th Clinical NLP Workshop will be co-located with ACL 2023
<https://2023.aclweb.org/> in Toronto, Canada - July 13 or 14, 2023.
Joint Track with MWE
Clinical NLP 2023 is also co-organizing a special track with the 19th
Workshop on Multiword Expressions (MWE 2023)
<https://multiword.org/mwe2023/>. The goal is to foster future synergies
that could address scientific challenges in the creation of resources,
models and applications to deal with multiword expressions and related
phenomena in the specialised domain of Clinical NLP. Submissions describing
research on multi-word expressions in the specialized domain of Clinical
NLP, especially introducing new datasets or new tools and resources, are
welcome.
Note that submissions to this track must be submitted to MWE, not Clinical
NLP by their earlier submission deadline, 20 Feb 2023. Please visit the MWE
2023 website <https://multiword.org/mwe2023/> for more details. Submissions
accepted to this “Multi-word expressions in Clinical NLP” special track
will have the opportunity to present their work first at MWE 2023 at EACL
and then also at Clinical NLP 2023 at ACL.
Shared Task
Clinical NLP 2023 is hosting the MEDIQA-Chat Shared Tasks
<https://sites.google.com/view/mediqa2023/clinicalnlp-mediqa-chat-2023> on
doctor-patient conversations, which focuses on the following tasks:
1. Dialogue2Note Summarization: Given a conversation between a doctor
and patient, participants are tasked with producing a clinical note
summarizing the conversation with one or multiple note sections (e.g.
Assessment, Past Medical History, Past Surgical History).
2. Note2Dialogue Generation: Given a clinical note, participants are
tasked with generating a synthetic doctor-patient conversation related to
the information described in the clinical note section(s).
Please visit the shared task website
<https://sites.google.com/view/mediqa2023/clinicalnlp-mediqa-chat-2023> to
register to participate and for additional information about the shared
tasks.
Submissions
Submissions may have a maximum length of eight (8) pages for long papers
and four (4) pages for short papers and shared task participant papers,
with unlimited pages for references and appendices. All submissions must be
made through OpenReview <https://openreview.net/> and follow ACL formatting
guidelines <https://acl-org.github.io/ACLPUB/formatting.html>.
The OpenReview submission site can be found here: OpenReview-ClinicalNLP
<https://openreview.net/group?id=aclweb.org/ACL/2023/Workshop/Clinical_NLP>
We encourage submissions of papers submitted to but not accepted by EACL
2023 <https://2023.eacl.org/>, ACL 2023 <https://2023.aclweb.org/>, or ACL
Rolling Review <https://aclrollingreview.org/>, as long as the topics are
relevant to Clinical NLP.
Important Dates
All deadlines are 11:59PM UTC-12:00 (anywhere on Earth
<https://www.timeanddate.com/time/zones/aoe>)
EventDate
Shared task registration opens Tuesday, January 10, 2023
Shared task release of training and validation sets Friday, February 10,
2023
Shared task release of the test sets Wednesday, March 15, 2023
Shared task run submission deadline Friday, March 17, 2023
Shared task release of official results Friday, March 31, 2023
Submission deadline (both general and shared task) Tuesday, May 2, 2023
Notification of acceptance Monday, May 22, 2023
Final versions of papers due Tuesday, June 6, 2023
Pre-recorded video due Monday, June 12, 2023
Workshop Thursday or Friday, July 13 or 14, 2023Workshop Organizers
- Anna Rumshisky (UMass Lowell)
- Asma Ben Abacha (Microsoft)
- Kirk Roberts (University of Texas Health Science Center at Houston)
- Steven Bethard (University of Arizona)
- Tristan Naumann (Microsoft Research)
For inquiries, please contact:
clinical-nlp-workshop-organizers(a)googlegroups.com.
There is an opening for a funded PhD position at CIS LMU Munich to work on
NLP for climate change within the KLIMA-MEMES project at the MaiNLP lab,
with ample opportunities for collaboration. The deadline is today, the
position remains open until filled. Interested? Check out the details and
apply as soon as possible:
https://mainlp.github.io/jobs/
Forwarded message from Archna below
---------- Forwarded message ---------
Von: Archna Bhatia <abhatia(a)ihmc.org>
Date: Sa., 11. Feb. 2023 um 01:57 Uhr
Subject: Re: [Corpora-List] Fwd: Deadline extension: 19th Workshop on
Multiword Expressions (MWE 2023)
To: Ada Wan <adawan919(a)gmail.com>
Cc: Kilian Evang <kilian.evang(a)gmail.com>, corpora(a)list.elra.info <
corpora(a)list.elra.info>
Thanks, Ada. My point was not that the term “multiword expressions”
predates the term “idioms/idiomatic expressions” but that the category of
items, of which idioms is a subset, has been referred to as multiwords for
a long time. It may not be the perfect terminology but there’s some shared
understanding what kinds of expressions or constructions constitute this
category labeled as multiwords/multiword expressions/multiword units. I
would like to see strong evidence of better suitability of a new term to
refer to this category before I adopt it.
BTW, I’m curious do you have examples in mind which belong to this category
but show that wordhood might be a problematic notion? How frequent is this
phenomenon?
Also regarding emojis etc, I’m curious: are there combinations of emojis
that show some sort of idiosyncrasy when they cooccur? Or even combinations
of emojis and textual words or other utterances which show such
idiosyncratic behavior as we generally associate with “MWEs”? (It’s
possible but I had not thought of it until now and it would be interesting
to see that.)
BTW, are you planning on attending the MWE 2023 workshop? There would be a
lot of opportunity to discuss this with researchers currently working on
multiword expressions.
Thanks,
Archna
On Feb 11, 2023, at 4:02 AM, Ada Wan <adawan919(a)gmail.com> wrote:
Hi Archna
"Idioms"/"Idiomatic expressions" are established terms in the study of
language [1], with a longer history than MWE [2]. "Fixed", e.g. in "fixed
phrases", is mentioned in, inter alia, [3], which was the earliest cite
from the earliest work on MWEs in the ACL Anthology [4]. If I understand
correctly, "MWEs" was a term so coined in order to establish a practice
based on "words" (if anyone should view this differently, please do correct
me here).
You're right, the task I suggested can be seen as orthogonal to
distinguishing between lexical expressions or non-lexical expressions. I
think it's important to document also the contexts surrounding expressions,
instead of just picking expressions out and studying them in an isolated
manner. It was just a suggestion for those who might be interested in
building a multilingual parallel lexical database as well as those who
might want to get a more holistic understanding of language while weaning
oneself of "words" --- now that it's become even more obvious how
superfluous the term/concept is.
[1] See e.g. https://en.wikipedia.org/wiki/Phraseme
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipe…>
[2] "Idiomatic expression" is just another formulation of "idiom" (see
https://www.thefreedictionary.com/idiomatic+expression
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thefr…>).
According to Collins English Dictionary (accessed via
https://www.thefreedictionary.com/idiom
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thefr…>),
"idiom" stems from the 16th century Latin idiōma, denoting "pecularity of
language".
[3] Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms.
Language, 70:491–538. https://doi.org/10.2307/416483
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2…>
(Many older references on "idioms" by linguists can be found therein.)
[4] Ann Copestake, Fabre Lambeau, Aline Villavicencio, Francis Bond,
Timothy Baldwin, Ivan A. Sag, and Dan Flickinger. 2002. Multiword
expressions: linguistic precision and reusability. In Proceedings of the
Third International Conference on Language Resources and Evaluation
(LREC’02), Las Palmas, Canary Islands - Spain. European Language Resources
Association (ELRA).
------------------------------
Hi Kilian
Sorry about my oversight on "item". I do think "item" could be better than
"term" in this case, but it does carry a sense of "a single element", a
more discrete "singleton". It's ok to combine it with "complex" to mitigate
the sense of "singleton", but then "complex" as you suggested is dependent
on morphology, which can be problematic.
Re "lexical": sure. (I think there have been so many different
views/traditions/conventions among linguists and computational linguists in
the past, we don't necessarily have to agree on how we or our
definitions/methods might differ or might have differed, as long as we have
the same goal now?)
One argument for "expressions" would be that they could include a sign
(e.g. hand sign in motion).
So how about updating "MWEs" to:
i. "lexical expressions", or
ii. "lexical expressions (of one character or more when written)*", or
iii. [i] or [ii] without "lexical", or
iv. others?
* I'm trying to incorporate how expressions with emojis would/should be
treated too.
------------------------------
What do you all think?
Thanks and best
Ada
On Fri, Feb 10, 2023 at 10:58 AM Kilian Evang via Corpora <
corpora(a)list.elra.info> wrote:
> Forwarded message from Archna below
>
> ---------- Forwarded message ---------
> Von: Archna Bhatia <abhatia(a)ihmc.org>
> Date: Do., 9. Feb. 2023 um 19:58 Uhr
> Subject: Re: [Corpora-List] Deadline extension: 19th Workshop on Multiword
> Expressions (MWE 2023)
> To: Ada Wan <adawan919(a)gmail.com>, kilian Evang <kilian.evang(a)gmail.com>
> Cc: Mike Scott <mike(a)lexically.net>, mweworkshop2023(a)googlegroups.com <
> mweworkshop2023(a)googlegroups.com>, corpora(a)list.elra.info <
> corpora(a)list.elra.info>
>
>
> Thanks, Ada. I think using the terms “fixed” and “idiomatic” make the
> category appear more restrictive, and would need qualifications such as
> “fixed” is a relative term here, etc. With “multiwords/multiword
> expressions” also, there are stipulations (the notion of wordhood may not
> be applicable to every single language and in the same way) but since the
> term has been used for a long while, there is a bit of a shared
> understanding of this term, including about these stipulations. I am open
> to better terminology. Using just “expressions”, however, seems too vague
> and loses some generalizations about the idiosyncrasies that "multiword
> expressions” demonstrate. Every expression in not the same, “multiword
> expressions” show characteristics different from other expressions. I
> understand there is some fluidity also there when trying to distinguish
> between multiwords and non multiword expressions.
>
> There are so many angles that one could look at language from. I don’t see
> anything wrong with the view that studies expressions covering all aspects
> as you suggest without distinguishing between expressions based on notions
> of wordhood. The task you suggest will help in developing understanding
> about language and how languages are similar or different and how they are
> used. I don’t think it disqualifies efforts that distinguish between
> “multiword expressions” and non-multiword expressions though, and the
> idiosyncrasies are not limited to morphology/syntax, idiosyncrasies are
> found in other linguistic aspects too when characterizing "multiword
> expressions”.
>
> ~ Archna
>
> On Feb 9, 2023, at 11:17 AM, Ada Wan <adawan919(a)gmail.com> wrote:
>
> Hi Archna, hi Kilian, hi all
>
> Thanks for your replies.
>
> TLDR on my part: I'd be fine going with "expressions" (instead of
> "fixed/idiomatic expressions"). Neither "word" nor "morphology/syntax"
> (apart from the ordering of elements and/or sequential patterns) is
> necessary in the analyses of such.
>
> -----
>
> More specifically:
>
> [@Archna] Re "fixed/idiomatic expressions": I don't think it matters much
> whether they are "fixed" or "idiomatic". A "fixed expression" is one that
> is usually more impervious to (lexical) change. One can measure this
> quality in a longitudinal study, e.g. in relation to other aspects of
> language change etc.. Re how "fixed" is "fixed": it's relative, much like
> many other aspects of language studies. By "idiomatic", one could mean that
> there is an element of idiosyncrasy (as "idiom"/"idioma").
>
> The message that I am trying to get across is that "word" is a superflous
> category in the study of language. Would you mind please justifying why you
> need "words"?
>
> The same goes for morphology, actually. In essence, morphological analyses
> involve selective decomposition, not decomposition of all decomposable
> units. Hence if one is only accounting for variations within an expression
> as a ((sub-)character) sequence involving "morphemes" (assuming definable
> rigorously) and discounting the changes in other parts of the sequence,
> that would be an incomplete analysis of the expression. Instead, one can
> just refer to expressions as "expressions", as e.g. sequences/strings of
> various lengths/vocabs in (sub-)characters --- such an account is also more
> flexible and accommodating to diverse languages/registers/modalities.
>
> A study of "expressions" can cover all other aspects --- not just lexical
> but also functional ones. One doesn't need to incorporate/impose any ad hoc
> notions of "wordhood" in these studies.
>
> Suggestion: I believe there are many more interesting tasks in this area,
> instead of trying to find/define "words" within expressions, or to "parse"
> them according to some structuralist assumptions (i.e.
> morphologically/syntactically). For example, the community could start
> (some multi-year project) building an international multilingual parallel
> (note: not everything would be parallelizable) database of all expressions
> and terminologies ever existed with contextual (historical/cultural/social)
> information and start verifying their sources and status of current use.
> (Just be aware, though, that one is not reinforcing values that shouldn't
> be further emphasized / transfered to posterity --- as an ethical
> consideration. So if something is in the grey area now, document clearly
> what the current attitudes towards a certain value are, so posterity can
> look back and evaluate with respect to their point of view.)
>
> Counter questions to Archna:
> What are the motivations behind your suggestion to access/interpret
> language using "words"? How do you define "words" and justify the
> sufficiency/necessity of morphology/syntax in relation to the study of
> these expressions, esp. when the morphological decomposition of these
> expressions is arbitrary and helps little (or not at all) with explanation
> or prediction?
>
> Re "complex lexical terms", @Kilian: I'm just wondering what kind of terms
> that would be considered "terms" that wouldn't be considered lexical (I was
> tempted to add "lexical" to "expressions" as well, but thought that might
> be a bit redundant)? It depends on how one defines "terms", of course. And
> how "complex" are expressions really? They are just more calcified units
> after all, aren't they? (Why do we/some always seem to want to add the term
> "complex" to everything? Things that aren't "complex" are also worthy of
> studying!)
>
> Curious what you think...
>
> Thanks and best
> Ada
>
> Why I'm advocating #noWords:
> Fairness in Representation for Multilingual NLP: Insights from Controlled
> Experiments on Conditional Language Modeling
> https://openreview.net/forum?id=-llS6TiOew
>
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenrevie…>
> https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view
>
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.goo…>
> (It took me a while for everything to sink in.)
>
>
> On Thu, Feb 9, 2023 at 3:27 PM Mike Scott via Corpora <
> corpora(a)list.elra.info> wrote:
>
>> I must say I'm perfectly happy with "multi-word expression", or
>> "multi-word unit".
>>
>> I feel sympathy with Archna's post (and incidentally wish Archna didn't
>> have to go through a friend!)
>> Cheers -- Mike
>>
>> --
>>
>> Mike Scottlexically.net <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flexically.…>
>> Lexical Analysis Software and Aston University
>>
>> _______________________________________________
>> Corpora mailing list -- corpora(a)list.elra.info
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra…>
>> To unsubscribe send an email to corpora-leave(a)list.elra.info
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "MWE Workshop 2023 Organizers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mweworkshop2023+unsubscribe(a)googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mweworkshop2023/CAB7Mis_GSyFjZOVw_XWp431V…
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.go…>
> .
> For more options, visit https://groups.google.com/d/optout
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.go…>
> .
>
>
>
>
>
>
> --
> Archna Bhatia, Ph.D.
> Research Scientist, Institute for Human & Machine Cognition
> 15 SE Osceola Ave, Ocala, FL 34471
> (352) 387-3061
>
> _______________________________________________
> Corpora mailing list -- corpora(a)list.elra.info
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> <https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist.elra…>
> To unsubscribe send an email to corpora-leave(a)list.elra.info
>
The Natural Language Learning Group (NLLG) at Bielefeld University, Germany, is looking for a PhD student in NLP for understanding limitations of large language models (such as BERT, ChatGPT, etc.) in social/political science contexts. The position is jointly supervised by the NLLG group and the Semantic Computing Group at Bielefeld. More information and application details are here:
https://nl2g.github.io/positions
The application deadline is 23.02.2023.
For any questions, please contact steffen.eger(a)uni-bielefeld.de