2nd Call for Abstracts: 1st Workshop on Readability for Low Resourced Languages (RLRL 2023)
Free registration is now open https://bit.ly/3pwUwlG - a few tickets are still available.
Please join us for an exciting online workshop where experts in natural language processing will come together to discuss the latest research and innovative approaches to assessing the readability of low-resource languages. The workshop will take place as a free online event on September 5, 2023, and is being hosted jointly by Lancaster University, Sheffield Hallam University and King Saud University.
We welcome researchers and practitioners to submit presentation abstract proposals of up to 500 words for talks related to the development of a Readability Framework for low-resource languages.
The ultimate goal of the workshop is to discuss best practices and state-of-the-art AI-based approaches to create mathematical representations of expected readability levels at different school grade or cognitive ability levels. The workshop will also focus on utilising classifiers that are intuitive for humans to understand and adjust, enabling the analysis and improvement of the decision-making criteria. We welcome abstracts on work that is still in progress or that does not yet have conclusive results. We encourage authors to share their work at various stages of development to facilitate discussions and collaboration during the workshop.
Important Dates:
- Due date for workshop abstract submission: August 1, 2023 (extended)
- Notification of abstract acceptance to authors: August 10, 2023
- Workshop date: September 5, 2023 (online event<https://bit.ly/3pwUwlG>)
Keynote speakers:
- Professor Laurence Anthony - Faculty of Science and Engineering at Waseda University, Japan.
- Dr Violetta Cavalli-Sforza - School of Science and Engineering at Al Akhawayn University, Morocco.
- Professor Hend Al-Khalifa - College of Computer and Information Sciences at King Saud University, KSA
- Dr Abdel-Karim Al Tamimi- Computer Science and Software Engineering at Sheffield Hallam University, UK
- Dr Mo El-Haj - School of Computing and Communications at Lancaster University, UK
For list of speakers, talks' titles and abstract please visit the workshop's website:
https://wp.lancs.ac.uk/acc/rlrl2023/
The main objectives of the workshop are three-fold:
1- Increase awareness of the importance of readability in low-resource languages and its impact on language learning and literacy.
2- Discuss the challenges of readability in low-resource languages, such as limited resources and lack of standardization, and brainstorm strategies for addressing these challenges.
3- Foster a community of practice among participants, allowing them to share their experiences and best practices for addressing readability issues in low-resource languages.
Abstract submission:
Abstract submission page is now open, please submit abstracts of no more than 500 words https://easychair.org/conferences/?conf=rlrl2023
Alternatively, you can contact the organisers directly with presentation ideas on topics related to readability or low resourced languages.
Topics of interest include, but are not limited to:
- Machine learning for text readability
- Applications of readability assessment
- Readability in low-resource languages
- Comprehensibility measures
- Mathematical representations of readability levels
- Text simplification for low-resource languages
- Readability and comprehensibility in language learning
- The effects of text simplification on readability
- Readability frameworks for indigenous languages
- Updating readability representations
We look forward to your contributions and to a productive and enlightening workshop on September 5, 2023.
RLRL 2023 Organisers:
- Dr Mo El-Haj (SCC/DSI/UCREL, Lancaster University)
- Dr Abdel-Karim Al Tamimi (CSSE, Sheffield Hallam University)
- Prof. Hend Al Khalifa (iWAN, King Saud University)
https://wp.lancs.ac.uk/acc/rlrl2023/
Best wishes,
Mahmoud
---------------------
Dr Mo El-Haj
Senior Lecturer in NLP
Co-Director of UCREL NLP Group
Strategic Lead of Arabic and Financial NLP Research
Advisory Board of the Natural Language Processing Journal
https://benjamins.com/catalog/nlp
School of Computing and Communications, Lancaster University
https://www.lancaster.ac.uk/staff/elhaj
@DocElhaj<https://twitter.com/DocElhaj>
*** First Call for Workshop Proposals ***
36th International Conference on Advanced Information Systems Engineering
(CAiSE 2024)
June 3-7, 2024, 5* St. Raphael Resort and Marina, Limassol, Cyprus
https://cyprusconferences.org/caise2024/
(*** Submission Deadline: October 13, 2023 AoE ***)
CAiSE is a well-established, highly visible conference series on Advanced Information Systems
(IS) Engineering. It covers all relevant topics in the area, including methodologies and
approaches for IS engineering, innovative platforms, architectures and technologies, and
engineering of specific kinds of IS. CAiSE conferences also have the tradition of hosting
workshops in related fields. Workshops are intended to focus on particular topics and provide
ample room for discussions of new ideas and developments.
CAiSE 2024, the 36th edition of the CAiSE series, invites proposals for workshops to be held in
conjunction with the main conference, related to the CAiSE topics, covering new emerging
topics and targeting innovative papers in special focus areas.
Prospective workshop organisers should specify whether they plan an event with a
presentation-oriented track, a discussion-oriented track, or both.
Presentation-oriented track
This track focuses on accepted papers with presentations followed by Q&A sessions, akin to
conferences. The proceedings of these workshops are intended to be published in a joint
volume in the Springer LNBIP series. Submissions must conform to the Springer LNCS/LNBIP
format and should not exceed 12 pages. According to the Springer standards, the overall
acceptance rate cannot exceed 45%-50%.
Discussion-oriented track
This track emphasizes discussions facilitated by paper presentations revolving around novel
ideas and early-stage research. Since the main criterion for paper acceptance in such
workshops is relevance and potential for raising discussion, they are not expected to have
their proceedings in the Springer LNBIP volume.
The edition of a joint proceedings volume next to the LNBIP one for the discussion-oriented
track is underway. Proceedings shall be submitted to CEUR-WS.org for online publication.
Details on this aspect will be provided separately.
Proposal submission
Workshop proposals should be submitted via EasyChair at the following address: https://easychair.org/conferences/?conf=caise2024
Please select the Author role and the CAiSE 2024 Workshops track.
Prior contact with the workshop chairs (caise2024-workshops(a)easychair.org) is encouraged.
The organizer(s) of approved workshops will be responsible for advertising their workshop,
eliciting high-quality submissions, organizing the reviewing process of their workshop’s
papers according to the principles and guidelines of CAiSE, and collecting camera-ready
copies of accepted papers (verifying that they comply with the formatting rules). Organizers
(including co-organizers) are expected to attend their entire workshop.
Detailed instructions for workshop proposers
The proposal (up to 1000 words) should cover the following points:
• Workshop title, duration (1 day or 2 days), preferred date (3-4 June 2024)
• Workshop type (presentation-oriented or discussion-oriented, or both).
• Information on the organizers (PC chairs, other organizers who will be present at the workshop or are otherwise involved, including the person responsible for web presence and communication). Please include names, addresses, and affiliations, indicating the main responsible person. The submission should include a one-paragraph biographical sketch for each organizer, describing relevant qualifications and experience. Please specify at least one PC chair. PC chairs will not be allowed to submit papers to the workshop, but other organizers (who will have no oversight over the review process) are encouraged to do so.
• Purpose: What are the main goals of the workshop? Please list the workshop topics. How does the focus of the workshop differ from the main conference? How does the focus of the workshop differ from other potential CAiSE events? (Proponents are advised to look at the workshops and working conferences held at CAiSE 2022 and CAiSE 2023 and differentiate your scope from theirs.)
• Organization of the workshop: Specify the type of contributions, distribution into sessions, type of sessions, etc. Mention if you plan to have any keynote speaker (please note that the conference organization will not cover fees, travel expenses, accommodation and registration costs of keynote speakers). Include any special requirements regarding infrastructure and room layout.
• Tentative list of PC members.
• An estimate of the number of papers to be accepted, and the number of attendees. If applicable, short information on previous editions of the workshop series (this should include submission, acceptance, and attendance information). Short information on your plans for advertising your workshop and making it highly visible.
Services provided by CAiSE
• EasyChair installation for the management of the workshop submissions (each organizer will be made chair of their own workshop).
• Publication of papers in an LNBIP volume for presentation-oriented tracks, and in a CEUR-WS.org online volume for discussion-oriented tracks.
• One free workshop-only registration if more than 10 people are registered for the workshop. Organizers willing to attend the whole event (main conference) will have to register for the conference at their own expense.
• Local organizational infrastructure and administrative support (registration, badges, refreshments, beamers, screens, etc.). In particular, all venue issues (rooms, meals and catering, social dinner, etc.) as well as the management of the registrations and the financing/administrative issues will be handled by the CAiSE Organization Board and are not under the responsibility of the workshop organizers.
• Advertisement of the workshop on the CAiSE 2024 homepage and mailings.
Please note that the workshop may be canceled if the number of registrations is less than 10.
Also, in the case of workshops with topics that are similar, two or more workshops may be
suggested to merge together.
Key Dates
• Submission of workshops proposals (via Easychair): October 13th, 2023
• Workshop notifications: November 3rd, 2023
• Workshop paper submission (tentative, recommended for presentation-oriented tracks): March 6th, 2024
• Workshop paper decision (recommended for presentation-oriented tracks): April 3rd, 2024
• Camera-ready due (recommended for presentation-oriented tracks): April 22nd, 2024
• Workshops: June 3rd-4th, 2024
Contact
For more information and inquiries, please feel free and welcome to contact the Workshop
Chairs at the following address: caise2024-workshops(a)easychair.org
**** Call for Papers ****
*31st Irish Conference on Artificial Intelligence and Cognitive Science
(AICS23)*
December 7th – 8th 2023 - Atlantic Technological University, Letterkenny,
Co. Donegal
Conference Website: https://www.aics.ie/
Submission Deadline: 24th September 2023
The 31st Irish Conference on Artificial Intelligence and Cognitive Science
(AICS 2023) will be hosted by Atlantic Technological University (ATU) in
collaboration with Ulster University. The conference will be held in-person
at the Letterkenny campus of ATU Donegal. We are also honoured that IEEE UK
and Ireland Computational Intelligence Chapter will provide technical
sponsorship of the conference. Accepted papers will be submitted for
inclusion into IEEE Xplore subject to meeting IEEE Xplore’s scope and
quality requirements.
With regular conferences dating back to 1988, the AICS Conference is
Ireland’s primary forum bringing together researchers in the fields of
Artificial Intelligence and Cognitive Science. The fields of Artificial
Intelligence and Cognitive Science, encompassing areas such as Data
Analytics, Natural Language Processing, Information Retrieval, and Machine
Learning, are now at the forefront of Irish computing research and
industry. The AICS 2023 program will include presentations of high-quality
theoretical and applied scientific papers and tutorials. Invited talks will
describe important topics relevant to the field.
*Topics of Interest*
We invite submissions in the broad areas of Artificial Intelligence and
Cognitive Science to communicate the advances and achievements in these
fields. Areas of interest include, but are not restricted to:
- Machine Learning
- Natural Language Processing
- Computer Vision
- Automated Reasoning
- Robotics
- Deep Learning
- Ethics of AI
- Cognitive Psychology
- Explainability / Interpretability
- Cyberpsychology
- Intelligent Systems
- Opinion Mining
- Recommender Systems
- Machine Translation
For a full list of areas associated with this conference please visit the
conference website at: https://www.aics.ie/
*Submission*
Accepted papers will be submitted for inclusion into IEEE Xplore subject to
meeting IEEE Xplore’s scope and quality requirements. We invite two types
of submissions:
- Full Paper Track: Full paper submissions should consist of original
contributions (describing either fundamental research, interesting
applications, in-use experiences, or reviews of the field) not published in
other forums. Papers should be no more than 6-8 pages (including
references) and these will be oral presentations at conferences. Papers
over 6 pages will be charged €90 per extra page up to a maximum of two
extra pages (8 pages maximum).
- Short Paper Track: This track is designed to facilitate students who
have recently completed a Master’s program or for PhD students. Short paper
submissions should consist of original contributions (describing
interesting applications or theoretical research) and the first author must
be a student. Papers should be a maximum of 4 pages (including references)
and these will be poster presentations at conferences.
The paper submission system used is EasyChair and the submission link is:
https://easychair.org/conferences/?conf=aics231
All papers will be reviewed by the Program Committee on the basis of
technical quality, relevance to AICS23, originality, significance and
clarity.
For full submission instructions please visit:
https://www.aics.ie/submission.html
*Organisation*
*Conference Chairs*
Dr Kevin Meehan (Atlantic Technological University) - General Chair
Dr Muskaan Singh (Ulster University) - Co-chair
Dr Karla Muñoz Esquivel (Atlantic Technological University) - Co-chair
Dr Fatemeh Golpayegani, (University College Dublin) - Co-chair
*Organising Committee*
Dr Jennifer Hyndman (Atlantic Technological University)
Professor Michaela Black (Ulster University)
Dr Gerry McWilliams (Atlantic Technological University)
Dr Bryan Gardiner (Ulster University)
Professor Hujun Yin (University of Manchester)
Thank you on behalf of the chairs and organising committee.
The Natural Language Processing Chair at JMU Würzburg (WüNLP)
<https://www.informatik.uni-wuerzburg.de/nlp/wuenlp/> as a member of the Center
for AI and Data Science (CAIDAS) <https://www.uni-wuerzburg.de/caidas/>
offers one research position in the area of natural language processing
(NLP).
The position is bound to the project EQUIFAIR (Equitably Fair and
Trustworthy Language Technology), funded by the Alcatel-Lucent Stiftung as
part of the “Responsible AI” program. The project targets two major issues
related to *alignment of large language models* (LLMs): hallucinations and
societal biases, to be addressed in a (massively) multilingual context
(i.e., across a wide variety of natural languages) and in an
explainable/interpretable manner.
The applicants should have a doctoral degree in Computer Science,
Mathematics, or a related discipline. Substantial experience in modern NLP
(i.e., LLM-based) research is expected, as well as prior publications at
top-tier NLP/ML venues (e.g., ACL, EMNLP, NAACL, NeurIPS, ICLR, ICML),
demonstrating ability to conduct research at the highest level. Prior
experience in any of the subareas relevant to the project, namely fair and
trustworthy NLP (hallucinations, biases), multilinguality, and/or
explainable/interpretable AI, is a plus.
The position is available for *two years*; the project will start as soon
as the suitable candidate for this position is found. The remuneration will
be at the level *E14* of the German federal wage agreement scheme (TV-L).
Please send your application with standard documents (letter of motivation,
curriculum vitae, academic records, optionally letter(s) or recommendation;
please concatenate all documents into a single PDF) at your earliest
convenience, but no later than *30.09.2023* to Prof. Dr. Goran Glavaš (
goran.glavas(a)uni-wuerzburg.de). Potential candidates are welcome to also
informally contact Prof. Glavaš for additional information. We take
diversity seriously and *especially encourage female and diversity*
*candidates to apply*.
WüNLP is one of the leading German NLP research labs. We are a young and
energetic research group, part of CAIDAS, the ambitious AI research center
of the
University of Würzburg. Würzburg is one of the most livable cities in
Germany.
More information:
https://www.informatik.uni-wuerzburg.de/nlp/news/single/news/open-position-…
(apologies for cross-posting)
Submission deadline for <https://sites.google.com/view/crac2023/> CRAC
2023, the Sixth Workshop on Computational Models of Reference, Anaphora and
Coreference, held at EMNLP 2023 <https://2023.emnlp.org/> , was extended to
September 15, 2023.
The website has all information but here are a few highlights:
● we welcome papers on any aspect of coreference, anaphora or bridging,
whether it’s annotation, analysis or resolution, intra- or cross-document
● we have various categories of papers (research papers, survey papers,
position papers and more)
● we were running a new edition the shared task on Multilingual Coreference
Resolution <https://ufal.mff.cuni.cz/corefud/crac23> using a harmonized
CorefUD dataset <https://ufal.mff.cuni.cz/corefud> , extended this year; the
task is already over
<https://codalab.lisn.upsaclay.fr/competitions/11800#results> but the
resource is there, ready to be played with
Please spread the news to everyone who might be still interested.
Best regards,
Maciej Ogrodniczuk (also on behalf of Vincent Ng, Sameer Pradhan, and
Massimo Poesio)
*** Apologies for Cross-Posting ***
The ArabicNLP 2023’s organization committee is pleased to announce the extension of the paper submission deadline by 1 week, for it to be on Tuesday the 12th of September 2023, 23:59 AOE.
The First Arabic Natural Language Processing Conference (ArabicNLP 2023) is co-located with EMNLP 2023 in Singapore.
Conference URL: https://arabicnlp2023.sigarab.org/
Submission URL: https://openreview.net/group?id=SIGARAB.org/ArabicNLP/2023/Conference
ArabicNLP 2023 invites the submission of original long, short, or demo papers in the area of Arabic Natural Language Processing. ArabicNLP 2023 builds on seven previous workshop editions, which have been extremely successful, drawing in a large active participation in various capacities. This conference is timely given the continued rise in research projects focusing on Arabic NLP. ArabicNLP 2023 will also feature shared tasks, allowing participants to work on specific NLP challenges related to Arabic language processing. The conference is organized by the Special Interest Group on Arabic NLP (SIGARAB), an Association for Computational Linguistics Special Interest Group on Arabic Natural Language Processing.
Important Dates:
* (Extended) September 12, 2023: conference papers due date
* October 12, 2023: notification of acceptance
* October 20, 2023: camera-ready papers due
* December 7, 2023: conference day
All deadlines are 11:59 pm UTC -12h (“Anywhere on Earth”).
We accept long (up to 8 pages), short (up to 4 pages), and demo paper (up to 4 pages) submissions. Long and short papers will be presented orally or as posters as determined by the program committee.
Submissions are invited on topics that include, but are not limited to, the following:
Enabling core technologies: language models and large language models, morphological analysis, disambiguation, tokenization, POS tagging, named entity detection, chunking, parsing, semantic role labeling, sentiment analysis, Arabic dialect modeling, etc.
Applications: dialog modeling, machine translation, speech recognition, speech synthesis, optical character recognition, pedagogy, assistive technologies, social media, etc.
Resources: dictionaries, annotated data, corpora, etc.
Submissions may include work in progress as well as finished work. Submissions must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, classical, or mixed. Papers on other languages sharing problems faced by Arabic NLP researchers, such as Semitic languages or languages using Arabic script, are welcome provided that they propose techniques or approaches that would be of interest to Arabic NLP, and they explain why this is the case. Additionally, papers on efforts using Arabic resources but targeting other languages are also welcome. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work.
If you have any questions, please contact us at: arabicnlp-pc-chairs(a)sigarab.org
What's in a name? To mark our move from a workshop to a conference, we changed our acronym from WANLP to ArabicNLP.
The ArabicNLP 2023 Publicity Chairs,
Amr Keleg and Salam Khalifa
*The NEW submission DEADLINE is: 09 September 2023*
6th International Conference on Natural Language and Speech Processing
<http://icnlsp.org/2023welcome>
We are delighted to invite you to ICNLSP 2023, which will be held virtually
from December 16th to 17th, 2023.
ICNLSP 2023 offers the opportunity for attendees (researchers, academics
and students, and industrials) to share their ideas and to connect to each
other and make them up to date on the ongoing research in the field.
ICNLSP 2023 aims to attract contributions related to natural language and
speech processing. Authors are invited to present their work relevant to
the topics of the conference.
The following list includes the topics of ICNLSP 2023 but not limited to:
Signal processing, acoustic modeling.
Architecture of speech recognition system.
Deep learning for speech recognition.
Analysis of speech.
Paralinguistics in Speech and Language.
Pathological speech and language.
Speech coding.
Speech comprehension.
Summarization.
Speech Translation.
Speech synthesis.
Speaker and language identification.
Phonetics, phonology and prosody.
Cognition and natural language processing.
Text categorization.
Sentiment analysis and opinion mining.
Computational Social Web.
Arabic dialects processing.
Under-resourced languages: tools and corpora.
New language models.
Arabic OCR.
Lexical semantics and knowledge representation.
Requirements engineering and NLP.
NLP tools for software requirements and engineering.
Knowledge fundamentals.
Knowledge management systems.
Information extraction.
Data mining and information retrieval.
Machine translation.
NLP for Arabic heritage documents.
*IMPORTANT DATES*
Submission deadline: *31 August 2023*
Notification of acceptance: *31 October 2023*
Camera-ready paper due: *20 November 2023*
Conference dates: *16, 17 December 2023*
*PUBLICATION*
1- All accepted papers will be published in ACL Anthology (
https://aclanthology.org/venues/icnlsp/).
2- Selected papers will be published in Signals and Communication
Technology (Springer) (https://www.springer.com/series/4748), indexed by
Scopus and zbMATH.
*KEYNOTE SPEAKERS*
Alex Waibel, Carnegie Mellon University, USA
Najim Dehak, Johns Hopkins University, USA
For more details, visit the conference website: https://www.icnlsp
.org/2023welcome
*CONTACT*
icnlsp(at)gmail(dot)com
Best regards,
Mourad Abbas
All (with apologies for cross-posting)
The U.S. Copyright Office is conducting a study regarding the copyright
issues raised by generative artificial intelligence (AI). This study will
collect factual information and policy views relevant to copyright law and
policy. The Office will use this information to analyze the current state
of the law, identify unresolved issues, and evaluate potential areas for
congressional action.
Please go here to submit your comments:
https://www.copyright.gov/policy/artificial-intelligence/?fbclid=IwAR33KxMI…
There will be 2 rounds of opportunities to comment:
Initial written comments are due by 11:59 p.m. Eastern time on October 18,
2023.
Reply comments are due by 11:59 p.m. eastern time on November 15, 2023.
Here is a quick and accessible write-up by The Verge as to why this is
relevant/important:
https://www.theverge.com/2023/8/29/23851126/us-copyright-office-ai-public-c…
<https://www.theverge.com/2023/8/29/23851126/us-copyright-office-ai-public-c…>
Please feel welcome to share this with your networks, as my friend who
works for the copyright office says they want as much feedback as they can
get. I do not think you have to be a US citizen to participate.
Very best wishes
Heather Froehlich
--
Dr Heather Froehlich
w // http://hfroehli.ch
t // @heatherfro
I second Edyta's points too.
I have been on this list since 2015 and since then, the mailing list's standout feature has lied in its informative capacity to circulate calls for papers and job opportunities. While occasional "discussions" have also been a breath of fresh air, the current discourse doesn't quite align with this sentiment.
It would be more beneficial if the list could enhance its utility by containing intense discussions privately rather than disseminating them widely.
Thanks.
Best regards,
Sina Ahmadi
Postdoctoral Researcher & Adjunct Lecturer
Geroge Mason University
http://sinaahmadi.github.io/
**On the job market! I'm seeking out new opportunities to collaborate and innovate as a researcher and lecturer (in Europe).**
________________________________
De : Daniela Cesiri via Corpora <corpora(a)list.elra.info>
Envoyé : mercredi 30 août 2023 11:32
À : Edyta Jurkiewicz-Rohrbacher <edytaj(a)gmail.com>
Cc : corpora(a)list.elra.info <corpora(a)list.elra.info>
Objet : [Corpora-List] Re: RANLP 2023 Call for Participation
Dear All,
I agree with Edyta's polite remarks.
I find the discussions below purely informative posts quite confusing, and I am "losing track" of the original posts to the point that I fear I might miss calls that could be relevant for my work, or miss discussions that are worth joining. Before Edyta's remarks I was even considering leaving the list because of the current situation in the list.
So, I join Edyta's kind request to keep discussions as separate threads and leave call for papers/abstracts or job calls as purely informative posts.
Perhaps opening a new, separate discussion thread might be an alternative option that would allow us to filter the different kinds of communications we received from the list.
Best wishes to everyone,
Daniela Cesiri
Il Mer 30 Ago 2023, 17:15 Edyta Jurkiewicz-Rohrbacher via Corpora <corpora(a)list.elra.info<mailto:corpora@list.elra.info>> ha scritto:
Dear Ada, dear all,
I'm a bit concerned with what has been going with the list recently.
The list, as far as I understand, serves several purposes. One of them
is purely informative, where one informs the community about
potentially interesting jobs, conferences etc. If I open an answer to
a job advertisment, I expect it will be a question useful for the
potential applicants or correction about, for example, deadlines.
Another thing is to ask questions or start some discussions on
various topics, either theoretical or purely practical. There I will
expect people sharing their experience and opinions.
What I do not find ok, is giving the feedback to purely informational
posts in the way Ada does. In my opinion the discussions whether words
or sentences are up-to-date concepts in any (general)linguistic or
computational linguistic framework should be led in separate threads.
(Notice also that the problem of text segmentation has been topic for
already long time.) Summing up, I wouldn't mind if Adas comments were
presented maybe privately to the authors of posts, or discussed in
separate list-mails. Otherwise, we are facing chaos here.
Summing up, I would be more than happy to participate, if discussions
about the relation between linguistics and NLP took place, but not
mixed with advertisments.
I hope I did not offend anybody with this message.
Best,
Edyta Jurkiewicz-Rohrbacher
śr., 30 sie 2023 o 16:35 Gilles Sérasset via Corpora
<corpora(a)list.elra.info<mailto:corpora@list.elra.info>> napisał(a):
>
> Dear Ada, dear all,
>
> I am not a linguist but a computational scientist which is quite used to talk with (and tries to understand) linguists. I must say that I usually read your mails as thoroughly as my schedule and patience allows me to, but, to be honest, I also have a rather negative feeling when reading your “discourse”.
>
> In this discourse, I see facts + interpretation + rhetorics.
>
> [Here I take the risk of caricaturing for the sake of shortness, I hope you will understand that I have no time nor intention to really go deeply in all the intricacies of your different claims as I am more a witness than an actor of this scientific dispute]
>
> My understanding of your facts: Neural models do not use the concept of word in any of their tasks, but achieve very interesting results in their modelling of the language.
>
> My understanding of your interpretation: this is the proof that there is no such thing as a word.
>
> My understanding of your rhetoric: linguists are still using “words”, so they are wrong or dishonest or miseducated or dumb, we should wipe out entirely any occurence of this concept and start over with another modelling of the language.
>
> Please, understand that I am just presenting the way I am interpreting your different messages. And even if I am wrong here, this interpretation is to be taken into account as we are all persons with feeling. This feeling is a fact, even if I do not particularly feel targeted by your different criticisms. I hope this will help you ponder the terms involved in your next messages.
>
> This being said, I was not particularly surprised to see some “passionate” replies to your different messages. And I agree with everyone here, we should not go into such passion and use ad-hominem attacks on a mailing list, AND you should also understand that most of your rhetoric do contains such passion and attacks.
>
>
>
>
> Concerning the facts :
>
> You are right, Neural models does not use any notion of word (or word morphology) as it is usually thought in linguistics as it usually first decide what is the granularity with which it will aggregate its input (sequence of characters) into tokens to which it attaches an “interpretation” (modelled as a multi-dimensional vector).
>
>
>
>
> Concerning the interpretation :
>
> 1. You want to wipe out the notion of word based on such a fact. I would agree somehow if we were dealing with a universal modelling of language, but this is not the case. Human model language in a certain way and neural models in another way (even if neural networks are claimed to be inspired by biological neurones in our brains). The fact that a concept does not exist in a model does not entail that it does not exist in another model.
>
>
> 2. Also, you do make the very same mistake concerning the way you look at the facts: i.e. there is no such thing as a character…, which means that the input of NN is already flown with a bias with which we look at language. Indeed characters are a very recent invention that builds on different concerns:
> - usual graphical elements that are traditionally used in language writing and that has been interpreted as atomic,
> - their interpretation by the encoding authorities (see the differences and debates about code points vs characters)
> - arbitrary decision made (e.g. why model A and a as 2 different characters?)
> Moreover, all corpora are usually badly encoded by using one character for another (quote instead of apostrophe, unbreakable character instead of a space, …) and this only accounts for languages with a writing system or transcription, i.e. not the majority of them.
>
> The conclusion is that even Neural Network uses artificial bias in the way they model language, which means that the conclusion we draw from them are as flawed as the one we draw from the classical way linguists look at languages.
>
>
> 3. Most serious linguists never defined “words” lightly and most of them know that this concept is an "approximation” of something that is very difficult to apprehend and seems to be more grounded into linguistics from human introspection than linguistics from corpora. It somehow represents the way our human brain aggregates the atoms of the language (characters/phonemes) into something to which we associate an interpretation. In this sense, it is somehow the “tokens” of our biological neural network (and certainly far more).
>
> As an utterance production is not a bijection between whatever we have in our head and the sequential signal we use to communicate, I agree with you on the fact that “words" are certainly not present in a corpus (but I do think that our inner “tokens” may be observed somehow there).
>
>
> Concerning the rhetoric:
>
> I do not think any linguist or computational linguist is naive enough to think that any of the modelling we deal with are a “truth” and I doubt any of them is miseducated enough to think that “words” are clearly defined and undoubtedly present in corpora. I do think though that they are usually right to observe occurrences (or hints) of non atomic constructs we associate with some interpretation. I also think that this way of looking to a corpus has some advantages that are not really present in NN (for instance, it can observe some regularity that will help human produce new utterances without being shown a large amount of examples).
>
> I also do think that even if you were totally right in your facts and interpretations, asking for a denial of current/past ways of looking to the texts will be a mistake. Even in physics, since the general theory of relativity, we know the classical mechanics is wrong, however it is still in use and it is not a problem as long as everybody know under which hypothesis it is a good enough approximation and under which hypothesis it does not work anymore.
>
>
>
> I know this message will certainly not make you think differently, but if it allows you to communicate differently with persons that still use the terms “words" or “sentences" as a simple shortcut to position their work into a shared/common understanding of the state of the art, in contexts where there is no room for better explanation (e.g. in summaries of their keynote speech), then I will have achieved something.
>
> Hoping this scientifical debate will continue in an appeased manner,
>
> Regards,
>
> Gilles Sérasset,
>
> _______________________________________________
> Corpora mailing list -- corpora(a)list.elra.info<mailto:corpora@list.elra.info>
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to corpora-leave(a)list.elra.info<mailto:corpora-leave@list.elra.info>
_______________________________________________
Corpora mailing list -- corpora(a)list.elra.info<mailto:corpora@list.elra.info>
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave(a)list.elra.info<mailto:corpora-leave@list.elra.info>
Nota automatica aggiunta dal sistema di posta
Sostieni il futuro
Dona il tuo 5x1000 al Collegio Internazionale Ca' Foscari
FINANZIAMENTO DELLA RICERCA SCIENTIFICA E DELLA UNIVERSITÀ | CODICE FISCALE: 80007720271