June 2023 - Corpora - ELRA lists

CFP for NLP-OSS @ EMNLP2023
by Elijah Rippeth 09 Jun '23

09 Jun '23

(apologies for cross-posting) ---------------------------------------------------------------- *Workshop for NLP Open Source Software (NLP-OSS)* 06 Dec 2023, Co-located with EMNLP 2023 https://nlposs.github.io/ Deadline for Long and Short Paper submission: 09 August, 2023 (23:59, GMT-11) ---------------------------------------------------------------- You have tried to use the latest, bestest, fastest LLM models and bore grievances but found the solution after hours of coffee and computer staring. Share that at NLP-OSS and suggest how open source could change for the better (e.g. best practices, documentation, API design etc.) You came across an awesome SOTA system on NLP task X and no LLM has beaten its F1 score. However, the code is now stale and it takes a dinosaur to understand the code. Share your experience at NLP-OSS and propose how to "replicate" these forgotten systems. You see this shiny GPT from a blog post, tried it to reproduce similar results on a different task and it just doesn't work on your dataset. You did some magic to the code and now it works. Show us how you did it! Though they're small tweaks, well-motivated and empirically tested are valid submissions to NLP-OSS. You have tried 101 NLP tools and there's none that really do what you want. So you wrote your own shiny new package and made it open source. Tell us why your package is better than the existing tools. How did you design the code? Is it going to be a one-time thing? Or would you like to see thousands of people using it? You have heard enough of open-source LLM and pseudo-open-source GPT but not enough about how it can be used for your use-case or your commercial product at scale. So you contacted your legal department and they explained to you about how data, model and code licenses work. Sharing the knowledge with the NLP-OSS community. You have a position/opinion to share about free vs open vs closed source LLMs and have valid arguments, references or survey/data to support your position. We would want to hear more about it. At last, you've found the avenue to air these issues in an academic platform at the NLP-OSS workshop!!! Sharing your experiences, suggestions and analysis from/of NLP-OSS P/S: 1st CALL FOR PAPERS ==== ---------------------------------------------------------------- *Workshop for NLP Open Source Software (NLP-OSS)* 06 Dec 2023, Co-located with EMNLP 2023 https://nlposs.github.io/ Deadline for Long and Short Paper submission: 09 August, 2023 (23:59, GMT-11) ---------------------------------------------------------------- The Third Workshop for NLP Open Source Software (NLP-OSS) will be co-located with EMNLP 2023 on 06 Dec 2023. Focusing more on the social and engineering aspect of NLP software and less on scientific novelty or state-of-art models, the Workshop for NLP-OSS is an academic forum to advance open source developments for NLP research, teaching and application. NLP-OSS also provides an academic workshop to announce new software/features, promote the collaborative culture and best practices that go beyond the conferences. We invite full papers (8 pages) or short papers (4 pages) on topics related to NLP-OSS broadly categorized into (i) software development, (ii) scientific contribution and (iii) NLP-OSS case studies. - **Software Development** - Designing and developing NLP-OSS - Licensing issues in NLP-OSS - Backwards compatibility and stale code in NLP-OSS - Growing, maintaining and motivating an NLP-OSS community - Best practices for NLP-OSS documentation and testing - Contribution to NLP-OSS without coding - Incentivizing OSS contributions in NLP - Commercialization and Intellectual Property of NLP-OSS - Defining and managing NLP-OSS project scope - Issues in API design for NLP - NLP-OSS software interoperability - Analysis of the NLP-OSS community - **Scientific Contribution** - Surveying OSS for specific NLP task(s) - Demonstration, introductions and/or tutorial of NLP-OSS - Small but useful NLP-OSS - NLP components in ML OSS - Citations and references for NLP-OSS - OSS and experiment replicability - Gaps between existing NLP-OSS - Task-generic vs task-specific software - **Case studies** - Case studies of how a specific bug is fixed or feature is added - Writing wrappers for other NLP-OSS - Writing open-source APIs for open data - Teaching NLP with OSS - NLP-OSS in the industry Submission should be formatted according to the [EMNLP 2023 templates](https://2023.emnlp.org/call-for-papers) and submitted to [OpenReview](https://openreview.net/group?id=EMNLP/2023/Workshop/NLP-OSS) ORGANIZERS Geeticka Chauhan, Massachusetts Institute of Technology Dmitrijs Milajevs, Grayscale AI Elijah Rippeth, University of Maryland Jeremy Gwinnup, Air Force Research Laboratory Liling Tan, Amazon

1 0

Call for participation - Arabic NER Shared Task 2023
by nagham ghanim 09 Jun '23

09 Jun '23

Dear colleagues, We are happy to invite you to join the *Arabic NER SharedTask 2023* <https://dlnlp.ai/st/wojood/> which will be organized as part of the WANLP 2023. We will provide you with a large corpus and Google Colab notebooks to help you reproduce the baseline results. دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها. *INTRODUCTION* Named Entity Recognition (NER) is integral to many NLP applications. It is the task of identifying named entity mentions in unstructured text and classifying them to predefined classes such as person, organization, location, or date. Due to the scarcity of Arabic resources, most of the research on Arabic NER focuses on flat entities and addresses a limited number of entity types (person, organization, and location). The goal of this shared task is to alleviate this bottleneck by providing Wojood, a large and rich Arabic NER corpus. Wojood consists of about 550K tokens (MSA and dialect, in multiple domains) that are manually annotated with 21 entity types. *REGISTRATION* Participants need to register via this form ( *https://forms.gle/UCCrVNZ2LaPviCZS6* <https://forms.gle/UCCrVNZ2LaPviCZS6>). Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups. *FAQ* For any questions related to this task, please check our *Frequently Asked Questions* <https://docs.google.com/document/d/1XE2n89mFLic2P9DO_sAD51vy734BOt0kgtZ6bFf…> *IMPORTANT DATES* - March 03, 2023: Registration available - May 25, 2023: Data-sharing and evaluation on development set Avaliable - June 10, 2023: Registration deadline - July 20, 2023: Test set made available - July 30, 2023: Evaluation on test set (TEST) deadline - Augest 29, 2023: Shared task system paper submissions due - October 12, 2023: Notification of acceptance - October 30, 2023: Camera-ready version - TBA: WANLP 2023 Conference. ** All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).* *CONTACT* For any questions related to this task, please contact the organizers directly using the following email address: *NERShare...(a)gmail.com <https://groups.google.com/>* or join the google group: *https://groups.google.com/g/ner_sharedtask2023* <https://groups.google.com/g/ner_sharedtask2023>. *SHARED TASK* As described, this shared task targets both flat and nested Arabic NER. The subtasks are: *Subtask 1:* *Flat NER* In this subtask, we provide the Wojood-Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). The flat NER dataset is the same as the nested NER dataset in terms of train/test/dev split and each split contains the same content. The only difference in the flat NER is each token is assigned one tag, which is the first high-level tag assigned to each token in the nested NER dataset. *Subtask 2:* *Nestd NER* In this subtask, we provide the Wojood-Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). *METRICS* The evaluation metrics will include precision, recall, F1-score. However, our official metric will be the micro F1-score. The evaluation of shared tasks will be hosted through CODALAB. Teams will be provided with a CODALAB link for each shared task. -*CODALAB link for NER Shared Task Subtask 1 (Flat NER)* <https://codalab.lisn.upsaclay.fr/competitions/11594> -*CODALAB link for NER Shared Task Subtask 2 (Nestd NER)* <https://dlnlp.ai/st/wojood/> *BASELINES* Two baseline models trained on Wojood (flat and nested) are provided: *Nested NER baseline:* is presented in this *article* <https://aclanthology.org/2022.lrec-1.387/>, and code is available in *GitHub* <https://github.com/SinaLab/ArabicNER>. The model achieves a micro F1-score of 0.9059 (note that this baseline does not handle nested entities of the same type). *Flat NER baseline:* same code repository for nested NER (*GitHub* <https://github.com/SinaLab/ArabicNER>) can also be used to train flat NER task. Our flat NER baseline achieved a micro F1-score of 0.8785. *GOOGLE COLAB NOTEBOOKS* To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models. [1] *Train Flat NER* <https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>: This notebook can be used to train our ArabicNER model on the flat NER task using the sample Wojood data found in our repository. [2] *Evaluate Flat NER* <https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. [3] *Train Nested NER* <https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>: This notebook can be used to train our ArabicNER model on the nested NER task using the sample Wojood data found in our repository. [4] *Evaluate Nested NER* <https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. *ORGANIZERS* - Mustafa Jarrar, Birzeit University - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI - Mohammed Khalilia, Birzeit University - Bashar Talafha, University of British Columbia - AbdelRahim Elmadany, University of British Columbia - Nagham Hamad, Birzeit University - Alaa Omer, Birzeit University

1 0

Journal of Open Humanities Data: 2023 call for papers (second call, with a twist)
by Barbara McGillivray 09 Jun '23

09 Jun '23

Dear all, with apologies for cross-postings, I'm sharing again the 2023 call for papers of the Journal of Open Humanities Data. This time we've added an explicit mention of large language model prompts and prompt engineering strategies among the language resources of interest to the journal, plus a reminder that our Covid-19 special collection is still accepting submissions. We've also explicitly included Library Science and Media Studies in the scope. Kind regards, Barbara Call for Papers for 2023 The Journal of Open Humanities Data (JOHD)<https://openhumanitiesdata.metajnl.com/> features peer-reviewed publications describing humanities research objects with high potential for reuse. These might include curated resources like (annotated) linguistic corpora, ontologies, and lexicons, as well as databases, maps, atlases, linked data objects, and other data sets created with qualitative, quantitative, or computational methods, including large language model prompts and prompt engineering strategies. We are currently inviting submissions of two varieties: 1. Short data papers contain a concise description of a humanities research object with high reuse potential. These are short (1,000 words) highly structured narratives. A data paper does not replace a traditional research article, but rather complements it. 2. Full length research papers discuss and illustrate methods, challenges, and limitations in humanities research data creation, collection, management, access, processing, or analysis. These are intended to be longer narratives (3,000 - 5,000 words), which give authors the ability to contribute to a broader discussion regarding the creation of research objects or methods. Humanities subjects of interest to the JOHD include, but are not limited to Art History, Classics, History, Library Science, Linguistics, Literature, Media Studies, Modern Languages, Music and musicology, Philosophy, Religious Studies, etc. Research that crosses one or more of these traditional disciplinary boundaries is highly encouraged. Authors are encouraged to publish their data in recommended repositories<https://openhumanitiesdata.metajnl.com/about/#repo>. More information about the submission process<https://openhumanitiesdata.metajnl.com/about/submissions>, editorial policies<https://openhumanitiesdata.metajnl.com/about/editorialpolicies/> and archiving<https://openhumanitiesdata.metajnl.com/about/> is available on the journal’s web pages. Submissions are still open for our special collection, Humanities Data in the Time of COVID-19<https://openhumanitiesdata.metajnl.com/collections/humanities-data-in-the-t…>. This collection includes data papers that span various areas of enquiry about the COVID-19 pandemic through the lens of the Humanities. Data from this period have far-reaching and impactful reuse potential, so we encourage you to share your data by submitting to this growing collection. JOHD provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge. We accept online submissions via our journal website. See Author Guidelines <https://openhumanitiesdata.metajnl.com/about/submissions/> for further information. Alternatively, please contact the editor<https://openhumanitiesdata.metajnl.com/contact/> if you are unsure as to whether your research is suitable for submission to the journal. Authors remain the copyright holders and grant third parties the right to use, reproduce, and share the article according to the Creative Commons<http://creativecommons.org/licenses/by/4.0/> licence agreement.

1 0

Re: Visualizing graphs of words as nodes from text lines (not just adjacency pairs) ...
by Albretch Mueller 09 Jun '23

09 Jun '23

Thank you! igraph seems to be more Linux/Debian friendly. There is a "GNU R network analysis and visualization" package: r-cran-igraph So far I have found: https://cran.r-project.org/web/packages/igraph/ https://cran.r-project.org/web/packages/igraph/igraph.pdf and a bunch of videos/tutorials, which I will have a better opinion about after I watch them. I will keep publicly posting my experiences to help those running against the same kinds of problems. $ time apt-cache search gephi real 0m0.267s user 0m0.255s sys 0m0.012s $ time apt-cache search igraph karbon - vector graphics application for the Calligra Suite cl-graph - simple graph data structure and algorithms libdirgra-java - Java library providing a simple directed graph implementation libdirgra-java-doc - Documentation for dirgra fonts-bajaderka - Warsaw's sign painters styled font fonts-gfs-neohellenic - modern Greek font family with matching Latin fonts-gfs-solomos - ancient Greek oblique font fonts-isabella - Isabella free TrueType font fonts-sil-annapurna - smart font for languages using Devanagari script fonts-uralic - Truetype fonts for Cyrillic-based Uralic languages golang-github-guptarohit-asciigraph-dev - Make lightweight ASCII line graph in CLI apps with no other dependencies golang-github-jesseduffield-asciigraph-dev - Go package to make lightweight ASCII line graph without dependencies golang-github-steveyen-gtreap-dev - gtreap is an immutable treap implementation in the Go Language gpw - Trigraph Password Generator libigraph-dev - library for creating and manipulating graphs - development files libigraph-examples - library for creating and manipulating graphs - example files libigraph1 - library for creating and manipulating graphs libjgrapht0.6-java - mathematical graph theory library for Java libjgrapht0.8-java - mathematical graph theory library for Java libtext-password-pronounceable-perl - Perl module to generate pronounceable passwords liwc - Tools for manipulating C source code msort - utility for sorting records in complex ways libnauty2 - library for graph automorphisms -- library package libnauty2-dev - library for graph automorphisms -- development package nauty - library for graph automorphisms -- interface and tools nauty-doc - library for graph automorphisms -- user guide otp - Generator for One Time Pads or Passwords perl-tk - Perl module providing the Tk graphics library python3-igraph - High performance graph data structures and algorithms (Python 3) r-cran-graphlayouts - GNU R additional layout algorithms for network visualizations r-cran-gwidgets - gWidgets API for Toolkit-Independent, Interactive GUIs r-cran-igraph - GNU R network analysis and visualization r-cran-propclust - Propensity Clustering and Decomposition scalable-cyrfonts-tex - Scalable Cyrillic fonts for TeX texlive-pictures - TeX Live: Graphics, pictures, diagrams texlive-fonts-extra - TeX Live: Additional fonts texlive-latex-extra - TeX Live: LaTeX additional packages tran - transcribe between character scripts (alphabets) vis - Modern, legacy free, simple yet efficient vim-like editor real 0m0.303s user 0m0.283s sys 0m0.020s $ On 6/9/23, David Chartash <dchartas(a)ieee.org> wrote: > Hi Albretch, > I would start off with Gephi <https://gephi.org/> or try the R/C/Python... > package igraph <https://igraph.org/>. > Cheers, > > David > --- > Please forgive any spelling errors, sent from a poorly implemented software > keyer > > On Fri, Jun 9, 2023, 02:40 Albretch Mueller via Corpora < > corpora(a)list.elra.info> wrote: > >> I could imagine, as John Lennon used to sing, that "I am not the only >> one" in need of such an application. >> >> At times you get ten of thousand lines which you would like to >> quickly “visually parse” to gain a general sense of what you've got. >> Ideally, you should be able to play with it to select the records you >> need. >> >> Think for example, of the many links to texts you would get from >> archive.org (which also includes some metadata) or *.pub (each site >> using their own quirkiness) >> >> Based on some sort of GUI, you would see weighted terms (coloured or >> not based on a user's preference) with all other terms preceding (as >> some sort of tree-like structure confluent on that term) and following >> it ( ... branching off of it). >> >> Which kind of applications people use to do such thing? >> >> lbrtchx >> _______________________________________________ >> Corpora mailing list -- corpora(a)list.elra.info >> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ >> To unsubscribe send an email to corpora-leave(a)list.elra.info >> >

1 0

Visualizing graphs of words as nodes from text lines (not just adjacency pairs) ...
by Albretch Mueller 09 Jun '23

09 Jun '23

I could imagine, as John Lennon used to sing, that "I am not the only one" in need of such an application. At times you get ten of thousand lines which you would like to quickly “visually parse” to gain a general sense of what you've got. Ideally, you should be able to play with it to select the records you need. Think for example, of the many links to texts you would get from archive.org (which also includes some metadata) or *.pub (each site using their own quirkiness) Based on some sort of GUI, you would see weighted terms (coloured or not based on a user's preference) with all other terms preceding (as some sort of tree-like structure confluent on that term) and following it ( ... branching off of it). Which kind of applications people use to do such thing? lbrtchx

1 0

FInCausal 2023 Shared Task: Call for Participation (English & Spanish)
by FinCausal 2023 08 Jun '23

08 Jun '23

Call for Participation: *FinCausal-2023 Shared Task: “Financial Document Causality Detection” *is organised within the *5th Financial Narrative Processing Workshop (FNP 2023)* taking place in the 2023 IEEE International Conference on Big Data (IEEE BigData 2023) <http://bigdataieee.org/BigData2023/>, Sorrento, Italy, 15-18 December 2023. It is a *one-day event*. The exact date is to be announced. Important Dates: - Call for participation and registration: 3rd June 2023 - Registration deadline: 28 June - Training set release: 29 June 2023 - Test set release: 5 September 2023 - Systems submission deadline: 15 September 2023 - Release of results: 20 September 2023 - Paper submission deadline: 20 October 2023 - Notification of acceptance: November 12, 2023 - Camera-ready of accepted papers: November 20, 2023 - FNP Workshop: December 2023 Workshop URL: https://wp.lancs.ac.uk/cfie/fincausal2023/ Registration Form: https://forms.gle/29E161a8RmMosBLU8. After completing the registration form, the practice set will be sent to participants. *Shared Task Description:* Financial analysis needs factual data and an explanation of the variability of these data. Data state facts but need more knowledge regarding how these facts materialised. Furthermore, understanding causality is crucial in studying decision-making processes. The *Financial Document Causality Detection Task* (FinCausal) aims at identifying elements of cause and effect in causal sentences extracted from financial documents. Its goal is to evaluate which events or chain of events can cause a financial object to be modified or an event to occur, regarding a given context. In the financial landscape, identifying cause and effect from external documents and sources is crucial to explain why a transformation occurs. Two subtasks are organised this year. *English FinCausal subtask *and* Spanish FinCausal subtask*. This is the first year where we introduce a subtask in Spanish. *Objective*: For both tasks, participants are asked to identify, given a causal sentence, which elements of the sentence relate to the cause, and which relate to the effect. Participants can use any method they see fit (regex, corpus linguistics, entity relationship models, deep learning methods) to identify the causes and effects. *English FinCausal subtask* - *Data Description: *The dataset has been sourced from various 2019 financial news articles provided by Qwam, along with additional SEC data from the Edgar Database. Additionally, we have augmented the dataset from FinCausal 2022, adding 500 new segments. Participants will be provided with a sample of text blocks extracted from financial news and already labelled. - *Scope: *The* English FinCausal subtask* focuses on detecting causes and effects when the effects are quantified. The aim is to identify, in a causal sentence or text block, the causal elements and the consequential ones. Only one causal element and one effect are expected in each segment. - *Length of Data fragments: *The* English FinCausal subtask* segments are made up of up to three sentences. - *Data format: *CSV files. Datasets for both the English and the Spanish subtasks will be presented in the same format. This shared task focuses on determining causality associated with a quantified fact. An event is defined as the arising or emergence of a new object or context regarding a previous situation. So, the task will emphasise the detection of causality associated with the transformation of financial objects embedded in quantified facts. *Spanish FinCausal subtask* - *Data Description: *The dataset has been sourced from a corpus of Spanish financial annual reports from 2014 to 2018. Participants will be provided with a sample of text blocks extracted from financial news, labelled through inter-annotator agreement. - *Scope: *The *Spanish FinCausal subtask* aims to detect all types of causes and effects, not necessarily limited to quantified effects. The aim is to identify, in a paragraph, the causal elements and the consequential ones. Only one causal element and one effect are expected in each paragraph. - *Length of Data fragments: *The *Spanish FinCausal subtask* involves complete paragraphs. - *Data format: *CSV files. Datasets for both the English and the Spanish subtasks will be presented in the same format. This shared task focuses on determining causality associated with both events or quantified facts. For this task, a cause can be the justification for a statement or the reason that explains a result. This task is also a relation detection task. *FinCausal Shared Task Organisers:* - Antonio Moreno-Sandoval (UAM, Spain) - Blanca Carbajo Coronado (UAM, Spain) - Doaa Samy (UCM, Spain) - Jordi Porta (UAM, Spain) - Dominique Mariko (Yseop, France) For any questions, please contact the organisers at *fincausal.2023(a)gmail.com <fincausal.2023(a)gmail.com>*

1 0

Call for Participation in Shared Tasks (WANLP2023): The 1st Arabic Natural Language Processing Conference
by Salam Khalifa 08 Jun '23

08 Jun '23

*** Apologies for Cross-Posting *** We are pleased to announce five exciting shared tasks <https://wanlp2023.sigarab.org/shared-tasks> as part of the 1st Conference on Arabic NLP (WANLP2023 <https://wanlp2023.sigarab.org/>) - co-located with EMNLP 2023 <https://2023.emnlp.org/> in Singapore. Please refer to the tasks’ websites for more information on participation and internal deadlines. General shared tasks deadlines are at the end of this call. Shared Task 1: NADI 2023 Description: Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. In addition to nuanced dialect identification at the country level, NADI 2022 offered a new subtask focused on country-level sentiment analysis. NADI 2023 continues this tradition of extending tasks beyond dialect identification. Namely, we propose a new open track subtask focused on machine translation (MT) in two directions (i) into Modern Standard Arabic (MSA) from five Arabic dialects and (ii) into any of the five dialects from MSA. Organizers: Muhammad Abdul-Mageed, Chiyu Zhang, El Moatez Billah Nagoudi, AbdelRahim Elmadany (The University of British Columbia, Canada), Houda Bouamor (Carnegie Mellon University, Qatar), and Nizar Habash (New York University Abu Dhabi). For more information, please visit the shared task’s website: <https://sites.google.com/view/arabic-gender-rewriting/> https://nadi.dlnlp.ai/ –––––––––––––––––––––––––––––– Shared Task 2: ArAIEval - Persuasion Techniques and Disinformation Detection in Arabic Text Description: The ArAIEval shared tasks include two tasks: (i) persuasion techniques detection, and (ii) disinformation detection. Organizers: Firoj Alam, Hamdy Mubarak, Maram Hasanain, Wajdi Zaghouani, Giovanni Da San Martino, and Preslav Nakov. For more information, please visit the shared task’s website: <https://sites.google.com/view/arabic-gender-rewriting/> https://araieval.gitlab.io/ –––––––––––––––––––––––––––––– Shared Task 3: Qur'an QA 2023 Description: This is a shared task of Arabic Reading Comprehension over the Holy Qur’an, aiming to trigger state-of-the-art question-answering and reading comprehension research on a book that is sacredly held by more than 1.8 billion people across the world. The shared task entails two subtasks: (A) Passage Retrieval (PR) task, and (B) Reading Comprehension (RC) task. Organizers: Tamer Elsayed, Rana Malhas, Watheq Mansour (Qatar University). For more information, please visit the shared task’s website: <https://sites.google.com/view/arabic-gender-rewriting/> https://sites.google.com/view/quran-qa-2023 –––––––––––––––––––––––––––––– Shared Task 4: WojoodNER Description: Due to the scarcity of Arabic resources, most of the research on Arabic NER focuses on flat entities and addresses a limited number of entity types (person, organization, and location). The goal of this shared task is to alleviate this bottleneck by providing Wojood; a large and rich Arabic NER corpus. Organizers: Muhammad Abdul-Mageed, Mohammed Khalilia, Nagham Hamad, Bashar Talafha, AbdelRahim Elmadany, Alaa’ Omar, Mustafa Jarrar. For more information, please visit the shared task’s website: <https://sites.google.com/view/arabic-gender-rewriting/> https://dlnlp.ai/st/wojood/ –––––––––––––––––––––––––––––– Shared Task 5: Arabic Reverse Dictionary Description: This shared task aims to address the Tip-of-Tongue (TOT) problem by developing a Reverse Dictionary (RD) system specifically for the Arabic language. Reverse dictionaries allow users to find words based on their meanings or definitions and can be useful for writers, crossword puzzle enthusiasts, non-native language learners, and anyone looking to expand their vocabulary. This shared task includes two subtasks: Arabic RD and Cross-lingual Reverse Dictionary (CLRD). Organizers: Rawan Al-Matham, Waad Alshammari, Abdulrahman AlOsaimy, Sarah Alhumoud, Afrah Altamimi, Abdullah Alfaifi. For more information, please visit the shared task’s website: https://samai.ksaa.gov.sa/sharedTask.html –––––––––––––––––––––––––––––– Important dates: - August 29, 2023: shared task papers due date - October 12, 2023: notification of acceptance - October 20, 2023: camera-ready papers due - December 7, 2023: Conference Day All deadlines are 11:59 pm UTC -12h <https://www.timeanddate.com/time/zone/timezone/utc-12> (“Anywhere on Earth”). The WANLP 2023 Publicity Chairs, Salam Khalifa and Amr Keleg -- Salam Khalifa PhD Student at Stony Brook Linguistics <https://www.linguistics.stonybrook.edu/>.

1 0

NLDB 2023 - Call for Participation
by Farid Meziane 08 Jun '23

08 Jun '23

NLDB 2023 28th International Conference on Natural Language & Information Systems 21-23 June 2023, University of Derby, United Kingdom https://www.derby.ac.uk/events/latest-events/nldb-2023/ The 28th International Conference on Natural Language & Information Systems will be held at the University of Derby, United Kingdom and will be a face to face event. This is a full three days event and the conference programme is now available . The University of Derby has a published policy regarding email and reserves the right to monitor email traffic. If you believe this was sent to you in error, please reply to the sender and let them know. Key University contacts: http://www.derby.ac.uk/its/contacts/

1 0

CFP - Second Workshop on Text Simplification, Accessibility and Readability - TSAR 2023 @ RANLP
by Horacio Saggion 08 Jun '23

08 Jun '23

Second Workshop on Text Simplification, Accessibility and Readability - TSAR 2023 @ RANLP Jointly with the Recent Advances in Natural Language Processing Conference RANLP 2023 https://tsar-workshop.github.io/http://ranlp.org/ranlp2023/First Call for PapersImportant Dates Submission deadline: 10 July 2023 Notification of acceptance: 5 August 2023 Camera-ready papers due: 25 August 2023 Workshop: 7 or 8 September 2023 Web provides an abundance of knowledge and information that can reach large populations. However, the way in which a text is written (vocabulary, syntax, or text organization/structure), or presented, can make it inaccessible to many people, especially to non-native speakers, people with low literacy, and people with some type of cognitive or linguistic impairments. The results of Adult Literacy Survey (OECD, 2023) indicate that approximately 16.7% of the adult population (averaged over 24 highly-developed countries) requires lexical, 50% syntactic, and 89.4% conceptual simplification of everyday texts (Štajner, 2021). Research on automatic text simplification (TS), textual accessibility, and readability thus have the potential to improve social inclusion of marginalised populations. These related research areas have increasingly attracted more and more attention in the past ten years, evidenced by the growing number of publications in NLP conferences. While only about 300 articles in Google Scholar mentioned TS in 2010, this number has increased to about 600 in 2015 and is greater than 1000 in 2020 (Štajner, 2021). Recent research in automatic text simplification has mostly focused on proposing the use of methods derived from the deep learning paradigm (Glavaš and Štajner, 2015; Paetzold and Specia, 2016; Nisioi et al., 2017; Zhang and Lapata, 2017; Martin et al., 2020; Maddela et al., 2021; Sheang and Saggion, 2021). However, there are many important aspects of the automatic text simplification that need the attention of our community: the design of appropriate evaluation metrics, the development of context-aware simplification solutions, the creation of appropriate language resources to support research and evaluation, the deployment of simplification in real environments for real users, the study of discourse factors in text simplification, the identification of factors affecting the readability of a text, etc. To overcome those issues, there is a need for collaboration of CL/NLP researchers, machine learning and deep learning researchers, UI/UX and Accessibility professionals, as well as public organisations representatives (Štajner, 2021). The proposed TSAR workshop builds upon the recent success of several workshops that covered a subset of our topics of interest, including the SEPLN 2021 Current Trends in Text Simplification (CTTS) and the SimpleText workshop at CLEF 2021, the TSAR-2022 at EMNLP 2022, the recent Special Issue on Text Simplification, Accessibility, and Readability at Frontiers in AI, as well as the birds-of-a-feather event on Text Simplification at NAACL 2021 (over 50 participants). The TSAR workshop aims to foster collaboration among all parties interested in making information more accessible to all people. We will discuss recent trends and developments in the area of automatic text simplification, text accessibility, automatic readability assessment, language resources and evaluation for text simplification, etc. Topics We invite contributions on the following topics (among others): - Lexical simplification; - Syntactic simplification; - Modular and end-to-end TS; - Sequence-to-sequence and zero-shot TS; - Controllable TS; - Text complexity assessment; - Complex word identification and lexical complexity prediction; - Corpora, lexical resources, and benchmarks for TS; - Evaluation of TS systems; - Domain specific TS (e.g. health, legal); - Other related topics (e.g. empirical and eye-tracking studies); - Assistive technologies for improving readability and comprehension including those going beyond text. Submissions We welcome two types of papers: long papers and short papers. Submissions should be made to: https://softconf.com/ranlp23/TSAR/ The papers should present novel research. The review will be double blind and thus all submissions should be anonymized. Format: Paper submissions must use the official RANLP 2023 Templates <http://ranlp.org/ranlp2023/index.php/submissions/>, which are available as an Overleaf <https://www.overleaf.com/latex/templates/instructions-for-ranlp-2023-procee…> template and also downloadable directly (Latex <http://ranlp.org/ranlp2023/Templates/ranlp2023-LaTeX.zip> and Word <http://ranlp.org/ranlp2023/Templates/ranlp2023-word.docx>). Authors may not modify these style files or use templates designed for other conferences. Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review. Long Papers: Long papers must describe substantial, original, completed, and unpublished work. Wherever appropriate, concrete evaluation and analysis should be included. Long papers may consist of up to eight (8) pages of content, plus unlimited pages of references. Final versions of long papers will be given one additional page of content (up to 9 pages), so that reviewers’ comments can be taken into account. Long papers will be presented orally or as posters as determined by the program committee. The decisions as to which papers will be presented orally and which as poster presentations will be based on the nature rather than the quality of the work. There will be no distinction in the proceedings between long papers presented orally and long papers presented as posters. Short Papers: Short paper submissions must describe original and unpublished work. Please note that a short paper is not a shortened long paper. Instead, short papers should have a point that can be made in a few pages. Some kinds of short papers include: a small, focused contribution; a negative result; an opinion piece; an interesting application nugget. Short papers may consist of up to four (4) pages of content, plus unlimited pages of references. Final versions of short papers will be given one additional page of content (up to 5 pages), so that reviewers' comments can be taken into account. Short papers will be presented orally or as posters as determined by the program committee. While short papers will be distinguished from long papers in the proceedings, there will be no distinction in the proceedings between short papers presented orally and short papers presented as posters. Demo papers: should be no more than two (2) pages, including references, and should describe implemented systems related to the topics of interest of the workshop. It also should include a link to a short screencast of the working software. In addition, authors of demo papers must be willing to present a demo of their system during TSAR 2023. -- Professor Horacio Saggion Head of the Large Scale Text Understanding Systems Lab Full Professor / Chair in Computer Science and Artificial Intelligence TALN / DTIC Deputy Director for Recruitment Universitat Pompeu Fabra [image: https://twitter.com/h_saggion] [image: https://www.linkedin.com/in/horacio-saggion-1749b916]

1 0

Extended deadline: PhD on generation of Norwegian news texts at U. of Bergen
by Koenraad de Smedt 08 Jun '23

08 Jun '23

Deadline extended to June 30! The University of Bergen invites applications for a PhD position at MediaFutures: Research Centre for Responsible Media Technology & Innovation. The position will be on Norwegian Language Technology, in particular the generation and adaptation (including summarization) of Norwegian news texts. The methodology could include generative neural encoder-decoder architectures using large Norwegian language models. Announcement in Norwegian and English: https://www.jobbnorge.no/en/available-jobs/job/244992/phd-research-fellowsh…

1 0

2026

2025

2024

2023

2022

Corpora June 2023