(apologies for cross-posting)
----------------------------------------------------------------
*Workshop for NLP Open Source Software (NLP-OSS)*
06 Dec 2023, Co-located with EMNLP 2023
https://nlposs.github.io/
Deadline for Long and Short Paper submission: 09 August, 2023
(23:59, GMT-11)
----------------------------------------------------------------
You have tried to use the latest, bestest, fastest LLM models and bore
grievances but found the solution after hours of coffee and computer
staring. Share that at NLP-OSS and suggest how open source could
change for the better (e.g. best practices, documentation, API design
etc.)
You came across an awesome SOTA system on NLP task X and no LLM has
beaten its F1 score. However, the code is now stale and it takes a
dinosaur to understand the code. Share your experience at NLP-OSS and
propose how to "replicate" these forgotten systems.
You see this shiny GPT from a blog post, tried it to reproduce similar
results on a different task and it just doesn't work on your dataset.
You did some magic to the code and now it works. Show us how you did
it! Though they're small tweaks, well-motivated and empirically tested
are valid submissions to NLP-OSS.
You have tried 101 NLP tools and there's none that really do what you
want. So you wrote your own shiny new package and made it open source.
Tell us why your package is better than the existing tools. How did
you design the code? Is it going to be a one-time thing? Or would you
like to see thousands of people using it?
You have heard enough of open-source LLM and pseudo-open-source GPT
but not enough about how it can be used for your use-case or your
commercial product at scale. So you contacted your legal department
and they explained to you about how data, model and code licenses
work. Sharing the knowledge with the NLP-OSS community.
You have a position/opinion to share about free vs open vs closed
source LLMs and have valid arguments, references or survey/data to
support your position. We would want to hear more about it.
At last, you've found the avenue to air these issues in an academic
platform at the NLP-OSS workshop!!!
Sharing your experiences, suggestions and analysis from/of NLP-OSS
P/S:
1st CALL FOR PAPERS
====
----------------------------------------------------------------
*Workshop for NLP Open Source Software (NLP-OSS)*
06 Dec 2023, Co-located with EMNLP 2023
https://nlposs.github.io/
Deadline for Long and Short Paper submission: 09 August, 2023
(23:59, GMT-11)
----------------------------------------------------------------
The Third Workshop for NLP Open Source Software (NLP-OSS) will be co-located
with EMNLP 2023 on 06 Dec 2023.
Focusing more on the social and engineering aspect of NLP software
and less on scientific novelty or state-of-art models, the Workshop for NLP-OSS
is an academic forum to advance open source developments for NLP research,
teaching and application.
NLP-OSS also provides an academic workshop to announce new software/features,
promote the collaborative culture and best practices that go beyond
the conferences.
We invite full papers (8 pages) or short papers (4 pages) on topics related to
NLP-OSS broadly categorized into (i) software development, (ii) scientific
contribution and (iii) NLP-OSS case studies.
- **Software Development**
- Designing and developing NLP-OSS
- Licensing issues in NLP-OSS
- Backwards compatibility and stale code in NLP-OSS
- Growing, maintaining and motivating an NLP-OSS community
- Best practices for NLP-OSS documentation and testing
- Contribution to NLP-OSS without coding
- Incentivizing OSS contributions in NLP
- Commercialization and Intellectual Property of NLP-OSS
- Defining and managing NLP-OSS project scope
- Issues in API design for NLP
- NLP-OSS software interoperability
- Analysis of the NLP-OSS community
- **Scientific Contribution**
- Surveying OSS for specific NLP task(s)
- Demonstration, introductions and/or tutorial of NLP-OSS
- Small but useful NLP-OSS
- NLP components in ML OSS
- Citations and references for NLP-OSS
- OSS and experiment replicability
- Gaps between existing NLP-OSS
- Task-generic vs task-specific software
- **Case studies**
- Case studies of how a specific bug is fixed or feature is added
- Writing wrappers for other NLP-OSS
- Writing open-source APIs for open data
- Teaching NLP with OSS
- NLP-OSS in the industry
Submission should be formatted according to the [EMNLP 2023
templates](https://2023.emnlp.org/call-for-papers) and submitted to
[OpenReview](https://openreview.net/group?id=EMNLP/2023/Workshop/NLP-OSS)
ORGANIZERS
Geeticka Chauhan, Massachusetts Institute of Technology
Dmitrijs Milajevs, Grayscale AI
Elijah Rippeth, University of Maryland
Jeremy Gwinnup, Air Force Research Laboratory
Liling Tan, Amazon
Dear colleagues,
We are happy to invite you to join the *Arabic NER SharedTask 2023*
<https://dlnlp.ai/st/wojood/> which will be organized as part of the WANLP
2023. We will provide you with a large corpus and Google Colab notebooks to
help you reproduce the baseline results.
دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود
المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها.
*INTRODUCTION*
Named Entity Recognition (NER) is integral to many NLP applications. It is
the task of identifying named entity mentions in unstructured text and
classifying them to predefined classes such as person, organization,
location, or date. Due to the scarcity of Arabic resources, most of the
research on Arabic NER focuses on flat entities and addresses a limited
number of entity types (person, organization, and location). The goal of
this shared task is to alleviate this bottleneck by providing Wojood, a
large and rich Arabic NER corpus. Wojood consists of about 550K tokens (MSA
and dialect, in multiple domains) that are manually annotated with 21
entity types.
*REGISTRATION*
Participants need to register via this form (
*https://forms.gle/UCCrVNZ2LaPviCZS6* <https://forms.gle/UCCrVNZ2LaPviCZS6>).
Participating teams will be provided with common training development
datasets. No external manually labelled datasets are allowed. Blind test
data set will be used to evaluate the output of the participating teams.
Each team is allowed a maximum of 3 submissions. All teams are required to
report on the development and test sets (after results are announced) in
their write-ups.
*FAQ*
For any questions related to this task, please check our *Frequently Asked
Questions*
<https://docs.google.com/document/d/1XE2n89mFLic2P9DO_sAD51vy734BOt0kgtZ6bFf…>
*IMPORTANT DATES*
- March 03, 2023: Registration available
- May 25, 2023: Data-sharing and evaluation on development set Avaliable
- June 10, 2023: Registration deadline
- July 20, 2023: Test set made available
- July 30, 2023: Evaluation on test set (TEST) deadline
- Augest 29, 2023: Shared task system paper submissions due
- October 12, 2023: Notification of acceptance
- October 30, 2023: Camera-ready version
- TBA: WANLP 2023 Conference.
** All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).*
*CONTACT*
For any questions related to this task, please contact the organizers
directly using the following email address: *NERShare...(a)gmail.com
<https://groups.google.com/>* or join the google group:
*https://groups.google.com/g/ner_sharedtask2023*
<https://groups.google.com/g/ner_sharedtask2023>.
*SHARED TASK*
As described, this shared task targets both flat and nested Arabic NER. The
subtasks are:
*Subtask 1:* *Flat NER*
In this subtask, we provide the Wojood-Flat train (70%) and development
(10%) datasets. The final evaluation will be on the test set (20%). The
flat NER dataset is the same as the nested NER dataset in terms of
train/test/dev split and each split contains the same content. The only
difference in the flat NER is each token is assigned one tag, which is the
first high-level tag assigned to each token in the nested NER dataset.
*Subtask 2:* *Nestd NER*
In this subtask, we provide the Wojood-Nested train (70%) and development
(10%) datasets. The final evaluation will be on the test set (20%).
*METRICS*
The evaluation metrics will include precision, recall, F1-score. However,
our official metric will be the micro F1-score.
The evaluation of shared tasks will be hosted through CODALAB. Teams will
be provided with a CODALAB link for each shared task.
-*CODALAB link for NER Shared Task Subtask 1 (Flat NER)*
<https://codalab.lisn.upsaclay.fr/competitions/11594>
-*CODALAB link for NER Shared Task Subtask 2 (Nestd NER)*
<https://dlnlp.ai/st/wojood/>
*BASELINES*
Two baseline models trained on Wojood (flat and nested) are provided:
*Nested NER baseline:* is presented in this *article*
<https://aclanthology.org/2022.lrec-1.387/>, and code is available in
*GitHub* <https://github.com/SinaLab/ArabicNER>. The model achieves a micro
F1-score of 0.9059 (note that this baseline does not handle nested entities
of the same type).
*Flat NER baseline:* same code repository for nested NER (*GitHub*
<https://github.com/SinaLab/ArabicNER>) can also be used to train flat NER
task. Our flat NER baseline achieved a micro F1-score of 0.8785.
*GOOGLE COLAB NOTEBOOKS*
To allow you to experiment with the baseline, we authored four Google Colab
notebooks that demonstrate how to train and evaluate our baseline models.
[1] *Train Flat NER*
<https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>:
This notebook can be used to train our ArabicNER model on the flat NER task
using the sample Wojood data found in our repository.
[2] *Evaluate Flat NER*
<https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>:
this notebook will use the trained model saved from the notebook above to
perform evaluation on unseen dataset.
[3] *Train Nested NER*
<https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>:
This notebook can be used to train our ArabicNER model on the nested NER
task using the sample Wojood data found in our repository.
[4] *Evaluate Nested NER*
<https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>:
this notebook will use the trained model saved from the notebook above to
perform evaluation on unseen dataset.
*ORGANIZERS*
- Mustafa Jarrar, Birzeit University
- Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
- Mohammed Khalilia, Birzeit University
- Bashar Talafha, University of British Columbia
- AbdelRahim Elmadany, University of British Columbia
- Nagham Hamad, Birzeit University
- Alaa Omer, Birzeit University
Dear all,
with apologies for cross-postings, I'm sharing again the 2023 call for papers of the Journal of Open Humanities Data. This time we've added an explicit mention of large language model prompts and prompt engineering strategies among the language resources of interest to the journal, plus a reminder that our Covid-19 special collection is still accepting submissions. We've also explicitly included Library Science and Media Studies in the scope.
Kind regards,
Barbara
Call for Papers for 2023
The Journal of Open Humanities Data (JOHD)<https://openhumanitiesdata.metajnl.com/> features peer-reviewed publications describing humanities research objects with high potential for reuse. These might include curated resources like (annotated) linguistic corpora, ontologies, and lexicons, as well as databases, maps, atlases, linked data objects, and other data sets created with qualitative, quantitative, or computational methods, including large language model prompts and prompt engineering strategies.
We are currently inviting submissions of two varieties:
1. Short data papers contain a concise description of a humanities research object with high reuse potential. These are short (1,000 words) highly structured narratives. A data paper does not replace a traditional research article, but rather complements it.
2. Full length research papers discuss and illustrate methods, challenges, and limitations in humanities research data creation, collection, management, access, processing, or analysis. These are intended to be longer narratives (3,000 - 5,000 words), which give authors the ability to contribute to a broader discussion regarding the creation of research objects or methods.
Humanities subjects of interest to the JOHD include, but are not limited to Art History, Classics, History, Library Science, Linguistics, Literature, Media Studies, Modern Languages, Music and musicology, Philosophy, Religious Studies, etc. Research that crosses one or more of these traditional disciplinary boundaries is highly encouraged. Authors are encouraged to publish their data in recommended repositories<https://openhumanitiesdata.metajnl.com/about/#repo>. More information about the submission process<https://openhumanitiesdata.metajnl.com/about/submissions>, editorial policies<https://openhumanitiesdata.metajnl.com/about/editorialpolicies/> and archiving<https://openhumanitiesdata.metajnl.com/about/> is available on the journal’s web pages.
Submissions are still open for our special collection, Humanities Data in the Time of COVID-19<https://openhumanitiesdata.metajnl.com/collections/humanities-data-in-the-t…>. This collection includes data papers that span various areas of enquiry about the COVID-19 pandemic through the lens of the Humanities. Data from this period have far-reaching and impactful reuse potential, so we encourage you to share your data by submitting to this growing collection.
JOHD provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.
We accept online submissions via our journal website. See Author Guidelines <https://openhumanitiesdata.metajnl.com/about/submissions/> for further information. Alternatively, please contact the editor<https://openhumanitiesdata.metajnl.com/contact/> if you are unsure as to whether your research is suitable for submission to the journal.
Authors remain the copyright holders and grant third parties the right to use, reproduce, and share the article according to the Creative Commons<http://creativecommons.org/licenses/by/4.0/> licence agreement.
Thank you! igraph seems to be more Linux/Debian friendly. There is a
"GNU R network analysis and visualization" package: r-cran-igraph
So far I have found:
https://cran.r-project.org/web/packages/igraph/https://cran.r-project.org/web/packages/igraph/igraph.pdf
and a bunch of videos/tutorials, which I will have a better opinion
about after I watch them.
I will keep publicly posting my experiences to help those running
against the same kinds of problems.
$ time apt-cache search gephi
real 0m0.267s
user 0m0.255s
sys 0m0.012s
$ time apt-cache search igraph
karbon - vector graphics application for the Calligra Suite
cl-graph - simple graph data structure and algorithms
libdirgra-java - Java library providing a simple directed graph implementation
libdirgra-java-doc - Documentation for dirgra
fonts-bajaderka - Warsaw's sign painters styled font
fonts-gfs-neohellenic - modern Greek font family with matching Latin
fonts-gfs-solomos - ancient Greek oblique font
fonts-isabella - Isabella free TrueType font
fonts-sil-annapurna - smart font for languages using Devanagari script
fonts-uralic - Truetype fonts for Cyrillic-based Uralic languages
golang-github-guptarohit-asciigraph-dev - Make lightweight ASCII line
graph in CLI apps with no other dependencies
golang-github-jesseduffield-asciigraph-dev - Go package to make
lightweight ASCII line graph without dependencies
golang-github-steveyen-gtreap-dev - gtreap is an immutable treap
implementation in the Go Language
gpw - Trigraph Password Generator
libigraph-dev - library for creating and manipulating graphs - development files
libigraph-examples - library for creating and manipulating graphs -
example files
libigraph1 - library for creating and manipulating graphs
libjgrapht0.6-java - mathematical graph theory library for Java
libjgrapht0.8-java - mathematical graph theory library for Java
libtext-password-pronounceable-perl - Perl module to generate
pronounceable passwords
liwc - Tools for manipulating C source code
msort - utility for sorting records in complex ways
libnauty2 - library for graph automorphisms -- library package
libnauty2-dev - library for graph automorphisms -- development package
nauty - library for graph automorphisms -- interface and tools
nauty-doc - library for graph automorphisms -- user guide
otp - Generator for One Time Pads or Passwords
perl-tk - Perl module providing the Tk graphics library
python3-igraph - High performance graph data structures and algorithms
(Python 3)
r-cran-graphlayouts - GNU R additional layout algorithms for network
visualizations
r-cran-gwidgets - gWidgets API for Toolkit-Independent, Interactive GUIs
r-cran-igraph - GNU R network analysis and visualization
r-cran-propclust - Propensity Clustering and Decomposition
scalable-cyrfonts-tex - Scalable Cyrillic fonts for TeX
texlive-pictures - TeX Live: Graphics, pictures, diagrams
texlive-fonts-extra - TeX Live: Additional fonts
texlive-latex-extra - TeX Live: LaTeX additional packages
tran - transcribe between character scripts (alphabets)
vis - Modern, legacy free, simple yet efficient vim-like editor
real 0m0.303s
user 0m0.283s
sys 0m0.020s
$
On 6/9/23, David Chartash <dchartas(a)ieee.org> wrote:
> Hi Albretch,
> I would start off with Gephi <https://gephi.org/> or try the R/C/Python...
> package igraph <https://igraph.org/>.
> Cheers,
>
> David
> ---
> Please forgive any spelling errors, sent from a poorly implemented software
> keyer
>
> On Fri, Jun 9, 2023, 02:40 Albretch Mueller via Corpora <
> corpora(a)list.elra.info> wrote:
>
>> I could imagine, as John Lennon used to sing, that "I am not the only
>> one" in need of such an application.
>>
>> At times you get ten of thousand lines which you would like to
>> quickly “visually parse” to gain a general sense of what you've got.
>> Ideally, you should be able to play with it to select the records you
>> need.
>>
>> Think for example, of the many links to texts you would get from
>> archive.org (which also includes some metadata) or *.pub (each site
>> using their own quirkiness)
>>
>> Based on some sort of GUI, you would see weighted terms (coloured or
>> not based on a user's preference) with all other terms preceding (as
>> some sort of tree-like structure confluent on that term) and following
>> it ( ... branching off of it).
>>
>> Which kind of applications people use to do such thing?
>>
>> lbrtchx
>> _______________________________________________
>> Corpora mailing list -- corpora(a)list.elra.info
>> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
>> To unsubscribe send an email to corpora-leave(a)list.elra.info
>>
>
I could imagine, as John Lennon used to sing, that "I am not the only
one" in need of such an application.
At times you get ten of thousand lines which you would like to
quickly “visually parse” to gain a general sense of what you've got.
Ideally, you should be able to play with it to select the records you
need.
Think for example, of the many links to texts you would get from
archive.org (which also includes some metadata) or *.pub (each site
using their own quirkiness)
Based on some sort of GUI, you would see weighted terms (coloured or
not based on a user's preference) with all other terms preceding (as
some sort of tree-like structure confluent on that term) and following
it ( ... branching off of it).
Which kind of applications people use to do such thing?
lbrtchx
Call for Participation:
*FinCausal-2023 Shared Task: “Financial Document Causality Detection” *is
organised within the *5th Financial Narrative Processing Workshop (FNP
2023)* taking place in the 2023 IEEE International Conference on Big Data
(IEEE BigData 2023) <http://bigdataieee.org/BigData2023/>, Sorrento, Italy,
15-18 December 2023. It is a *one-day event*. The exact date is to be
announced.
Important Dates:
- Call for participation and registration: 3rd June 2023
- Registration deadline: 28 June
- Training set release: 29 June 2023
- Test set release: 5 September 2023
- Systems submission deadline: 15 September 2023
- Release of results: 20 September 2023
- Paper submission deadline: 20 October 2023
- Notification of acceptance: November 12, 2023
- Camera-ready of accepted papers: November 20, 2023
- FNP Workshop: December 2023
Workshop URL: https://wp.lancs.ac.uk/cfie/fincausal2023/
Registration Form: https://forms.gle/29E161a8RmMosBLU8. After completing
the registration form, the practice set will be sent to participants.
*Shared Task Description:*
Financial analysis needs factual data and an explanation of the variability
of these data. Data state facts but need more knowledge regarding how these
facts materialised. Furthermore, understanding causality is crucial in
studying decision-making processes.
The *Financial Document Causality Detection Task* (FinCausal) aims at
identifying elements of cause and effect in causal sentences extracted from
financial documents. Its goal is to evaluate which events or chain of
events can cause a financial object to be modified or an event to occur,
regarding a given context. In the financial landscape, identifying cause
and effect from external documents and sources is crucial to explain why a
transformation occurs.
Two subtasks are organised this year. *English FinCausal subtask *and* Spanish
FinCausal subtask*. This is the first year where we introduce a subtask in
Spanish.
*Objective*: For both tasks, participants are asked to identify, given a
causal sentence, which elements of the sentence relate to the cause, and
which relate to the effect. Participants can use any method they see fit
(regex, corpus linguistics, entity relationship models, deep learning
methods) to identify the causes and effects.
*English FinCausal subtask*
- *Data Description: *The dataset has been sourced from various 2019
financial news articles provided by Qwam, along with additional SEC data
from the Edgar Database. Additionally, we have augmented the dataset from
FinCausal 2022, adding 500 new segments. Participants will be provided with
a sample of text blocks extracted from financial news and already labelled.
- *Scope: *The* English FinCausal subtask* focuses on detecting causes
and effects when the effects are quantified. The aim is to identify, in
a causal sentence or text block, the causal elements and the consequential
ones. Only one causal element and one effect are expected in each segment.
- *Length of Data fragments: *The* English FinCausal subtask* segments
are made up of up to three sentences.
- *Data format: *CSV files. Datasets for both the English and the
Spanish subtasks will be presented in the same format.
This shared task focuses on determining causality associated with a
quantified fact. An event is defined as the arising or emergence of a new
object or context regarding a previous situation. So, the task will
emphasise the detection of causality associated with the transformation of
financial objects embedded in quantified facts.
*Spanish FinCausal subtask*
- *Data Description: *The dataset has been sourced from a corpus of
Spanish financial annual reports from 2014 to 2018. Participants will be
provided with a sample of text blocks extracted from financial news,
labelled through inter-annotator agreement.
- *Scope: *The *Spanish FinCausal subtask* aims to detect all types of
causes and effects, not necessarily limited to quantified effects. The
aim is to identify, in a paragraph, the causal elements and the
consequential ones. Only one causal element and one effect are expected in
each paragraph.
- *Length of Data fragments: *The *Spanish FinCausal subtask* involves
complete paragraphs.
- *Data format: *CSV files. Datasets for both the English and the
Spanish subtasks will be presented in the same format.
This shared task focuses on determining causality associated with both
events or quantified facts. For this task, a cause can be the justification
for a statement or the reason that explains a result. This task is also a
relation detection task.
*FinCausal Shared Task Organisers:*
- Antonio Moreno-Sandoval (UAM, Spain)
- Blanca Carbajo Coronado (UAM, Spain)
- Doaa Samy (UCM, Spain)
- Jordi Porta (UAM, Spain)
- Dominique Mariko (Yseop, France)
For any questions, please contact the organisers at *fincausal.2023(a)gmail.com
<fincausal.2023(a)gmail.com>*
*** Apologies for Cross-Posting ***
We are pleased to announce five exciting shared tasks
<https://wanlp2023.sigarab.org/shared-tasks> as part of the 1st Conference
on Arabic NLP (WANLP2023 <https://wanlp2023.sigarab.org/>) - co-located
with EMNLP 2023 <https://2023.emnlp.org/> in Singapore.
Please refer to the tasks’ websites for more information on participation
and internal deadlines. General shared tasks deadlines are at the end of
this call.
Shared Task 1: NADI 2023
Description: Dialect identification is the task of automatically detecting
the source variety of a given text or speech segment. In addition to
nuanced dialect identification at the country level, NADI 2022 offered a
new subtask focused on country-level sentiment analysis. NADI 2023
continues this tradition of extending tasks beyond dialect identification.
Namely, we propose a new open track subtask focused on machine translation
(MT) in two directions (i) into Modern Standard Arabic (MSA) from five
Arabic dialects and (ii) into any of the five dialects from MSA.
Organizers: Muhammad Abdul-Mageed, Chiyu Zhang, El Moatez Billah Nagoudi,
AbdelRahim Elmadany (The University of British Columbia, Canada), Houda
Bouamor (Carnegie Mellon University, Qatar), and Nizar Habash (New York
University Abu Dhabi).
For more information, please visit the shared task’s website:
<https://sites.google.com/view/arabic-gender-rewriting/>
https://nadi.dlnlp.ai/
––––––––––––––––––––––––––––––
Shared Task 2: ArAIEval - Persuasion Techniques and Disinformation
Detection in Arabic Text
Description: The ArAIEval shared tasks include two tasks: (i) persuasion
techniques detection, and (ii) disinformation detection.
Organizers: Firoj Alam, Hamdy Mubarak, Maram Hasanain, Wajdi Zaghouani,
Giovanni Da San Martino, and Preslav Nakov.
For more information, please visit the shared task’s website:
<https://sites.google.com/view/arabic-gender-rewriting/>
https://araieval.gitlab.io/
––––––––––––––––––––––––––––––
Shared Task 3: Qur'an QA 2023
Description: This is a shared task of Arabic Reading Comprehension over the
Holy Qur’an, aiming to trigger state-of-the-art question-answering and
reading comprehension research on a book that is sacredly held by more than
1.8 billion people across the world. The shared task entails two subtasks:
(A) Passage Retrieval (PR) task, and (B) Reading Comprehension (RC) task.
Organizers: Tamer Elsayed, Rana Malhas, Watheq Mansour (Qatar University).
For more information, please visit the shared task’s website:
<https://sites.google.com/view/arabic-gender-rewriting/>
https://sites.google.com/view/quran-qa-2023
––––––––––––––––––––––––––––––
Shared Task 4: WojoodNER
Description: Due to the scarcity of Arabic resources, most of the research
on Arabic NER focuses on flat entities and addresses a limited number of
entity types (person, organization, and location). The goal of this shared
task is to alleviate this bottleneck by providing Wojood; a large and rich
Arabic NER corpus.
Organizers: Muhammad Abdul-Mageed, Mohammed Khalilia, Nagham Hamad, Bashar
Talafha, AbdelRahim Elmadany, Alaa’ Omar, Mustafa Jarrar.
For more information, please visit the shared task’s website:
<https://sites.google.com/view/arabic-gender-rewriting/>
https://dlnlp.ai/st/wojood/
––––––––––––––––––––––––––––––
Shared Task 5: Arabic Reverse Dictionary
Description: This shared task aims to address the Tip-of-Tongue (TOT)
problem by developing a Reverse Dictionary (RD) system specifically for the
Arabic language. Reverse dictionaries allow users to find words based on
their meanings or definitions and can be useful for writers, crossword
puzzle enthusiasts, non-native language learners, and anyone looking to
expand their vocabulary. This shared task includes two subtasks: Arabic RD
and Cross-lingual Reverse Dictionary (CLRD).
Organizers: Rawan Al-Matham, Waad Alshammari, Abdulrahman AlOsaimy, Sarah
Alhumoud, Afrah Altamimi, Abdullah Alfaifi.
For more information, please visit the shared task’s website:
https://samai.ksaa.gov.sa/sharedTask.html
––––––––––––––––––––––––––––––
Important dates:
-
August 29, 2023: shared task papers due date
-
October 12, 2023: notification of acceptance
-
October 20, 2023: camera-ready papers due
-
December 7, 2023: Conference Day
All deadlines are 11:59 pm UTC -12h
<https://www.timeanddate.com/time/zone/timezone/utc-12> (“Anywhere on
Earth”).
The WANLP 2023 Publicity Chairs,
Salam Khalifa and Amr Keleg
--
Salam Khalifa
PhD Student at Stony Brook Linguistics
<https://www.linguistics.stonybrook.edu/>.
NLDB 2023
28th International Conference on Natural Language & Information Systems
21-23 June 2023, University of Derby, United Kingdom
https://www.derby.ac.uk/events/latest-events/nldb-2023/
The 28th International Conference on Natural Language & Information Systems will be held at the University of Derby, United Kingdom and will be a face to face event.
This is a full three days event and the conference programme is now available .
The University of Derby has a published policy regarding email and reserves the right to monitor email traffic.
If you believe this was sent to you in error, please reply to the sender and let them know.
Key University contacts: http://www.derby.ac.uk/its/contacts/
Second Workshop on Text Simplification, Accessibility and Readability -
TSAR 2023 @ RANLP
Jointly with the Recent Advances in Natural Language Processing Conference
RANLP 2023
https://tsar-workshop.github.io/http://ranlp.org/ranlp2023/First Call for
PapersImportant Dates
Submission deadline: 10 July 2023
Notification of acceptance: 5 August 2023
Camera-ready papers due: 25 August 2023
Workshop: 7 or 8 September 2023
Web provides an abundance of knowledge and information that can reach large
populations. However, the way in which a text is written (vocabulary,
syntax, or text organization/structure), or presented, can make it
inaccessible to many people, especially to non-native speakers, people with
low literacy, and people with some type of cognitive or linguistic
impairments. The results of Adult Literacy Survey (OECD, 2023) indicate
that approximately 16.7% of the adult population (averaged over 24
highly-developed countries) requires lexical, 50% syntactic, and 89.4%
conceptual simplification of everyday texts (Štajner, 2021).
Research on automatic text simplification (TS), textual accessibility, and
readability thus have the potential to improve social inclusion of
marginalised populations. These related research areas have increasingly
attracted more and more attention in the past ten years, evidenced by the
growing number of publications in NLP conferences. While only about 300
articles in Google Scholar mentioned TS in 2010, this number has increased
to about 600 in 2015 and is greater than 1000 in 2020 (Štajner, 2021).
Recent research in automatic text simplification has mostly focused on
proposing the use of methods derived from the deep learning paradigm
(Glavaš and Štajner, 2015; Paetzold and Specia, 2016; Nisioi et al., 2017;
Zhang and Lapata, 2017; Martin et al., 2020; Maddela et al., 2021; Sheang
and Saggion, 2021). However, there are many important aspects of the
automatic text simplification that need the attention of our community: the
design of appropriate evaluation metrics, the development of context-aware
simplification solutions, the creation of appropriate language resources to
support research and evaluation, the deployment of simplification in real
environments for real users, the study of discourse factors in text
simplification, the identification of factors affecting the readability of
a text, etc. To overcome those issues, there is a need for collaboration of
CL/NLP researchers, machine learning and deep learning researchers, UI/UX
and Accessibility professionals, as well as public organisations
representatives (Štajner, 2021).
The proposed TSAR workshop builds upon the recent success of several
workshops that covered a subset of our topics of interest, including the
SEPLN 2021 Current Trends in Text Simplification (CTTS) and the SimpleText
workshop at CLEF 2021, the TSAR-2022 at EMNLP 2022, the recent Special
Issue on Text Simplification, Accessibility, and Readability at Frontiers
in AI, as well as the birds-of-a-feather event on Text Simplification at
NAACL 2021 (over 50 participants).
The TSAR workshop aims to foster collaboration among all parties interested
in making information more accessible to all people. We will discuss
recent trends and developments in the area of automatic text
simplification, text accessibility, automatic readability assessment,
language resources and evaluation for text simplification, etc.
Topics
We invite contributions on the following topics (among others):
-
Lexical simplification;
-
Syntactic simplification;
-
Modular and end-to-end TS;
-
Sequence-to-sequence and zero-shot TS;
-
Controllable TS;
-
Text complexity assessment;
-
Complex word identification and lexical complexity prediction;
-
Corpora, lexical resources, and benchmarks for TS;
-
Evaluation of TS systems;
-
Domain specific TS (e.g. health, legal);
-
Other related topics (e.g. empirical and eye-tracking studies);
-
Assistive technologies for improving readability and comprehension
including those going beyond text.
Submissions
We welcome two types of papers: long papers and short papers. Submissions
should be made to: https://softconf.com/ranlp23/TSAR/
The papers should present novel research. The review will be double blind
and thus all submissions should be anonymized.
Format: Paper submissions must use the official RANLP 2023 Templates
<http://ranlp.org/ranlp2023/index.php/submissions/>, which are available as
an Overleaf
<https://www.overleaf.com/latex/templates/instructions-for-ranlp-2023-procee…>
template and also downloadable directly (Latex
<http://ranlp.org/ranlp2023/Templates/ranlp2023-LaTeX.zip> and Word
<http://ranlp.org/ranlp2023/Templates/ranlp2023-word.docx>). Authors may
not modify these style files or use templates designed for other
conferences.
Submissions that do not conform to the required styles, including paper
size, margin width, and font size restrictions, will be rejected without
review.
Long Papers: Long papers must describe substantial, original, completed,
and unpublished work. Wherever appropriate, concrete evaluation and
analysis should be included. Long papers may consist of up to eight (8)
pages of content, plus unlimited pages of references. Final versions of
long papers will be given one additional page of content (up to 9 pages),
so that reviewers’ comments can be taken into account. Long papers will be
presented orally or as posters as determined by the program committee. The
decisions as to which papers will be presented orally and which as poster
presentations will be based on the nature rather than the quality of the
work. There will be no distinction in the proceedings between long papers
presented orally and long papers presented as posters.
Short Papers: Short paper submissions must describe original and
unpublished work. Please note that a short paper is not a shortened long
paper. Instead, short papers should have a point that can be made in a few
pages. Some kinds of short papers include: a small, focused contribution; a
negative result; an opinion piece; an interesting application nugget. Short
papers may consist of up to four (4) pages of content, plus unlimited pages
of references. Final versions of short papers will be given one additional
page of content (up to 5 pages), so that reviewers' comments can be taken
into account. Short papers will be presented orally or as posters as
determined by the program committee. While short papers will be
distinguished from long papers in the proceedings, there will be no
distinction in the proceedings between short papers presented orally and
short papers presented as posters.
Demo papers: should be no more than two (2) pages, including references,
and should describe implemented systems related to the topics of interest
of the workshop. It also should include a link to a short screencast of the
working software. In addition, authors of demo papers must be willing to
present a demo of their system during TSAR 2023.
--
Professor Horacio Saggion
Head of the Large Scale Text Understanding Systems Lab
Full Professor / Chair in Computer Science and Artificial Intelligence
TALN / DTIC
Deputy Director for Recruitment
Universitat Pompeu Fabra
[image: https://twitter.com/h_saggion]
[image: https://www.linkedin.com/in/horacio-saggion-1749b916]
Deadline extended to June 30!
The University of Bergen invites applications for a PhD position at MediaFutures: Research Centre for Responsible Media Technology & Innovation. The position will be on Norwegian Language Technology, in particular the generation and adaptation (including summarization) of Norwegian news texts. The methodology could include generative neural encoder-decoder architectures using large Norwegian language models.
Announcement in Norwegian and English:
https://www.jobbnorge.no/en/available-jobs/job/244992/phd-research-fellowsh…