June 2023 - Corpora - ELRA lists

Call for Participation: FinCausal 2023 (Financial Document Causality Detection)
by Doaa Samy 05 Jun '23

05 Jun '23

Call for Participation: *FinCausal-2023 Shared Task: “Financial Document Causality Detection” *is organised within the *5th Financial Narrative Processing Workshop (FNP 2023)* taking place in the 2023 IEEE International Conference on Big Data (IEEE BigData 2023) <http://bigdataieee.org/BigData2023/>, Sorrento, Italy, 15-18 December 2023. It is a *one-day event*. The exact date is to be announced. Important Dates: - Call for participation and registration: 3rd June 2023 - Registration deadline: 28 June - Training set release: 29 June 2023 - Test set release: 5 September 2023 - Systems submission deadline: 15 September 2023 - Release of results: 20 September 2023 - Paper submission deadline: 20 October 2023 - Notification of acceptance: November 12, 2023 - Camera-ready of accepted papers: November 20, 2023 - FNP Workshop: December 2023 Workshop URL: https://wp.lancs.ac.uk/cfie/fincausal2023/ Registration Form: https://forms.gle/29E161a8RmMosBLU8. After completing the registration form, the practice set will be sent to participants. *Shared Task Description:* Financial analysis needs factual data and an explanation of the variability of these data. Data state facts but need more knowledge regarding how these facts materialised. Furthermore, understanding causality is crucial in studying decision-making processes. The *Financial Document Causality Detection Task* (FinCausal) aims at identifying elements of cause and effect in causal sentences extracted from financial documents. Its goal is to evaluate which events or chain of events can cause a financial object to be modified or an event to occur, regarding a given context. In the financial landscape, identifying cause and effect from external documents and sources is crucial to explain why a transformation occurs. Two subtasks are organised this year. *English FinCausal subtask *and* Spanish FinCausal subtask*. This is the first year where we introduce a subtask in Spanish. *Objective*: For both tasks, participants are asked to identify, given a causal sentence, which elements of the sentence relate to the cause, and which relate to the effect. Participants can use any method they see fit (regex, corpus linguistics, entity relationship models, deep learning methods) to identify the causes and effects. *English FinCausal subtask* - *Data Description: *The dataset has been sourced from various 2019 financial news articles provided by Qwam, along with additional SEC data from the Edgar Database. Additionally, we have augmented the dataset from FinCausal 2022, adding 500 new segments. Participants will be provided with a sample of text blocks extracted from financial news and already labelled. - *Scope: *The* English FinCausal subtask* focuses on detecting causes and effects when the effects are quantified. The aim is to identify, in a causal sentence or text block, the causal elements and the consequential ones. Only one causal element and one effect are expected in each segment. - *Length of Data fragments: *The* English FinCausal subtask* segments are made up of up to three sentences. - *Data format: *CSV files. Datasets for both the English and the Spanish subtasks will be presented in the same format. This shared task focuses on determining causality associated with a quantified fact. An event is defined as the arising or emergence of a new object or context regarding a previous situation. So, the task will emphasise the detection of causality associated with the transformation of financial objects embedded in quantified facts. *Spanish FinCausal subtask* - *Data Description: *The dataset has been sourced from a corpus of Spanish financial annual reports from 2014 to 2018. Participants will be provided with a sample of text blocks extracted from financial news, labelled through inter-annotator agreement. - *Scope: *The *Spanish FinCausal subtask* aims to detect all types of causes and effects, not necessarily limited to quantified effects. The aim is to identify, in a paragraph, the causal elements and the consequential ones. Only one causal element and one effect are expected in each paragraph. - *Length of Data fragments: *The *Spanish FinCausal subtask* involves complete paragraphs. - *Data format: *CSV files. Datasets for both the English and the Spanish subtasks will be presented in the same format. This shared task focuses on determining causality associated with both events or quantified facts. For this task, a cause can be the justification for a statement or the reason that explains a result. This task is also a relation detection task. *FinCausal Shared Task Organisers:* - Antonio Moreno-Sandoval (UAM, Spain) - Blanca Carbajo Coronado (UAM, Spain) - Doaa Samy (UCM, Spain) - Jordi Porta (UAM, Spain) - Dominique Mariko (Yseop, France) For any questions, please contact the organisers at *fincausal.2023(a)gmail.com <fincausal.2023(a)gmail.com>*

1 0

LxGr2023: Programme and Call for Participation
by Costas Gabrielatos 03 Jun '23

03 Jun '23

8th Symposium on Corpus Approaches to Lexicogrammar (LxGr2022) The symposium will take place online on 6-8 July 2023. The programme and registration details are here: https://sites.edgehill.ac.uk/lxgr/lxgr2023 For more information, contact lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>. ________________________________ Edge Hill University<http://ehu.ac.uk/home/emailfooter> Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter> University of the Year, Educate North 2021/21 ________________________________ This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>

1 0

Call for participation - Arabic NER Shared Task 2023
by nagham ghanim 02 Jun '23

02 Jun '23

Dear colleagues, We are happy to invite you to join the *Arabic NER SharedTask 2023* <https://dlnlp.ai/st/wojood/> which will be organized as part of the WANLP 2023. We will provide you with a large corpus and Google Colab notebooks to help you reproduce the baseline results. دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها. *INTRODUCTION* Named Entity Recognition (NER) is integral to many NLP applications. It is the task of identifying named entity mentions in unstructured text and classifying them to predefined classes such as person, organization, location, or date. Due to the scarcity of Arabic resources, most of the research on Arabic NER focuses on flat entities and addresses a limited number of entity types (person, organization, and location). The goal of this shared task is to alleviate this bottleneck by providing Wojood, a large and rich Arabic NER corpus. Wojood consists of about 550K tokens (MSA and dialect, in multiple domains) that are manually annotated with 21 entity types. *REGISTRATION* Participants need to register via this form ( *https://forms.gle/UCCrVNZ2LaPviCZS6* <https://forms.gle/UCCrVNZ2LaPviCZS6>). Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups. *FAQ* For any questions related to this task, please check our *Frequently Asked Questions* <https://docs.google.com/document/d/1XE2n89mFLic2P9DO_sAD51vy734BOt0kgtZ6bFf…> *IMPORTANT DATES* - March 03, 2023: Registration available - May 25, 2023: Data-sharing and evaluation on development set Avaliable - June 10, 2023: Registration deadline - July 20, 2023: Test set made available - July 30, 2023: Evaluation on test set (TEST) deadline - Augest 29, 2023: Shared task system paper submissions due - October 12, 2023: Notification of acceptance - October 30, 2023: Camera-ready version - TBA: WANLP 2023 Conference. ** All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).* *CONTACT* For any questions related to this task, please contact the organizers directly using the following email address: *NERShare...(a)gmail.com <https://groups.google.com/>* or join the google group: *https://groups.google.com/g/ner_sharedtask2023* <https://groups.google.com/g/ner_sharedtask2023>. *SHARED TASK* As described, this shared task targets both flat and nested Arabic NER. The subtasks are: *Subtask 1:* *Flat NER* In this subtask, we provide the Wojood-Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). The flat NER dataset is the same as the nested NER dataset in terms of train/test/dev split and each split contains the same content. The only difference in the flat NER is each token is assigned one tag, which is the first high-level tag assigned to each token in the nested NER dataset. *Subtask 2:* *Nestd NER* In this subtask, we provide the Wojood-Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). *METRICS* The evaluation metrics will include precision, recall, F1-score. However, our official metric will be the micro F1-score. The evaluation of shared tasks will be hosted through CODALAB. Teams will be provided with a CODALAB link for each shared task. -*CODALAB link for NER Shared Task Subtask 1 (Flat NER)* <https://codalab.lisn.upsaclay.fr/competitions/11594> -*CODALAB link for NER Shared Task Subtask 2 (Nestd NER)* <https://dlnlp.ai/st/wojood/> *BASELINES* Two baseline models trained on Wojood (flat and nested) are provided: *Nested NER baseline:* is presented in this *article* <https://aclanthology.org/2022.lrec-1.387/>, and code is available in *GitHub* <https://github.com/SinaLab/ArabicNER>. The model achieves a micro F1-score of 0.9059 (note that this baseline does not handle nested entities of the same type). *Flat NER baseline:* same code repository for nested NER (*GitHub* <https://github.com/SinaLab/ArabicNER>) can also be used to train flat NER task. Our flat NER baseline achieved a micro F1-score of 0.8785. *GOOGLE COLAB NOTEBOOKS* To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models. [1] *Train Flat NER* <https://gist.github.com/mohammedkhalilia/72c3261734d7715094089bdf4de74b4a>: This notebook can be used to train our ArabicNER model on the flat NER task using the sample Wojood data found in our repository. [2] *Evaluate Flat NER* <https://gist.github.com/mohammedkhalilia/c807eb1ccb15416b187c32a362001665>: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. [3] *Train Nested NER* <https://gist.github.com/mohammedkhalilia/a4d83d4e43682d1efcdf299d41beb3da>: This notebook can be used to train our ArabicNER model on the nested NER task using the sample Wojood data found in our repository. [4] *Evaluate Nested NER* <https://gist.github.com/mohammedkhalilia/9134510aa2684464f57de7934c97138b>: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset. *ORGANIZERS* - Mustafa Jarrar, Birzeit University - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI - Mohammed Khalilia, Birzeit University - Bashar Talafha, University of British Columbia - AbdelRahim Elmadany, University of British Columbia - Nagham Hamad, Birzeit University - Alaa Omer, Birzeit University

1 0

The 5th Financial Narrative Processing Workshop (FNP 2023)
by Marina Litvak 02 Jun '23

02 Jun '23

*The 5th Financial Narrative Processing Workshop (FNP 2023)* To be held at the 2023 IEEE International Conference on Big Data (IEEE BigData 2023), Sorrento, Italy, from 15 to 18 December 2023. FNP 2023: http://wp.lancs.ac.uk/cfie/fnp2023/ *Submission page:* https://wi-lab.com/cyberchair/2023/bigdata23/scripts/submit.php?subarea=S14… *Important Dates:* 1st Call for workshop papers: June 1, 2023 2nd Call for workshop papers: August 15, 2023 Final Call for workshop papers: October 1, 2023 Due date for workshop papers submission: October 30, 2023 (anywhere in the world) Notification of paper acceptance to authors: November 12, 2023 Camera-ready of accepted papers: November 20, 2023 Workshop date: 1 day event: December 15-18, 2023 (exact date to be announced) Other dates for shared tasks will be advertised separately *Workshop Description:*Financial narrative processing is an emerging field that combines natural language processing (NLP) and machine learning (ML) techniques to extract, summarise, and analyse both qualitative and quantitative financial data. As the amount of financial data continues to grow exponentially, this data is increasingly considered as big data, which presents challenges and opportunities for data scientists. The 5th Financial Narrative Processing Workshop (FNP 2023) aims to bring together researchers and industry practitioners to share their latest research results and practical experiences in financial narrative processing, which is a key aspect of big data. In particular, the workshop will focus on three shared tasks: Financial Narrative Summarization, Financial Table of Content Extraction, and Financial Causality Detection. These tasks will challenge participants to apply state-of-the-art techniques in NLP and ML to extract meaningful insights from financial documents. The workshop will provide an informal and vibrant forum for discussion and collaboration, with the goal of advancing the field of financial narrative processing within the context of big data. We welcome submissions from researchers and practitioners in academia and industry. FNP 2023 workshop is organised by a team of experts who have been at the forefront of financial NLP research for the past five years. We have organised more than 7 international events, introduced NLP and AI shared tasks, and provided big datasets and methodologies needed to push forward the emerging field of financial NLP. Our workshop series has contributed significantly to the field of financial NLP, as evidenced by our proceedings on ACL anthology and citations in Google Scholar. *Previous Proceedings:*All FNP proceedings across the years are on ACL Anthology: https://aclanthology.org/venues/fnp/. The 1st FNP was associated with LREC 2018 http://lrec-conf.org/workshops/lrec2018/W27/pdf/book_of_proceedings.pdf FNP Google Scholar: https://scholar.google.com/citations?hl=en&user=8Qn7yJ8AAAAJ *Motivation:*Financial narrative disclosures represent a significant portion of firms’ overall financial communications with investors. Textual commentaries help to clarify issues that may be obscured by complex accounting methods and footnote disclosures. In addition, narratives summarise corporate strategy, contextualise results, explain governance arrangements, describe corporate social responsibility policy, and provide forward-looking information for investors. However, financial narratives may also provide management with an opportunity to obfuscate accounting results and manipulate readers’ perceptions of underlying economic performance. In a previous FNP workshop, we organised a panel of experts to discuss the future of Financial NLP and data leaders from AI firms in France and London. The consensus was that financial data has increased exponentially in recent years due to the increase in regulations. This has led to an increase in the number of financial news surrounding the events of releasing such disclosures. Therefore, state-of-the-art methodologies are necessary to understand and analyse huge and sensitive financial data in a short amount of time. We believe that the FNP 2023 workshop will continue to contribute to the field of Financial NLP by providing a platform for researchers and industry practitioners to share their research results and practical development experiences in Big Data research, development, and practice. In addition, our workshop will help participants gain a better understanding of the challenges posed by big data and its 5 V’s (velocity, volume, value, variety, and veracity) in financial text analysis. *Topics of Interest in relation to Financial NLP:*We encourage research on topics related to analysing financial narratives using state-of-the-art NLP techniques, including but not limited to morphological analysis, disambiguation, tokenization, part-of-speech tagging, named entity recognition, chunking, parsing, semantic role labelling, sentiment analysis, document quality, and advanced readability metrics. The use of NLP and machine learning in the financial domain has encouraged studies around gender and ethnicities imbalance, as well as mental health and well-being research. Given the focus of the IEEE Big Data 2023 conference, we also encourage research on under-resourced languages and under-represented financial markets. In recent years, FNP has included research on Arabic, Spanish, and Portuguese financial markets. Our collaboration with the MultiLing workshop ( http://multiling.iit.demokritos.gr) has highlighted the importance of summarization across domains and sources that are related to finance (e.g., company blogs, product reviews, market briefs, etc.). This includes financial multilingual and cross-lingual summarization using single-document summarization, multi-document summarization, summarization evaluation, headline generation, and cross-domain/cross-topic summarization. Given the international nature of the event, we particularly welcome FNP papers reporting non-English and multilingual research, describing the different regulatory regimes within which companies operate internationally. *The FNP2023 shared tasks* will be announced separately and are expected to be: Financial Narrative Summarisation (FNS 2023) Financial Table of Content Extraction (FinTOC 2023) Financial Causality Detection (FinCausal 2023) For the latest details about the shared tasks please visit: http://wp.lancs.ac.uk/cfie/shared-tasks/ *Call For Papers for the Main Workshop:* We invite papers describing original, completed or ongoing, unpublished research in Financial Natural Language Processing and Financial Text Analysis. As financial data is increasingly considered as big data, we encourage submissions that address the five main and innate characteristics of big data (velocity, volume, value, variety, and veracity) in the context of financial narrative processing. We encourage submissions on topics that include, but are not limited to, the following: - Applying core technologies on financial narratives within the context of big data: morphological analysis, disambiguation, tokenization, part-of-speech tagging, named entity recognition, chunking, parsing, semantic role labelling, sentiment analysis, document quality and advanced readability metrics, etc. - Using NLP to detect misreporting in relation to diversity and wellbeing on issues related to gender, ethnicity, women at work as well as employee mental health and stability, in the context of big data. - Financial narrative resources and tools for managing and analysing large-scale financial data. - Summarization techniques across domains and sources that are related to finance (e.g. company blogs, product reviews, market briefs, etc.), this includes financial multilingual and cross-lingual summarization using single-document summarization, multi-document summarization, summarization evaluation, headline generation, cross-domain/cross-topic summarization. - Analysis of Online Social Networks for detection of public opinions towards financial events. - Multilingual analysis, describing the different regulatory regimes within which companies operate internationally. - Ongoing research and preliminary results that explore the intersection of financial narrative processing and big data. - Negative results, for example techniques and methodologies that work for certain languages but not on others. Other venues could be showing that state-of-the-art technologies such as BERT could fail on certain tasks or languages. All papers accepted will be included in the conference proceedings published by the IEEE Computer Society Press. We follow IEEE submission format. Please submit a full paper (up to 10 page IEEE 2-column format) or short paper (up to 4 page IEEE 2-column format) through the online submission system. *Organising Committee:* Dr Mo El-Haj, Lancaster University, UK (General Chair) Dr Houda Bouamor, CMU, Qatar (FNP Program Chair) Prof Paul Rayson, Lancaster University, UK (FNP Program Chair) Blanca Carbajo Coronado, UAM, Madrid, Spain (FNP coordinator, Publication Chair) Nikiforos Pittaras, NCSR Demokritos (Publicity Chair) Dr George Giannakopoulos, NCSR Demokritos (FNS Shared Task Organiser) Dr Marina Litvak, Shamoon Academic College of Engineering (FNS Shared Task Organiser) Prof Antonio Moreno Sandoval, UAM, Madrid, Spain (FinCausal Shared Task Organizer) Dr Doaa Samy, UAM, Madrid, Spain (FinCausal Shared Task Organizer) Dr Juyeon KANG, Fortia Financial Solution (FinTOC Shared Task Organiser) Dr Ismail El Maarouf, Imprevicible (FinTOC Shared Task Organiser) -- Best regards, Marina Litvak

1 0

2nd CfP: The 1st Workshop on Counter Speech for Online Abuse
by Abercrombie, Gavin 02 Jun '23

02 Jun '23

2nd Call for Papers The 1st Workshop on Counter Speech for Online Abuse: A workshop for creating, investigating and improving tools for producing and evaluating counter speech. Hate speech and abusive and toxic language are prevalent in online spaces. For example, a 2019 survey shows that in the UK 30-40% of people have experienced online abuse, and platforms like Facebook bring down millions of harmful posts every year, with the help of AI tools. While removal of such content can immediately reduce the quantity of harmful messages, it can bring about accusations of censorship and may not be effective at curbing hate in the long term. An alternative approach is to reply with counter speech, i.e. targeted responses aimed at refuting the hateful language using thoughtful and cogent reasons, and fact-bound arguments. This has been shown to be effective in influencing the behaviour of both the perpetrators of abuse and bystanders that witness the interactions, as well as providing support to victims. The sheer amount of social media data shared online on a daily basis means that hate mitigation, using counter speech, requires reliable, efficient and scalable tools. Recently, efforts have been made to curate hate countering datasets and automate the production of counter speech. However, this research field is still in its infancy, and many questions remain open regarding the most effective approaches and methods to take, as well as how to evaluate them. This first multidisciplinary workshop aims to bring together researchers from diverse backgrounds such as computer science and the social sciences, as well as policy makers and other stakeholders to attempt to understand how counter speech is currently used to tackle abuse by individuals, activists and organisations, how Natural Language Processing (NLP) and Generation (NLG) can be applied to produce counter narratives, and the implications of using large language models for this task. It will also address, but not be limited to, the questions of how to evaluate and measure the impacts of counter speech, the importance of expert knowledge from civil society in the development of counter speech datasets and taxonomies, and how to ensure fairness and mitigate the biases present in language models when generating counter speech. Topics We invite papers (long and short) on a wide range of topics, including but not limited to: • Models and methods for generating counter speech; • Dialogue agents employing counter speech to address hateful inputs, directed towards other people or the AI itself; • Human and automatic evaluation methods of counter speech tools; • Multidisciplinary studies including different perspectives on the topic such as from computer science, social science, NGOs and stakeholders; • Development of datasets and taxonomy for counter speech; • Potentials and limitations (e.g., fairness, biases) of using large language models for generating counter speech; • Social impact and empirical studies of counter speech on social media, including investigating the effectiveness and consequences on users of employing counter speech to fight online hate; • Proposals for future research on counter speech, and/or preliminary results of studies in this field We accept three types of submissions: * Regular research papers – long (8 pages) or short (4 pages); * Non-archival submissions: like research papers, but will not be included in the proceedings; * Research communications: 2-4 page abstracts summarising relevant research published elsewhere. Submission link: https://softconf.com/n/cs4oa2023 Location: co-located with SIGdialxINLG, Prague, Czechia Important dates All deadlines are Anywhere on Earth (UTC-12) * Submission deadline: Jun 26, 2023 * Notification of acceptance Jul 17, 2023 * Camera-ready deadline Aug 11, 2023 * Workshop date: September 11/12 2023 Format and Styling Submissions should follow ACL Author Guidelines<https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines> and policies for submission, review and citation, and be anonymised for double blind reviewing. Please use ACL 2023 style files; LaTeX style files and Microsoft Word templates are available at https://2023.aclweb.org/calls/style_and_formatting/<https://2021.aclweb.org/downloads/acl-ijcnlp2021-templates.zip>. Organising Committee: * Yi-Ling Chung, The Alan Turing Institute * Gavin Abercrombie, Heriot-Watt University * Helena Bonaldi, Fondazione Bruno Kessler * Marco Guerini, Fondazione Bruno Kessler Contact If you have any questions, please let us know at cs4oa(a)googlegroups.com Website: https://sites.google.com/view/cs4oa Twitter: @cs4oa_workshop<https://twitter.com/cs4oa_workshop> ________________________________ Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences. This email is generated from the Heriot-Watt University Group, which includes: 1. Heriot-Watt University, a Scottish charity registered under number SC000278 2. Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS. The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.

1 0

Birmingham Corpus Linguistics Summer School and Sinclair lecture 11 - 14 Sept 2023 (CCRSS23)
by Michaela Mahlberg 02 Jun '23

02 Jun '23

The 7th summer school hosted by the Centre for Corpus Research at the University of Birmingham will take place from 11th to 14th September 2023. https://www.birmingham.ac.uk/schools/edacs/departments/englishlanguage/even… Who is it for? The summer school is open to undergraduate, postgraduate, and doctoral students, as well as researchers who want to improve their skills to apply corpus methods in their own research. Our summer school aims to equip participants with critical expertise in both the theory and practice of corpus-supported linguistic research. Building on the strengths of our Centre for Corpus Research (CCR)<https://www.birmingham.ac.uk/research/activity/corpus/index.aspx> and our guest speakers, we strive to offer participants a learning experience that benefits their own specific research needs and enriches their experience as researchers more widely. There will be the opportunity for participants to present their own work and receive feedback from our expert team. Given the specialised nature of the programme, a basic understanding of corpus linguistics is highly recommended. What’s the format? The summer school will be online and consist of synchronous and asynchronous elements. Prior to the synchronous part, participants will be expected to complete asynchronous activities consisting of self-study video lectures and hands-on materials. The synchronous part will take place from 11 to 14 September, 2023. Over the course of four days, participants will be actively involved in two kinds of sessions. First, hands-on sessions will put the emphasis on the learning of practical skills for the purpose of extracting and analysing corpus data of various kinds, and the application of this knowledge to specific research projects. Second, participants will learn about current corpus research and theoretical foundations from our team. What is it about? Topics typically covered in the programme include (non-exhaustive list and subject to change): Corpus linguistics for media analysis Analysis of spoken corpora CQPweb Introduction to R and tidyverse Web scraping with R Digital humanities and the study of fiction Sign language corpora Regular expressions Corpora and legal research Critical issues in keyness analysis Behavioral Profiles Machine learning in corpus linguistics Corpus-based discourse analysis Corpus linguistics and language learning Who are the teachers? Our local team of corpus linguists includes Dagmar Divjak<https://www.birmingham.ac.uk/staff/profiles/languages/divjak-dagmar.aspx>, Natalie Finlayson<https://www.birmingham.ac.uk/schools/edacs/departments/englishlanguage/staf…>, Jason Grafmiller<https://www.birmingham.ac.uk/schools/edacs/departments/englishlanguage/staf…>, Jack Grieve<https://www.birmingham.ac.uk/staff/profiles/elal/grieve-jack.aspx>, Michaela Mahlberg<https://www.birmingham.ac.uk/staff/profiles/elal/mahlberg-michaela.aspx>, Karen McAuliffe<https://www.birmingham.ac.uk/staff/profiles/law/mcauliffe-karen.aspx>, Petar Milin<https://www.birmingham.ac.uk/staff/profiles/languages/milin-petar.aspx>, Akira Murakami<https://www.birmingham.ac.uk/staff/profiles/elal/murakami-akira.aspx>, Florent Perek<https://www.birmingham.ac.uk/staff/profiles/elal/perek-florent.aspx>, Laurence Romain<https://www.birmingham.ac.uk/staff/profiles/languages/romain-laurence.aspx>, Adam Schembri<https://www.birmingham.ac.uk/staff/profiles/elal/schembri-adam.aspx>, Paul Thompson<https://www.birmingham.ac.uk/staff/profiles/elal/thompson-paul.aspx>, and Bodo Winter<https://www.birmingham.ac.uk/staff/profiles/elal/winter-bodo.aspx>. Guest speakers will include Robbie Love<https://robbielove.org/> (Aston University), Stephanie Evert<http://www.stephanie-evert.de/> (FAU Erlangen), Alexander Piperski<https://www.linguistik.phil.fau.de/person/alexander-piperski/> (FAU Erlangen) Sinclair lecture As every year, the annual Sinclair Lecture will take place during the Summer School. This year, the Lecture will be delivered on Monday 11 September by our very own Susan Hunston<https://www.birmingham.ac.uk/staff/profiles/elal/hunston-susan.aspx>. The Sinclair lecture will be an in-person event. Registration will open shortly via our webpage https://www.birmingham.ac.uk/schools/edacs/departments/englishlanguage/even… For updates also follow @CCR_UoB See you there!

1 0

Call for Book Chapter Submissions: "Empowering Low-Resource Languages With NLP Solutions"
by Pankaj Dadure 02 Jun '23

02 Jun '23

Dear Sir/Ma'am, I hope you are doing well and in good health. We are excited to announce a call for a book chapter for an upcoming book titled "*Empowering Low-Resource Languages With NLP Solutions.*" Link: https://www.igi-global.com/publish/call-for-papers/call-details/6596 The objective of this book is to provide an in-depth understanding of Natural Language Processing (NLP) techniques and applications specifically tailored for low-resource languages. We believe that your valuable insights and research in this domain would greatly enrich the content of this book. To ensure a comprehensive and high-quality book, all submitted chapters will undergo a rigorous peer-review process. The accepted book will be *indexed in Scopus and Web of Science*, thereby enhancing the visibility and impact of your work. The book aims to cover a wide range of topics related to NLP in low-resource languages. Some of the suggested topics, although not limited to, include: · Introduction to Low-Resource Languages in NLP · Language Resource Acquisition for Low-Resource Languages · Morphological Analysis and Morpho-Syntactic Processing · Named Entity Recognition and Entity Linking for Low-Resource Languages · Part-of-Speech Tagging and Syntactic Parsing · Machine Translation for Low-Resource Languages · Sentiment Analysis and Opinion Mining for Low-Resource Languages · Speech and Audio Processing for Low-Resource Languages · Text Summarization and Information Retrieval for Low-Resource Languages · Multimodal NLP for Low-Resource Languages · Code-switching and Language Identification for Low-Resource Languages · Evaluation and Benchmarking for NLP in Low-Resource Languages · Applications of NLP in Low-Resource Language Settings · Future Directions and Challenges in NLP We encourage you to contribute a book chapter focusing on any of the above-mentioned topics or related areas within the scope of NLP in low-resource languages. The submission guidelines are as follows: 1. Please submit a chapter proposal (maximum 500 words) outlining the objective, methodology, and expected outcomes of your proposed chapter by July 3, 2023, to the submission portal: https://www.igi-global.com/publish/call-for-papers/call-details/6596 2. Chapter proposals should include the title of the chapter, author(s) name and their affiliations. 3. All submissions should be original and should not have been previously published or currently under review elsewhere. 4. The chapters should be written in English and adhere to the formatting guidelines provided after the acceptance of the proposal. *Important Dates:* July 3, 2023: Proposal Submission Deadline July 17, 2023: Notification of Acceptance September 17, 2023: Full Chapter Submission October 31, 2023: Review Results Returned December 12, 2023: Final Acceptance Notification December 26, 2023: Final Chapter Submission Thank you for considering this invitation, and we look forward to receiving your valuable contribution to this book. If you have any further questions or require additional information, please do not hesitate to contact us. Best regards, Editorial Team Dr. Partha Pakray National Institute of Technology Silchar Email: partha(a)cse.nits.ac.in Dr. Pankaj Dadure University of Petroleum and Energy Studies Dehradun Email: pankajk.dadure(a)ddn.upes.ac.in Prof. Sivaji Bandyopadhyay Jadavpur University, Kolkata Email: sivaji.cse.ju(a)gmail.com

1 0

RANLP 2023 Student Workshop - Second Call for Papers
by Ivelina Nikolova 01 Jun '23

01 Jun '23

-----------------Apologies for cross-posting------------------- Second Call for Papers RANLP 2023 Student Research Workshop 4-6 September 2023 Varna, Bulgaria https://sites.google.com/view/ranlp-stud-2023/ The International Conference RANLP 2023 (http://ranlp.org/) would like to invite students at all levels (undergraduate, Master-, and PhD-students) to present their ongoing or completed work at the Student Research Workshop (https://sites.google.com/view/ranlp-stud-2023/). SUBMISSIONS We invite two types of student submissions: Full Papers must describe original unpublished work of the student in any topic area of the workshop. Full papers are limited to 8 pages for content, with 2 additional pages for references. Short Papers may describe either work in progress or a research proposal. They may also be in the style of a position paper that surveys and criticizes existing literature. Short papers must include clear directions for future research. Submissions of this type are limited to 6 pages for content, with 2 additional pages for references. All papers must be submitted in .pdf format through the START system (https://softconf.com/ranlp23/ranlp20t23stud/) . The papers should follow the format of the main conference, described at the RANLP website (http://ranlp.org/), Submissions page. All papers must have only student authors. Submissions with non-student authors will not be considered for review. After eventual acceptance of the paper, the authors could add their supervisor(s) in the Acknowledgments Section. The submissions must specify the student’s level (Bachelor-, Master-, or PhD) and the type of submission (Full or Short). Double submission Authors may submit the same paper at several conferences. In this case, they must notify the organizers by filling in the corresponding information in the submission form, as well as notifying the contact organizer by email. TOPICS OF INTEREST The aim of this workshop is to facilitate the exchange of knowledge between young researchers by providing an excellent opportunity to present and discuss their work and to receive mentorship and valuable feedback from an international research community. The research to be presented can come from any topic within Natural Language Processing (NLP) and Computational Linguistics, including but not limited to the following: Computational Social Science and Social Media; Computer-aided Language Learning; Dialogue and Interactive Systems; Discourse and Pragmatics; Ethics and NLP; Information Extraction; Information Retrieval and Text Mining; Intent Recognition and Detection; Interpretability and Analysis of Models for NLP; Language and Vision; Language Generation; Language Resources and Corpora; Linguistic Theories; Machine Translation and Computer-aided Translation Tools; Multilingual NLP; Multimodal Systems; NLP Applications – Biomedical, Educational, Healthcare, Financial, Legal, Semantic Web, etc.; Opinion Mining and Sentiment Analysis; Phonetics, Phonology, and Morphology; Question Answering; Semantics; Stylistic Analysis; Sublanguages and Controlled languages; Syntax: Tagging, Chunking, and Parsing; Temporal Processing; Text Categorization; Text Simplification and Readability Estimation; Text Summarisation; Text-to-Speech Synthesis and Speech Recognition; Textual Entailment. All accepted papers will be presented at the Student Workshop sessions (oral or poster) during the main conference days: 4-6 September 2023. The articles will be issued in a special Student Session proceedings and uploaded to the ACL Anthology. IMPORTANT DATES Submission deadline: 3 July 2023 Acceptance notification: 4 August 2023 Camera-ready deadline: 20 August 2023 Workshop: 4 - 6 September 2023 All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth") ORGANISERS Momchil Hardalov (AWS AI Labs, Spain) Zara Kancheva (Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria) Boris Velichkov (Faculty of Mathematics and Informatics at Sofia University “St. Kliment Ohridski”, Bulgaria) Ivelina Nikolova-Koleva (Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, and Sirma AI, Bulgaria) Milena Slavcheva (Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Bulgaria)

1 0

PhD Studentship in IR/NLP/AI: Large Language Models for Academic SEarch and Recommendation
by Ingo Frommholz 01 Jun '23

01 Jun '23

Faculty of Science and Engineering Dean Research PhD Studentship LASER - Large Language Models for Academic SEarch and Recommendation Deadline: June 19, 2023 University of Wolverhampton, UK Applications are invited for doctoral study in Computer Science, Information Retrieval and Natural Language Processing on the topic of Large Language Models for Academic Search and Recommendation. Project Description Scientific publications are an important vehicle for understanding the world around us; they contain scientific evidence that informs researchers and decision-makers, with a high impact on society. However, the rapid and large number of publications, in particular on preprint servers, causes an information overload for everybody struggling to keep up with developments in their field. This makes finding relevant information of high quality a challenging task, which requires advanced scholarly search and recommendation solutions. Recent developments in Large Language Models (LLMs) are having a huge impact on Artificial Intelligence (AI) and related fields. LLMs are a type of AI trained on huge amounts of text, with ChatGPT/GPT-4 and Bard as popular examples. LLMs combined with conversational AI provide exciting new possibilities for interactive search and recommendation, but they are also suffering from severe flaws. While there are efforts to combine LLMs with, e.g., neural search, the endeavour of utilising LLMs to tackle information overload in academia has only started and more research is needed. This PhD studentship will explore how LLMs can be used to improve academic search and recommendation and what their benefits and limitations are. This may include integrating LLMs into search and recommendation services or utilising search to keep LLMs from "hallucinating". A further part of this project is to estimate the quality of publications. The PhD project provides exciting opportunities for the successful candidate to work with and critically reflect on innovative technologies at the forefront of AI that will shape our digital future. As a further incentive, the PhD candidate will be able to participate in an EU Horizon Europe Staff Exchange project, providing the opportunity to go on fully funded secondments to collaborate with an international network of researchers and industry partners. For further information regarding the project or an informal discussion please contact Director of Studies, Dr Ingo Frommholz <i.frommholz(a)wlv.ac.uk>. To apply for one of the above PhD Research Studentship applicants must hold a first class/distinction at Master and/or Bachelor level of study. Applications to include one identified project, a full CV (including 2 referee names and contact details), transcripts and a letter of application outlining the motivation for applying (maximum of 2 pages). Applicants from outside UK must provide evidence of English Language requirement as stated in https://www.wlv.ac.uk/research/research-degrees/ Application submission deadline is 10:00am BST 19 June 2023 to FSEPGR(a)wlv.ac.uk A shortlist of candidates will be prepared from the pool of applicants, in line with Faculty of Science and Engineering Post Graduate Research (PGR) studentship selection criteria, who will be invited to attend an interview with a panel of academic staff, week commencing 26 June 2023. Following this process, all successful candidates will be notified to enrol in July 2023 on a PhD degree programme. The studentship award will include tuition fees at home level for the first three years of full-time study including any write-up period fees and research support fees. For further information on fees https://www.wlv.ac.uk/apply/funding-costs-fees-and-support/fees-and-costs/r… Informal enquiries are welcome and should be directed to the individual Director of Studies mentioned above. Further information: https://www.wlv.ac.uk/schools-and-institutes/faculty-of-science-and-enginee… (look for the LASER project) -- Ingo Frommholz, PhD, FBCS, FHEA Reader (~Associate Professor) in Data Science Deputy Head Digital Innovations and Solutions Centre (DISC) University of Wolverhampton, UK Adjunct Professor, Bern University of Applied Sciences, Switzerland Web: http://www.frommholz.org/ | Email: ifrommholz(a)acm.org Twitter: @iFromm | Mastodon: @ingo@idf.social PGP/GPG fingerprint: B74E A422 C7B2 A5BB 2BC2 523B 2790 216E F8F8 D166 http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x2790216EF8F8D166

1 0

LongEval: Call for papers and talk proposals
by Elena Kochkina 01 Jun '23

01 Jun '23

Dear colleagues, We invite submissions of papers and talk proposals to LongEval 2023 Workshop on Longitudinal Evaluation of Model Performance. https://clef-longeval.github.io/ CLEF 2023 Conference and Labs of the Evaluation Forum<https://clef2023.clef-initiative.eu/index.php> 18-21 September 2023, Thessaloniki - Greece<https://clef2023.clef-initiative.eu/index.php><https://clef2023.clef-initiative.eu/index.php> Topics of interest include (but not limited to): • Evaluations of the temporal persistence of information retrieval (IR) systems and text classifiers for various tasks • Challenges posed by the dynamic nature of language • Time-aware longitudinal models • Post-evaluation stage LongEval shared-task submissions. Deadlines: Papers: June 5th (to be included in the proceedings) Submission format: https://drive.google.com/drive/folders/1r2lNOteMNoQrhQGUat6VHPUnwFAgR8Nz Length: a maximum of 8 pages not including references and appendices). Submission link: https://easychair.org/my/conference?conf=clef2023 Talk proposals: July 10th (not included in the proceedings) Submission format: a maximum of 2 pages Submission link: https://forms.gle/fU46Fb5zJufxF5NV8 Organisers: Alkhalifa, Rabab, Bilal, Iman, Borkakoty, Hsuvas, Camacho-Collados, Jose, Deveaud, Romain, El-Ebshihy, Alaa, Espinosa-Anke, Luis, Gonzalez-Saez, Gabriela, Galusakova, Petra, Goeuriot, Lorraine, Kochkina, Elena, Liakata, Maria, Loureiro, Daniel, Tayyar Madabushi, Harish, Mulhem, Philippe, Piroi, Florina, Popel, Martin, Servan, Christophe, Zubiaga, Arkaitz. Feel free to reach out with any questions! Best regards, Elena Kochkina On behalf of LongEval organisers

1 0

2026

2025

2024

2023

2022

Corpora June 2023