----------------------------------------------
Invitation to Arabic NER Shared Task 2023
----------------------------------------------
Dear colleagues,
We are happy to invite you to join the Arabic NER SharedTask 2023 which will be organized as part of the WANLP 2023. We will provide you with a large corpus and Google Colab notebooks to help you reproduce the baseline results.
دعوة للمشاركة في مسابقة استخراج الكيونات المسماه من النصوص العربية. سنزود المشاركين بمدونة وبرمجيات للحصول على نتائج مرجعية يمكنهم البناء عليها.
For more details please visit the shared task website from ( https://dlnlp.ai/st/wojood/ ).
You can register directly from ( https://docs.google.com/forms/d/e/1FAIpQLSeWwvGSRMcSa7CHStGLkE8ODY87571wGD2… ).
-----------------------------------------
INTRODUCTION
-----------------------------------------
Named Entity Recognition (NER) is integral to many NLP applications. It is the task of identifying named entity mentions in unstructured text and classifying them to predefined classes such as person, organization, location, or date. Due to the scarcity of Arabic resources, most of the research on Arabic NER focuses on flat entities and addresses a limited number of entity types (person, organization, and location). The goal of this shared task is to alleviate this bottleneck by providing Wojood, a large and rich Arabic NER corpus. Wojood consists of about 550K tokens (MSA and dialect, in multiple domains) that are manually annotated with 21 entity types.
-----------------------------------------
REGISTRATION
-----------------------------------------
Participants need to register via this form (https://forms.gle/UCCrVNZ2LaPviCZS6). Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups.
-----------------------------------------
FAQ
-----------------------------------------
For any questions related to this task, please check our Frequently Asked Questions (https://docs.google.com/document/d/1XE2n89mFLic2P9DO_sAD51vy734BOt0kgtZ6bFf…)
-----------------------------------------
IMPORTANT DATES
-----------------------------------------
Below is subject to change:
- March 03, 2023: Registration available
- March 25, 2023: Data-sharing and evaluation on development set Avaliable
- April 10, 2023: Registration deadline
- May 20, 2023: Test set made available
- May 30, 2023: Evaluation on test set (TEST) deadline
- Jun 25, 2023: Shared task system paper submissions due
- JUL 15, 2023: Notification of acceptance
- Jul 30, 2023: Camera-ready version
- TBA, 2023: WANLP 2023 Conference.
* All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).
-----------------------------------------
CONTACT
-----------------------------------------
For any questions related to this task, please contact the organizers directly using the following email address: NERShare...(a)gmail.com or join the google group: https://groups.google.com/g/ner_sharedtask2023.
-----------------------------------------
SHARED TASK
-----------------------------------------
As described, this shared task targets both flat and nested Arabic NER. The subtasks are:
Subtask 1: Flat NER
In this subtask, we provide the Wojood-Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). The flat NER dataset is the same as the nested NER dataset in terms of train/test/dev split and each split contains the same content. The only difference in the flat NER is each token is assigned one tag, which is the first high-level tag assigned to each token in the nested NER dataset.
Subtask 2: Nestd NER
In this subtask, we provide the Wojood-Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%).
-----------------------------------------
METRICS
-----------------------------------------
The evaluation metrics will include precision, recall, F1-score. However, our official metric will be the micro F1-score.
The evaluation of shared tasks will be hosted through CODALAB. Teams will be provided with a CODALAB link for each shared task.
-CODALAB link for NER Shared Task Subtask 1 (Flat NER)
-CODALAB link for NER Shared Task Subtask 2 (Nestd NER)
-----------------------------------------
BASELINES
-----------------------------------------
Two baseline models trained on Wojood (flat and nested) are provided:
Nested NER baseline: is presented in this article, and code is available in GitHub. The model achieves a micro F1-score of 0.9059 (note that this baseline does not handle nested entities of the same type).
Flat NER baseline: same code repository for nested NER (GitHub) can also be used to train flat NER task. Our flat NER baseline achieved a micro F1-score of 0.8785.
-----------------------------------------
GOOGLE COLAB NOTEBOOKS
-----------------------------------------
To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models.
[1] Train Flat NER: This notebook can be used to train our ArabicNER model on the flat NER task using the sample Wojood data found in our repository.
[2] Evaluate Flat NER: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.
[3] Train Nested NER: This notebook can be used to train our ArabicNER model on the nested NER task using the sample Wojood data found in our repository.
[4] Evaluate Nested NER: this notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.
-----------------------------------------
ORGANIZERS
-----------------------------------------
- Mustafa Jarrar, Birzeit University
- Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
- Mohammed Khalilia, Birzeit University
- Bashar Talafha, University of British Columbia
- AbdelRahim Elmadany, University of British Columbia
- Nagham Hamad, Birzeit University
- Alaa Omer, Birzeit University
============================================
NTCIR-17 MedNLP-SC
https://sociocom.naist.jp/mednlp-sc/
Registration due: June 1, 2023
============================================
----------------
Invitation to the MedNLP-SC shared task
----------------
Continuing the series of NTCIR Natural Language Processing shared tasks and
conferences, NTCIR-17 will be held in Tokyo on December 12-15, 2023. As in
previous editions, NTCIR-17 features the MedNLP task on medical NLP. This
year, MedNLP-SC is about Medical Natural Language Processing for Social
Media and Clinical Texts.
This task provides two types of data:
- social media (artificially created tweets) in Japanese, English, German,
and French in parallel;
- radiology reports in Japanese.
We invite you to explore the sample data at
https://sociocom.naist.jp/mednlp-sc/ and to participate in the task.
---------------
Tasks
----------------
1) Social Media (SM) Subtask: Adverse drug event detection (ADE)
(Languages: Japanese, English, French, and German)
2) Radiology Report (RR) Subtask: TNM staging
(Language: Japanese)
More detail and examples are available at
https://sociocom.naist.jp/mednlp-sc/
------------------
Schedule
------------------
* March 2023: Dataset release
* June 1, 2023: Deadline for registration
* July 10: Test data release
* July 17, 2023: Deadline for submission of test runs (Formal Run)
* August 1, 2023: Evaluation result release to the participants
* August 1, 2023: Task overview paper release (draft)
* September 1, 2023: Deadline for submission of participant papers
* November 1, 2023: Deadline for camera-ready participant papers
* December 12-15, 2023: NTCIR-17 Conference (NII, Tokyo, Japan) (hybrid
event, online presentation will be available)
------------------
Task Registration
------------------
Please register on the NTCIR-17 website:
http://research.nii.ac.jp/ntcir/ntcir-17/howto.html
------
Organizers
------
(JAPAN)
Eiji Aramaki, Ph.D. (NAIST, Japan)
Yuta Nakamura, M.D. (The University of Tokyo, Japan)
Shoko Wakamiya, Ph.D. (NAIST, Japan)
Shuntaro Yada, Ph.D. (NAIST, Japan)
Shouhei Hanaoka, M.D., Ph.D. (The University of Tokyo, Japan)
Gabriel Herman Bernardim Andrade (NAIST, Japan)
Faith Wavinya Mutinda (NAIST, Japan)
Noriki Nishida, Ph.D. (RIKEN, Japan)
Tomohiro Nishiyama (NAIST, Japan)
Hiroki Teranishi, Ph.D. (RIKEN, Japan)
Narumi Tokunaga (RIKEN, Japan)
Akiko Aizawa, Ph.D. (NII, Japan)
Yuji Matsumoto, Ph.D. (RIKEN, Japan)
(FRANCE)
Cyril Grouin, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Thomas Lavergne, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Aurélie Névéol, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Patrick Paroubek, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
Hui-Syuan Yeh (Université Paris-Saclay, CNRS, LISN, France)
Pierre Zweigenbaum, Ph.D. (Université Paris-Saclay, CNRS, LISN, France)
(GERMANY)
Lisa Raithel (DFKI, Germany, TU Berlin, Germany, and Université
Paris-Saclay, CNRS, LISN, France)
Roland Roller, Ph.D. (DFKI, Germany)
Philippe Thomas, Ph.D. (DFKI, Germany)
* MedNLP-SC contact: mednlp-sc[at]is.naist.jp
*** DIPROMATS 2023 Call for participation ***
A challenge on the automatic detection and characterization of propaganda
techniques in public messages from diplomats and authorities from USA,
China, EU and Russia
https://sites.google.com/view/dipromats2023
<https://sites.google.com/view/dipromats2023?authuser=0>
We are glad to invite you to participate in DIPROMATS 2023, the shared task
on propaganda detection at the Iberian Languages Evaluation Forum (IberLEF
2023)
Unlike fake news, the detection of propaganda in news and social media has
not attracted so much attention from journalists, fact-checkers, or
scholars. In our view, this hinders the endeavors against hostile and
manipulative information. The deceiving intent of propaganda may be more
subtle and devious than disinformation; its content does not have to be
false, and its effects may be only discernible through systematic
observation over time.
As a means by which certain ideas and actions propagate, propaganda
involves rhetorical techniques to improve replication. This task proposes a
specific approach to detect those techniques based on the language employed
by official authorities on Twitter. The corpus provided for the task
encompasses tweets in Spanish and English from diplomats of four different
international actors: China, Russia, United States, and the European Union.
The authorities collected include government accounts, embassies,
ambassadors, and other diplomatic profiles such as consuls and missions.
This shared task challenges participants to classify tweets according to
the following two tasks:
Task 1: propaganda identification: The first subtask is a binary
classification problem. The systems must decide whether a given tweet
contains propaganda techniques.
Task 2 & 3: propaganda characterization: The second subtask aims to
categorize the type of propaganda. The proposed categorization considers
multiple techniques identified in literature that are clustered according
to their rhetorical features. We propose a multiclass, multilabel
classification task, where systems have to decide, for each tweet, in which
of the available categories it fits. The proposed typology can be found
here. Evaluation will consider a coarse grain categorization (Task 2) with
four classes of propaganda (plus the negative class), and a fine-grained
categorization (task 3) with 15 subclasses (plus the negative class).
We encourage participation from both academic institutions and industrial
organizations. To participate in the task, please fill the registration
form at https://sites.google.com/view/dipromats2023/registration
<https://sites.google.com/view/dipromats2023/registration?authuser=0>
Important Dates:
* Registration opens: January 30th, 2023
* Training data released: March 23th, 2023
* Test set release: April 11th, 2023
* Deadline for submitting runs: April 25th, 2023
* Release of evaluation results: May 9th, 2023
* Paper submission deadline: May 30th, 2023
* Camera-ready submissions for organizers: July 6th, 2023
* IberLEF Workshop: September 27, 2023, together with SEPLN 2023
Organizers:
Pablo Moral, Universidad Nacional de Educación a Distancia (UNED)
Guillermo Marco, Universidad Nacional de Educación a Distancia (UNED)
Julio Gonzalo, Universidad Nacional de Educación a Distancia (UNED)
Contact:
If you have any questions or need more information, please do not hesitate
to contact us at dipromats(a)lsi.uned.es
*Overview*
EACL 2023 is providing D&I funds for registration, travel/accommodation,
caregiving, bandwidth and VPN subsidies. The grants are intended for
individuals for whom attending the conference would cause a financial
burden or create risks to their safety or privacy. Please apply by 27th
March 2023 (2 weeks before the early registration deadline) at
https://forms.gle/RqAGjjEjwdvnEqBv7.
This call is also available online:
https://2023.eacl.org/calls/d-i-subsidies
Registration subsidies
This is for getting a waiver for EACL registration. If you haven’t already
registered for an *ACL in 2023, you would also need to pay the ACL
membership registration fee, e.g. $100 for regular. You can also request
for ACL membership if that’s applicable.
Note that if you are applying for a registration fee waiver, it is expected
that you DO NOT register for the conference until you hear from us about
your D&I subsidy application. That way, you would not have to ask for a
reimbursement, since you would not be charged in the first place. We
strongly encourage folks to apply for a volunteership program in order to
maximize their chances for getting their registration fees waived.
Travel subsidies and pre-paid accommodation
We have a limited number of pre-paid accommodation available. These are
shared double-rooms in the conference hotel. Please list any special
requirements in your application form. We also have a limited budget
allocated to support international travel. Please make sure to outline how
you will benefit from in-person attendance at this stage of your career.
Caregiving and accessibility subsidies
We can reimburse caregiving purchases that you need for the conferences.
You need to provide receipts for your purchases. These can be babysitting,
transportation costs, or other support required for yourself or your
dependents, which would ease your participation. If you need any personal
or technical assistance to access and navigate the conference due to a
disability, we can reimburse you for the cost of it. This holds for example
if you need assistance in using the platform for online participation due
to a visual impairment. We acknowledge that accessibility needs are highly
individual to each participant, so please feel free to reach out to us with
any issues you might face in this regard.
Bandwidth and VPN subsidies
We can reimburse bandwidth or VPN purchases that you need for the
conferences. You need to provide receipts for your purchases. Bandwidth
subsidies would be for high-speed internet access costs for the duration of
the conference only. VPN generally does not cost more than 10 USD per month
and could be applied towards participation if you need anonymity to
participate completely (for example, for queer folks who might need this
for extra security depending on where they live).
Selection Criteria
Applicants for the subsidy program will be evaluated based on the material
they submit in their application packages. Preference will be given to
applicants who are presenting a paper in the main conference, the Student
Research Workshop, or any of the workshops associated with EACL 2023, and
who do not have other means of support. Preferences will be given to
residents of under-represented regions and members of marginalized
communities (as detailed in the application form).
Instructions
Applicants for the D&I subsidy Program should fill the application form at:
https://forms.gle/RqAGjjEjwdvnEqBv7.
*Application deadline:*
27th March 2023 (anywhere on earth) – 2 weeks before the early registration
deadline.
*Notification of acceptance:*
3rd April 2023 – 1 week before the early registration deadline
We aim to send all notifications by the deadline above. However, we will
start processing the applications as they come and you may be contacted
earlier.
*Contact Information*
The co-chairs of the Diversity and Inclusion committee can be contacted by
email at: eacl2023_div_incl(a)hw.ac.uk
(apologies for cross posting)
*** Second Call for Participation for GUA-SPA at IberLEF 2023 ***
GUA-SPA - Guarani-Spanish Code Switching Analysis at IberLEF 2023
News: The training dataset has just been released, and you can send submissions to the development phase!
https://codalab.lisn.upsaclay.fr/competitions/11030
Guarani is a South American indigenous language that has been in contact with Spanish and other Indo-European languages for about 500 years, which has resulted in many interesting varieties with different levels of mixture. In Paraguay, according to the most recent census, most of the population of the country speak at least some Guarani, and there is a high prevalence of Guarani-Spanish bilingualism. Bilingual speakers often make use of the two languages at the same time, mixing them in different ways, in a phenomenon called code-switching. We propose a challenge for analyzing code-switched texts in Guarani and Spanish, trying to identify the language used in each span of text, the named entities mentioned in the text, and the way Spanish is used. For this, we will provide a corpus of news and tweets where each token is labeled with an appropriate language or category identifier.
Three tasks are presented:
* Language identification in code-switched data: identify each token in the text as Guarani, Spanish, mix, named entity, or other categories.
* Named entity classification: classify the named entities found in the text as locations, organizations or people.
* Spanish code classification: classify the spans in Spanish as code changes or unadapted loans.
How to participate:
If you want to participate in this task, please join our Codalab competition: https://codalab.lisn.upsaclay.fr/competitions/11030
Important Dates:
* March 22nd, 2023: training and development set. Development phase begins.
* May 24th, 2023: test set and open for submissions. Evaluation phase begins.
* June 7th, 2023: evaluation phase ends. Publication of results.
* June 14th, 2023: paper submission.
* June 28th, 2023: notification of acceptance.
* July 3rd, 2023: camera-ready paper submission.
* September 26th, 2023: IberLEF 2023 Workshop.
Apologies for cross-posting.
----------------------------------------
We kindly invite you to participate at the Pan-2023 Task 2 - “Profiling Cryptocurrency Influencers with Few-Shot Learning”.
This task is being held as part of CLEF 2023, and all participating teams will be able to publish their system description paper at the CLEF proceedings.
This shared task focuses on the author profiling of cryptocurrency influencers in social media from a low-resource perspective, that is, with little training data. Moreover, we propose to profile types of influencers also using a low-resource setting.
Specifically, we focus on English Twitter posts for three different sub-tasks:
• Low-resource influencer profiling (subtask-1): profile authors according to their degree of influence (null, nano, micro, macro, mega).
• Low-resource influencer interest profiling (subtask-2): profile authors according to their main interests or areas of influence (technical information, price update, trading matters, gaming, other).
• Low-resource influencer intent profiling (subtask-3): profile authors according to the intent of their messages (subjective opinion, financial information, advertising, announcement).
Important Links
• Task Website - https://pan.webis.de/clef23/pan23-web/author-profiling.html
• Dataset site - https://zenodo.org/record/7701748#.ZBxxrnbMKiM
Important Dates
• February 20, 2023: Training data ready
• May 10, 2023: Early bird software submission phase (optional)
• May 29, 2023: Software submission deadline
• June 05, 2023: Participant paper submission
• September 18-21, 2023: Conference
Task organizers
• Francisco Rangel (Symanto)
• Mara Chinea-Rios (Symanto) - Contact Email: mara.chinea(a)symanto.com
• Marc Franco-Salvador (Symanto)
• Paolo Rosso (Universidad Politécnica de Valencia)
Please reach out to the organizers at crypto-influencers-pan-organizers(a)googlegroups.com, or join the Slack workspace (https://pan2023profil-0q48349.slack.com) to connect with the other participants and organizers.
The University of Manchester is hiring a Lecturer (UK equivalent to Assistant Professor) in Computational Linguistics, to be based in the Department of Linguistics and English Language. It is a permanent post. Applications are welcome from people working in all areas of Natural Language Processing.
More information on the position and how to apply can be found here:
https://www.jobs.manchester.ac.uk/displayjob.aspx?jobid=24945
The closing date for applications is April 24th 2023.
Best,
Colin
This full-day "Communication in Human-AI Interaction" (CHAI) workshop
welcomes submissions of position papers (8 pages maximum in Springer LNCS
format).
The workshop will be held as part of the *INTERACT 23 conference* (Aug 28 -
Sep 1, York, UK and online, https://interact2023.org/).
It will be organized as an interactive work group event, including a design
activity and group discussions.
Important dates and links
* Paper deadline: April 28, 2023 AoE
* Notification: May 19, 2023
* Workshop date: between August 28 and September 1, 2023 (TBD)
* Workshop website:https://chai-workshop.github.io/
* Submission website:https://easychair.org/conferences/?conf=chai23
*Goal and topics:*
Human interactions with AI systems are becoming part of our everyday life.
If designed and developed efficiently, these interactions have great
potential to enhance human work, abilities, and well-being. Communication,
here is the iterative process of establishing shared meaning, is a crucial
aspect of successful interaction and has been studied for years, from HCI,
AI, and cognitive sciences points of view, among others. The goal of this
workshop is to bring together experts from HCI, AI, and Cognitive Sciences
to explore and understand the specificities and characteristics of
communication in human-AI interactions, as well as the salient principles,
methods, and theories one has to consider to build meaningful human-AI
communication systems.
Topics of interest include but are not limited to:
* Blended social contexts, comprising both human and technological
communication
* Communication in multi-user interaction with intelligent agents
* Verbal and non-verbal human-AI communication
* Explicit and implicit human-AI communication
* Rules of etiquette and social norms in human-AI communication
* Communication in human-AI collaboration
* Human-AI communication design
* Language and communication
* Theory of Mind
* Common ground
* Inclusion and Diversity in Human-AI Communication
* Embodied multi-modal communication between humans and physical robots
*Format:*
The workshop will be a one-day event, envisioned as a collaborative
thinking group activity focusing on the challenges surrounding
Communication in Human-AI Interaction. The workshop aims at being
interactive and will consist of two main parts.
First, participants will be invited to collaborate on a design activity.
For this activity, participants will be grouped as best as possible by
topics and each group will be given the same challenge: analyzing and
suggesting a solution to an HAI communication problem. At the end of the
activity, each group will report on their solution.
Second, we will have group discussions in two phases, for which groups will
be shuffled to be cross-disciplinary. Each group will be given a separate
topic. The goal of each group will be to identify common interests, open
questions, and challenges related to this topic. The groups are shuffled
between phases, and participants are encouraged to build on top of the
previous groups’ discussion. Each discussion phase will be followed by a
quick report of each group. The day will end by a longer plenary discussion
during which we will also outline the after-workshop steps.
The design activity problem and the discussion topics will be chosen
following participants submissions so that they align with the expertise
and interest of the workshop participants.
By the end of the workshop, we expect to be as close as possible to a
common agreement about an outline for the coordinated research agenda,
including key points to explore further.
Post-workshop, the results will be communicated to a larger audience
through various activities, to be synchronized with the participants.
*Submission Requirements:*
We welcome interested participants to submit position papers stating
existing work, conceptual design, or their position with respect to the
workshop topics. The submission should also include one or two points of
discussion that the participant would like to address during the workshop.
Submissions should be made through EasyChair (
https://easychair.org/conferences/?conf=chai23) and be formatted according
to the Springer LNCS format, templates available for LaTeX and Word,
maximum 8 pages. We strongly encourage authors of position papers to follow
the SIGCHI accessibility guidelines.
Workshop organizers
Jennifer Renoux, Örebro University, Sweden
Jasmin Grosinger, Örebro University, Sweden
Marta Romeo, Heriot-Watt University, UK
Victor Kaptelinin, Umeå University, Sweden
Antti Oulasvirta, Aalto University, Finland
Contact information
Workshop website:https://chai-workshop.github.io/
For inquiries about the workshop, please contact:
jennifer.renoux(a)oru.se
[Apologies for multiple postings]
ImageCLEFmedicalGANs (1st edition)
Registration: https://www.imageclef.org/2023/medical/gans
Run submission: May 10, 2023
Working notes submission: June 5, 2023
CLEF 2023 conference: September 18-21, Thessaloniki, Greece
*** CALL FOR PARTICIPATION ***
The task is focused on examining the existing hypothesis that GANs are
generating medical images that contain the "fingerprints" of the real
images used for generative network training. If the hypothesis is
correct, artificial biomedical images may be subject to the same
sharing and usage limitations as real sensitive medical data. On the
other hand, if the hypothesis is wrong, GANs may be potentially used
to create rich datasets of biomedical images that are free of ethical
and privacy regulations. The participants will test the hypothesis by
solving one or several tasks related to the detection of relations
between real and artificial biomedical image datasets.
*** TASK ***
Given a set of real-world medical images comprising 2D axial CT image
slices of the heart (including the middle sections and adjacent
slices) of patients afflicted with lung #tuberculosis, the task
challenges participants to develop #machinelearning solutions to
automatically determine which real images were used in training the
generator of realistic synthetic examples.
*** DATA SET ***
The image datasets comprise 2D axial CT image slices of the heart,
including the middle sections of the heart and adjacent slices. These
images are obtained from patients afflicted with Lung Tuberculosis and
are stored in the form of 8 bit/pixel PNG images with dimensions of
256x256 pixels. The development dataset comprises three distinct sets
of images. One set contains images that were generated using a GAN,
while the other two sets are comprised of real images. The first of
these real image sets contains images that were used during the
algorithm's training process. The second set consists of real images
that were not used during the training process. Test dataset is a
collection of two image sets. The first set contains 10,000 images
that have been generated, while the second set is made up of a
combination of 200 real images that were either used or unused during
the training process.
*** IMPORTANT DATES ***
- Run submission: May 10, 2023
- Working notes submission: June 5, 2023
- CLEF 2023 conference: September 18-21, Thessaloniki, Greece
(https://clef2023.clef-initiative.eu/)
*** OVERALL COORDINATION ***
Serge Kozlovski, Belarusian Academy of Sciences, Belarus
Vassili Kovalev, Belarusian Academy of Sciences, Belarus
Ihar Filipovich, Belarus State University, Belarus
Alexandra Andrei, Politehnica University of Bucharest, Romania
Ioan Coman, Politehnica University of Bucharest, Romania
Bogdan Ionescu, Politehnica University of Bucharest, Romania
Henning Müller, University of Applied Sciences Western Switzerland, Switzerland
*** ACKNOWLEDGEMENT ***
Alexandra Andrei, Ioan Coman, Bogdan Ionescu, and Henning Müller
contribution is supported under the H2020 AI4Media "A European
Excellence Centre for Media, Society and Democracy" project, contract
#951911 https://www.ai4media.eu/.
On behalf of the Organizers,
Bogdan Ionescu
https://www.AIMultimediaLab.ro/
3 July 2023 (online, via Zoom)
Organisers: Catherine Travis & Li Nguyen (ANU; Language Data Commons of Australia (LDaCA))
Over decades of work in Australia, significant collections of language data have been amassed, including of varieties of Australian English, Australian migrant languages, Australian Indigenous languages, sign languages and others. These collections represent a trove of knowledge not only of language in Australia, but also of Australia’s social and cultural history. And yet, not all are well known and many lack published descriptions. The purpose of this workshop is to provide an opportunity to share information about existing language corpora in Australia, with a view to producing a special issue of the Australian Journal of Linguistics that introduces a selection of these corpora, explores how they can contribute to our understanding of language, society, and history in Australia, and considers avenues that such corpora open up for future research.
This workshop is being run as part of the Language Data Commons of Australia (LDaCA), which is working to build national research infrastructure for the Humanities and Social Sciences, facilitating access to and use of digital language corpora for linguists, scholars across the Humanities and Social Sciences, and non-academics.
Abstract submission
For a 20 min presentation, please submit a 250-300 word abstract in English (excluding references). The presentation should include the following information:
· Speech community/fieldsite: Describe the location of the community and/or their brief history in Australia, the languages spoken and their current status.
· Corpus design principles: Specify the sample size, sociolinguistic background of the participants, method of data collection and/or genre (e.g. sociolinguistic interviews, natural conversations, oral histories, elicited data, etc.); data format (written/spoken/audio/video, etc.) and where it is stored.
· Corpus findings and implications: Summarise some key findings from the corpus and discuss other insights that might be obtained from the data in current or future work.
Important dates
22 May Abstracts due
5 June Notification of acceptance
3 July Workshop
How to Submit: Please submit your abstract by 22 May on https://forms.gle/1pwxVVmUV5hCCZ997
Inquiries: Please contact either Catherine Travis or Li Nguyen