ENGLISH VERSION BELOW
Mae'n dda gennym lansio TestunRhydd - pecyn cymorth dwyieithog ar-lein am ddim ar gyfer dadansoddi a delweddu data testun rhydd (o arolygon, holiaduron etc) yn Gymraeg a Saesneg. Mae TestunRhydd yn defnyddio rhai o’r gwasanaethau a’r methodolegau corpws o CorCenCC ac ACC (Crynhoi Testunau Cymraeg yn Awtomatig), ac yn eu hailbecynnu fel bod modd i gynulleidfaoedd a grwpiau o ddefnyddwyr newydd ddadansoddi eu data adborth eu hunain. Wedi'i gynllunio ar y cyd ag Ymddiriedolaeth Genedlaethol Cymru, Amgueddfa Cymru, Cadw, CBAC, a'r Ganolfan Dysgu Cymraeg Genedlaethol, mae TestunRhydd ar gael i unrhyw un mewn unrhyw sector yng Nghymru a'r tu hwnt. Mae TestunRhydd:
* yn dangos a yw eich data yn gadarnhaol a/neu'n negyddol (dadansoddi sentiment) ac mae modd delweddu'r canlyniadau a'u lawrlwytho.
* yn gadael i chi archwilio/delweddu geiriau, ymadroddion a themâu cyffredin yn eich data (mewn tablau, cymylau geiriau etc.).
* yn gadael i chi grynhoi data testun-rhydd, ac archwilio'r defnydd o eiriau a'u perthnasoedd.
Mae TestunRhydd ar gael fel cod agored gyda thrwydded Apache 2.0 (https://github.com/UCREL/FreeTxt-Flask), a thrwy ryngwyneb demo gwe lletyol yn: www.freetxt.app<http://www.freetxt.app/>. Mae’n ymgorffori offer cod agored eraill o’n prosiectau blaenorol fel CyTag (tagiwr rhannau ymadrodd Cymraeg), crynodebwr Cymraeg, a PyMUSAS (ar gyfer Cymraeg a Saesneg), gweler https://www.freetxt.app/about am fwy o fanylion.
Datblygwyd TestunRhydd fel rhan o brosiect ymchwil ar y cyd a ariannwyd gan yr AHRC 'TestunRhydd yn cefnogi dadansoddi data arolygon a holiaduron testun-rhydd dwyieithog' gyda chydweithwyr o Brifysgol Caerdydd a Phrifysgol Caerhirfryn (Rhif y Grant AH/W004844/1). Roedd y tîm yn cynnwys PY - Dawn Knight; CY - Paul Rayson, Mo El-Haj; Cydymeithion Ymchwil - Ignatius Ezeani, Nouran Khallaf a Steve Morris. Roedd Grŵp Ymgynghorol y Prosiect yn cynnwys cynrychiolwyr o: Ymddiriedolaeth Genedlaethol Cymru, Cadw, Amgueddfa Cymru, CBAC a'r Ganolfan Dysgu Cymraeg Genedlaethol
We’re excited to launch FreeTxt – a free bilingual online toolkit for analysing and visualising free-text data (from surveys, questionnaires etc.) in English and Welsh. FreeTxt draws on some of the corpus-based utilities and methodologies from CorCenCC and ACC (Welsh Automatic Text Summarisation), repackaging these to enable new audiences and user-groups to analyse their own feedback data. Co-designed in collaboration with National Trust Wales, Museum Wales, Cadw, WJEC, and National Centre for Learning Welsh, FreeTxt is accessible to anyone in any sector in Wales and beyond. FreeTxt:
* indicates if your data is positive and/or negative (sentiment analysis) and provides downloadable visualisations of results.
* allows you to explore/visualise common words, phrases and themes in your data (in tables, word clouds etc.).
* enables you to summarise free-text data, and examine word use and relationships.
FreeTxt is available open source with an Apache 2.0 licence (https://github.com/UCREL/FreeTxt-Flask), and via a hosted web demo interface at: www.freetxt.app<http://www.freetxt.app/>. It incorporates other open source tools from our previous projects such as CyTag (Welsh POS tagger), a Welsh summariser, and PyMUSAS (for English and Welsh), see https://www.freetxt.app/about for more details.
FreeTxt was developed as part of an AHRC funded collaborative 'FreeTxt supporting bilingual free-text survey and questionnaire data analysis' research project involving colleagues from Cardiff University and Lancaster University (Grant Number AH/W004844/1). The team included PI - Dawn Knight; CIs - Paul Rayson, Mo El-Haj; RAs - Ignatius Ezeani, Nouran Khallaf and Steve Morris. The Project Advisory Group included representatives from: National Trust Wales, Cadw, Museum Wales, CBAC | WJEC and National Centre for Learning Welsh.
--
Paul Rayson
Director of UCREL and Professor of Natural Language Processing
SCC Data Theme Lead
School of Computing and Communications, InfoLab21, Lancaster University, Lancaster, LA1 4WA, UK.
Web: https://www.research.lancs.ac.uk/portal/en/people/Paul-Rayson/
Tel: +44 1524 510357
Contact me on Teams<https://teams.microsoft.com/l/chat/0/0?users=p.rayson@lancaster.ac.uk>
CODI, 5th Workshop on Computational Approaches to Discourse
2024-03-21 - EACL 2024 - Malta
** Direct Submission deadline: January 22th, 2024 **
Direct submission: We now open submissions for papers rejected at another main conference.
The deadline has been updated to account for the delay in EACL notifications. Note that notifications will be sent on January 25 for direct submissions, and camera-ready will be due on January 30.
Website link: https://sites.google.com/view/codi2024
CODI considers for publication papers rejected at one of the main conferences, authors will have to submit both the paper and the reviews as a supplemantary pdf file. If modifications have been made since the original submission, please submit an additional file describing briefly the modifications made. The organizers will decide on the acceptance of the papers based on the quality of the paper and its fit with the workshop.
As a reminder, CODI also invites presentations of paper accepted at another main conference. They will be included in the workshop program and handbook, but will not appear in the workshop proceedings.
Please submit your workshop papers (category: "direct submission") at https://softconf.com/eacl2024/CODI-2024/
In this newsletter:
Renew your LDC membership today
New publications:
KASET - Kurmanji and Sorani Kurdish Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S01>
LORELEI Farsi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T01>
________________________________
Renew your LDC membership today
The importance of curated resources for language-related education, research, and technology development drives LDC's mission to create them, to accept data contributions from researchers across the globe, and to broadly share such resources through the LDC Catalog. LDC members enjoy no-cost access to new corpora released annually, as well as the ability to license legacy data sets from among our 950+ holdings at reduced fees. Ensure that your data needs continue to be met by renewing your LDC membership or by joining the Consortium today.
Now through March 1, 2024, 2023 members receive a 10% discount on 2024 membership, and new or returning organizations receive a 5% discount. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/communications/newsletter/january-2022-newsletter> for more details on membership options and benefits.
________________________________
New publications:
KASET - Kurmanji and Sorani Kurdish Speech and Transcripts<https://catalog.ldc.upenn.edu/LDC2024S01> consists of 147 hours of telephone conversations (289 recordings) and broadcast news (410 recordings) in two Kurdish dialects: Kurmanji Kurdish and Sorani Kurdish along with transcripts covering 60 hours of those recordings. Kurdish is spoken primarily in Turkey, Iran, Iraq, and Syria. Sorani and Kurmanji are the two widely spoken dialects of the Kurdish language.
The telephone speech was generated from calls by native Kurdish speakers in the United States to North American acquaintances in their social network. The broadcast news audio was collected from multiple streaming radio and television broadcast programs (narrowband and wideband audio), many of which contained a mix of Kurmanji and Sorani Kurdish. Native speaker auditors identified a 5-10 minute span from each broadcast recording for transcription. Full telephone recordings that passed the native speaker audit were transcribed. This release includes speaker information, such as gender, year of birth, and language.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Farsi Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2024T01> was developed by LDC and is comprised of approximately 250 million words of Farsi monolingual text, 120,000 Farsi words translated from English data, and 751,000 words of found Farsi-English parallel text. Approximately 75,000 words were annotated for named entities and up to 22,000 words were annotated for entity discovery and linking and situation frames (identifying entities, needs, and issues). Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2024 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
*The Third Ukrainian Natural Language Processing Workshop (UNLP 2024)*
<https://unlp.org.ua/>
UNLP 2024 features the first *Shared Task on Fine-Tuning Large Language
Models for Ukrainian*.
This Shared Task aims to challenge and assess LLMs' capabilities to
understand and generate Ukrainian, paving the way for LLM development in
Slavic languages.
*Task Description*
In this shared task, your goal is to instruction-tune a large language
model that can answer questions and perform tasks in Ukrainian. The model
should possess knowledge of Ukrainian history, language, and literature, as
well as common knowledge, and should be capable of generating fluent and
factually accurate responses.
The evaluation will be two-fold: accuracy of answers to multiple-choice
questions and human evaluation on a selection of text generation tasks.
You can find the instructions, sample data, and scripts at
https://github.com/unlp-workshop/unlp-2024-shared-task.
*Registration*
Teams that intend to participate should register by filling in this form
<https://forms.gle/MiC7pWsWbwBdSmoX9>.
*Publication*
Participants in the shared task are invited to submit a paper to the UNLP
2024 <https://unlp.org.ua/call-for-papers/> workshop. Submitting a
paper is *not
mandatory* for participating in the Shared Task. Papers must follow the
workshop submission instructions and will undergo regular peer review.
Their acceptance will not depend on the results obtained in the shared
task, but on the quality of the paper. Accepted papers will appear in the
ACL anthology and will be presented at a session of UNLP 2023 specially
dedicated to the Shared Task.
*Important Dates*
January 15, 2024 — Shared task announcement
February 15, 2024 — Release of test data to registered participants
February 22, 2024 — Registration deadline
February 23, 2024 — Submission of system responses
March 1, 2024 — Shared Task paper due
March 7, 2024 — Results of the Shared Task announced
March 29, 2024 — Notification of acceptance
TBD (mid-April) — Camera-ready Shared Task papers due
May 25, 2024 — Workshop date
*Contact*
Discord for the shared task: https://discord.gg/kCc6xgWbCJ
Email: info(a)unlp.org.ua
Website: https://unlp.org.ua/
Twitter: https://twitter.com/UNLP_workshop
Telegram: https://t.me/UNLP_workshop
Facebook: https://www.facebook.com/UNLPworkshop
TEICAI, Towards Ethical and Inclusive Conversational AI: Language Attitudes, Linguistic Diversity, and Language Rights at EACL 2024 on Malta-March 17-22, 2024.
Workshop website: https://sites.google.com/view/teicai2024
Submission link: https://softconf.com/eacl2024/TEICAI-2024/
Submission Deadline: 17 Januray 2024 (anywhere on earth)
Submissions are now being accepted for papers that were previously rejected at a major conference. Authors are required to submit their paper alongside the reviews it received, provided as a supplementary PDF file. In cases where the paper has undergone revisions since its original submission, authors should also include a separate file briefly outlining the changes made. The acceptance of these papers for the workshop will be determined by the organizers, based on the paper's quality and relevance to the workshop's theme.
We are also pleased to announce that our sponsor, e-COST ACTION Language in the Human-Machine Era (LITHME), is offering two to three travel grants for authors of selected accepted papers. More information about LITHME can be found at https://lithme.eu/.
Workshop Organizers:
Sviatlana Höhn, LuxAI, Luxembourg
Nina Hosseini-Kivanani, Faculty of Science, Technology and Medicine (FSTM), University of Luxembourg, Luxembourg
Dimitra Anastasiou, Luxembourg Institute of Science and Technology, Luxembourg
Angela Soltan, State University of Moldova, Moldova
Bettina Migge, University College Dublin, Ireland
Doris Dippold, University of Surrey, UK
Fred Philippy, Zortify, Luxembourg
Ekaterina Kamlovskaya, Translatables
Program Committee:
A list of program committee members is available on the workshop website.
For any preliminary questions, you're welcome to reach out to teicai2024(a)gmail.com .
You can follow us on LinkedIn (TEICAI) and Twitter (teicai2024) to get more updates about the workshop.
On behalf of the organizers
Nina Hosseini-Kivanani
University of Luxembourg
Apologies for cross-posting.
----------------------------------------
*The International Conference on Spoken Language Translation*
*21st IWSLT 2024 – **Second** Call for Participation*
*August 15-16, 2024 – Bangkok, Thailand*
*http://iwslt.org <http://iwslt.org/>*
The International Conference on Spoken Language Translation (IWSLT) is the
premier annual conference for all aspects of Spoken Language Translation.
Every year, the conference organizes and sponsors open evaluation campaigns
around key challenges in simultaneous and consecutive translation, under
real-time/low latency or offline conditions and under low-resource or
multilingual constraints. System descriptions and results from
participants’ systems and scientific papers related to key algorithmic
advances and best practices are presented.
IWSLT is the venue of the SIGSLTs, the Special Interest Group on Spoken
Language Translation of ACL, ISCA and ELRA. With a track record of 20
years, IWSLT benchmarks and proceedings serve as reference for all
researchers and practitioners working on speech translation and related
fields.
The 21st edition of IWSLT <https://iwslt.org/2024/> will be run as an
*ELRA/ACL* event and co-located with ACL 2024 <https://2024.aclweb.org/> on
August 15-16, 2024. It will be run as a hybrid event.
Important Dates
January 15, 2024: Release of shared task training and dev data
April 01-15, 2024: Evaluation period
April 29, 2024: Paper submission due (all papers)
June 4, 2024: Notification of acceptance
June 24, 2024: Camera-ready paper due
July 22, 2024: Pre-recorded video due
August 15-16, 2024: Conference
Evaluation
The IWSLT 2024 features shared tasks <https://iwslt.org/2024/#shared-tasks>
that address the following focus areas:
- Speech-to-speech track
- Simultaneous track
- Subtitling track
- Offline track
- Dubbing track
- Low-resource track
- Indic track
Training, development and test data for each shared task will be prepared
and released by the respective organizers (for further information on this
initiative, please refer to the website <https://iwslt.org/2024/>).
Participants will receive instructions about how to submit their runs. In
addition, participants have the opportunity to present their work
through a system
paper that will be published in the ACL Proceedings.
Conference
IWSLT also invites submissions of scientific papers to be published in the
ACL Proceedings and presented either in oral or poster format. The
conference selects high-quality, original contributions on theoretical and
practical issues of spoken language translation research, technologies and
applications. For further information on this initiative, please refer to
the website <https://iwslt.org/2024/#paper-submission>
Contact
Please send an email to iwslt-evaluation-campaign(a)googlegroups.com if you
have any questions related to the shared tasks.
Thanks,
Marine, Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
(IWSLT organisers)
The International Congress of Linguists (ICL) is organized once every five years as the meeting place for international linguistics, where all areas and sub-disciplines of linguistics as well as interdisciplinary topics can be discussed. Its 21st edition (https://icl2024poznan.pl/) will be held from 8 to 14 September 2024 in Poznań and now invites abstracts for Sections, Focus streams, and Workshops.
Call for Abstracts: Corpus Linguistics
Focus stream 8 invites abstracts of papers that examine the methods and applications of corpus linguistics. Topics may include the design and construction of corpora, the analysis and interpretation of corpus data, the use of corpus tools and software, and the implications of corpus findings for various linguistic domains and disciplines. The focus stream also explores the challenges and opportunities of corpus linguistics in the era of big data, artificial intelligence, and natural language processing.
Abstracts should clearly state the research question(s), approach, method, data, and (expected) results. They should not display the names of the presenters, nor their affiliations or addresses, or any other information that could reveal their authorship. They should contain the title, five keywords, and a text between 300 and 400 words (including examples, excluding references).
Each abstract will be reviewed anonymously by two reviewers (section/focus stream/workshop convenor + external reviewer).
Important dates
Feb 1, 2024: (Extended) submission deadline (12.00 PM CET). Submission link: https://easychair.org/conferences/?conf=icl2024poznan
Apr 15, 2024: Notification of acceptance.
Sep 11, 2024: Focus stream date
Presentations and posters
Authors may apply, upon abstract submission, for a presentation or a poster. Presentations will be organized in 30 minute slots (20 min. presentation, 7 min. discussion, 3 min. room change). Posters are always displayed during one full day. Separate time slots will be included in the program in which participants can discuss with the poster presenters.
Best regards –
Maciej Ogrodniczuk
Convenor of FS8: Corpus Linguistics at ICL 2024
The fifth workshop on Resources for African Indigenous Language (RAIL)
Colocated with LREC-COLING 2024
https://bit.ly/rail2024
Conference dates: 20-25 May 2024
Workshop date: 25 May 2024
Venue: Lingotto Conference Centre, Torino (Italy)
The fifth RAIL workshop website: https://bit.ly/rail2024
LREC-COLING 2024 website: https://lrec-coling-2024.org/
Submission website: https://softconf.com/lrec-coling2024/rail2024/
The fifth Resources for African Indigenous Languages (RAIL) workshop
will be co-located with LREC-COLING 2024 in Lingotto Conference Centre,
Torino, Italy on 25 May 2024. The RAIL workshop is an interdisciplinary
platform for researchers working on resources (data collections, tools,
etc.) specifically targeted towards African indigenous languages. In
particular, it aims to create the conditions for the emergence of a
scientific community of practice that focuses on data, as well as
computational linguistic tools specifically designed for or applied to
indigenous languages found in Africa.
Many African languages are under-resourced while only a few of them are
somewhat better resourced. These languages often share interesting
properties such as writing systems, or tone, making them different from
most high-resourced languages. From a computational perspective, these
languages lack enough corpora to undertake high level development of
Human Language Technologies (HLT) and Natural Language Processing (NLP)
tools, which in turn impedes the development of African languages in
these areas. During previous workshops, it has become clear that the
problems and solutions presented are not only applicable to African
languages but are also relevant to many other low-resource languages.
Because these languages share similar challenges, this workshop
provides researchers with opportunities to work collaboratively on
issues of language resource development and learn from each other.
The RAIL workshop has several aims. First, the workshop brings together
researchers who work on African indigenous languages, forming a
community of practice for people working on indigenous languages.
Second, the workshop aims to reveal currently unknown or unpublished
existing resources (corpora, NLP tools, and applications), resulting in
a better overview of the current state-of-the-art, and also allows for
discussions on novel, desired resources for future research in this
area. Third, it enhances sharing of knowledge on the development of
low-resource languages. Finally, it enables discussions on how to
improve the quality as well as availability of the resources.
The workshop has “Creating resources for less-resourced languages” as
its theme, but submissions on any topic related to properties of
African indigenous languages (including non-African languages) may be
accepted. Suggested topics include (but are not limited to) the
following:
* Digital representations of linguistic structures
* Descriptions of corpora or other data sets of African indigenous
languages
* Building resources for (under resourced) African indigenous languages
* Developing and using African indigenous languages in the digital age
* Effectiveness of digital technologies for the development of African
indigenous languages
* Revealing unknown or unpublished existing resources for African
indigenous languages
* Developing desired resources for African indigenous languages
* Improving quality, availability and accessibility of African
indigenous language resources
Submission requirements:
We invite papers on original, unpublished work related to the topics of
the workshop. Submissions, presenting completed work, may consist of up
to eight (8) pages of content plus additional pages of references. The
final camera-ready version of accepted long papers are allowed one
additional page of content (up to 9 pages) so that reviewers’ feedback
can be incorporated. Papers should be formatted according to the LREC-
COLING style sheet (https://lrec-coling-2024.org/authors-kit/), which
is provided on the LREC-COLING 2024 website
(https://lrec-coling-2024.org/). Reviewing is double-blind, so make
sure to anonymise your submission (e.g., do not provide author names,
affiliations, project names, etc.) Limit the amount of self citations
(anonymised citations should not be used). The RAIL workshop follows
the LREC-COLING submission requirements.
Please submit papers in PDF format to the START account
(https://softconf.com/lrec-coling2024/rail2024/). Accepted papers will
be published in proceedings linked to the LREC-COLING conference.
Important dates:
Submission deadline: 16 February 2024
Date of notification: 15 March 2024
Camera ready deadline: 29 March 2024
RAIL workshop: 25 May 2024
Organising Committee
Rooweither Mabuya, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Muzi Matfunjwa, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Mmasibidi Setaka, South African Centre for Digital Language Resources
(SADiLaR), South Africa
Menno van Zaanen, South African Centre for Digital Language Resources
(SADiLaR), South Africa
--
Prof Menno van Zaanen menno.vanzaanen(a)nwu.ac.za
Professor in Digital Humanities
South African Centre for Digital Language Resources
https://www.sadilar.org
________________________________
NWU PRIVACY STATEMENT:
http://www.nwu.ac.za/it/gov-man/disclaimer.html
DISCLAIMER: This e-mail message and attachments thereto are intended solely for the recipient(s) and may contain confidential and privileged information. Any unauthorised review, use, disclosure, or distribution is prohibited. If you have received the e-mail by mistake, please contact the sender or reply e-mail and delete the e-mail and its attachments (where appropriate) from your system.
________________________________