In this newsletter:
LDC at ICASSP 2023
New publications:
2019 NIST Speaker Recognition Evaluation Test Set - CTS Challenge<https://catalog.ldc.upenn.edu/LDC2023S03>
LORELEI Zulu Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2023T06>
________________________________
LDC at ICASSP 2023
LDC will be exhibiting at ICASSP 2023<https://2023.ieeeicassp.org/>, held this year June 4-10 in Rhodes, Greece. Stop by booth 15 to learn more about recent developments at the Consortium and the latest publications.
LDC will post conference updates via Twitter<https://twitter.com/LDCupenn> and Facebook<https://www.facebook.com/ldc.upenn>. We look forward to seeing you there!
________________________________
New publications:
2019 NIST Speaker Recognition Evaluation Test Set - CTS Challenge<https://catalog.ldc.upenn.edu/LDC2023S03>, developed by LDC and NIST, contains 635 hours of Tunisian Arabic telephone recordings for development and test, answer keys, enrollment, trial files, and documentation from the CTS Challenge portion of the NIST-sponsored 2019 Speaker Recognition Evaluation<https://www.nist.gov/itl/iad/mig/nist-2019-speaker-recognition-evaluation>. The 2019 evaluation was conducted in two parts: (1) a leaderboard-style challenge based on conversational telephone speech from LDC's Call My Net 2 (CMN2) corpus; and (2) a separate evaluation using audio-visual material collected by LDC for the VAST (Video Annotation for Speech Technology) project (released as LDC2023V01<https://catalog.ldc.upenn.edu/LDC2023V01>).
The telephone speech data for the CTS Challenge was drawn from the CMN2 collection conducted by LDC in Tunisia in which Tunisian Arabic speakers called friends or relatives who agreed to record their telephone conversations lasting between 8-10 minutes. The speech segments include PSTN (public switched telephone network) and VOIP (voice over IP) data.
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Zulu Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2023T06> is comprised of over 5 million words of Zulu monolingual text, 2.7 million words of found Zulu-English parallel text, and 71,000 Zulu words translated from English data. Approximately 100,000 words were annotated for named entities and over 23,000 words were annotated for entity discovery and linking and situation frames (identifying entities, needs, and issues). Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104
*Final Call For Papers: 16th International Natural Language Generation
Conference INLG 2023*
We invite the submission of long and short papers, as well as system
demonstrations, related to all aspects of Natural Language Generation
(NLG), including data-to-text, concept-to-text, text-to-text and
vision-to-text approaches. Accepted papers will be presented as oral
talks or posters.
The event is organized under the auspices of the Special Interest Group
on Natural Language Generation (SIGGEN)
(https://aclweb.org/aclwiki/SIGGEN) of the Association for Computational
Linguistics (ACL) (https://aclweb.org/). The event will be held from
11-15 September in Prague, Czech Republic. INLG 2023 will be (jointly)
colocated with SIGDial 2023.
*Important dates*
All deadlines are Anywhere on Earth (UTC-12)
- ***UPDATE: START system regular paper title & abstract submission
deadline: May 22, 2023
- ***UPDATE: START system full paper submission deadline May 29, 2023
- ARR commitment to INLG deadline via START system: June 15, 2023
- START system demo paper submission deadline: June 15, 2023
- Notification: July 11, 2023
- Camera ready: July 25, 2023
- Conference: 11-15 September 2023
*Topics*
INLG 2023 solicits papers on any topic related to NLG. General topics of
interest include, but are not limited to:
- Affect/emotion generation
- Analysis and detection of automatically generated text
- Bias and fairness in NLG systems
- Cognitive modelling of language production
- Computational efficiency of NLG models
- Content and text planning
- Corpora and resources for NLG
- Ethical considerations of NLG
- Evaluation and error analysis of NLG systems
- Explainability and Trustworthiness of NLG systems
- Generalizability of NLG systems
- Grounded language generation
- Large Language Models for NLG
- Lexicalisation
- Multimedia and multimodality in generation
- Natural language understanding techniques for NLG
- NLG and accessibility
- NLG in speech synthesis and spoken language models
- NLG in dialogue
- NLG for human-robot interaction
- NLG for low-resourced languages
- NLG for real-world applications
- Paraphrasing, summarization and translation
- Personalisation and variation in text
- Referring expression generation
- Storytelling and narrative generation
- Surface realisation
- System architectures
*Submissions & Format*
Three kinds of papers can be submitted:
- Long papers are most appropriate for presenting substantial research
results and must not exceed eight (8) pages of content, plus unlimited
pages of ethical considerations, supplementary material statements, and
references. The supplementary material statement provides detailed
descriptions to support the reproduction of the results presented in the
paper (see below for details). The final versions of long papers will be
given one additional page of content (up to 9 pages) so that reviewers'
comments can be taken into account.
- Short papers are more appropriate for presenting an ongoing research
effort and must not exceed four (4) pages, plus unlimited pages of
ethical considerations, supplementary material statements, and
references. The final versions of short papers will be given one
additional page of content (up to 5 pages) so that reviewers' comments
can be taken into account.
- Demo papers should be no more than two (2) pages, including
references, and should describe implemented systems relevant to the NLG
community. It also should include a link to a short screencast of the
working software. In addition, authors of demo papers must be willing to
present a demo of their system during INLG 2023.
Submissions should follow ACL Author Guidelines
(https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines)
and policies for submission, review and citation, and be anonymised for
double blind reviewing. Please use ACL 2023 style files; LaTeX style
files and Microsoft Word templates are available at
https://2023.aclweb.org/calls/style_and_formatting/.
Authors must honour the ethical code set out in the ACL Code of Ethics
(https://www.aclweb.org/portal/content/acl-code-ethics). If your work
raises any ethical issues, you should include an explicit discussion of
those issues. This will also be taken into account in the review
process. You may find the following checklist of use:
https://aclrollingreview.org/responsibleNLPresearch/
Authors are strongly encouraged to ensure that their work is
reproducible; see, e.g., the following reproducibility checklist
(https://2021.aclweb.org/calls/reproducibility-checklist/). Papers
involving any kind of experimental results (human judgments, system
outputs, etc) should incorporate a data availability statement into
their paper. Authors are asked to indicate whether the data is made
publicly available. If the data is not made available, authors should
provide a brief explanation why. (E.g. because the data contains
proprietary information.) A statement guide is available on the INLG
2023 website (https://inlg2023.github.io/).
To submit a long or short paper to INLG 2023, authors can either submit
directly or commit a paper previously reviewed by ARR via the same paper
submission site (https://softconf.com/n/inlg2023/). For direct
submissions, the deadline for submitting abstracts and titles of papers
is May 22, 2023, 11:59:59 AOE and the full paper submission deadline is
May 29, 2023, 11:59:59 AOE. If committing an ARR paper to INLG, the
submission is also made through the INLG 2023 paper submission site,
indicating the link of the paper on OpenReview. The deadline for
committing an ARR paper to INLG is June 15, 2023, 11:59:59 AOE, and the
last eligible ARR paper submission deadline for INLG 2023 is April 15,
2023. It is important to note that when committing an ARR paper to INLG,
it should be submitted through the INLG 2023 paper submission site, just
like a direct submission paper, with the only difference being the need
to provide the OpenReview link to the paper and to provide an optional
author response to reviews.
Demo papers should be submitted directly through the INLG 2023 paper
submission site (https://softconf.com/n/inlg2023/) by June 15, 2023,
11:59:59 AOE.
All accepted papers will be published in the INLG 2023 proceedings and
included in the ACL anthology. A paper accepted for presentation at INLG
2023 must not have been presented at any other meeting with publicly
available proceedings. Dual submission to other conferences is
permitted, provided that authors clearly indicate this in the submission
form. If the paper is accepted at both venues, the authors will need to
choose which venue to present at, since they can not present the same
paper twice.
*Awards*
INLG 2023 will present several awards to recognize outstanding
achievements in the field. These awards are:
- Best Long Paper Award: This award will be given to the best long paper
submission based on its originality, impact, and contribution to the
field of NLG.
- Best Short Paper Award: This award will be given to the best short
paper submission based on its originality, impact, and contribution to
the field of NLG.
- Best Demo Paper Award: This award will recognize the best demo paper
submitted to the conference. This award considers not only the paper's
quality but also the demonstration given at the conference. The
demonstration will play a significant role in the judging process.
- Best Evaluation Award: The award is a new addition to INLG 2023. This
award is designed to honour authors who have demonstrated the most
comprehensive and insightful analysis in evaluating their results. This
award aims to highlight papers where the authors have gone the extra
mile in providing a thorough and detailed analysis of their results,
offering a nuanced understanding of their findings.
***** apologies for multiple posting ****
*MISDOOM 2023 - *5th Multidisciplinary International Symposium on
Disinformation in Open Online Media
The Multidisciplinary International Symposium on Disinformation in Open
Online Media (MISDOOM) is returning for its 5th edition on 21 and 22
November 2023, hosted by the National Research Center for Mathematics and
Computer Science (CWI), Amsterdam, Netherlands.
MISDOOM values multidisciplinary research and is designed to be inclusive
of different academic disciplines and practices. The symposium provides a
platform for researchers, industry professionals, and practitioners from
various disciplines such as communication science, computer science,
computational social science, political science, psychology, journalism,
and media studies to come together and share their knowledge and insights
on online disinformation.
Symposium Topics
Participants can discuss and contribute to the following list of topics:
-
Cross-platform campaigns and their impact (e.g., diffusion of
disinformation and manipulation, observations of campaigns and strategies,
communication strategies, hate speech)
-
Approaches to studying misinformation (e.g., qualitative approaches,
case studies, quantitative approaches, experiments)
-
User involvement with misinformation on various platforms (e.g.,
engagement, viewership)
-
Counter-measures for mis- and disinformation and manipulation (e.g.,
censorship policies, behavioral changes, education, trainings, professional
codices, legal actions)
-
Factors contributing to misinformation beliefs or hampering corrections
of false beliefs (e.g., political polarization, motivated reasoning,
confirmation bias)
-
Trending topics in mis- and disinformation research
-
Automated fact-checking and misinformation detection
-
Models for misinformation diffusion
-
Human computation approaches for misinformation detection
(crowdsourcing, human-machine interaction)
-
Information quality (information quality dimensions, metrics, ethics of
information quality)
-
Generative AI tools and disinformation (e.g., ChatGPT, Midjourney,
DALL-E)
Industry
Industries are also invited to participate in the conference by submitting
a contribution describing their approach to countering or detecting
misinformation.
Submission Instructions
Given that we welcome both social scientists and computer scientists, and
that the publication strategies of these fields differ, we solicit two
types of contributions that, upon acceptance, result in the same
opportunity to present at MISDOOM:
Full papers
Full papers to be published with Springer LNCS proceedings. Up to 15 pages
(including references) in Springer Lecture Notes in Computer Science (LNCS)
format describing original unpublished and new research. The work should be
structured like a research paper, and cover the context of the problem
studied, the research question, approach/methodology, and results in 6 to
15 pages. It should be formatted according to the LNCS Word or LaTeX
template. Such submissions will be judged based on scientific quality and
relevance for the MISDOOM symposium.
Extended Abstracts
Authors can also choose to submit an Extended Abstract. The extended
abstract should not exceed 500 words, excluding references, and can pertain
to previously published work, ongoing projects, or new research ideas.
There is no particular format for the extended abstract, but it must
include the title, authors, their affiliation, the text of the abstract,
and references, particularly if it involves previously published work.
Submissions are not archival and are not formally published. Additionally,
authors must submit a conference program abstract of no more than 150
words. Authors should add the suffix "(Extended Abstract)" to the title of
their extended abstract submission.
Important Note about Submissions
Both contribution types (full papers and extended abstracts) must specify
the discipline they are contributing to as keyword(s) in Easychair at the
time of submission (they should enter at least one of the two keywords
“computer science” or “social science” in the keyword box).
Submission Link: https://easychair.org/conferences/?conf=misdoom2023
Important Dates
Submission Deadline: 30 June 2023
Notification: 28 August 2023
Camera ready: 11 September 2023
Symposium: 21-22 November 2023
--
Tommaso Caselli, Ph.D.
Senior Assistant Professor in Computational Semantics
Faculty of Arts, Rijksuniversiteit Groningen
The Netherlands
----------------------------
https://xs4all.academia.edu/TommasoCasellihttps://www.researchgate.net/profile/Tommaso_Caselli
Twitter: @tommaso_caselli
Dear all,
(Apologies for cross-posting)
I'm very pleased to announce that we (Renato Software Ltd. and Birmingham City University) are looking to hire a Corpus/Computational Linguist for an Innovate UK-funded Knowledge Transfer Partnership around online safeguarding for children in schools. Senso.cloud is a cloud-based classroom management tool that allows schools to monitor children's online behaviour and be aware of any potential safeguarding risks their activity might indicate. The role of the KTP Associate is to work with our Safeguarding and Development teams as well as academics at Birmingham City University to enhance this offering and help deliver even better protection to vulnerable students in school. Digital safeguarding saves lives.
If you have completed a Master's or a PhD in Corpus/Computational Linguistics and you're for a job in industry with real meaning and impact, we'd love to hear from you!
The closing date is Sunday, 11th June and interviews will take place on Monday, 19th June. If you have any questions, feel free to email Emma Franklin at e.franklin(a)senso.cloud.
Please share with anyone who might be interested!
https://jobs.bcu.ac.uk/vacancy.aspx?ref=052023-254
CALL FOR PAPERS
1st Symposium on Challenges for Natural Language Processing (CNLPS'23)
Warsaw, Poland, 17-20 September, 2023
https://fedcsis.org/sessions/aaia/cnlps
Organized within FedCSIS 2023 (IEEE: #57573)
Strict submission deadline: May 23, 2023, 23:59:59 AOE (no extensions)
KEY FACTS: Proceedings: submitted to IEEE Digital Library; indexing: DBLP, Scopus and Web of Science; 70 MEiN points
Please feel free to forward this announcement to your colleagues and associates who could be interested in it.
********************* Statement concerning LLMs *********************
Recognizing developing issue that affects all academic disciplines, we would like to state that, in principle, papers that include text generated from a large-scale language model (LLM) are prohibited, unless the produced text is used within the experimental part of the work.
*********************************************************************
Challenges for Natural Language Processing Symposium is a series of competitions oriented towards advancing human language technologies.
The goal of the symposium is to evaluate natural language processing tools in demanding, non-obvious tasks that address multimodal problems, cross-lingual learning and processing of natural languages that are not widely represented in other evaluation campaigns.
This year we invite all interested teams and individuals to participate in the following events:
+ PolEval Competition
https://fedcsis.org/sessions/aaia/cnlps/poleval
+ Center for Artificial Intelligence Challenge on Conversational AI Correctness
https://fedcsis.org/sessions/aaia/cnlps/caiccaic
+ Temporal Image Caption Retrieval Competition
https://fedcsis.org/sessions/aaia/cnlps/ticrc
More details about the competitions can be found in the linked subpages.
Apart from the competitions, we also welcome submissions to the General Session that includes the topics listed below:
* Corpora and Language Resources
* Machine Learning in NLP
* Speech Processing
* Language Modeling
* Language Generation
* Conversational AI
* Question Answering
* Sentiment and Emotion Detection
* Information Extraction
Papers submitted for the General Session must comply with all standard FedCSIS requirements.
For this session we only accept regular papers that describe new research contributions, present experiences encountered in practice or report on research topics worthy of immediate communication as explained on the page on paper categories.
Submission rules:
- Authors should submit their papers as Postscript, PDF or MSWord files.
- The total length of a paper should not exceed 10 pages IEEE style (including tables, figures and references). IEEE style templates are available here.
- Papers will be refereed and accepted on the basis of their scientific merit and relevance to the workshop.
- Preprints containing accepted papers will be published on a USB memory stick provided to the FedCSIS participants.
- Only papers presented at the conference will be published in Conference Proceedings and submitted for inclusion in the IEEE Xplore® database.
- Conference proceedings will be published in a volume with ISBN, ISSN and DOI numbers and posted at the conference WWW site.
- Conference proceedings will be submitted for indexation.
- Organizers reserve right to move accepted papers between FedCSIS technical sessions.
Important dates:
+ Paper submission (strict deadline): May 23, 2023, 23:59:59 (AoE; there will be no extension)
+ Position paper submission: June 7, 2023
+ Author notification: July 11, 2023
+ Final paper submission and registration: July 31, 2023
+ Payment (early fee deadline): July 26, 2023
CNLPS is organized in collaboration with (within the framework of) Multi-task, Multilingual, Multi-modal Language Generation COST Action CA18231; https://multi3generation.eu/
4 YEAR PHD POSITION: SPecializing NLP through reinforcement learning (Hybrid intelligence)
VU AMSTERDAM & Utrecht university
Deadline: 31 May 2023 23:59 CET
Do you have a Masters degree in Computational Linguistics or a related area? Are you interested in reinforcement learning and do you want to dive deeper in the behavior of current models and the kind of errors they make? Do you want to be part of a large consortium that addresses a variety of topics around hybrid intelligence? Do you want to work in a group that cares about core questions in NLP research and aims to provide a positive inspiring environment to young researchers?
Then please consider applying for our fully funded 4-year PhD position at the Vrije Universiteit in Amsterdam.
Please find out more & apply at:
https://workingat.vu.nl/ad/phd-hybrid-intelligence-specializing-nlp-models-…
(applications sent via email will not be processed)
--
prof. dr. Antske Fokkens
Computational Linguistics & Text Mining Lab, Vrije Universiteit Amsterdam
Algorithms, Geometry & Applications, Eindhoven University of Technology
We are delighted to announce the programme of the HealTAC 2023, the sixth annual meeting of the Healthcare text analytics community. In addition, the registration is now open!
Programme page: http://healtex.org/healtac-2023/programme/
The registration site is here<https://estore.manchester.ac.uk/conferences-and-events/faculty-of-science-e…> .
Dates:
* Early bird registration ends on: 1st June
* Pre-conference tutorials/workshops: 14 June
* Conference: 15 & 16 June
Keynotes:
The keynotes this year full naturally focus on the impact and promises of large healthcare language models. We will hear from two experts that are involved in large centres that work with clinical free-text data in the UK and the US.
Dr Angus Roberts, King's College London<https://www.kcl.ac.uk/people/angus-roberts>
From regular expressions to pre-trained language models – 14 years of applying NLP at the Maudsley Biomedical Research Centre
Abstract of the talk<http://healtex.org/healtac-2023/programme/>
Bio of the speaker<http://healtex.org/healtac-2023/programme/>
Dr Yonghui Wu, University of Florida<https://hobi.med.ufl.edu/profile/wu-yonghui/>
Opportunities and Challenges of Conversational Artificial Intelligence and Large Language Models in Healthcare
Abstract of the talk<http://healtex.org/healtac-2023/programme/>
Bio of the speaker<http://healtex.org/healtac-2023/programme/>
We are looking forward to seeing you all at HealTAC 2023.
The HealTAC 2023 organising committee
--------------------------------------------------------------
Dr. Beatrice Alex
Senior Lecturer and Chancellor’s Fellow
University of Edinburgh
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
[trying to get around with formatting issues
for the digest version of corpora list]
[sending again: many apologies for repetition]
Good morning,
We are pleased to announce the release of Albertina PT-*
This is the first large language model specifically for Portuguese,
covering both variants PT-PT and PT-BR, publicly available
and open source.
With its 900 million parameters in this first version,
its sets new state of the art for models specifically for Portuguese
that are publicly available and open.
It was developed at the University of Lisbon together
with colleagues from the University of Porto,
and can be obtained here:
https://huggingface.co/models?other=albertina-pt*
Its development is documented here:
https://arxiv.org/abs/2305.06721
Best regards,
On behalf of Albertina's team
***Apologies for Cross-Posting***
Call for Papers:
The first BNLP workshop aims to provide a forum for the NLP, speech and
multimodal communities to share and discuss their ongoing work with the
international community. We particularly focus on Bangla, which is a
low-resource language, and assess its current state-of-the-art and discuss
strategies to make further progress in NLP, speech and multimodal research.
Through this workshop, we plan to bring researchers together to come up
with frameworks and strategies that can later support other low-resource
languages. We encourage researchers to submit their papers focusing on
novel methodologies and resources that help towards the progress of Bangla
and other low resource languages. Novel methodologies include, but are not
limited to, zero-shot learning, unsupervised learning, and simple yet
effective methods applicable to low-computation scenarios.
We invite original research papers from a wide range of topics, including
but not limited to:
Natural Language Processing: Corpus and Resource Development, Language
Modeling, Stemmer, POS Tagger, Named Entity Recognition, Relation
Extraction, Spell and Grammar Checker, Question Answering, Semantics, Text
Summarization, Machine Translation, Sentiment Analysis.
Speech Processing: Speech Synthesis and Spoken Language Generation, Speech
Recognition, Phonetics, Phonology, and Prosody, Spoken Dialog and
Conversational System, Speaker and Language Detection.
Multimodality: OCR - Handwriting, Printed Document, Sign Language Detection.
Human Computer Interaction: Software for Disabled People, Multimodal HCI
for Bangla.
Important dates:
Workshop paper due: 1 September 2023
Notification of acceptance: 6 October 2023
Camera-ready papers due: 18 October 2023
Workshop dates: 6-7 December 2023
All deadlines are 11:59pm anywhere on Earth (AoE).
Submission Details:
Papers must describe original, completed or in-progress, and unpublished
work. All papers will be refereed through a double-blind peer review
process by multiple reviewers with final acceptance decisions made by the
workshop organizers. Accepted papers will be given up to 9 pages (for full
papers), 5 pages (for short papers and posters) in the workshop
proceedings, and will be presented as oral paper or poster.
We are seeking submissions under the following category
-
Full papers (8 pages)
-
Short papers (work in progress, innovative ideas/proposals: 4 pages)
-
Shared task paper (4 pages)
Both long and short papers must follow the EMNLP 2023 two-column format,
using the supplied official templates [1]. The templates can be downloaded
in style files and formatting. Please do not modify these style files, nor
should you use templates designed for other conferences. Submissions that
do not conform to the required styles, including paper size, margin width,
and font size restrictions, will be rejected without review. Verification
to guarantee conformance to publication standards, we will be using the ACL
pubcheck tool [2]. The PDFs of camera-ready papers must be run through this
tool prior to their final submission, and we recommend its use also at
submission time.
Submissions are open to all, and are to be submitted anonymously. For the
anonymity, double-blind submission and reproducibility criteria please
follow the EMNLP 2023 instructions [3].
If you have published in the field previously, and are interested in
helping out in the program committee to review papers, please fill up this
form <https://forms.gle/1WUYQjWT9UuqioX48> [4].
Submission portal: TBA
Workshop Organizers:
Firoj Alam, Qatar Computing Research Institute, HBKU, Qatar
Sudipta Kar, Amazon Alexa AI, USA
Shammur Absar Chowdhury, Qatar Computing Research Institute, HBKU, Qatar
Farig Sadeque, BRAC University, Bangladesh
Ruhul Amin, Fordham University, USA
Asif Shahriyar Sushmit, Rensselaer Polytechnic Institute, USA
[1] https://2023.emnlp.org/calls/style-and-formatting/
[2] https://github.com/acl-org/aclpubcheck
[3] https://2023.emnlp.org/calls/main_conference_papers/
[4] https://forms.gle/1WUYQjWT9UuqioX48
The Organizers