*** Apologies for cross-posting ***
FOIS 2023: 3rd Call for Papers - deadline extended
==============================
13th International Conference on Formal Ontology in Information Systems (FOIS 2023), July 17-20, 2023 (Sherbrooke, QC, Canada) and Sept 18-20, 2023 (Online)
http://fois2023.griis.ca <http://fois2023.griis.ca/>
We are happy to announce three exciting keynote speakers for FOIS 2023:
- Deborah McGuinness, Tetherless World Senior Constellation Chair and Professor of Computer and Cognitive Science, Rensselaer Polytechnic Institute, USA
- John Heil, Professor of Philosophy, Washington University, USA and Durham University, UK
- Michael Gruninger, Professor of Industrial Engineering, University of Toronto, Canada
More information about our keynotes speakers: https://fois2023.griis.ca/keynote-speakers/ <https://fois2023.griis.ca/keynote-speakers/>
New dates
==========
Abstract submission : January 31, 2023
Full paper submission : February 12, 2023
Definition and scope
====================
The FOIS conference is a meeting point for all researchers with an interest in formal ontology. Formal ontology is the systematic study of the types of entities and relations making up the domains of interest represented in modern information systems. The conference encourages submission of high quality, not previously published results on both theoretical issues and practical advancements. FOIS 2023 will have distinct tracks for foundational issues, ontology applications and methods, and domain ontologies.
FOIS aims to be a nexus of interdisciplinary research and communication for researchers from many domains engaging with formal ontology. Common application areas include conceptual modeling, database design, knowledge engineering and management, software engineering, organizational modeling, artificial intelligence, robotics, computational linguistics, the life sciences, bioinformatics and scientific research in general, geographic information science, information retrieval, library and information science, as well as the Semantic Web.
FOIS is the flagship conference of the International Association for Ontology and its Applications (IAOA: http://iaoa.org/ <http://iaoa.org/>), which is a non-profit organization promoting interdisciplinary research and international collaboration in formal ontology.
Important dates (NEW !)
===============
- Abstract submission deadline: January 31, 2023
- Paper submission deadline: February 12, 2023
- Author rebuttal period: March 24-31, 2023 (tentative)
- Notifications: April 10, 2023 (tentative)
- Camera-ready papers: May 1, 2023
- Onsite conference: July 17-20, 2023
- Virtual conference: September 18-20, 2023
The submission deadline for workshops will be after the notifications to allow authors to submit a revised version of rejected papers to any of the conference workshops if the paper topics are appropriate for this workshop.
Location
========
FOIS 2023 will consist of a physical meeting and a virtual meeting:
An in-person only meeting in Sherbrooke, Quebec from July 17 to 20, 2023 that will be very much like a traditional conference with keynotes, regular talks, workshops and tutorials and plenty of social and networking opportunities. This part will not have a remote participation option, but we plan on recording selected talks (e.g. keynotes). The main conference will be from July 17 to 19 and workshops and tutorials will be held mostly on July 20.
This will be followed by an online part to be held from September 18 to 20, 2023 that offers an opportunity for presentation and discussion of additional papers that were not presented at the physical meeting in Sherbrooke.
To plan for this two-part event, authors must at the time of submission indicate their preference and constraints for presenting either on site in Sherbrooke or virtually. Acceptance will be either for in-person presentation or for online presentation, at which time authors can no longer change the modality. Since the numbers of in-person and online presentations are limited, we encourage authors to be as flexible as possible to maximize your chance of paper acceptance. More details are provided in the Submission Instructions.
Submissions
===========
FOIS 2023 seeks three types of full-length (14 pages) high-quality papers on a wide range of topics:
Foundational papers address content-related ontological issues, their formal representation, and their relevance to some aspect of information systems.
Application and Methods papers address novel systems, methods, and tools related to building, evaluating, or using ontologies, emphasizing the impact of ontology contents.
Domain ontology papers describe a novel ontology for a specific realm of interest, clarifying ontological choices against requirements and foundational theory, and showing ontology use.
Please refer to the submission instructions for more details. As usual, the FOIS proceedings will be published by IOS Press.
The conference will also offer workshops and tutorials related to formal ontologies. See the separate call for workshops and tutorials for more information.
Topics of interest
==================
Areas of particular interest to FOIS include the following:
- Foundational Issues
- Kinds of entities: particulars/universals, continuants/occurrents, abstracta/concreta, dependent entities/independent entities, natural objects/artifacts, events/processes
- Formal relations: parthood, identity, connection, dependence, constitution, causality, subsumption, instantiation
- Vagueness and granularity
- Space, time, and change
- Methodological issues
- Top-level vs. domain-specific ontologies
- Role of reference ontologies
- Ontology similarity, integration, alignment, matching and entity reconciliation
- Ontology modularity, patterns, and contextuality
- Ontology evaluation, quality, reuse, adaptation, and evolution
- Ontology compliance with FAIR principles
- Formal comparison among ontologies
- Relationship between conceptual modeling and ontologies
- Relationship with cognition, language, semantics, and context
- Connections between knowledge graphs and ontologies
- Methodological issues in the applications of ontologies
- Social issues, such as trust or bias, with respect to ontologies
- Applications
- Technical applications of ontologies, such as
- Semantic Web
- Other areas of AI (Machine Learning, Explainable AI, Rules)
- Qualitative modeling
- Systems applications of ontologies, such as
- Ontology-driven information systems design
- Ontology-based data access
- Knowledge management
- Information retrieval
- Computational linguistics
- Metadata management
- Domain applications of ontologies, such as
- Ontologies for business modeling
- Ontologies for particular scientific disciplines (biology, chemistry, geography, physics, geoscience, cognitive sciences, linguistics, etc.)
- Ontologies for engineering: shape, form and function, artifacts, manufacturing, design, architecture, etc.
- Ontologies for the humanities: arts, cultural studies, history, literature, philosophy, etc.
- Ontologies for the social sciences: economics, law, political science, anthropology, archeology, etc.
- Ontologies for Open Science and dataset sharing
- Domain-specific ontologies
- Ontology of physical reality (matter, space, time, motion, etc.)
- Ontology of biological reality (organisms, genes, proteins, cells, etc.)
- Ontology of mental reality and agency (beliefs, intentions, emotions, perceptions, cognition, etc.)
- Ontology of artifacts, functions, capacities and roles
- Ontology of social reality (institutions, organizations, norms, social relationships, artistic expressions, etc.)
Conference Organization
=======================
General Chair:
Antony Galton, University of Exeter, UK
PC Chairs:
Nathalie Aussenac-Gilles, IRIT-CNRS Toulouse, France
Torsten Hahmann, University of Maine, USA
Local Organization Chair:
Jean-François Ethier, University of Sherbrooke, Canada
Online Chair:
Cassia Trojahn, IRIT Université Toulouse 2, France
Workshop and Tutorial Chairs:
Megan Katsumi, University of Toronto, Canada
Emilio Sanfilippo, ISTC-CNR, Trento, Italy
Early Career Chairs:
Antoine Zimmermann, École des Mines de Saint-Étienne (EMSE), France
Guendalina Righetti, Free University Bozen/Bolzano, Italy
Demo & Showcase Chairs:
Sergio de Cesare, University of Westminster, UK
Tiago Prince Sales, University of Twente, Netherlands
Publicity Chairs:
Lucia Gomez Alvarez, TU Dresden, Germany
Selja Seppälä, University College Cork, Ireland
Proceedings Chair:
Maria Hedblom, Jönköping University, Sweden
Program committee: https://fois2023.griis.ca/conference-organization/ <https://fois2023.griis.ca/conference-organization/>
Symposium: Corpus Approaches to Lexicogrammar (LxGr2022)
Call for Papers
Deadline for abstract submission: Friday 31 March 2023
The symposium will take place online on Friday 7 and Saturday 8 July 2023.
If you would like to present, send an abstract of 500 words (excluding references) to lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>. Make sure that the abstract clearly specifies the research focus (research questions or hypotheses), the corpus, the methodology (techniques and metrics), the theoretical orientation, and the main findings. Abstracts will be double-blind reviewed, and decisions will be communicated within four weeks.
Full papers will be allocated 35 minutes (including 10 minutes for discussion).
Work-in-progress reports will be allocated 20 minutes (including 5 minutes for discussion).
There will be no parallel sessions.
Participation is free.
The focus of LxGr is the interaction of lexis and grammar. The focus is influenced by Halliday's view of lexis and grammar as "complementary perspectives" (1991: 32), and his conception of the two as notional ends of a continuum (lexicogrammar), in that "if you interrogate the system grammatically you will get grammar-like answers and if you interrogate it lexically you get lexis-like answers" (1992: 64).
For more information and details of past symposia, visit the LxGr website: https://ehu.ac.uk/lxgr.
If you have any questions, contact lxgr(a)edgehill.ac.uk<mailto:lxgr@edgehill.ac.uk>.
________________________________
Edge Hill University<http://ehu.ac.uk/home/emailfooter>
Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter>
University of the Year, Educate North 2021/21
________________________________
This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>
Dear colleagues,
The CYENS Center of Excellence is offering multiple fully-funded (post-doc and junior researcher) positions for pure and applied research on:
- integrated learning and reasoning,
- cognitive computing,
- personal assistants,
- explainable and trustworthy AI,
- machine learning / learning theory,
- preference and policy elicitation,
- neural-symbolic integration,
- conversational AI,
- natural language understanding / generation,
- formal argumentation,
- knowledge-based systems.
Interested candidates should apply as soon as possible. Applications are reviewed on a rolling basis until positions are filled.
https://www.cyens.org.cy/en-gb/vacancies/job-listings/research-associates/r…
Regards,
Loizos
Dear colleagues,
The UC Santa Cruz Natural Language Processing (NLP) master's degree program
provides both depth and breadth in core algorithms and methods for NLP.
Taught intensively over 15-18 months, our program design combines
theoretical learning with hands-on practice to ensure our students have the
right skill set to prepare for a professional career in this fast-growing
field. We are accepting applications for Fall 2023 admission consideration,
and will be hosting a series of information sessions about the NLP MS
program. To review our information session schedule, visit
nlp.ucsc.edu/admissions.
Join us at our next virtual information session on January 31st at 12 PM PST
to learn more about studying NLP at UCSC and to meet with our current
students and faculty. Please complete the following registration form by 11
AM PST on January 31st to attend the session:
https://ucsc.zoom.us/meeting/register/tJwoc-uorD8jG9R-2UojWP2bPZLQc_oaiJAp
Applications for Fall 2023 admission consideration are now open. Apply by
March 1st, 2023: nlp.ucsc.edu/apply
If you have questions about the program or our upcoming information
session, please contact the NLP Support Team at nlp(a)ucsc.edu.
All the best,
The NLP Support Team
Natural Language Processing MS Program
Baskin Engineering
UCSC Silicon Valley Campus
4-year PhD position in Computational Linguistics
The CiTIUS Research Center on Intelligent Technologies of the University of
Santiago de Compostela (https://citius.gal) offers a 4-year PhD position in
Computational Linguistics. The selected candidate will work on the research
project LingUMT: Linguistically MotivatedStrategies for Unsupervised
Machine Translation.
LingUMT aims at exploring, defining, and implementing linguistically
motivated strategies for unsupervised machine translation. For this
purpose, we will use monolingual corpora focusing on concepts such as
contextualized meaning, distributional-based similarity, semantic
compositionality, and syntactic-semantic constructions.
Candidate profile
- Preferably a MA/MSc in Computational Linguistics, Artificial
Intelligence, Applied Linguistics, Computer Science, or a related
discipline.
- Knowledge of Linux/Unix command line tools.
- Familiarity with NLP tools (Spacy, UDPipe, Stanza, Transformers, etc.).
- Programming skills (e.g.,python).
- Fluency in English.
- Ability to carry out research and work in a team.
Terms and conditions
This is a 4-year position granted by the Spanish Ministry. Check the call
conditions in the following link:
https://www.aei.gob.es/en/announcements/announcements-finder/ayudas-contrat…
How to Apply
The application will be made through a public call ("Ayudas para contratos
predoctorales para la formación de doctores"), opening January 12 until
January 26. Visit the previous link (under Terms and conditions) for more
information.
AboutCiTIUS
CiTIUS is as research center specialized in Intelligent Technologies with a
team of more than 150 researchers. Our team is the greatest asset of our
center and we are continuously looking for talent. CiTIUS provides a
stimulating, interdisciplinary and cutting-edge scientific environment,
where our team can foster and develop their career in an international
environment. We are in a high-level scientific environment with four
research centers specialized in different areas (CiQUS,CiMUS, IGFAE and
CiTIUS). Santiago de Compostela is a UNESCO World Heritage City with more
than 5 million square meters of green spaces and one of the 100 best places
in the world by TIME Magazine 2021.
More information:citius.kmt@usc.es;marcos.garcia.gonzalez@usc.gal;
http://imaisd.usc.es/guiaconvocatorias.asp?i=gl&s=-2-26-31-56&id=4911&t=9&s…
Hello All,
The annual European Conference on Information Retrieval (ECIR) is the main
European forum for the presentation of new research results in the field of
Information Retrieval. In 2023, the forty fifth ECIR conference (ECIR'23)
will be held in Dublin, Ireland from the 2nd to the 6th of April 2023.
ECIR'23 will feature a high quality programme including long and short
papers, posters, demonstrations, workshops & tutorials and an Industry Day.
ECIR offers a variety of sponsorship opportunities suitable for
organisations of all sizes. Sponsors of ECIR gain visibility for their
companies and contribute to the success of the conference. Sponsorship will
be used to support student grants or enhanced delegate experiences.
For more information about the Levels of sponsorship available at ECIR 2023
and their benefits, please visit *http://ecir2023.org/calls/sponsorship.html
<http://ecir2023.org/calls/sponsorship.html>*
Potential sponsors should send an e-mail to the *sponsorship(a)ecir2023.org
<sponsorship(a)ecir2023.org>* including details of the person to contact,
telephone number, return e-mail address, level of sponsorship desired.
Best Regards,
Esraa Ali, Ph.D.
DCU
Publicity officer, ECIR 2023
--
*
*Séanadh Ríomhphoist/Email Disclaimer*
*Tá an ríomhphost seo agus aon
chomhad a sheoltar leis faoi rún agus is lena úsáid ag an seolaí agus sin
amháin é. Is féidir tuilleadh a léamh anseo.
<https://sites.google.com/view/seanadh-riomhphoist>*
*This e-mail and any
files transmitted with it are confidential and are intended solely for use
by the addressee. Read more here.
<https://sites.google.com/view/dcu-email-disclaimer>*
*
--
<https://www.facebook.com/DCU/> <https://twitter.com/DCU>
<https://www.linkedin.com/company/dublin-city-university>
<https://www.instagram.com/dublincityuniversity/?hl=en>
<https://www.youtube.com/user/DublinCityUniversity>
Language Technologies and Digital Humanities: Resources and Applications (LTаDH-RA)
CLaDA-BG 2023 Conference
https://clada-bg.eu/en/dissemination/events/international-clada-bg-conferen…
Sofia, Bulgaria
10-12 May 2023
CLaDA-BG is the Bulgarian national research infrastructure for resources and technologies for linguistic, cultural and historical heritage, integrated within CLARIN EU and DARIAH EU. Its mission is to provide access to the necessary resources and technologies that would support the research in Social Sciences and Humanities (SS&H). Modeling and linking of various types of knowledge and its contexts is crucial for the successful research in the interdisciplinary field of resources and technologies related to language, culture and history.
This is the second edition of the CLaDA-BG conference. It aims at bringing together NLP developers, linguists, digital humanitarians, scholars and all parties interested in knowledge modeling and linking data for research.
Topics of Interest
The topics include, but are not limited to, the following ones:
Problems in SS&H – research methods, technological support
Language technologies for sentiment analysis, semantic technologies, trust-worthiness of knowledge graphs, ethical challenges in digital SS&H
Knowledge Modeling and Elicitation for digital SS&H
Specific Language Resources and Technologies for historical texts, parliamentary records, speech and multimodal corpora, social media data
The role of digital libraries, archives and museums in digital SS&H research
Language Interface to Knowledge Graphs in SS&H
Knowledge-modeled and linked applications in SS&H
Best practices and new trends in Knowledge Modeling and Linking for language, culture and history
Invited Speakers
Alessandro Lenci, Università di Pisa, Italy
Erhard Hinrichs, Leibniz Institut für Deutsche Sprache Mannheim and Tübingen University, Germany
Milena Dobreva, Sofia University St Kliment Ohridski, Bulgaria
TBA
Important Dates
Submission deadline: 24.02.2023
Notification of acceptance: 3.04.2023
Final Submission: 3.05.2023
Conference: 10-12.05.2023
Submissions
We welcome oral presentations or posters (optionally with demo). There are two modes of submissions: Full papers (6 to 12 pages) or extended abstracts (3-5 pages, references excluded) in PDF format, in accordance with the Springer Computer Science Proceedings (https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu…).
Please submit your full paper or extended abstract in PDF to this EasyChair link: https://easychair.org/my/conference?conf=ltdhra2023
For contacting organizers please use the following email: ltadh-ra(a)bultreebank.org
The CLaDA-BG Organizer
-------------------------------------------
Call for Bids to Host ESSIR 2024
Deadline: 5 March 2023
Further details: https://www.essir.eu/
Contact: chair(a)essir.eu
-------------------------------------------
The Steering Committee of the European Summer School on Information Retrieval (ESSIR) invites interested parties to submit bids to host ESSIR in 2024.
## INTRODUCTION ##
The ESSIR initiative is a self-organized body, whose main mission is to promote research, innovation, and development of information access systems by educating junior and senior researchers, students, professionals, developers, and practitioners on the latest developments in the field, both methodological and technological.
The ESSIR event is a week-long event, organized over the summer, where renowned lecturers and students interact together in a number of ways, e.g. lectures, hands-on sessions, flipped classrooms, aimed at the most effective teaching and learning of both basic and advanced topics on information access at large.
By targeting information access at large, ESSIR places itself at the crossroad of several neighbors disciplines, namely
+ Information Retrieval (IR)
+ Recommender Systems (RecSys)
+ Natural Language Processing (NLP)
+ Machine Learning (ML)
+ Artificial Intelligence (AI)
+ Data Science (DS)
ESSIR gives participants a grounding in the core subjects such as architectures; algorithms; formal theoretical models; evaluation theory and practice, as well as a coverage of recent topics and trends in the field, such as fairness, conversational search, and more.
ESSIR is aimed at: advanced undergraduate students; PhD students; post-doctoral researchers; academic and industrial researchers; developers.
Traditionally, ESSIR is co-located with accompanying events (such as the Symposium on Future Directions in Information Access, FDIA) that give the participants an excellent opportunity for focused discussions on recent emerging topics in Information Retrieval.
Further details on ESSIR can be found at: https://www.essir.eu/
## PROPOSAL SUBMISSION AND SELECTION ##
Parties interested in hosting ESSIR 2024 are invited to submit proposals, in PDF format, by email to the ESSIR Steering Committee chair Nicola Ferro at
chair(a)essir.eu
by
** 5 MARCH 2023 **
Proposals will be evaluated by the ESSIR Steering Committee. Evaluation will take into account:
+ venue and timing: attractiveness of the location, hosting facilities, transportation options, accommodation, social program options, targeted event week, key dates, avoidance of timing conflicts with other relevant IR events and large-scale local public events.
+ scientific program: foundational topics, special lectures, accompanying events (podium discussions, poster sessions, etc.), strategy for acquiring and organizational support of high-quality IR lecturers. Proposals should also take into consideration the scheduling of relevant co-located events (symposiums, workshops) like FDIA.
+ support for student participants: grants, special conditions for participation and/or accommodation, opportunity to collect ECTS credit points, networking events, opportunities for personal dialogue with ESSIR lecturers.
+ financial viability: initial draft of the financial plan including major fixed and variable costs, budget cut-off points, strategy of sponsoring acquisition.
+ plans for organisation: local organizer consortium and its expertise in event/hosting management, key roles and initial responsibility assignments. Analysis of major risks (such as appropriate number of participants, commitment of key lecturers, sufficient amount of sponsoring) and reasonable fallback options.
+ dissemination and publicity: plans for reaching the target audience through mailing lists, direct contacts to research groups, scientific social networks, Web 2.0 channels, Web presence. Opportunities to share ESSIR materials within research community: slides, video lectures, scripts, post-proceedings, etc.
The ESSIR charter and a bid template is available at:
https://www.essir.eu/assets/charter/essir-sc-charter.pdf
## INQUIRIES AND FURTHER INFORMATION ##
For any inquiries or if any additional information is needed, please write to chair(a)essir.eu
[Apologies for multiple postings]
ACM Transactions on Multimedia Computing, Communications, and Applications
Special Issue on Realistic Synthetic Data: Generation, Learning, Evaluation
Impact Factor 4.094
https://mc.manuscriptcentral.com/tomm
Submission deadline: 31 March 2023
*** CALL FOR PAPERS ***
[Guest Editors]
Bogdan Ionescu, Universitatea Politehnica din Bucuresti, România
Ioannis Patras, Queen Mary University of London, UK
Henning Muller, University of Applied Sciences Western Switzerland, Switzerland
Alberto Del Bimbo, Università degli Studi di Firenze, Italy
[Scope]
In the current context of Machine Learning (ML) and Deep Learning
(DL), data and especially high-quality data are central for ensuring
proper training of the networks. It is well known that DL models
require an important quantity of annotated data to be able to reach
their full potential. Annotating content for models is traditionally
made by human experts or at least by typical users, e.g., via
crowdsourcing. This is a tedious task that is time consuming and
expensive -- massive resources are required, content has to be curated
and so on. Moreover, there are specific domains where data
confidentiality makes this process even more challenging, e.g., in the
medical domain where patient data cannot be made publicly available,
easily.
With the advancement of neural generative models such as Generative
Adversarial Networks (GAN), or, recently diffusion models, a promising
way of solving or alleviating such problems that are associated with
the need for domain specific annotated data is to go toward realistic
synthetic data generation. These data are generated by learning
specific characteristics of different classes of target data. The
advantage is that these networks would allow for infinite variations
within those classes while producing realistic outcomes, typically
hard to distinguish from the real data. These data have no proprietary
or confidentiality restrictions and seem a viable solution to generate
new datasets or augment existing ones. Existing results show very
promising results for signal generation, images etc.
Nevertheless, there are some limitations that need to be overcome so
as to advance the field. For instance, how can one control/manipulate
the latent codes of GANs, or the diffusion process, so as to produce
in the output the desired classes and the desired variations like real
data? In many cases, results are not of high quality and selection
should be made by the user, which is like manual annotation. Bias may
intervene in the generation process due to the bias in the input
dataset. Are the networks trustworthy? Is the generated content
violating data privacy? In some cases one can predict based on a
generated image the actual data source used for training the network.
Would it be possible to train the networks to produce new classes and
learn causality of the data? How do we objectively assess the quality
of the generated data? These are just a few open research questions.
[Topics]
In this context, the special issue is seeking innovative algorithms
and approaches addressing the following topics (but is not limited
to):
- Synthetic data for various modalities, e.g., signals, images,
volumes, audio, etc.
- Controllable generation for learning from synthetic data.
- Transfer learning and generalization of models.
- Causality in data generation.
- Addressing bias, limitations, and trustworthiness in data generation.
- Evaluation measures/protocols and benchmarks to assess quality of
synthetic content.
- Open synthetic datasets and software tools.
- Ethical aspects of synthetic data.
[Important Dates]
- Submission deadline: 31 March 2023
- First-round review decisions: 30 June 2023
- Deadline for revised submissions: 31 July 2023
- Notification of final decisions: 30 September 2023
- Tentative publication: December 2023
[Submission Information]
Prospective authors are invited to submit their manuscripts
electronically through the ACM TOMM online submission system (see
https://mc.manuscriptcentral.com/tomm) while adhering strictly to the
journal guidelines (see https://tomm.acm.org/authors.cfm). For the
article type, please select the Special Issue denoted SI: Realistic
Synthetic Data: Generation, Learning, Evaluation.
Submitted manuscripts should not have been published previously, nor
be under consideration for publication elsewhere. If the submission is
an extended work of a previously published conference paper, please
include the original work and a cover letter describing the new
content and results that were added. According to ACM TOMM publication
policy, previously published conference papers can be eligible for
publication provided that at least 40% new material is included in the
journal version.
[Contact]
For questions and further information, please contact Bogdan Ionescu /
bogdan.ionescu(a)upb.ro.
[Acknowledgement]
The Special Issue is endorsed by the AI4Media "A Centre of Excellence
delivering next generation AI Research and Training at the service of
Media, Society and Democracy" H2020 ICT-48-2020 project
https://www.ai4media.eu/.
On behalf of the Guest Editors,
Bogdan Ionescu
https://www.aimultimedialab.ro/
In this newsletter:
Renew your LDC membership today
30th Anniversary Highlight: CSR
New publications:
AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts<https://catalog.ldc.upenn.edu/LDC2023S01>
LORELEI Swahili Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2023T01>
________________________________
Renew your LDC membership today
The importance of curated resources for language-related education, research, and technology development drives LDC's mission to create them, to accept data contributions from researchers across the globe, and to broadly share such resources through the LDC Catalog. LDC members enjoy no-cost access to new corpora released annually, as well as the ability to license legacy data sets from among our 925+ holdings at reduced fees. Ensure that your data needs continue to be met by renewing your LDC membership or by joining the Consortium today.
Now through March 1, 2023, 2022 members receive a 10% discount on 2023 membership, and new or returning organizations receive a 5% discount. Membership remains the most economical way to access current and past LDC releases. Consult Join LDC<https://www.ldc.upenn.edu/members/join-ldc> for more details on membership options and benefits.
30th Anniversary Highlight: CSR
The CSR (continuous speech recognition) corpus series was developed in the early 1990s under DARPA's Spoken Language Program to support research on large-vocabulary CSR systems.
CSR-I (WSJ0) Complete (LDC93S6A)<https://catalog.ldc.upenn.edu/LDC93S6A> and CSR-II (WSJ1) Complete (LDC94S13A)<https://catalog.ldc.upenn.edu/LDC94S13A> contain speech from a machine-readable corpus of Wall Street Journal news text. They also include spontaneous dictation by journalists of hypothetical news articles as well as transcripts.
The text in CSR-I (WSJ0) was selected to fall within either a 5,000-word subset or a 20,000-word subset. Audio includes speaker-dependent and speaker-independent sections as well as sentences with verbalized and nonverbalized punctuation. (Doddington, 1992<https://aclanthology.org/H92-1074.pdf>). CSR-II features "Hub and Spoke" test sets that include a 5,000-word subset and a 64,000-word subset. Both data sets were collected using two microphones: a close-talking Sennheiser HMD414 and a second microphone of varying type.
WSJ0 Cambridge Read News (LDC95S24)<https://catalog.ldc.upenn.edu/LDC95S24> was developed by Cambridge University and consists of native British English speakers reading CSR WSJ news text, specifically, sentences from the 5,000-word and 64,000-word subsets. All speakers also recorded a common set of 18 adaptation sentences.
The CSR corpora continue to have value for the research community. CSR-I (WSJ0) target utterances were used in the CHiME2 and CHiME3 challenges which focused on distant-microphone automatic speech recognition in real-world environments. CHiME2 WSJ0 (LDC2017S10)<https://catalog.ldc.upenn.edu/LDC2017S10> and CHiME2 Grid (LDC2017S07)<https://catalog.ldc.upenn.edu/LDC2017S07> each contain over 120 hours of English speech from a noisy living room environment. CHiME3 (LDC2017S24)<https://catalog.ldc.upenn.edu/LDC2017S24> consists of 342 hours of English speech and transcripts from noisy environments and 50 hours of noisy environment audio.
CSR-I target utterances were also used in the Distant-Speech Interaction for Robust Home Applications (DIRHA) Project which addressed natural spontaneous speech interaction with distant microphones in a domestic environment. DIRHA English WSJ Audio (LDC2018S01)<https://catalog.ldc.upenn.edu/LDC2018S01> is comprised of approximately 85 hours of real and simulated read speech from native American English speakers in an apartment setting with typical domestic background noises and inter/intra-room reverberation effects.
Multi-Channel WSJ Audio (LDC2014S03)<https://catalog.ldc.upenn.edu/LDC2014S03>, designed to address the challenges of speech recognition in meetings, contains 100 hours of audio from British English speakers reading sentences from WSJ0 Cambridge Read News. There were three recording scenarios: a single stationary speaker, two stationary overlapping speakers, and one single moving speaker.
All CSR corpora and their related data sets are available for licensing by Consortium members and non-members. Visit Obtaining Data<https://www.ldc.upenn.edu/language-resources/data/obtaining> for more information.
________________________________
New publications:
AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts<https://catalog.ldc.upenn.edu/LDC2023S01> and is comprised of approximately 156 hours of Ukrainian conversational telephone speech and broadcast news audio with 1.2 million words of corresponding orthographic transcripts.
The news audio data was taken from 87 recordings broadcast by various Ukrainian sources. The telephone speech was generated from telephone calls by native Ukrainian speakers to acquaintances in their social network. Native Ukrainian speakers manually segmented the data into sentence-level units as part of the transcription process.
The broadcast recordings and transcripts were produced by LDC to support the DARPA AIDA (Active Interpretation of Disparate Alternatives) program which aimed to develop a multi-hypothesis semantic engine to generate explicit alternative interpretations of events, situations, and trends from a variety of unstructured sources. The telephone speech audio recordings were collected by LDC to support the NIST 2011 Language Recognition Evaluation <https://www.nist.gov/itl/iad/mig/2011-language-recognition-evaluation> and are also contained in Multi-Language Conversational Telephone Speech 2011 - Slavic Group LDC2016S11<https://catalog.ldc.upenn.edu/LDC2016S11>.
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
*
LORELEI Swahili Representative Language Pack<https://catalog.ldc.upenn.edu/LDC2023T01> was developed by LDC and is comprised of approximately 4.3 million words of Swahili monolingual text, 90,000 Swahili words translated from English data, and 545,000 words of found Swahili-English parallel text. Approximately 100,000 words were annotated for named entities and up to 26,000 words were annotated for entity discovery and linking and situation frames (identifying entities, needs and issues). Data was collected from discussion forum, news, reference, social network, and weblogs.
The LORELEI (Low Resource Languages for Emergent Incidents) program was concerned with building human language technology for low resource languages in the context of emergent situations. Representative languages were selected to provide broad typological coverage.
The knowledge base for entity linking annotation is available separately as LORELEI Entity Detection and Linking Knowledge Base (LDC2020T10)<https://catalog.ldc.upenn.edu/LDC2020T10>.
2023 members can access this corpus through their LDC accounts. Non-members may license this data for a fee.
To unsubscribe from this newsletter, log in to your LDC account<https://catalog.ldc.upenn.edu/login> and uncheck the box next to "Receive Newsletter" under Account Options or contact LDC for assistance.
Membership Coordinator
Linguistic Data Consortium<ldc.upenn.edu>
University of Pennsylvania
T: +1-215-573-1275
E: ldc(a)ldc.upenn.edu<mailto:ldc@ldc.upenn.edu>
M: 3600 Market St. Suite 810
Philadelphia, PA 19104