[trying to get around with formatting issues
for the digest version of corpora list]
[sending again: many apologies for repetition]
Good morning,
We are pleased to announce the release of Albertina PT-*
This is the first large language model specifically for Portuguese,
covering both variants PT-PT and PT-BR, publicly available
and open source.
With its 900 million parameters in this first version,
its sets new state of the art for models specifically for Portuguese
that are publicly available and open.
It was developed at the University of Lisbon together
with colleagues from the University of Porto,
and can be obtained here:
https://huggingface.co/models?other=albertina-pt*
Its development is documented here:
https://arxiv.org/abs/2305.06721
Best regards,
On behalf of Albertina's team
***Apologies for Cross-Posting***
Call for Papers:
The first BNLP workshop aims to provide a forum for the NLP, speech and
multimodal communities to share and discuss their ongoing work with the
international community. We particularly focus on Bangla, which is a
low-resource language, and assess its current state-of-the-art and discuss
strategies to make further progress in NLP, speech and multimodal research.
Through this workshop, we plan to bring researchers together to come up
with frameworks and strategies that can later support other low-resource
languages. We encourage researchers to submit their papers focusing on
novel methodologies and resources that help towards the progress of Bangla
and other low resource languages. Novel methodologies include, but are not
limited to, zero-shot learning, unsupervised learning, and simple yet
effective methods applicable to low-computation scenarios.
We invite original research papers from a wide range of topics, including
but not limited to:
Natural Language Processing: Corpus and Resource Development, Language
Modeling, Stemmer, POS Tagger, Named Entity Recognition, Relation
Extraction, Spell and Grammar Checker, Question Answering, Semantics, Text
Summarization, Machine Translation, Sentiment Analysis.
Speech Processing: Speech Synthesis and Spoken Language Generation, Speech
Recognition, Phonetics, Phonology, and Prosody, Spoken Dialog and
Conversational System, Speaker and Language Detection.
Multimodality: OCR - Handwriting, Printed Document, Sign Language Detection.
Human Computer Interaction: Software for Disabled People, Multimodal HCI
for Bangla.
Important dates:
Workshop paper due: 1 September 2023
Notification of acceptance: 6 October 2023
Camera-ready papers due: 18 October 2023
Workshop dates: 6-7 December 2023
All deadlines are 11:59pm anywhere on Earth (AoE).
Submission Details:
Papers must describe original, completed or in-progress, and unpublished
work. All papers will be refereed through a double-blind peer review
process by multiple reviewers with final acceptance decisions made by the
workshop organizers. Accepted papers will be given up to 9 pages (for full
papers), 5 pages (for short papers and posters) in the workshop
proceedings, and will be presented as oral paper or poster.
We are seeking submissions under the following category
-
Full papers (8 pages)
-
Short papers (work in progress, innovative ideas/proposals: 4 pages)
-
Shared task paper (4 pages)
Both long and short papers must follow the EMNLP 2023 two-column format,
using the supplied official templates [1]. The templates can be downloaded
in style files and formatting. Please do not modify these style files, nor
should you use templates designed for other conferences. Submissions that
do not conform to the required styles, including paper size, margin width,
and font size restrictions, will be rejected without review. Verification
to guarantee conformance to publication standards, we will be using the ACL
pubcheck tool [2]. The PDFs of camera-ready papers must be run through this
tool prior to their final submission, and we recommend its use also at
submission time.
Submissions are open to all, and are to be submitted anonymously. For the
anonymity, double-blind submission and reproducibility criteria please
follow the EMNLP 2023 instructions [3].
If you have published in the field previously, and are interested in
helping out in the program committee to review papers, please fill up this
form <https://forms.gle/1WUYQjWT9UuqioX48> [4].
Submission portal: TBA
Workshop Organizers:
Firoj Alam, Qatar Computing Research Institute, HBKU, Qatar
Sudipta Kar, Amazon Alexa AI, USA
Shammur Absar Chowdhury, Qatar Computing Research Institute, HBKU, Qatar
Farig Sadeque, BRAC University, Bangladesh
Ruhul Amin, Fordham University, USA
Asif Shahriyar Sushmit, Rensselaer Polytechnic Institute, USA
[1] https://2023.emnlp.org/calls/style-and-formatting/
[2] https://github.com/acl-org/aclpubcheck
[3] https://2023.emnlp.org/calls/main_conference_papers/
[4] https://forms.gle/1WUYQjWT9UuqioX48
The Organizers
[Apologies for cross posting]
==================================================
MULTI-Fake-DetectiVE @ EVALITA 2023
Final call for Participation
Task website, news and updates: https://sites.google.com/unipi.it/multi-fake-detective
Contacts: multifakedetective [at] gmail [dot] com
==================================================
We invite you to participate in the FIRST shared task on Multimodal Fake News Detection in Italian (MULTI-Fake-DetectiVE).
The shared task is aimed at broadening the horizon of research on disinformation in Italian by addressing both the textual and visual modalities.
MULTI-Fake-DetectiVE proposes two subtasks. Participants are allowed to participate in either task or both of them.
The evaluation window for both subtask is open until May 19! Test data available on the website.
---------
Tasks
---------
Task 1. Multimodal Fake News Detection
The task is structured as a multi-class classification problem in a multimodal setting.
Given a piece of content c = ⟨ t, v ⟩ including a textual component t and a visual component v (i.e., an image), classify it as being one of the following labels on a scale: Certainly Fake, Probably Fake, Probably Real, Certainly Real.
Task 2. Cross-modal relations in Fake and Real News
The task is aimed at assessing how the two modalities (i.e., textual and visual) relate to each other in the context of fake and real news.
The task is formulated as a three-class classification problem.
Given a piece of content c = ⟨ t, v ⟩ which includes a textual component t and a visual component v, decide whether their combination is misleading, not misleading in the interpretation of the information provided by either component, or the two are unrelated.
-------
Data
-------
The training and test datasets are available here: https://sites.google.com/unipi.it/multi-fake-detective/data
-----------------------
Important Dates
-----------------------
12th - 19th May 2023: evaluation window (started)
30th May 2023: assessment returned to participants
14th June 2023: final reports due to task organizers
10th July 2023: review deadline
25th July 2023: camera ready deadline
7th - 8th September 2023: final workshop in Parma
-----------------------
Task Organizers
-----------------------
Alessandro Bondielli, Department of Computer Science, University of Pisa
Pietro Dell'Oglio, Department of Information Engineering, University of Florence
Alessandro Lenci, Department of Philology, Literature, and Linguistics, University of Pisa
Francesco Marcelloni, Department of Information Engineering, University of Pisa
Lucia Passaro, Department of Computer Science, University of Pisa
Marco Sabbatini, Department of Philology, Literature, and Linguistics, University of Pisa
* The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37) *
* December 2-4, 2023 (deadline paper submission: July 16, 2023 AoE) *
* The Hong Kong Polytechnic University, Hong Kong (China) *
Conference website: https://paclic2023.github.io/
CONFERENCE OBJECTIVES
Following the long tradition of PACLIC conferences, PACLIC 37 emphasizes the synergy of theoretical frameworks and processing of natural language, providing a forum for researchers from different fields to share and discuss progress in scientific studies, development and application of the topics related to the study of languages.
TOPICS
Topics include but are not limited to:
- Language Studies
o Clinical linguistics and language disorders
o Corpus linguistics o Discourse Analysis
o Language Acquisition
o Language and Social Media
o Language Learning
o Language, Mind and Culture
o Linguistic Theories
o Morphology
o Multilingualism
o Phonology
o Pragmatics
o Semantics
o Sociolinguistics
o Spoken language processing
o Syntax
o Typology
- Information Processing and Computational Applications
o Cognitive modeling and psycholinguistics
o Dialogue systems
o Digital Humanities
o Ethics in Natural Language Processing
o Information retrieval/extraction
o Interpretability of Natural Language Processing systems
o Language models o Language resources
o Linguistic diversity
o Machine Learning and Natural Language Processing
o Machine Translation o Multimodality
o Natural Language Generation
o Natural Language Processing applications
o Sentiment Analysis
o Summarization
o Word segmentation
PAPER SUBMISSION
Papers may consist of up to eight (8) pages of content, plus references and appendices. Submissions will be judged based on relevance, technical strength, significance and opportunities, and interest to the attendees. As the reviewing will be double-blind, authors must not indicate their names and affiliations while submitting their papers. Papers must be submitted through the Easy Chair Conference System: https://easychair.org/cfp/PACLIC37. Accepted papers will be presented orally or as posters as determined by the PACLIC 37 program committee. Papers in the proceedings of PACLIC have been indexed in Scopus since PACLIC 19 (2005). They are also listed in the ACL Anthology. Double submissions with other conferences/workshops are allowed, but the authors are asked to declare it at submission time.
SUBMISSION FORMAT
The conference will only accept papers formatted according to the standard ACL templates (downloadable at: https://2023.aclweb.org/calls/style_and_formatting/).
IMPORTANT DATES
* Deadline of paper submission: July 16, 2023 (AoE) *
Notification: September 10, 2023
Camera-ready: October 1, 2023
Early bird registration deadline: October 1, 2023
Conference: December 2-4, 2023
CONFERENCE ORGANIZERS (To be completed)
Chu-Ren Huang (The Hong Kong Polytechnic University)
Yasunari Harada (Waseda University)
Jong-Bok Kim (Kyung Hee University)
Emmanuele Chersoni (The Hong Kong Polytechnic University)
Sophia Yat Mei Lee (The Hong Kong Polytechnic University)
Sarah Chen (The Hong Kong Polytechnic University)
Yu-Yin Hsu (The Hong Kong Polytechnic University)
Pranav A (Dayta AI)
Winnie Zeng (The Hong Kong Polytechnic University)
Bo Peng (The Hong Kong Polytechnic University)
Yuxi Li (The Hong Kong Polytechnic University)
Junlin Li (The Hong Kong Polytechnic University)
CONTACT
paclic37(a)gmail.com
*** Apologies for cross-posting ***
Artificial Intelligence Research in Applied Linguistics (AIRiAL)
Conference at Teachers College, Columbia University
Theme
The Future of Artificial Intelligence in Applied Linguistics
Location
Teachers College, Columbia University
Dates
September 29-30, 2023
CALL FOR PROPOSALS
The AL & TESOL Language and Technology Research Group
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home>
in the Applied Linguistics & TESOL program at Teachers College will host
the Conference on Artificial Intelligence Research in Applied Linguistics
(AIRiAL)
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/event…>.
This conference is a forum for scholarly discussions on Artificial
Intelligence research in Applied Linguistics (e.g., Natural Language
Processing, Speech Technologies, Computer Vision, and Biometrics). Applied
Linguistics is a broad field including scholarship about language analysis
and how language is learned in order to achieve some purpose or solve some
problem in the real world. It includes areas such as language acquisition,
language assessment, language use, language & technology and other related
sub-fields.
We welcome abstracts exploring the relationship between Applied Linguistics
and Artificial Intelligence that align with our conference theme. Research
areas include, but are not limited to:
-
Affective computing
-
Automated scoring
-
Conversational AI
-
Intelligent tutoring
-
Immersive technologies
-
Language models
-
Multilingual ASR
-
AI literacy education
-
AI policy decisions
-
and other related topics
Presentation Types
Papers
Formal presentations of completed research making original scholarly
contributions. Presenters will have 15-minutes to discuss their papers,
followed by 5 minutes for questions and comments from the audience.
Posters
Poster sessions provide an opportunity for the presentation of work
visually. Poster topics can be works-in-progress and research that is being
planned as well as completed projects. Presenters will discuss their
posters with participants informally during a one-hour poster session.
Proposal Evaluation Criteria
Proposals will be evaluated on (1) Contribution to the field of AI in AL,
(2) Quality of the proposal, and (3) Clarity of the abstract.
Preparation and Submission of Proposals
Please submit your abstract through this submission form
<https://tccolumbia.qualtrics.com/jfe/form/SV_50jI0FuFAvt5xem>.
Abstracts should be max. 250 words.
Submission deadline: June 2, 2023
Notification date: June 30, 2023
Student Paper Award
An award will be presented to the best student paper presentation at the
conference. All authors on student papers must be actively-enrolled
graduate students at the time of the conference.
--
Erik Voss, Ph.D.
Assistant Professor, Applied Linguistics & TESOL program
Language & Technology Specialization
Department of Arts & Humanities
Teachers College, Columbia University
TC Faculty Profile <https://www.tc.columbia.edu/faculty/ev2449/>, Linkedin
Profile <https://www.linkedin.com/in/erik-voss-ph-d-941a3ab9>, Google
Scholar <https://scholar.google.com/citations?user=FMnVdjcAAAAJ&hl=en>
ALTESOL Language & Technology Research Group
<https://sites.google.com/tc.columbia.edu/al-tesol-language-technology/home>
*Latest Publications*
Voss, E. (2022). Argument-based validation in the time of the COVID-19
pandemic
<https://www.taylorfrancis.com/chapters/edit/10.4324/9781003221463-6/argumen…>.
(Ch. 5) Routledge.
Voss, E. (2023). Proctoring remote language assessments
<https://www.routledge.com/Fundamental-Considerations-in-Technology-Mediated…>.
(Ch. 12) Routledge.
** apologies for cross-posting **
Linking Lexicographic and Language Learning Resources (4LR)
Workshop at LDK 2023 – Call for Papers
Workshop website: https://lexicala.com/4lr/
The workshop ‘Linking Lexicographic and Language Learning Resources’ (4LR) will be held in conjunction with LDK 2023 – 4th conference on Language, Data and Knowledge – (http://2023.ldk-conf.org/) at the University of Vienna, Austria, on September 13 (tentative), in hybrid mode.
The aim of this workshop is to explore linguistic linked (open) data and knowledge management methods and technologies for linking lexicographic and language learning resources, tools and applications in general and dictionaries and CEFR lists in particular.
Our starting point is, on the one hand, enhancing CEFR-graded language proficiency lists with lexicographic content and, on the other hand, incorporating CEFR labels in learner’s dictionaries. CEFR – the Common European Framework of Reference for Languages – is a generally established international standard for describing language proficiency, and CEFR-graded resources have been developed for many languages in Europe. However, incorporating their information is still not a common practice in modern lexicography for most languages, except for notably two English dictionaries for advanced learners (Cambridge and Oxford). There are substantial unsolved issues, such as inconsistencies in vocabulary size per level between languages; no, or limited, sense disambiguation in CEFR resources; words from a higher CEFR level in definitions and example sentences. Moreover, there has been limited collaboration and interoperability so far among the related fields of lexicography, language acquisition, and linguistic linked data, whether regarding research, development, or practical application.
4LR will feature an overview by the organizers, as well as an invited talk by Jorge Gracia from University of Zaragoza and chair of NexusLinguarum CA on Linked Data for Lexicographic Resources.
In addition, we invite submissions for papers (20 minutes, plus discussion) on the following topics:
• Linking lexicographic content to CEFR-graded vocabularies
• Pedagogical lexicography and knowledge graphs
• Attributing CEFR labels in learner’s dictionaries
• Incorporating vocabulary and grammar profiles in lexicographic resources
• Creating and linking crosslingual concept-based CEFR resources
• Multilingual knowledge management and language learning applications and tools
SUBMISSION AND DATES
Please submit your abstract including 300-500 words via EasyChair [https://easychair.org/conferences/?conf=4lr2023].
19 May 2023 Deadline for abstract submission
29 May 2023 Deadline for notification for abstract submission
30 June 2023 Deadline for camera-ready paper submission
13 Sep 2023 (tentative) 4LR workshop
14–15 Sep LDK 2023 conference
ORGANIZERS AND CONTACT
Kris Heylen. Dutch Language Institute (Kris DOT Heylen AT ivdnt DOT org)
Jelena Kallas. Institute of the Estonian Language
Ilan Kernerman. Lexicala by K Dictionaries
Carole Tiberius. Dutch Language Institute
Website: https://lexicala.com/4lr/
4LR is supported by NexusLinguarum COST Action (CA18209) – European network for Web-centered linguistic data science.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
4LR workshop at LDK 2023 will follow a related workshop ‘Lexicography and CEFR: Linking Lexicographic Resources and Language Proficiency Levels’ that will be held in conjunction with eLex 2023 on June 29 in Brno, Czech Republic.
[apologies for cross-posting]
Call for Participation in Clinkart@EVALITA2023
Linking a Lab Result to its Test Event in the Clinical Domain
***SYSTEM SUBMISSION DEADLINE: MAY 19, 2023***
We invite you to participate in the Clinkart task on the extraction of
relations from clinical cases. Clinkart is organised in the context of
Evalita 2023 and the task mainly consists in identifying test results and
measurements and linking them to the textual mentions of the laboratory
tests and measurements from which they were obtained, as in the example
below:
All’ECG RS 66 bpm, deviazione assiale sinistra, BBD incompleto.
66 bpm → RS
Check the website for more detailed information and for downloading the
data: https://e3c.fbk.eu/clinkart.
Participants are invited to submit their results by May 19th (by email to
clinkart[at]fbk.eu). Each team is allowed to submit up to 2 different runs.
Assessment of the results will be returned to the participants by May 30th.
The evaluation dataset is based on the European Clinical Case Corpus (E3C),
which consists of clinical cases in five European languages and is freely
available (CC-BY-NC-4.0).
Participants are encouraged to also use the data of the TESTLINK task
(which focuses on Spanish and Basque) to train their systems (TESTLINK
website: https://e3c.fbk.eu/testlinkiberlef).
Join us!
The Clinkart organising team