Dear all,
the department DRX of DFKI Berlin has opened two student assistant
positions for a research project related to LLM evaluation and
usability. We would be grateful if you could circulate the following
student assistant positions to students who may be interested:
* Student Assistant (m/f/d) on the topic of Frontend-Development and
Design
https://jobs.dfki.de/en/vacancy/en-student-assistant-m-f-d-frontenddevelopm…
* Student Assistant (m/f/d) on the topic of UX-Design
https://jobs.dfki.de/en/vacancy/en-student-assistant-m-f-d-545837.html
Deadline is January 31st
best
Lefteris
--
Eleftherios Avramidis, senior researcher
German Research Center for Artificial Intelligence (DFKI)
departments: Design Research eXplorations, Speech and Language Technology
short name: Lefteris, (pronouns: he/him), languages: English, German, Greek
Website:https://www.dfki.de/~elav01
Address: Alt Moabit 91c, 10559 Berlin, Germany
Tel.: +49 30 23895 1806
Sec.: +49 30 23895 1800
Fax.: +49 30 23895 1810
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
Geschäftsführung: Prof. Dr. Antonio Krüger, Helmut Ditzer
Vorsitzender des Aufsichtsrats: Dr. Ferri Abolhassan
Amtsgericht Kaiserslautern, HRB 2313
tl;dr:
-
submission deadline for research track paper via Softconf: December 18th
2023
-
submission deadline for research track submissions already reviewed via
ARR: January 17th 2024
https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/SCI-CHAT_ARR_Co…
-
submission deadline for shard task systems: January 20th 2024
https://forms.gle/r7HgxZKgqdencRrHA
-
submission deadline for shard task system descriptions via SoftConf:
January 26th 2024
https://sites.google.com/view/dialogue-evaluation/
Call for Papers
The aim of this workshop is to bring together experts working in the area
of open-domain dialogue. In this speedily advancing research area many
challenges still exist, such as learning information from conversations,
engaging in realistic and convincing simulation of human intelligence,
reasoning, and so on.
SCI-CHAT follows previous workshops on open domain dialogue, but with a
focus on the simulation of intelligent conversation, including the ability
to follow a challenging topic over a multi-turn conversation, the ability
to posit questions, refuting and reasoning with live human evaluation
employed as the primary mechanism for evaluating models. The workshop will
include a research track and shared task:
SCI-CHAT's research track aims to explore recent advances and challenges in
open-domain dialogue research. Researchers working on all aspects of
open-domain dialogue are invited to submit papers on recent advances,
resources, tools, analysis, evaluation, and challenges on the broad theme
of open-domain dialogues.
The topics of the workshop include but are not limited to the following:
-
Intelligent conversation, chit-chat, open-domain dialogue;
-
Automatic and human evaluation of open-domain dialogue;
-
Limitations, risks and safety in open-domain dialogue;
-
Instruction-tuned and instruction-enabled models;
-
Any other topic of interest to the dialogue community.
SCI-CHAT's shared task will focus on simulating intelligent conversations;
participants will be asked to submit (access to the APIs of) automated
dialogue agents with the aim of carrying out nuanced conversations over
multiple dialogue turns. Participating systems will be interactively
evaluated in a live human evaluation. All data acquired within the context
of the shared task will be made public, providing an important resource for
improving metrics and systems in this research area.
Submission guidelines:
Authors are invited to submit their unpublished work that represents novel
research through either direct submission or ARR commitment. Papers should
consist of up to 8 pages of content, plus unlimited pages for references
and appendix. Authors should make use of the EACL Latex Template
<https://2023.eacl.org/calls/styles/> alongside supplementary materials,
including technical appendices, links to source code, datasets, and
multimedia appendices.
Papers can also be submitted as non-archival, so that their content can be
reused for other venues by adding "(NON-ARCHIVAL)" to the title of their
submission. Previously published work can also be submitted as non-archival
in the same way, with the additional requirement to state such on the first
page.
-
Direct paper submissions must be submitted through SoftCon submission
link: https://softconf.com/eacl2024/SCI-CHAT-2024/
<https://softconf.com/eacl2024/SCI-CHAT-2024/>
Multiple submissions of the same paper to more EACL workshops are forbidden.
All papers will be double-blind peer-reviewed, by at least 2 program
committee members. As such, all submissions, including the main paper and
its supplementary materials, should be fully anonymized. For more
information on formatting and anonymity guidelines, please refer to EACL
guidelines <https://eacl.org/index.html>.
Organizers
-
Yvette Graham (Trinity College Dublin, Ireland)
-
Qun Liu (Huawei Noah's Ark Lab, China)
-
Gerasimos Lampouras (Huawei Noah's Ark Lab,UK)
-
Ignacio Iacobacci (Huawei Noah's Ark Lab, UK)
-
Sinead Madden (Trinity College Dublin, Ireland)
-
Haider Khalid (Trinity College Dublin, Ireland)
-
Rameez Qureshi (Trinity College Dublin, Ireland)
Important Dates
Regarding Research Track:
-
Research paper via Softconf: December 18th 2023
-
Pre-reviewed ARR commitment deadline: January 17th 2024
-
Notification of research paper acceptance: January 20th, 2024
-
Camera-ready papers due: January 30th 2024
Regarding Shared Task:
-
Release of training and development data: November 9th 2023
-
Release of baseline systems: November 9th 2023
-
Preliminary System submission deadline: January 13th 2024 (optional - if
you want help testing your API, please submit early)
-
System submission (API) deadline: January 20th 2024
-
System description paper via SoftConf: January 26th 2024
-
Camera-ready papers due: January 30th 2024
Overview of results at one-day workshop: March 21 or 22, 2024
CONTACT: sci-chat(a)adaptcentre.ie
Dear corpora-list members,
We are announcing the first SemEval shared task on Semantic Textual
Relatedness (STR): A shared task on automatically detecting the degree of
semantic relatedness (closeness in meaning) between pairs of sentences.
The semantic relatedness of two language units has long been considered
fundamental to understanding meaning (Halliday and Hasan, 1976; Miller and
Charles, 1991), and automatically determining relatedness has many
applications such as evaluating sentence representation methods, question
answering, and summarization.
Two sentences are considered semantically similar when they have a
paraphrasal or entailment relation. On the other hand, relatedness is a
much broader concept that accounts for all the commonalities between two
sentences: whether they are on the same topic, express the same view,
originate from the same time period, one elaborates on (or follows from)
the other, etc. For instance, for the following sentence pairs:
-
Pair 1: a. There was a lemon tree next to the house. b. The boy enjoyed
reading under the lemon tree.
-
Pair 2: a. There was a lemon tree next to the house. b. The boy was an
excellent football player.
Most people will agree that the sentences in pair 1 are more related than
the sentences in pair 2.
In this task, new textual datasets will be provided for Afrikaans
<https://en.wikipedia.org/wiki/Afrikaans>, Algerian Arabic
<https://en.wikipedia.org/wiki/Algerian_Arabic>, Amharic
<https://en.wikipedia.org/wiki/Amharic>, English, Hausa
<https://en.wikipedia.org/wiki/Hausa_language>, Hindi
<https://en.wikipedia.org/wiki/Hindi>, Indonesian
<https://en.wikipedia.org/wiki/Indonesian_language>, Kinyarwanda
<https://en.wikipedia.org/wiki/Kinyarwanda>, Marathi
<https://en.wikipedia.org/wiki/Marathi_language>, Moroccan Arabic
<https://en.wikipedia.org/wiki/Moroccan_Arabic>, Modern Standard Arabic
<https://en.wikipedia.org/wiki/Modern_Standard_Arabic>, Punjabi
<https://en.wikipedia.org/wiki/Punjabi_language>, Spanish
<https://en.wikipedia.org/wiki/Spanish_language>, and Telugu
<https://en.wikipedia.org/wiki/Telugu_language>.
Data
Each instance in the training, development, and test sets is a sentence
pair. The instance is labeled with a score representing the degree of
semantic textual relatedness between the two sentences. The scores can
range from 0 (maximally unrelated) to 1 (maximally related). These gold
label scores have been determined through manual annotation. Specifically,
a comparative annotation approach was used to avoid known limitations of
traditional rating scale annotation methods This comparative annotation
process (which avoids several biases of traditional rating scales) led to a
high reliability of the final relatedness rankings.
Further details about the task, the method of data annotation, how STR is
different from semantic textual similarity, applications of semantic
textual relatedness, etc. can be found in this paper:
https://aclanthology.org/2023.eacl-main.55.pdf
Tracks
Each team can provide submissions for one, two or all of the tracks shown
below:
Track A: Supervised
Participants are to submit systems that have been trained using the labeled
training datasets provided. Participating teams are allowed to use any
publicly available datasets (e.g., other relatedness and similarity
datasets or datasets in any other languages). However, they must report
additional data they used, and ideally report how impactful each resource
was on the final results.
Track B: Unsupervised
Participants are to submit systems that have been developed without the use
of any labeled datasets pertaining to semantic relatedness or semantic
similarity between units of text more than two words long in any language.
The use of unigram or bigram relatedness datasets (from any language) is
permitted.
Track C: Cross-lingual
Participants are to submit systems that have been developed without the use
of any labeled semantic similarity or semantic relatedness datasets in the
target language and with the use of labeled dataset(s) from at least one
other language. Note: Using labeled data from another track is mandatory
for submission to this track.
Deciding which track a submission should go to:
-
If a submission uses labeled data in the target language: submit to
Track A
-
If a submission does not use labeled data in the target language but
uses labeled data from another language: submit to Track C
-
If a submission does not use labeled data in any language: submit to
Track B
** Here ‘labeled data’ refers to labeled datasets pertaining to semantic
relatedness or semantic similarity between units of text more than two
words long.
Evaluation
The official evaluation metric for this task is the Spearman rank
correlation coefficient, which captures how well the system-predicted
rankings of test instances align with human judgments. You can find the
evaluation script for this shared task on our Github page
<https://github.com/semantic-textual-relatedness/Semantic_Relatedness_SemEva…>
.
Helpful Links
-
Competition Website: https://codalab.lisn.upsaclay.fr/competitions/15704
-
Task Website: <https://afrisenti-semeval.github.io/>
https://semantic-textual-relatedness.github.io
-
Twitter/X: <https://twitter.com/AfriSenti2023>
https://twitter.com/SemRel2024
-
Contact organisers semrel-semeval-organisers(a)googlegroups.com
-
Google group for participants semrel
-semeval-participants(a)googlegroups.com
Important Dates
-
Training data ready: 11 September 2023
-
Evaluation Starts: *20 January 2024*
-
Evaluation End: 31 January 2024
-
System Description Paper Due: 19 February 2024
- Notification of acceptance: 01 April 2024
-
Camera-ready Due: 22 April 2024
- SemEval workshop: 16-21 June (co-located with NAACL 2024)
NB. We will organise a QA mentorship tomorrow (January 16th 2024 from 4 to
5 pm GMT) and a system description writing tutorial in February for all
participants, especially students and junior researchers. The zoom links
will be shared by email and on Slack.
References
-
Shima Asaadi, Saif Mohammad, Svetlana Kiritchenko. 2019. Big BiRD: A
Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic
Composition. Proceedings of the 2019 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language
Technologies.
-
M. A. K. Halliday and R. Hasan. 1976. Cohesion in English. London:
Longman.
-
George A Miller and Walter G Charles. 1991. Contextual Correlates of
Semantic Similarity. Language and Cognitive Processes, 6(1):1–28
-
Mohamed Abdalla, Krishnapriya Vishnubhotla, and Saif Mohammad. 2023.
What Makes Sentences Semantically Related? A Textual Relatedness Dataset
and Empirical Study. In Proceedings of the 17th Conference of the European
Chapter of the Association for Computational Linguistics, pages 782–796,
Dubrovnik, Croatia. Association for Computational Linguistics.
Task Organizers
Nedjma Ousidhoum
Shamsuddeen Hassan Muhammad
Mohamed Abdalla
Krishnapriya Vishnubhotla
Vladimir Araujo
Meriem Beloucif
Idris Abdulmumin
Seid Muhie Yimam
Nirmal Surange
Christine De Kock
Sanchit Ahuja
Oumaima Hourrane
Manish Shrivastava
Alham Fikri Aji
Thamar Solorio
Saif M. Mohammad
17th Workshop on Building and Using Comparable Corpora --- Call for Papers
Co-located with LREC-COLING 2024
Torino, Italia, 20 May 2024
Workshop website: https://comparable.limsi.fr/bucc2024/
LREC-COLING website: BLOCKEDlrec-coling-2024[.]org/BLOCKED
Workshop proceedings to be published in the ACL Anthology
MOTIVATION
In the language engineering and linguistics communities, research in comparable corpora has been motivated by two main reasons. In language engineering, on the one hand, it is chiefly motivated by the need to use comparable corpora as training data for statistical NLP applications such as statistical and neural machine translation or cross-lingual retrieval. In linguistics, on the other hand, comparable corpora are of interest because they enable cross-language discoveries and comparisons. It is generally accepted in both communities that comparable corpora consist of documents that are comparable in content and form in various degrees and dimensions across several languages. Parallel corpora are on the one end of this spectrum, unrelated corpora on the other.
Comparable corpora have been used in a range of applications, including Information Retrieval, Machine Translation, Cross-lingual text classification, etc. The linguistic definitions and observations related to comparable corpora can improve methods to mine such corpora for applications of neural NLP, for example, to extract parallel corpora from comparable corpora for neural machine translation. As such, it is of great interest to bring together builders and users of such corpora.
TOPICS
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:
Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora
Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora
Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words, multi-word expressions, proper names, named entities, sentences, paraphrases etc. from comparable corpora
- Induction of morphological, grammatical, and translation rules from comparable corpora
- Induction of multilingual word classes from comparable corpora
Comparable Corpora in the Humanities:
- Comparing linguistic phenomena across languages in contrastive linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research
IMPORTANT DATES
21 Feb 2024: Paper submission deadline
24 Mar 2024: Notification of acceptance
7 Apr 2024: Camera-ready final papers
20 May 2024: Workshop date
For updates, please see the workshop website at https://comparable.limsi.fr/bucc2024/
PRACTICAL INFORMATION
The workshop is an in-person event. Workshop registration is via the main conference registration site, see BLOCKEDlrec-coling-2024[.]org/BLOCKED
The workshop proceedings will be published in the ACL Anthology.
SUBMISSION GUIDELINES
Please follow the style sheet and templates (for LaTeX, Overleaf and MS-Word) provided for the main conference at BLOCKEDlrec-coling-2024[.]org/authors-kit/BLOCKED
Papers should be submitted as a PDF file using the START conference manager at https://secure-web.cisco.com/1UUoVNXimK0Jzna4dQKSutgJlLRB94SkbvGnq5AUpyqLNT…
Submissions must describe original and unpublished work and range from 4 to 8 pages plus unlimited references.
Reviewing will be double blind, so the papers should not reveal the authors' identity. Accepted papers will be published in the workshop proceedings, which will be included in the ACL Anthology.
Double submission policy: Parallel submission to other meetings or publications is possible but must be immediately (i.e. as soon as known to the authors) notified to the workshop organizers by e-mail.
For further information and updates, please see the BUCC 2024 website: https://comparable.limsi.fr/bucc2024/
WORKSHOP ORGANIZERS
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay, France)
- Reinhard Rapp (University of Mainz and Magdeburg-Stendal University of Applied Sciences, Germany)
- Serge Sharoff (University of Leeds, United Kingdom)
Contact: pz (at) lisn (dot) fr
PROGRAMME COMMITTEE
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences, Iran)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan)
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (Language Weaver, Inc., USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Ayla Rigouts Terryn (KU Leuven, Belgium)
- Reinhard Rapp (University of Mainz and Magdeburg-Stendal University of Applied Sciences, Germany)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Serge Sharoff (University of Leeds, UK)
- Richard Sproat (OGI School of Science & Technology, USA)
- Tim Van de Cruys (KU Leuven, Belgium)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay, France)
*** First Call for Journal First Submissions ***
36th International Conference on Advanced Information Systems Engineering
(CAiSE'24)
June 3-7, 2024, 5* St. Raphael Resort and Marina, Limassol, Cyprus
https://cyprusconferences.org/caise2024/
(*** Submission Deadline: 31st March, 2024 AoE ***)
CAiSE 2024 is organising journal-first sessions as part of the scientific program. The aim of
these sessions is to disseminate recent important research contributions and spark
discussions between authors and researchers in the CAiSE community. Authors of selected
journal articles on CAiSE-related topics will be invited to present their work at the
conference.
SCOPE
For the journal-first sessions, we solicit submissions related to articles that have been
accepted for publication by a reputable journal and that meet the following criteria:
• The article relates to the topics of the CAiSE conference and the recent call for papers.
• The article is an original submission to the journal and not an extension of an earlier
conference or workshop paper.
• The article is an original research article; review articles or commentaries will not be
considered.
• The article was accepted for publication by a journal on or after 1 January 2023, the
acceptance must have been publicly announced, the article must be available at the
publisher’s website (e.g., as "articles in advance" or published on a journal’s website), and
the article must be written in English.
• The article has not been presented at, and is not under consideration for, journal-first
tracks of other conferences.
FORMAT
Accepted submissions will be presented as part of the CAiSE 2024 scientific programme.
SUBMISION
Submissions must be done electronically via Easychair
(https://easychair.org/my/conference?conf=caise2024) and include:
• Title and author information of the article.
• The original abstract and keywords.
• DOI of the original publication or, alternatively, a link to the publication at the journal’s
website.
EVALUATION
All submissions will be reviewed by the track chairs with the aim to accept all qualifying
submissions subject to ability to accommodate them in the program. If needed, priority will
be given to submissions according to their topical fit with the scope of the conference, the
importance of the contribution, as well as the standing of the respective journal (including,
but not limited to, the journal's impact factor and ranking results).
ATTENDANCE AND PRESENTATION
At least one author of each submission accepted for the journal-first track must register
and attend the conference to present the work. The author needs a full registration to
present the journal article. As the articles of the journal-first track have been published
already, they will not be part of the CAiSE 2024 proceedings. The articles will be listed in
the conference program and CAiSE 2024 participants will have access to the respective
abstracts and a pointer to the original journal article.
IMPORTANT DATES
• Submission: 31st March, 2024 (AoE)
• Notification of Acceptance: 14th April, 2024
• Author Registration: 17th May, 2024
• Conference Dates: 3rd-7th June, 2024
JOURNAL FIRST CHAIRS
• Paolo Giorgini, University of Trento, Italy
• Jeffrey Parsons, Memorial University of Newfoundland, Canada
The UKP Lab at the Department of Computer Science, Technical University Darmstadt, Germany, is hiring several
*** Postdoc Research Fellows in the field of AI/Natural Language Processing. ***
Areas of work include Conversational AI, Multimodal fact-checking, Interactive Code Generation, NLP for mental health and privacy-aware NLP. It is also possible to propose a topic bottom-up.
https://www.informatik.tu-darmstadt.de/ukp/ukp_home/jobs_ukp/2023_postdoc_u…
Join our internationally recognized team at TU Darmstadt, enjoy diverse opportunities for professional development, and conduct cutting-edge research! Application deadline: January 30th, 2024.
Please submit your application via the following form: https://careers.ukp.informatik.tu-darmstadt.de/ukprecruitment
--------------------------------------------------------------------
Prof. Dr. Iryna Gurevych
UKP Lab
Technical University Darmstadt, Germany
http://www.ukp.tu-darmstadt.de/
EvaLatin, at its third edition, is the campaign devoted to the evaluation of NLP tools for Latin. This year we invite all those interested in parsing and sentiment analysis to undertake the challenge of working on Latin by partecipating in the following tasks:
- dependency parsing;
- emotion polarity detection.
Test sets for both tasks will be released on the EvaLatin 2024 web page in the first half of February.
Check all the important dates here: https://circse.github.io/LT4HALA/2024/EvaLatin
EvaLatin 2024 is organized as part of the "Workshop on Language Technologies for Historical and Ancient Languages" (LT4HALA) which will be held in Turin on May 25, 2024 in the context of the LREC-COLING 2024 conference.
Prof. Marco C. Passarotti
Computational Linguistics
Index Thomisticus Treebank https://itreebank.marginalia.it/
ERC Grantee, P.I. LiLa https://lila-erc.eu/ (Grant Agreement No. 769994)
CIRCSE Research Centre https://centridiricerca.unicatt.it/circse_index.html
[cid:38DBA4B0-3169-48DD-B59A-4F3A679F9DD9@lan] [cid:D415BF3A-E244-4BC4-9FB5-064066B300AD@lan] [cid:13BA173A-59CB-4F2D-9B90-DE302E870A50@lan]
Università Cattolica del Sacro Cuore
Largo Gemelli, 1
20123 Milan, Italy
marco.passarotti(a)unicatt.it<mailto:marco.passarotti@unicatt.it>
tel. +39-02-72342380
[http://static.unicatt.it/ext-portale/5xmille_firma_mail_2023.jpg] <https://www.unicatt.it/uc/5xmille>
Dear list members (esp. those in the Japanese community),
for a cross-linguistic evaluation of co-reference annotations, I was
interested into looking into the NAIST Coreference Corpus, which is based
on the Kyoto Corpus. Luckily, both annotations are available, but not the
primary text. According to the documentation of both corpora, it is
necessary to acquire the Mainichi Shimbun CD-ROM (1995), first. I really
tried my best, and I followed several catalogues (incl.
https://www.jaist.ac.jp/project/NLP_Portal/doc/LR/lr-cat-e.html#jp:mainichi…),
but the URL is points to (
https://www.nichigai.co.jp/sales/mainichi/mainichi-data.html) isn't
operational any more. Does anyone know where and how to buy that CDROM? Is
there another way to get access to that data?
Thanks a lot,
Christian
Journal of Data Mining and Digital Humanities (JDMDH)
organizes a call for papers about the topic
Chinese Natural Language Processing for Digital Humanities (CNLP4DH)
As a reminder JDMDH is an international-based journal managed by French
national research institutions and green open access (no charge for readers
and authors).
This special issue is dedicated to natural language processing for digital
humanities involving the documents written in Chinese, including Modern,
Ancient and dialectal Chinese. Mandarin, which is the national official and
main common language, can be accepted and research on texts written in
other languages, such as Tibet, Inner Mongolia, etc., is also welcome.
A list of suitable topics includes but are not limited to:
- Text analysis and processing related to humanities using computational
methods
- Dataset creation and curation for NLP (e.g. digitization, datafication,
and data preservation).
- Research on cultural heritage collections such as national archives and
libraries using NLP
- NLP for error detection, correction, normalization and denoising data
- Generation and analysis of literary works such as poetry and novels
- Analysis and detection of text genres
- Word segmentation, part-of-speech tagging of Ancient Chinese
- Large Language Models (LLM) for Chinese in Digital Humanities
- Cross modal Models (text-speech-video-image) for Chinese in Digital
Humanities
- Visualization of text analytics
- Ontology models for natural language text
- Applications in Chinese Literature, Traditional Chinese medicine,
Learning Chinese language as second language, Sentiment Analysis in Chinese
Social Media, China Cultural Heritage, Chinese History, Ancient Chinese
language
Website and more details:
https://jdmdh.episciences.org/page/chinese-natural-language-processing-for-…
submission guideline: https://jdmdh.episciences.org/page/submissions
Paper submission : https://jdmdh.episciences.org/submit
Guest Editors:
Dr. Wenhe FENG (Guangdong University of Foreign Studies, Laboratory of
Language Engineering and Computing)
Dr. Bin LI (Nanjing Normal University, School of Chinese Language and
Literature, Center of Linguistic Big Data and Computational Humanities)
Dr. Nicolas TURENNE (Guangdong University of Foreign Studies, School of
Information Science and Technology)
Dr. Tong WEI (Beijing University, Digital Humanities Center)
************************************************************************************
Second Call for papers: CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization @ EACL 2024, March 21 or 22, 2024
Website:
https://mormor-karl.github.io/events/CALD-pseudo/
Submission website: https://softconf.com/eacl2024/CALD-pseudo-2024/
Submission Deadline: Monday, 18 December 2023
We invite submissions to the first edition of the CALD-pseudo workshop on Computational Approaches to Language Data Pseudonymization, to be held at EACL 2024 on March 21 or 22, 2024.
[Important Dates]
* December 18, 2023: paper submission deadline
* January 17, 2024: resubmission of already pre-reviewed ARR papers
* January 20, 2024: notification of acceptance
* January, 30 2024: camera-ready papers due
* March 21 or 22, 2024: workshop date (the date to be confirmed by the EACL)
[Introduction]
Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names, political opinions, sensitive personal information and medical data. General Data Protection Regulation, GDPR (EU Commission, 2016), suggests pseudonymization as a solution to secure open access to research data but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data (Volodina et al., 2023). The main challenge is how to effectively pseudonymize data so that individuals cannot be identified, while at the same time keeping the data usable for research in, among others, computational linguistics, linguistics and natural language processing, for which it was collected.
[Topics of Interest]
CALD-pseudo workshop invites a broad community of researchers in all concerned cross-disciplinary fields to jointly discuss challenges within pseudonymization, such as
* automatic approaches to detection and labelling of personal information in unstructured language data, including events and other context-dependent cues revealing a person;
* developing context-sensitive algorithms for replacement of personal information in unstructured data;
* studies into the effects of pseudonymization on unstructured data, e.g. applicability of pseudonymised data for the intended research questions, readability of pseudonymised data or addition of unwelcome biases through pseudonymization;
* effectiveness of pseudonymization as a way of protecting writer identity;
*
reidentification studies; e.g. adversarial learning techniques that attempt to breach the privacy protections of pseudonymized data;
* constructing datasets for automatic pseudonymization, including methodological and ethical aspects of those;
* approaches to the evaluation of automatic pseudonymization both in concealing the private information and preserving the semantics of the non-personal data;
* pseudonymization tools and software: evaluating the available tools and software for pseudonymization in different languages, and their ease of use, scalability, and performance;
* and numerous other open questions.
[Submission Guidelines]
Authors are invited to submit by December 18, 2023 original and unpublished research papers in the following categories:
* Full papers (up to 8 pages) for substantial contributions
* Short papers (up to 4 pages) for ongoing or preliminary work
All submissions must be in PDF format, must follow the EACL 2024 guidelines described in the ARR CfP (https://aclrollingreview.org/cfp), and use the official ACL style templates available here: https://github.com/acl-org/acl-style-files
Direct submission deadline: December 18, 2023 at https://softconf.com/eacl2024/CALD-pseudo-2024/
Deadline for registration of ARR reviewed papers: January 17, 2023. (Further instructions will follow.)
We also invite authors of papers on the topics of the workshop accepted to Findings to reach out to the organizing committee of CALD-pseudo to present them at the workshop.
[Invited speakers]
We are happy to announce that the workshop will host two invited speakers:
*
Anders Søgaard, University of Copenhagen, Denmark
*
Ildikó Pilán, the Norwegian Computing Center, Norway
[Workshop Organizers]
* Elena Volodina, University of Gothenburg, Sweden
* Therese Lindström Tiedemann, University of Helsinki, Finland
* Simon Dobnik, University of Gothenburg, Sweden
* Xuan-Son Vu, Umeå university, Sweden
[Program Committee]
A list of program committee members is available on the workshop website.
[Contact]
For inquiries, please contact mormor.karl(a)svenska.gu.se
ACL link to the call: https://www.aclweb.org/portal/content/computational-approaches-language-dat…
___________________
Elena Volodina, PhD, Docent
https://spraakbanken.gu.se/en/about/staff/elena
Life is like a mirror. Smile at it and it smiles back at you.
Peace Pilgrim