The Natural Language Processing Section at the Department of Computer Science at University of Copenhagen is advertising an 18 month position for a Postdoctoral Researcher in Natural Language Processing. The position is funded by a VILLUM Young Investigator Grant held by the principal investigator, Desmond Elliott. The overall goal of the project is to develop a new family of language models that can process any written language by rendering text as images, which allows the models to learn from the visual similarities between written languages, facilitating effective transfer to lower-resource or unseen languages.
The Natural Language Processing Section provides a strong, international and diverse environment for research within core as well as emerging topics in natural language processing, natural language understanding, computational linguistics and multi-modal language processing. It is housed within the main Science Campus, which is centrally located in Copenhagen. Further information about research at the Department is available here: https://di.ku.dk/english/research/.
The application deadline is 31 August 2023, with a preferred start date of January 2024, or as soon as possible thereafter. More information and your application can be submitted here: https://di.ku.dk/english/about/vacancies/postdoc-in-natural-language-proces…
Informal enquiries about the positions can be made to Desmond Elliott, Department of Computer Science, University of Copenhagen, e-mail: de(a)di.ku.dk.
The Research Training Group 2853 “Neuroexplicit Models of Language, Vision, and Action” is looking for
3 PhD students and 1 postdoc
November 2023 or later
Neuroexplicit models combine neural and human-interpretable (“explicit”) models in order to overcome the limitations that each model class has separately. They include neurosymbolic models, which combine neural and symbolic models, but also e.g. combinations of neural and physics-based models. In the RTG, we will improve the state of the art in natural language processing (“Language”), computer vision (“Vision”), and planning and reinforcement learning (“Action”) through the use of neuroexplicit models and investigate the cross-cutting design principles of effective neuroexplicit models (“Foundations”).
The RTG is scheduled to grow to a total of 24 PhD students and one postdoc by 2025. Through the inclusion of ~20 further PhD students and postdocs funded from other sources, it will be one of the largest research centers on neuroexplicit or neurosymbolic models in the world. The RTG brings together researchers at Saarland University, the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, the CISPA Helmholtz Center for Information Security, and the German Research Center for Artificial Intelligence (DFKI). All of these institutions are colocated on the same campus in Saarbrücken, Germany.
In a previous round of recruiting, we have already filled three PhD positions, leaving three positions free for applicants with a later MSc graduation date. The positions are funded as follows:
• PhD students will be funded for up to four years at the TV-L E13 100% pay scale. You should have or be about to complete an MSc degree in computer science or a related field and have demonstrated expertise in one of the research areas of the RTG, e.g. through an excellent Master’s thesis or relevant publications.
• The postdoc will initially be funded for three years, with the possibility of extension up to five years, at the TV-L E13 100% pay scale. As the RTG postdoc, you will pursue your own research agenda in the field of neuroexplicit models and work with the PhD students to identify and pursue opportunities for collaborative research. You should have or be about to complete a PhD in computer science or a related field and have demonstrated your expertise in one or more of the RTG’s research areas through publications in top venues.
The RTG is part of the Saarland Informatics Campus, one of the leading centers for research in computer science, artificial intelligence, and natural language processing in Europe. The Saarland Informatics Campus brings together 900 researchers and 2500 students from 81 countries. The CISPA Helmholtz Center, located on the same campus, is home to an additional 350 researchers and on track to grow to 800 by 2026. Researchers at SIC and CISPA are part of the ELLIS network and have been awarded more than 35 ERC grants.
Each PhD student in the RTG will be jointly supervised by two PhD advisors from the list of Principal Investigators below. Each student will freely define their own research topic; we encourage the choice of topics that cross the traditional boundaries of research fields. Students may be affiliated with Saarland University or with one of the participating institutes.
Vera Demberg, Saarland University - Computational Linguistics
Jörg Hoffmann, Saarland University - AI Planning
Eddy Ilg, Saarland University - Computer Vision, Machine Learning
Dietrich Klakow, Saarland University - Natural Language Processing
Alexander Koller, Saarland University - Computational Linguistics
Bernt Schiele, MPI for Informatics - Computer Vision, Machine Learning
Philipp Slusallek, DFKI and Saarland University - Computer Graphics, Artificial Intelligence
Christian Theobalt, MPI for Informatics - Visual Computing, Machine Learning
Mariya Toneva, MPI for Software Systems - Computational Neuroscience, Machine Learning
Isabel Valera, Saarland University - Machine Learning
Jilles Vreeken, CISPA - Machine Learning, Causality
Joachim Weickert, Saarland University - Mathematical Data Analysis
Verena Wolf, DFKI and Saarland University - Modeling and Simulation, Reinforcement Learning
Ellie Pavlick, Brown University and Google AI, will join us regularly as a Mercator Fellow.
Please send your application by 31 July 2023 to apply(a)neuroexplicit.org. Include the reference number W2350 for the postdoc position and the reference number W2351 for the PhD positions. We aim to conduct job interviews in September, ideally in person in Saarbrücken, Germany.
Make sure to check the RTG website for details on the application process and what materials to include in the application: https://www.neuroexplicit.org/jobs
For all further information about the RTG, check out our website: https://www.neuroexplicit.org/
AAC/CFP Corpus 26 - 2025 - https://journals.openedition.org/corpus/
<https://journals.openedition.org/corpus/>
Background noise or added value? Managing noise during computer processing of linguistic corpora
Elisa Gugliotta, Luca Pallanti, Olivier Kraif, Iris Fabry et Martina Barletta (eds.)
-------FRENCH VERSION BELOW-----
The increasing influence of NLP-related methodologies on corpus linguistics has compelled researchers to reassess their practices for managing noise and its impact on research results (Fuchs & Habert, 2004; Léon, 2018; Zalmout et al., 2018). Whether working with long-diachronic corpora (e.g., medieval French), dialectal corpora with limited resources (e.g., oral or written texts in dialectal Arabic, cf. Arabizi), or corpora of texts deviating from the norm (e.g., learner corpora), conducting noise analysis becomes an essential step in drawing linguistic conclusions from the available data (Molinelli & Putzu, 2015; Scaglione, 2018; Litosseliti, 2018). This special issue of Corpus builds upon a workshop held in April 2023 (https://je-bruit-corpus.sciencesconf.org/) and offers an opportunity to examine noise management methods in the fields of NLP and corpus linguistics, as well as their impact on the quality of linguistic data (Kraif & Ponton, 2007; Goutte et al., 2012; Zeroual, 2018).
The fundamental inquiries in any linguistic study revolve around defining the research object, understanding the nature of the data, and determining ways to preserve its inherent characteristics throughout the various processing steps (such as lemmatisation, normalisation, labelling, etc.) (Sarrica et al., 2016). Hence, selecting appropriate methods for identifying and controlling noise becomes crucial throughout the entire process, from data collection to the archiving phase, and from data preparation to annotation (Egbert & Baker, 2019). The definition of noise itself is diverse and far from self-evident. In the field of NLP alone, this term encompasses a wide range of highly heterogeneous phenomena, including web peritexts - such as hyperlinks, menus and computer codes - as well as code switching and instances of spelling or grammatical errors that punctuate productions (Al Sharou et al., 2021).
This special issue aims to delve into the definition of noise, from a linguistic perspective, and the practices employed by researchers to mitigate the biases that can arise from it. These practices are implemented during collection, recording, and annotation of data. The question of noise inevitably emerges at each stage of the empirical process involved in data construction and analysis:
1. Noise during data collection and recording
If one accepts the postulate that "linguistic data is a result" (Benveniste, 1966), decoding the noise stemming from data collection and recording becomes crucial. Depending on the research object, various factors may contribute to data alteration, including the researcher's preconceptions or the biases introduced by an OCR system (Jentsch & Porada, 2020). The key challenge lies in predicting or identifying the potential biases induced by these factors during the selection and formatting of data. This enables better control over subsequent research stages and ensures greater accuracy in the analysis process.
2. Data preparation and pre-processing
The methods employed to refine raw data and prepare it for advanced manipulation can give rise to a significant source of noise (or, conversely, of silence, if noise elimination filters are applied). This is particularly evident during the data normalization process (Al Sharou et al., 2021). When transcribing data or correcting errors, researchers must make choices that inevitably influence the nature of the data, either by reducing or enriching its content. As a result, it becomes essential to anticipate the consequences of the transformations introduced by data processing methods (Tanguy, 2012).
3. The annotation process and metadata
Initially, corpus annotation aims to enrich the data by categorizing units through a labelling process, depending on the developed analysis model (Péry-Woodley et al., 2011). However, while this process has the potential to introduce noise, it can result in detrimental silence (when missing or erroneous labels lead to incomplete results during data analysis or querying). The concept of metadata also raises questions: does categorizing data transform it into something different? Furthermore, does the absence of agreement or low agreement in annotations produced by humans reflect inter-individual variations akin to noise, or does it stem from the inherent vagueness of the categorizations themselves?
***
At each and every step of the process, key methodological questions arise: what threshold can be considered acceptable for noise? How can we differentiate between noise and methodological bias? Is it possible to estimate noise without a ground truth? Which statistical tools are specific to corpus studies and enable the definition of confidence intervals? How can we strike a balance to prevent the noise resulting from compromising research outcomes?
***
Proposals for articles may address these topics from a general point of view, offering a theoretical and methodological perspective. Alternatively, they can be based on one or more case studies that focus on specific observations, while highlighting the noise management methods employed throughout the study.
References
Al Sharou, K., Li, Z., & Specia, L. (2021). Towards a Better Understanding of Noise in Natural Language Processing. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 5362. https://aclanthology.org/2021.ranlp-1.7
Benveniste, É. (1966). Problèmes de linguistique générale. Gallimard.
Egbert, J., & Baker, P. (Eds.). (2019). Using corpus methods to triangulate linguistic analysis. Routledge. Fuchs, C., & Habert, B. (2004). Le traitement automatique des langues : Des modèles aux ressources.
Le Français Moderne - Revue de linguistique Française, CILF (conseil international de la langue française), LXXII: 1, online.
Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers.
Jentsch, P., & Porada, S. (2020). From Text to Data : Digitization, Text Analysis and Corpus Linguistics. In S. Schwandt (Éd.), Digital Humanities Research (1re éd., Vol. 1, p. 89128). transcript Verlag / Bielefeld University Press. https://doi.org/10.14361/9783839454190-004
Kraif, O., & Ponton, C. (2007). Du bruit, du silence et des ambiguïtés : Que faire du TAL pour
l'apprentissage des langues ? TALN 2007, 143152. https://hal.archives-ouvertes.fr/hal-01073706
Léon, J. (2018). Tal et linguistique : Application, expérimentation, instrumentalisation. ELA. Etudes de linguistique appliquee, 2(190), 195203.
Litosseliti, L. (Ed.). (2018). Research methods in linguistics. Bloomsbury Publishing.
Molinelli, P., & Putzu, I. (2015). Modelli epistemologici, metodologie della ricerca e qualità del dato. Dalla linguistica storica alla sociolinguistica storica. Franco Angeli.
Péry-Woodley, M.-P., Afantenos, S. D., Ho-Dac, L.-M., & Asher, N. (2011). La ressource ANNODIS, un
corpus enrichi d'annotations discursives. TAL, 52(3), 71101.
Sarrica, M., Mingo, I., Mazzara, B., & Leone, G. (2016). The effects of lemmatization on textual analysis conducted with IRaMuTeQ: results in comparison. JADT2016: 13ème Journées Internacionales d'Analyse Statistique de Données Textuelles.
Scaglione, F. (2018). "Lavorare"; il dato linguistico: Prospettive e limiti. Alcune considerazioni dall'esperienza dell'Atlante Linguistico della Sicilia (ALS). In G. Sampino (Éd.), Atti del convegno internazionale dei dottorandi (p. 101122).
Tanguy, L. (2012). Complexification des données et des techniques en linguistique : contribution du TAL aux solutions et aux problèmes. HDR dissertation, Université de Toulouse 2 - le Mirail.
Zalmout, N., Erdmann, A., & Habash, N. (2018). Noise-robust morphological disambiguation for dialectal Arabic. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 953-964).
Zeroual, I. (2018). Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments (Doctoral dissertation, University of Maryland, USA).
Retro-planning
* July 2023: call for publications.
* November 2023: pre-selection based on article summaries.
* March 2024: article submission deadline.
* June 2024: response to the authors.
* June-October 2024: review process with authors to submit the final version of the article.
* November-December 2024: editing process.
* January 2025: publication.
Please note that this retro-planning outlines a general timeline and may vary depending on the specific publication requirements.
Abstract submission
* Your abstract should be no longer than 1,500 words, including bibliographical references.
* Please submit your abstracts by November 6, 2023 to elisa.gugliotta(a)ilc.cnr.it and luca.pallanti(a)univ-lyon2.fr.
----- FRENCH VERSION------
Bruit de fond ou valeur ajoutée ? Gérer le bruit lors des traitements informatiques des corpus linguistiques
Sous la direction de Elisa Gugliotta, Luca Pallanti, Olivier Kraif, Iris Fabry et Martina Barletta
English version below
L'influence croissante des méthodologies liées au TAL sur la linguistique de corpus oblige les chercheurs à réinterroger les pratiques de gestion du bruit et son impact dans les résultats de recherche (Fuchs & Habert, 2004 ; Léon, 2018 ; Zalmout et al., 2018). Qu'il s'agisse de corpus en diachronie longue (ex. français médiéval), de corpus dialectaux aux ressources limitées (ex. textes oraux ou écrits en arabe dialectal, cf. arabizi), ou encore de corpus de textes éloignés de la norme (ex. corpus d'apprenants), l'analyse du bruit est une étape nécessaire pour tirer des conclusions linguistiques des données ainsi évaluées (Molinelli & Putzu, 2015 ; Scaglione, 2018 ; Litosseliti, 2018). Ce numéro thématique de la revue Corpus, qui fait suite à une journée d'étude sur le même thème organisée en avril 2023 (https://je-bruit-corpus.sciencesconf.org/), sera l'occasion de réfléchir sur les méthodes de gestion du bruit dans les domaines du TAL et de la linguistique de corpus outillée, et à son impact sur la qualité des données linguistiques (Kraif et Ponton, 2007 ; Goutte et al., 2012 ; Zeroual, 2018).
Les questions sous-jacentes à toute étude linguistique concernent la définition de l'objet de recherche, la nature des données elles-mêmes, et la manière de préserver autant que possible leurs caractéristiques dans les différents traitements (lemmatisation, normalisation, étiquetage, etc.) (Sarrica et al., 2016). Ainsi, le choix des méthodes d'identification et de contrôle du bruit, de la phase de collecte à celle d'archivage, de la préparation des données à l'annotation, joue un rôle fondamental (Egbert & Baker, 2019). La définition même du bruit est multiple, et ne va pas de soi : dans le seul champ du TAL, ce terme, souvent peu interrogé, désigne des phénomènes variables et très hétérogènes, allant des péritextes du Web - hyperliens, menus et codes informatiques - au code switching, en passant par les erreurs d'orthographe ou de grammaire qui émaillent les productions (Al Sharou et al., 2021).
Ce numéro thématique propose de mener une réflexion sur la définition du bruit, dans une perspective linguistique, et sur les pratiques des chercheurs visant à réduire la portée des biais qui en découlent, que ce soit durant la collecte, l'enregistrement ou l'annotation des données. Dans le concret de la recherche, la question du bruit se pose à chaque étape de la démarche empirique de construction et d'analyse des données :
1. Le bruit pendant la collecte et l'enregistrement des données
Si l'on accepte le postulat selon lequel " la donnée linguistique est un résultat " (Benveniste, 1966), comment décoder le bruit causé par le recueil des données et leur enregistrement ? En effet, en fonction des objets de recherche, il existe des facteurs potentiels d'altération des données, comme par exemple les préconceptions du chercheur, ou les biais introduits par un système OCR donné (Jentsch & Porada, 2020). L'enjeu consiste alors à prédire ou à déterminer les biais potentiels induits par ces facteurs lors de la sélection et la mise en forme des données pour mieux contrôler les phases de recherche successives.
2. La préparation et le prétraitement des données.
Les méthodes choisies pour affiner les données brutes et les rendre disponibles pour des manipulations avancées peuvent représenter une importante source de bruit (ou, au contraire, de silence si on applique un filtre pour éliminer le bruit) : c'est notamment le cas du processus de normalisation des données (Al Sharou et al., 2021). Qu'il s'agisse de transcrire des données ou de corriger des erreurs, le chercheur fait des choix qui impactent nécessairement la nature des données, soit en les réduisant, soit en les enrichissant. Il s'agit donc d'anticiper les conséquences des transformations produites par les méthodes de traitement des données (Tanguy, 2012).
3. Le processus d'annotation et les métadonnées
À la base, l'annotation des corpus est une étape visant l'enrichissement des données : en fonction du modèle d'analyse mis au point, le chercheur tente de catégoriser des unités à travers un processus d'étiquetage (Péry-Woodley et al., 2011). Cependant, si d'un côté ce processus peut générer du bruit, de l'autre, il peut être une cause de silence fort préjudiciable aux résultats des recherches et à leur interprétation (des étiquettes absentes ou erronées pouvant générer des résultats lacunaires lors de l'analyse ou du requêtage des données). La notion de métadonnée peut également être mise en cause
: catégoriser une donnée signifie-t-il la transformer en quelque chose d'autre ? Par ailleurs, l'absence d'accord ou un faible accord dans les annotations produites par l'humain manifeste-t-il des variations interindividuelles assimilables à du bruit, ou au caractère trop vague des catégorisations en jeu ?
***
A chaque étape se posent des questions méthodologiques centrales : à partir de quel seuil peut-on considérer le bruit comme acceptable ? Comment différencier bruit et biais méthodologique ? Comment estimer le bruit sans vérité de terrain ? Quels outils statistiques spécifiques à l'étude des corpus permettent de délimiter des intervalles de confiance ? Comment atteindre l'équilibre nécessaire pour que le bruit causé par les traitements des données ne compromette pas les résultats des recherches ?
***
Les propositions d'article pourront aborder ces questions d'un point de vue général, sous un angle théorique et méthodologique, ou s'appuyer sur une ou plusieurs études de cas portant sur des observations particulières, en prenant soin de mettre en lumière les méthodes de gestion du bruit tout au long de l'étude.
Retro-planning
* Juillet 2023 : publication du l'Appel
* Novembre 2023 : pré-sélection sur résumé
* Mars 2024 : remise des articles. Juin 2024 : réponse aux auteurs
* Juin-octobre 2024 : navette avec les auteurs pour remise de l'article en forme définitive.
* Novembre-décembre 2024 : édition.
* Janvier 2025 : publication.
Soumission des résumés
* Votre résumé comptera 1.500 mots au maximum, références bibliographiques inclues.
* Merci de soumettre vos résumés pour le 6 novembre 2023 aux adresses elisa.gugliotta(a)ilc.cnr.it et luca.pallanti(a)univ-lyon2.fr
Références
Al Sharou, K., Li, Z., & Specia, L. (2021). Towards a Better Understanding of Noise in Natural Language Processing. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 5362. https://aclanthology.org/2021.ranlp-1.7
Benveniste, É. (1966). Problèmes de linguistique générale. Gallimard.
Egbert, J., & Baker, P. (Eds.). (2019). Using corpus methods to triangulate linguistic analysis. Routledge. Fuchs, C., & Habert, B. (2004). Le traitement automatique des langues : Des modèles aux ressources.
Le Français Moderne - Revue de linguistique Française, CILF (conseil international de la langue française), LXXII: 1, online.
Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers.
Jentsch, P., & Porada, S. (2020). From Text to Data : Digitization, Text Analysis and Corpus Linguistics. In S. Schwandt (Éd.), Digital Humanities Research (1re éd., Vol. 1, p. 89128). transcript Verlag / Bielefeld University Press. https://doi.org/10.14361/9783839454190-004
Kraif, O., & Ponton, C. (2007). Du bruit, du silence et des ambiguïtés : Que faire du TAL pour
l'apprentissage des langues ? TALN 2007, 143152. https://hal.archives-ouvertes.fr/hal-01073706
Léon, J. (2018). Tal et linguistique : Application, expérimentation, instrumentalisation. ELA. Etudes de linguistique appliquee, 2(190), 195203.
Litosseliti, L. (Ed.). (2018). Research methods in linguistics. Bloomsbury Publishing.
Molinelli, P., & Putzu, I. (2015). Modelli epistemologici, metodologie della ricerca e qualità del dato. Dalla linguistica storica alla sociolinguistica storica. Franco Angeli.
Péry-Woodley, M.-P., Afantenos, S. D., Ho-Dac, L.-M., & Asher, N. (2011). La ressource ANNODIS, un
corpus enrichi d'annotations discursives. TAL, 52(3), 71101.
Sarrica, M., Mingo, I., Mazzara, B., & Leone, G. (2016). The effects of lemmatization on textual analysis conducted with IRaMuTeQ: results in comparison. JADT2016: 13ème Journées Internacionales d'Analyse Statistique de Données Textuelles.
Scaglione, F. (2018). "Lavorare"; il dato linguistico: Prospettive e limiti. Alcune considerazioni dall'esperienza dell'Atlante Linguistico della Sicilia (ALS). In G. Sampino (Éd.), Atti del convegno internazionale dei dottorandi (p. 101122).
Tanguy, L. (2012). Complexification des données et des techniques en linguistique : contribution du TAL aux solutions et aux problèmes. HDR dissertation, Université de Toulouse 2 - le Mirail.
Zalmout, N., Erdmann, A., & Habash, N. (2018). Noise-robust morphological disambiguation for dialectal Arabic. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 953-964).
Zeroual, I. (2018). Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments (Doctoral dissertation, University of Maryland, USA).
The Centre for Translation Studies (CTS) at University of Surrey invites applications for a place in our MRes in Translation and Interpreting Studies course (academic year 2023-24 entry). Students attending this course get in-depth, systematic research training in translation and interpreting. This unique and innovative course is the first of its kind in the UK and draws on the research areas CTS is well known for: translation and interpreting technologies, translation as intercultural mediation, corpus-based translation, audiovisual translation, machine translation and Natural Language Processing for translation/interpreting. The research we carry out at CTS is in touch with recent technological and social developments, as we maintain a strong focus on the responsible integration of technologies in workflows where multilingual and multimodal mediation is key.
As an MRes student, you will take two compulsory taught modules and select two optional modules (60 credits). You will then complete your degree with a dissertation (120 credits), which is longer than in a typical MA dissertation format, thus allowing you to research a topic in greater depth. This year, we invite in particular students interested in pursuing dissertation topics related to:
- NLP and generative AI for multilingual communication
- accessibility services in museums
For further inspiration, take a look at what past students say about the course and their MA projects: https://www.surrey.ac.uk/student-life/what-our-students-say/zeynep-polat-po…
And for more details about the programme or how to apply visit:
https://www.surrey.ac.uk/postgraduate/translation-and-interpreting-studies-…
If you feel that an MRes is not for you, you can check our other postgraduate courses on topics related to translation and interpreting at:
https://www.surrey.ac.uk/centre-translation-studies/study/postgraduate-cour…
Watch our new video "More than an MA": https://www.youtube.com/watch?v=R2oVf3X2LEg
---
Prof Constantin Orăsan
Professor of Language and Translation Technologies
Centre for Translation Studies
University of Surrey
https://dinel.org.uk
* We apologize if you receive multiple copies of this CFP *
* For the online version of this call, visit:
https://2023-eu.semantics.cc/page/workshops *
SEMANTiCS 2023 (20th-22nd September - Leipzig, Germany) is hosting an
enriched collection of three workshops. Please find the relevant
workshops still open for accepting your submissions as long and short
paper contributions below.
# Onto4FAIR: 3rd Workshop on Ontologies for FAIR and FAIR Ontologies
Organizers: Cassia Trojahn (Institut de Recherche en Informatique de
Toulouse, France), Luiz Olavo Bonino da Silva Santos (University of
Twente, Leiden University Medical Centre, the Netherlands), Giancarlo
Guizzardi (University of Twente, the Netherlands), Clement Jonquet
(French National Research Institute for Agriculture, Food and
Environment, Mathematics, Informatics and Statistics for Environment and
Agronomy research unit, Montpellier, France)
https://onto4fair.github.io/2023-semantics.html
# NLP4KGC: 2nd Workshop on Natural Language Processing for Knowledge
Graph Construction
Organizers: Edlira Vakaj (Birmingham City University, Bermingham, UK),
Sanju Tiwari (Universidad Autónoma de Tamaulipas, Tamaulipas, Mexico),
Rizou Stamatia (Singular Logic, Athens, Greece), Nandana
Mihindukulasooriya (IBM Research, Dublin, Ireland), Fernando
Ortiz-Rodríguez (Universidad Autónoma de Tamaulipas, Tamaulipas,
Mexico), Ryan Mcgranaghan (NASA Jet Propulsion Laboratory, California,
United States)
https://sites.google.com/view/2nd-nlp4kgc/home
We are looking forward to your contribution!
Workshop & Tutorial Chairs
The School of Electronic Engineering and Computer Science at Queen Mary
University of London is currently advertising up to 15 faculty positions at
Lecturer (= Assistant Professor) or Senior Lecturer (= Associate Professor)
levels, seeking candidates with experience and research interests in a
range of topics generative AI, explainable AI and responsible AI.
Please see this link for more information:
https://www.qmul.ac.uk/jobs/vacancies/items/8638.html
Closing date is 1st August 2023.
--
Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/
Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/
Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/
School of Electronic Engineering and Computer Science
Queen Mary University of London, London E1 4NS, UK
*My working days for QMUL are Monday-Wednesday; responses to mail on other
days may be delayed.*
Call for workshop papers: the 6th workshop on Challenges and Applications
of Automated Extraction of Socio-political Events from Text - CASE @ RANLP
2023
************************************************************************************
URL: https://emw.ku.edu.tr/case-2023/
(new) Paper submission deadline: 24 July 2023
Paper acceptance notification: 5 August 2023
Paper camera-ready: 25 August 2023
Workshop dates: 7-8 September 2023
Softconf page of the workshop: https://softconf.com/ranlp23/CASE/
************************************************************************************
We invite contributions from researchers in computer science, NLP, ML, DL,
AI, socio-political sciences, conflict analysis and forecasting, peace
studies, as well as computational social science scholars involved in the
collection and utilization of socio-political event data. This includes
(but is not limited to) the following topics
1) Extracting events and their arguments such as time and location in and
beyond a sentence or document, event coreference resolution.
2) Research in NLP technologies in relation to event detection: geocoding,
temporal reasoning, argument structure detection, syntactic and semantic
analysis of event structures, text classification, for event type
detection, learning event-related lexica, event co-reference resolution,
fake news analysis, and others with a focus on real or potential event
detection applications.
3) New datasets, training data collection, and annotation for event
information.
4) Event-event relations, e.g., subevents, main events, spatio-temporal
relations, causal relations.
5) Event dataset evaluation in light of reliability and validity metrics.
6) Defining, populating, and facilitating event schemas and ontologies.
7) Automated tools and pipelines for event collection related tasks.
8) Lexical, syntactic, semantic, discursive, and pragmatic aspects of event
manifestation.
9) Methodologies for development, evaluation, and analysis of event
datasets.
10) Applications of event databases, e.g. early warning, conflict
prediction, and policymaking.
11) Estimating what is missing in event datasets using internal and
external information.
12) Detection of new and emerging socio-political event (SPE) types, e.g.
creative protests.
13) Release of new event datasets.
14) Bias and fairness of the sources and event datasets.
15) Ethics, misinformation, privacy, and fairness concerns pertaining to
event datasets.
16) Copyright issues on event dataset creation, dissemination, and sharing.
17) Cross-lingual, multilingual and multimodal aspects in event analysis.
18) Resources and approaches related to contentious politics around climate
change.
**** Shared tasks ****
Please check the workshop page and Github repositories of the respective
task for additional details.
Task 1 - Multilingual protest news detection: Contact person: Ali
Hürriyetoğlu (ali.hurriyetoglu(a)gmail.com), Github:
https://github.com/emerging-welfare/case-2022-multilingual-event
Task 2 - Collecting and Geocoding Armed Clash Events in Russian Ukrainian
Conflict: Contact person: Hristo Tanev (hristo.tanev(a)ec.europa.eu) and Onur
Uca (onuruca(a)mersin.edu.tr), Github:
https://github.com/zavavan/case2023_task2
Task 3 - Event causality identification: Contact person: Fiona Anting Tan (
tan.f(a)u.nus.edu) Github: https://github.com/tanfiona/CausalNewsCorpus
Task 4 - Multimodal Hate Speech Event Detection: Contact person:
Surendrabikram Thapa (surendrabikram(a)vt.edu), Github:
https://github.com/therealthapa/case2023_task4
*** Keynotes ***
We will continue our tradition of inviting keynote speakers from both
social and computational sciences. The social science keynote will be
delivered by Erdem Yörük with the title “Using Automated Text Processing to
Understand Social Movements and Human Behaviour” and the computational ones
will be delivered by Ruslan Mitkov and Kiril Simov.
Please see the workshop webpage (https://emw.ku.edu.tr/case-2023/) for
additional details.
** apologies for cross-posting ***
==================================================
*FINAL CALL: ML/NLP Competition on Automatic Classification of Literary
Epochs (CoLiE)*
To advance the field of implicit temporal information retrieval from a
text, this competition aims to challenge participants to develop automatic
methods to identify the literary epochs of a given text, which is
considered here as an implicit temporal context of a book.
The task on Automatic Classification of Literary Epochs (CoLiE) aims at
automatic identification of the literary epoch of a given text from its
writing style: (1) Romanticism (1798-1837), (2) Victorian Literature
(1837-1901), (3) Modernism (1900-1945), (4) Postmodernism (1945-2000), and
(5) our days (from 2000).
The competition is held as a part of the IACT’23
<https://en.sce.ac.il/news/iact23> workshop, held on July 27, 2023, in
conjunction with the 46th International ACM SIGIR Conference on Research
and Development in Information Retrieval
This competition is open to anyone with a passion for information
retrieval, machine learning, and natural language processing. Whether you
are a seasoned expert or a newcomer to the field, we welcome you to
participate and extend the boundaries of automated text analysis!
Competition site: http://www.kaggle.com/competitions/colie
Competition Timeline
- May 28, 2023: The competition is open to participants. Training and
validation sets together with their labels are available.
- July 10, 2023: Test dataset available.
- July 17, 2023, 23:59 UTC: Final submission deadline.
- July 27, 2023: The winners are announced at the special session at the
IACT'23 <https://en.sce.ac.il/news/iact23> workshop.
*The organizing team*
- Dr. Marina Litvak (marinal(a)ac.sce.ac.il),
Software Engineering Department,
Shamoon College of Engineering, Beer Sheva,
84100, Israel
- Dr. Irina Rabaev (irinar(a)ac.sce.ac.il),
Software Engineering Department,
Shamoon College of Engineering, Beer Sheva,
84100, Israel
- Prof. Ricardo Campos (ricardo.campos(a)ipt.pt),
Ci2 - Smart Cities Research Center, Polytechnic Institute of Tomar
INESC TEC, Porto
Porto, Portugal
- Prof. Alípio Mário Jorge (amjorge(a)fc.up.pt)
University of Porto
Porto, Portugal
- Prof. Adam Jatowt (adam.jatowt(a)uibk.ac.at)
University of Innsbruck,
Innsbruck, Austria
- Mr. Vladimir Younkin (vladiyo(a)ac.sce.ac.il),
Software Engineering Department,
Shamoon College of Engineering, Beer Sheva,
84100, Israel
--
Best regards,
Marina Litvak
[Apologies for cross-posting]
*****************************************************************************************************
*CALL FOR PAPERS*
7th Workshop on Natural Language for Artificial Intelligence (NL4AI)
at the 22nd International Conference of the Italian Association for
Artificial Intelligence (AIxIA 2023)
November 6th - 9th, 2023, Rome, Italy
Website:
http://sag.art.uniroma2.it/NL4AI/http://www.aixia2023.cnr.it/
*****************************************************************************************************
*IMPORTANT DATES*
Paper submission deadline: September 11th, 2023
Notification of paper acceptance: September 29th, 2023
Camera-ready version deadline: October 9th, 2023
Workshop (at AI*IA 2023): November 6th - 9th, 2023
*****************************************************************************************************
*INTRODUCTION*
The goal of the NL4AI workshop is to explore the role of Computational
Linguistics and Natural Language Processing in Artificial Intelligence
applications. We believe that new technological challenges and
opportunities rise at the boundary between NLP and AI. On the one hand,
AI applications benefit from a deeper understanding of problems related
to Natural Language, and thus the integration of advanced NLP
techniques. On the other hand, NLP benefits greatly from being used in
wider areas of AI where problems and methodologies related to NL can be
evaluated in new contexts.
*
***
*TOPICS OF INTEREST*
We invite papers that pertain to the workshop theme including, but not
limited, to:
* NLP and AI Applications (health, legal domain, social media and
journalism, etc.)
* Natural Language Interfaces for Human Robot Interaction
* Resources and Evaluation
* Discourse and Pragmatics
* Natural Language Generation
* Information extraction in AI applications
* Machine Learning for NLP
* Sentiment analysis and Opinion mining
* Natural Language Inference
* NLP and Industrial Challenges
* Semantics
* Conversational Agents in Human-Computer Interaction
* Cognitive modeling and psycholinguistics
* Language and other Multimodality
* Speech and Spoken language processing
* Ethics and NLP
* Interpretability, Explainability and Analysis of Models for NLP
* Abusive Language Detection and Analysis
* Machine Translation and Multilinguality
* Question Answering
* Summarization
* NLP for Fact Checking, Fake News Detection and Analysis
* LLMs and Applications
* Multimodal (text-image) data sources
Accepted papers will be published in the workshop proceedings via CEUR
Workshop Proceedings. Depending on the number and quality of papers
received, we will consider proposing a special issue in relevant
journals. The Program Committee will select the Best Workshop Paper from
the accepted papers.
*HOW TO SUBMIT*
We encourage submissions that describe new theoretical models, applied
techniques, and research in progress. Substantial extensions to works
already published or presented in other locations are also welcomed.
We will invite two kinds of submissions, which address novel interface
issues in recommender systems by following the new 2022 CEUR-ART – 1
Column papers style (http://ceur-ws.org/Vol-XXX/CEURART.zip).
Short/Demo papers: The maximum length is 6 pages (plus up to 2 pages of
references).
Long papers: The maximum length is 12 pages (plus up to 2 pages of
references).
Please note that papers with less than 25000 characters will be
considered short papers in the CEUR proceedings. Submissions will be
peer-reviewed (single-blind) by the program committee members.
Evaluation criteria will include novelty, significance for
theory/practice, technical soundness, and quality of presentation. All
the submissions should be submitted via EasyChair at:
https://easychair.org/conferences/?conf=nl4ai2023
*WORKSHOP ORGANIZERS*
Elisa Bassignana, IT University of Copenhagen, Denmark
Dominique Brunato, Institute for Computational Linguistics “A. Zampolli”
(CNR-ILC), Italy
Marco Polignano, University of Bari Aldo Moro, Italy
Alan Ramponi, Fondazione Bruno Kessler, Italy