July 2023 - Corpora - ELRA lists

Call for Participation: Challenge on Medical Video Question Answering at TRECVID 2023
by Deepak Gupta 15 Jul '23

15 Jul '23

Dear colleagues and friends, This year, we are organizing the MedVidQA <https://medvidqa.github.io/>challenge with TRECVID 2023 <https://www-nlpir.nist.gov/projects/tv2023/index.html>. This challenge aims at developing models for (1) retrieving the relevant videos and locating the visual answer in those videos for the medical or health-related question and (2) generating the medical instructional questions from the video segments. Following the success of the 1st MedVidQA shared task <https://aclanthology.org/2022.bionlp-1.25/>, MedVidQA at TRECVID 2023 expanded the tasks and introduced a new track considering language-video understanding and generation. This track is comprised of two main tasks Video Corpus Visual Answer Localization (VCVAL) and Medical Instructional Question Generation (MIQG). For more details, please visit the challenge website ( https://medvidqa.github.io/) and TRECVID 2023 website ( https://www-nlpir.nist.gov/projects/tv2023/index.html). The link for submission: - Task 1 (VCVAL): https://codalab.lisn.upsaclay.fr/competitions/13445 <https://codalab.lisn.upsaclay.fr/competitions/13546> - Task 2 (MIQG): https://codalab.lisn.upsaclay.fr/competitions/13546 *Important Dates* - *Release of the training and validation datasets:* April 30, 2023 - *Release of the video corpus:* May 12, 2023 - *Release of the test sets:* July 14, 2023 - *Run submission deadline:* August 4, 2023 - *Release of the official results:* September 29, 2023 We look forward to your participation in MedVidQA at TRECVID 2023. Join our Google Group <https://groups.google.com/g/trecvid-medvidqa2023> for important updates! If you have any questions, ask in our Google Group <https://groups.google.com/g/trecvid-medvidqa2023> or email <deepak.gupta(a)nih.gov> us. Thank you, MedVidQA 2023 Organizers

1 2

Linguamática V15N1 is available!
by Hugo Gonçalo Oliveira 14 Jul '23

14 Jul '23

---------------------------------------------------------------------- Linguamática Volume 15, Número 1 1647-0818 --- http://www.linguamatica.com ---------------------------------------------------------------------- ** Artigos relacionados com a DIP: Desafio de Identificação de Personagens ** Diana Santos, Cristina Mota, Emanoel Pires, Marcia Langfeldt, Rebeca Schumacher Fuão, Roberto Willrich. DIP - Desafio de Identificação de Personagens: objectivo, organização, recursos e resultados https://linguamatica.com/index.php/linguamatica/article/view/399/492 Eckhard Bick. Extração de Informação sobre Personagens Literários em Português https://linguamatica.com/index.php/linguamatica/article/view/397/490 Diana Santos, Cristina Mota. Pais, filhos e outras relações familiares no DIP https://linguamatica.com/index.php/linguamatica/article/view/402/493 Emanoel Pires, Marcia Caetano Langfeldt, Rebeca Schumacher Fuão. Desafios e vantagens do processo de identificação automática do gênero e das profissões das personagens no DIP https://linguamatica.com/index.php/linguamatica/article/view/401/491 Roberto Willrich, Diana Santos. Avaliação no Desafio de Identificação de Personagens https://linguamatica.com/index.php/linguamatica/article/view/398/494 ** Artigos de Investigação / Artículos de Investigación / Research Articles ** David Soares Batista. Extracção de Relações de Apoio e Oposição em Títulos de Notícias de Política em Português https://linguamatica.com/index.php/linguamatica/article/view/386/495 Cássio Faria da Silva, Vânia Paula de Almeida Neris, Helena de Medeiros Caseli. Classificação da qualidade da argumentação em tweets no domínio da política brasileira https://linguamatica.com/index.php/linguamatica/article/view/387/496 ** Novas Perspectivas / Nuevas Perspectivas / New Perspectives ** Átila Augusto Soares Vital. A compilação e a análise de métricas textuais de um corpus de redações https://linguamatica.com/index.php/linguamatica/article/view/393/497

1 0

18 month postdoc position in Natural Language Processing at the University of Copenhagen
by Desmond Elliott 14 Jul '23

14 Jul '23

The Natural Language Processing Section at the Department of Computer Science at University of Copenhagen is advertising an 18 month position for a Postdoctoral Researcher in Natural Language Processing. The position is funded by a VILLUM Young Investigator Grant held by the principal investigator, Desmond Elliott. The overall goal of the project is to develop a new family of language models that can process any written language by rendering text as images, which allows the models to learn from the visual similarities between written languages, facilitating effective transfer to lower-resource or unseen languages. The Natural Language Processing Section provides a strong, international and diverse environment for research within core as well as emerging topics in natural language processing, natural language understanding, computational linguistics and multi-modal language processing. It is housed within the main Science Campus, which is centrally located in Copenhagen. Further information about research at the Department is available here: https://di.ku.dk/english/research/. The application deadline is 31 August 2023, with a preferred start date of January 2024, or as soon as possible thereafter. More information and your application can be submitted here: https://di.ku.dk/english/about/vacancies/postdoc-in-natural-language-proces… Informal enquiries about the positions can be made to Desmond Elliott, Department of Computer Science, University of Copenhagen, e-mail: de(a)di.ku.dk.

1 0

Three PhD students, one postdoc on neurosymbolic models
by Alexander Koller 14 Jul '23

14 Jul '23

The Research Training Group 2853 “Neuroexplicit Models of Language, Vision, and Action” is looking for 3 PhD students and 1 postdoc November 2023 or later Neuroexplicit models combine neural and human-interpretable (“explicit”) models in order to overcome the limitations that each model class has separately. They include neurosymbolic models, which combine neural and symbolic models, but also e.g. combinations of neural and physics-based models. In the RTG, we will improve the state of the art in natural language processing (“Language”), computer vision (“Vision”), and planning and reinforcement learning (“Action”) through the use of neuroexplicit models and investigate the cross-cutting design principles of effective neuroexplicit models (“Foundations”). The RTG is scheduled to grow to a total of 24 PhD students and one postdoc by 2025. Through the inclusion of ~20 further PhD students and postdocs funded from other sources, it will be one of the largest research centers on neuroexplicit or neurosymbolic models in the world. The RTG brings together researchers at Saarland University, the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, the CISPA Helmholtz Center for Information Security, and the German Research Center for Artificial Intelligence (DFKI). All of these institutions are colocated on the same campus in Saarbrücken, Germany. In a previous round of recruiting, we have already filled three PhD positions, leaving three positions free for applicants with a later MSc graduation date. The positions are funded as follows: • PhD students will be funded for up to four years at the TV-L E13 100% pay scale. You should have or be about to complete an MSc degree in computer science or a related field and have demonstrated expertise in one of the research areas of the RTG, e.g. through an excellent Master’s thesis or relevant publications. • The postdoc will initially be funded for three years, with the possibility of extension up to five years, at the TV-L E13 100% pay scale. As the RTG postdoc, you will pursue your own research agenda in the field of neuroexplicit models and work with the PhD students to identify and pursue opportunities for collaborative research. You should have or be about to complete a PhD in computer science or a related field and have demonstrated your expertise in one or more of the RTG’s research areas through publications in top venues. The RTG is part of the Saarland Informatics Campus, one of the leading centers for research in computer science, artificial intelligence, and natural language processing in Europe. The Saarland Informatics Campus brings together 900 researchers and 2500 students from 81 countries. The CISPA Helmholtz Center, located on the same campus, is home to an additional 350 researchers and on track to grow to 800 by 2026. Researchers at SIC and CISPA are part of the ELLIS network and have been awarded more than 35 ERC grants. Each PhD student in the RTG will be jointly supervised by two PhD advisors from the list of Principal Investigators below. Each student will freely define their own research topic; we encourage the choice of topics that cross the traditional boundaries of research fields. Students may be affiliated with Saarland University or with one of the participating institutes. Vera Demberg, Saarland University - Computational Linguistics Jörg Hoffmann, Saarland University - AI Planning Eddy Ilg, Saarland University - Computer Vision, Machine Learning Dietrich Klakow, Saarland University - Natural Language Processing Alexander Koller, Saarland University - Computational Linguistics Bernt Schiele, MPI for Informatics - Computer Vision, Machine Learning Philipp Slusallek, DFKI and Saarland University - Computer Graphics, Artificial Intelligence Christian Theobalt, MPI for Informatics - Visual Computing, Machine Learning Mariya Toneva, MPI for Software Systems - Computational Neuroscience, Machine Learning Isabel Valera, Saarland University - Machine Learning Jilles Vreeken, CISPA - Machine Learning, Causality Joachim Weickert, Saarland University - Mathematical Data Analysis Verena Wolf, DFKI and Saarland University - Modeling and Simulation, Reinforcement Learning Ellie Pavlick, Brown University and Google AI, will join us regularly as a Mercator Fellow. Please send your application by 31 July 2023 to apply(a)neuroexplicit.org. Include the reference number W2350 for the postdoc position and the reference number W2351 for the PhD positions. We aim to conduct job interviews in September, ideally in person in Saarbrücken, Germany. Make sure to check the RTG website for details on the application process and what materials to include in the application: https://www.neuroexplicit.org/jobs For all further information about the RTG, check out our website: https://www.neuroexplicit.org/

1 0

Call for papers - AAC/CFP Corpus 26 - 2025
by Luca Pallanti 13 Jul '23

13 Jul '23

AAC/CFP Corpus 26 - 2025 - https://journals.openedition.org/corpus/ <https://journals.openedition.org/corpus/> Background noise or added value? Managing noise during computer processing of linguistic corpora Elisa Gugliotta, Luca Pallanti, Olivier Kraif, Iris Fabry et Martina Barletta (eds.) -------FRENCH VERSION BELOW----- The increasing influence of NLP-related methodologies on corpus linguistics has compelled researchers to reassess their practices for managing noise and its impact on research results (Fuchs & Habert, 2004; Léon, 2018; Zalmout et al., 2018). Whether working with long-diachronic corpora (e.g., medieval French), dialectal corpora with limited resources (e.g., oral or written texts in dialectal Arabic, cf. Arabizi), or corpora of texts deviating from the norm (e.g., learner corpora), conducting noise analysis becomes an essential step in drawing linguistic conclusions from the available data (Molinelli & Putzu, 2015; Scaglione, 2018; Litosseliti, 2018). This special issue of Corpus builds upon a workshop held in April 2023 (https://je-bruit-corpus.sciencesconf.org/) and offers an opportunity to examine noise management methods in the fields of NLP and corpus linguistics, as well as their impact on the quality of linguistic data (Kraif & Ponton, 2007; Goutte et al., 2012; Zeroual, 2018). The fundamental inquiries in any linguistic study revolve around defining the research object, understanding the nature of the data, and determining ways to preserve its inherent characteristics throughout the various processing steps (such as lemmatisation, normalisation, labelling, etc.) (Sarrica et al., 2016). Hence, selecting appropriate methods for identifying and controlling noise becomes crucial throughout the entire process, from data collection to the archiving phase, and from data preparation to annotation (Egbert & Baker, 2019). The definition of noise itself is diverse and far from self-evident. In the field of NLP alone, this term encompasses a wide range of highly heterogeneous phenomena, including web peritexts - such as hyperlinks, menus and computer codes - as well as code switching and instances of spelling or grammatical errors that punctuate productions (Al Sharou et al., 2021). This special issue aims to delve into the definition of noise, from a linguistic perspective, and the practices employed by researchers to mitigate the biases that can arise from it. These practices are implemented during collection, recording, and annotation of data. The question of noise inevitably emerges at each stage of the empirical process involved in data construction and analysis: 1. Noise during data collection and recording If one accepts the postulate that "linguistic data is a result" (Benveniste, 1966), decoding the noise stemming from data collection and recording becomes crucial. Depending on the research object, various factors may contribute to data alteration, including the researcher's preconceptions or the biases introduced by an OCR system (Jentsch & Porada, 2020). The key challenge lies in predicting or identifying the potential biases induced by these factors during the selection and formatting of data. This enables better control over subsequent research stages and ensures greater accuracy in the analysis process. 2. Data preparation and pre-processing The methods employed to refine raw data and prepare it for advanced manipulation can give rise to a significant source of noise (or, conversely, of silence, if noise elimination filters are applied). This is particularly evident during the data normalization process (Al Sharou et al., 2021). When transcribing data or correcting errors, researchers must make choices that inevitably influence the nature of the data, either by reducing or enriching its content. As a result, it becomes essential to anticipate the consequences of the transformations introduced by data processing methods (Tanguy, 2012). 3. The annotation process and metadata Initially, corpus annotation aims to enrich the data by categorizing units through a labelling process, depending on the developed analysis model (Péry-Woodley et al., 2011). However, while this process has the potential to introduce noise, it can result in detrimental silence (when missing or erroneous labels lead to incomplete results during data analysis or querying). The concept of metadata also raises questions: does categorizing data transform it into something different? Furthermore, does the absence of agreement or low agreement in annotations produced by humans reflect inter-individual variations akin to noise, or does it stem from the inherent vagueness of the categorizations themselves? *** At each and every step of the process, key methodological questions arise: what threshold can be considered acceptable for noise? How can we differentiate between noise and methodological bias? Is it possible to estimate noise without a ground truth? Which statistical tools are specific to corpus studies and enable the definition of confidence intervals? How can we strike a balance to prevent the noise resulting from compromising research outcomes? *** Proposals for articles may address these topics from a general point of view, offering a theoretical and methodological perspective. Alternatively, they can be based on one or more case studies that focus on specific observations, while highlighting the noise management methods employed throughout the study. References Al Sharou, K., Li, Z., & Specia, L. (2021). Towards a Better Understanding of Noise in Natural Language Processing. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 5362. https://aclanthology.org/2021.ranlp-1.7 Benveniste, É. (1966). Problèmes de linguistique générale. Gallimard. Egbert, J., & Baker, P. (Eds.). (2019). Using corpus methods to triangulate linguistic analysis. Routledge. Fuchs, C., & Habert, B. (2004). Le traitement automatique des langues : Des modèles aux ressources. Le Français Moderne - Revue de linguistique Française, CILF (conseil international de la langue française), LXXII: 1, online. Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers. Jentsch, P., & Porada, S. (2020). From Text to Data : Digitization, Text Analysis and Corpus Linguistics. In S. Schwandt (Éd.), Digital Humanities Research (1re éd., Vol. 1, p. 89128). transcript Verlag / Bielefeld University Press. https://doi.org/10.14361/9783839454190-004 Kraif, O., & Ponton, C. (2007). Du bruit, du silence et des ambiguïtés : Que faire du TAL pour l'apprentissage des langues ? TALN 2007, 143152. https://hal.archives-ouvertes.fr/hal-01073706 Léon, J. (2018). Tal et linguistique : Application, expérimentation, instrumentalisation. ELA. Etudes de linguistique appliquee, 2(190), 195203. Litosseliti, L. (Ed.). (2018). Research methods in linguistics. Bloomsbury Publishing. Molinelli, P., & Putzu, I. (2015). Modelli epistemologici, metodologie della ricerca e qualità del dato. Dalla linguistica storica alla sociolinguistica storica. Franco Angeli. Péry-Woodley, M.-P., Afantenos, S. D., Ho-Dac, L.-M., & Asher, N. (2011). La ressource ANNODIS, un corpus enrichi d'annotations discursives. TAL, 52(3), 71101. Sarrica, M., Mingo, I., Mazzara, B., & Leone, G. (2016). The effects of lemmatization on textual analysis conducted with IRaMuTeQ: results in comparison. JADT2016: 13ème Journées Internacionales d'Analyse Statistique de Données Textuelles. Scaglione, F. (2018). "Lavorare"; il dato linguistico: Prospettive e limiti. Alcune considerazioni dall'esperienza dell'Atlante Linguistico della Sicilia (ALS). In G. Sampino (Éd.), Atti del convegno internazionale dei dottorandi (p. 101122). Tanguy, L. (2012). Complexification des données et des techniques en linguistique : contribution du TAL aux solutions et aux problèmes. HDR dissertation, Université de Toulouse 2 - le Mirail. Zalmout, N., Erdmann, A., & Habash, N. (2018). Noise-robust morphological disambiguation for dialectal Arabic. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 953-964). Zeroual, I. (2018). Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments (Doctoral dissertation, University of Maryland, USA). Retro-planning * July 2023: call for publications. * November 2023: pre-selection based on article summaries. * March 2024: article submission deadline. * June 2024: response to the authors. * June-October 2024: review process with authors to submit the final version of the article. * November-December 2024: editing process. * January 2025: publication. Please note that this retro-planning outlines a general timeline and may vary depending on the specific publication requirements. Abstract submission * Your abstract should be no longer than 1,500 words, including bibliographical references. * Please submit your abstracts by November 6, 2023 to elisa.gugliotta(a)ilc.cnr.it and luca.pallanti(a)univ-lyon2.fr. ----- FRENCH VERSION------ Bruit de fond ou valeur ajoutée ? Gérer le bruit lors des traitements informatiques des corpus linguistiques Sous la direction de Elisa Gugliotta, Luca Pallanti, Olivier Kraif, Iris Fabry et Martina Barletta English version below L'influence croissante des méthodologies liées au TAL sur la linguistique de corpus oblige les chercheurs à réinterroger les pratiques de gestion du bruit et son impact dans les résultats de recherche (Fuchs & Habert, 2004 ; Léon, 2018 ; Zalmout et al., 2018). Qu'il s'agisse de corpus en diachronie longue (ex. français médiéval), de corpus dialectaux aux ressources limitées (ex. textes oraux ou écrits en arabe dialectal, cf. arabizi), ou encore de corpus de textes éloignés de la norme (ex. corpus d'apprenants), l'analyse du bruit est une étape nécessaire pour tirer des conclusions linguistiques des données ainsi évaluées (Molinelli & Putzu, 2015 ; Scaglione, 2018 ; Litosseliti, 2018). Ce numéro thématique de la revue Corpus, qui fait suite à une journée d'étude sur le même thème organisée en avril 2023 (https://je-bruit-corpus.sciencesconf.org/), sera l'occasion de réfléchir sur les méthodes de gestion du bruit dans les domaines du TAL et de la linguistique de corpus outillée, et à son impact sur la qualité des données linguistiques (Kraif et Ponton, 2007 ; Goutte et al., 2012 ; Zeroual, 2018). Les questions sous-jacentes à toute étude linguistique concernent la définition de l'objet de recherche, la nature des données elles-mêmes, et la manière de préserver autant que possible leurs caractéristiques dans les différents traitements (lemmatisation, normalisation, étiquetage, etc.) (Sarrica et al., 2016). Ainsi, le choix des méthodes d'identification et de contrôle du bruit, de la phase de collecte à celle d'archivage, de la préparation des données à l'annotation, joue un rôle fondamental (Egbert & Baker, 2019). La définition même du bruit est multiple, et ne va pas de soi : dans le seul champ du TAL, ce terme, souvent peu interrogé, désigne des phénomènes variables et très hétérogènes, allant des péritextes du Web - hyperliens, menus et codes informatiques - au code switching, en passant par les erreurs d'orthographe ou de grammaire qui émaillent les productions (Al Sharou et al., 2021). Ce numéro thématique propose de mener une réflexion sur la définition du bruit, dans une perspective linguistique, et sur les pratiques des chercheurs visant à réduire la portée des biais qui en découlent, que ce soit durant la collecte, l'enregistrement ou l'annotation des données. Dans le concret de la recherche, la question du bruit se pose à chaque étape de la démarche empirique de construction et d'analyse des données : 1. Le bruit pendant la collecte et l'enregistrement des données Si l'on accepte le postulat selon lequel " la donnée linguistique est un résultat " (Benveniste, 1966), comment décoder le bruit causé par le recueil des données et leur enregistrement ? En effet, en fonction des objets de recherche, il existe des facteurs potentiels d'altération des données, comme par exemple les préconceptions du chercheur, ou les biais introduits par un système OCR donné (Jentsch & Porada, 2020). L'enjeu consiste alors à prédire ou à déterminer les biais potentiels induits par ces facteurs lors de la sélection et la mise en forme des données pour mieux contrôler les phases de recherche successives. 2. La préparation et le prétraitement des données. Les méthodes choisies pour affiner les données brutes et les rendre disponibles pour des manipulations avancées peuvent représenter une importante source de bruit (ou, au contraire, de silence si on applique un filtre pour éliminer le bruit) : c'est notamment le cas du processus de normalisation des données (Al Sharou et al., 2021). Qu'il s'agisse de transcrire des données ou de corriger des erreurs, le chercheur fait des choix qui impactent nécessairement la nature des données, soit en les réduisant, soit en les enrichissant. Il s'agit donc d'anticiper les conséquences des transformations produites par les méthodes de traitement des données (Tanguy, 2012). 3. Le processus d'annotation et les métadonnées À la base, l'annotation des corpus est une étape visant l'enrichissement des données : en fonction du modèle d'analyse mis au point, le chercheur tente de catégoriser des unités à travers un processus d'étiquetage (Péry-Woodley et al., 2011). Cependant, si d'un côté ce processus peut générer du bruit, de l'autre, il peut être une cause de silence fort préjudiciable aux résultats des recherches et à leur interprétation (des étiquettes absentes ou erronées pouvant générer des résultats lacunaires lors de l'analyse ou du requêtage des données). La notion de métadonnée peut également être mise en cause : catégoriser une donnée signifie-t-il la transformer en quelque chose d'autre ? Par ailleurs, l'absence d'accord ou un faible accord dans les annotations produites par l'humain manifeste-t-il des variations interindividuelles assimilables à du bruit, ou au caractère trop vague des catégorisations en jeu ? *** A chaque étape se posent des questions méthodologiques centrales : à partir de quel seuil peut-on considérer le bruit comme acceptable ? Comment différencier bruit et biais méthodologique ? Comment estimer le bruit sans vérité de terrain ? Quels outils statistiques spécifiques à l'étude des corpus permettent de délimiter des intervalles de confiance ? Comment atteindre l'équilibre nécessaire pour que le bruit causé par les traitements des données ne compromette pas les résultats des recherches ? *** Les propositions d'article pourront aborder ces questions d'un point de vue général, sous un angle théorique et méthodologique, ou s'appuyer sur une ou plusieurs études de cas portant sur des observations particulières, en prenant soin de mettre en lumière les méthodes de gestion du bruit tout au long de l'étude. Retro-planning * Juillet 2023 : publication du l'Appel * Novembre 2023 : pré-sélection sur résumé * Mars 2024 : remise des articles. Juin 2024 : réponse aux auteurs * Juin-octobre 2024 : navette avec les auteurs pour remise de l'article en forme définitive. * Novembre-décembre 2024 : édition. * Janvier 2025 : publication. Soumission des résumés * Votre résumé comptera 1.500 mots au maximum, références bibliographiques inclues. * Merci de soumettre vos résumés pour le 6 novembre 2023 aux adresses elisa.gugliotta(a)ilc.cnr.it et luca.pallanti(a)univ-lyon2.fr Références Al Sharou, K., Li, Z., & Specia, L. (2021). Towards a Better Understanding of Noise in Natural Language Processing. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 5362. https://aclanthology.org/2021.ranlp-1.7 Benveniste, É. (1966). Problèmes de linguistique générale. Gallimard. Egbert, J., & Baker, P. (Eds.). (2019). Using corpus methods to triangulate linguistic analysis. Routledge. Fuchs, C., & Habert, B. (2004). Le traitement automatique des langues : Des modèles aux ressources. Le Français Moderne - Revue de linguistique Française, CILF (conseil international de la langue française), LXXII: 1, online. Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers. Jentsch, P., & Porada, S. (2020). From Text to Data : Digitization, Text Analysis and Corpus Linguistics. In S. Schwandt (Éd.), Digital Humanities Research (1re éd., Vol. 1, p. 89128). transcript Verlag / Bielefeld University Press. https://doi.org/10.14361/9783839454190-004 Kraif, O., & Ponton, C. (2007). Du bruit, du silence et des ambiguïtés : Que faire du TAL pour l'apprentissage des langues ? TALN 2007, 143152. https://hal.archives-ouvertes.fr/hal-01073706 Léon, J. (2018). Tal et linguistique : Application, expérimentation, instrumentalisation. ELA. Etudes de linguistique appliquee, 2(190), 195203. Litosseliti, L. (Ed.). (2018). Research methods in linguistics. Bloomsbury Publishing. Molinelli, P., & Putzu, I. (2015). Modelli epistemologici, metodologie della ricerca e qualità del dato. Dalla linguistica storica alla sociolinguistica storica. Franco Angeli. Péry-Woodley, M.-P., Afantenos, S. D., Ho-Dac, L.-M., & Asher, N. (2011). La ressource ANNODIS, un corpus enrichi d'annotations discursives. TAL, 52(3), 71101. Sarrica, M., Mingo, I., Mazzara, B., & Leone, G. (2016). The effects of lemmatization on textual analysis conducted with IRaMuTeQ: results in comparison. JADT2016: 13ème Journées Internacionales d'Analyse Statistique de Données Textuelles. Scaglione, F. (2018). "Lavorare"; il dato linguistico: Prospettive e limiti. Alcune considerazioni dall'esperienza dell'Atlante Linguistico della Sicilia (ALS). In G. Sampino (Éd.), Atti del convegno internazionale dei dottorandi (p. 101122). Tanguy, L. (2012). Complexification des données et des techniques en linguistique : contribution du TAL aux solutions et aux problèmes. HDR dissertation, Université de Toulouse 2 - le Mirail. Zalmout, N., Erdmann, A., & Habash, N. (2018). Noise-robust morphological disambiguation for dialectal Arabic. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 953-964). Zeroual, I. (2018). Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments (Doctoral dissertation, University of Maryland, USA).

1 0

MRes in Translation and Interpreting Studies 2023-2024
by Constantin Orasan 13 Jul '23

13 Jul '23

The Centre for Translation Studies (CTS) at University of Surrey invites applications for a place in our MRes in Translation and Interpreting Studies course (academic year 2023-24 entry). Students attending this course get in-depth, systematic research training in translation and interpreting. This unique and innovative course is the first of its kind in the UK and draws on the research areas CTS is well known for: translation and interpreting technologies, translation as intercultural mediation, corpus-based translation, audiovisual translation, machine translation and Natural Language Processing for translation/interpreting. The research we carry out at CTS is in touch with recent technological and social developments, as we maintain a strong focus on the responsible integration of technologies in workflows where multilingual and multimodal mediation is key. As an MRes student, you will take two compulsory taught modules and select two optional modules (60 credits). You will then complete your degree with a dissertation (120 credits), which is longer than in a typical MA dissertation format, thus allowing you to research a topic in greater depth. This year, we invite in particular students interested in pursuing dissertation topics related to: - NLP and generative AI for multilingual communication - accessibility services in museums For further inspiration, take a look at what past students say about the course and their MA projects: https://www.surrey.ac.uk/student-life/what-our-students-say/zeynep-polat-po… And for more details about the programme or how to apply visit: https://www.surrey.ac.uk/postgraduate/translation-and-interpreting-studies-… If you feel that an MRes is not for you, you can check our other postgraduate courses on topics related to translation and interpreting at: https://www.surrey.ac.uk/centre-translation-studies/study/postgraduate-cour… Watch our new video "More than an MA": https://www.youtube.com/watch?v=R2oVf3X2LEg --- Prof Constantin Orăsan Professor of Language and Translation Technologies Centre for Translation Studies University of Surrey https://dinel.org.uk

1 0

[CfP] SEMANTiCS 2023 – Open Call for Workshop Papers
by Anisa Rula & Jennifer D'Souza 13 Jul '23

13 Jul '23

* We apologize if you receive multiple copies of this CFP * * For the online version of this call, visit: https://2023-eu.semantics.cc/page/workshops * SEMANTiCS 2023 (20th-22nd September - Leipzig, Germany) is hosting an enriched collection of three workshops. Please find the relevant workshops still open for accepting your submissions as long and short paper contributions below. # Onto4FAIR: 3rd Workshop on Ontologies for FAIR and FAIR Ontologies Organizers: Cassia Trojahn (Institut de Recherche en Informatique de Toulouse, France), Luiz Olavo Bonino da Silva Santos (University of Twente, Leiden University Medical Centre, the Netherlands), Giancarlo Guizzardi (University of Twente, the Netherlands), Clement Jonquet (French National Research Institute for Agriculture, Food and Environment, Mathematics, Informatics and Statistics for Environment and Agronomy research unit, Montpellier, France) https://onto4fair.github.io/2023-semantics.html # NLP4KGC: 2nd Workshop on Natural Language Processing for Knowledge Graph Construction Organizers: Edlira Vakaj (Birmingham City University, Bermingham, UK), Sanju Tiwari (Universidad Autónoma de Tamaulipas, Tamaulipas, Mexico), Rizou Stamatia (Singular Logic, Athens, Greece), Nandana Mihindukulasooriya (IBM Research, Dublin, Ireland), Fernando Ortiz-Rodríguez (Universidad Autónoma de Tamaulipas, Tamaulipas, Mexico), Ryan Mcgranaghan (NASA Jet Propulsion Laboratory, California, United States) https://sites.google.com/view/2nd-nlp4kgc/home We are looking forward to your contribution! Workshop & Tutorial Chairs

1 0

Faculty positions available at Queen Mary University of London
by Matthew Purver 13 Jul '23

13 Jul '23

The School of Electronic Engineering and Computer Science at Queen Mary University of London is currently advertising up to 15 faculty positions at Lecturer (= Assistant Professor) or Senior Lecturer (= Associate Professor) levels, seeking candidates with experience and research interests in a range of topics generative AI, explainable AI and responsible AI. Please see this link for more information: https://www.qmul.ac.uk/jobs/vacancies/items/8638.html Closing date is 1st August 2023. -- Matthew Purver - http://www.eecs.qmul.ac.uk/~mpurver/ Computational Linguistics Lab - http://compling.eecs.qmul.ac.uk/ Cognitive Science Research Group - http://cogsci.eecs.qmul.ac.uk/ School of Electronic Engineering and Computer Science Queen Mary University of London, London E1 4NS, UK *My working days for QMUL are Monday-Wednesday; responses to mail on other days may be delayed.*

1 0

Extended Deadline, CASE @ RANLP 2023: CFP: Automated Extraction of Socio-political Events from Text - CASE @ RANLP 2023
by ali hürriyetoglu 12 Jul '23

12 Jul '23

Call for workshop papers: the 6th workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text - CASE @ RANLP 2023 ************************************************************************************ URL: https://emw.ku.edu.tr/case-2023/ (new) Paper submission deadline: 24 July 2023 Paper acceptance notification: 5 August 2023 Paper camera-ready: 25 August 2023 Workshop dates: 7-8 September 2023 Softconf page of the workshop: https://softconf.com/ranlp23/CASE/ ************************************************************************************ We invite contributions from researchers in computer science, NLP, ML, DL, AI, socio-political sciences, conflict analysis and forecasting, peace studies, as well as computational social science scholars involved in the collection and utilization of socio-political event data. This includes (but is not limited to) the following topics 1) Extracting events and their arguments such as time and location in and beyond a sentence or document, event coreference resolution. 2) Research in NLP technologies in relation to event detection: geocoding, temporal reasoning, argument structure detection, syntactic and semantic analysis of event structures, text classification, for event type detection, learning event-related lexica, event co-reference resolution, fake news analysis, and others with a focus on real or potential event detection applications. 3) New datasets, training data collection, and annotation for event information. 4) Event-event relations, e.g., subevents, main events, spatio-temporal relations, causal relations. 5) Event dataset evaluation in light of reliability and validity metrics. 6) Defining, populating, and facilitating event schemas and ontologies. 7) Automated tools and pipelines for event collection related tasks. 8) Lexical, syntactic, semantic, discursive, and pragmatic aspects of event manifestation. 9) Methodologies for development, evaluation, and analysis of event datasets. 10) Applications of event databases, e.g. early warning, conflict prediction, and policymaking. 11) Estimating what is missing in event datasets using internal and external information. 12) Detection of new and emerging socio-political event (SPE) types, e.g. creative protests. 13) Release of new event datasets. 14) Bias and fairness of the sources and event datasets. 15) Ethics, misinformation, privacy, and fairness concerns pertaining to event datasets. 16) Copyright issues on event dataset creation, dissemination, and sharing. 17) Cross-lingual, multilingual and multimodal aspects in event analysis. 18) Resources and approaches related to contentious politics around climate change. **** Shared tasks **** Please check the workshop page and Github repositories of the respective task for additional details. Task 1 - Multilingual protest news detection: Contact person: Ali Hürriyetoğlu (ali.hurriyetoglu(a)gmail.com), Github: https://github.com/emerging-welfare/case-2022-multilingual-event Task 2 - Collecting and Geocoding Armed Clash Events in Russian Ukrainian Conflict: Contact person: Hristo Tanev (hristo.tanev(a)ec.europa.eu) and Onur Uca (onuruca(a)mersin.edu.tr), Github: https://github.com/zavavan/case2023_task2 Task 3 - Event causality identification: Contact person: Fiona Anting Tan ( tan.f(a)u.nus.edu) Github: https://github.com/tanfiona/CausalNewsCorpus Task 4 - Multimodal Hate Speech Event Detection: Contact person: Surendrabikram Thapa (surendrabikram(a)vt.edu), Github: https://github.com/therealthapa/case2023_task4 *** Keynotes *** We will continue our tradition of inviting keynote speakers from both social and computational sciences. The social science keynote will be delivered by Erdem Yörük with the title “Using Automated Text Processing to Understand Social Movements and Human Behaviour” and the computational ones will be delivered by Ruslan Mitkov and Kiril Simov. Please see the workshop webpage (https://emw.ku.edu.tr/case-2023/) for additional details.

1 0

FINAL CALL: ML/NLP Competition on Automatic Classification of Literary Epochs (CoLiE)
by Marina Litvak 12 Jul '23

12 Jul '23

** apologies for cross-posting *** ================================================== *FINAL CALL: ML/NLP Competition on Automatic Classification of Literary Epochs (CoLiE)* To advance the field of implicit temporal information retrieval from a text, this competition aims to challenge participants to develop automatic methods to identify the literary epochs of a given text, which is considered here as an implicit temporal context of a book. The task on Automatic Classification of Literary Epochs (CoLiE) aims at automatic identification of the literary epoch of a given text from its writing style: (1) Romanticism (1798-1837), (2) Victorian Literature (1837-1901), (3) Modernism (1900-1945), (4) Postmodernism (1945-2000), and (5) our days (from 2000). The competition is held as a part of the IACT’23 <https://en.sce.ac.il/news/iact23> workshop, held on July 27, 2023, in conjunction with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval This competition is open to anyone with a passion for information retrieval, machine learning, and natural language processing. Whether you are a seasoned expert or a newcomer to the field, we welcome you to participate and extend the boundaries of automated text analysis! Competition site: http://www.kaggle.com/competitions/colie Competition Timeline - May 28, 2023: The competition is open to participants. Training and validation sets together with their labels are available. - July 10, 2023: Test dataset available. - July 17, 2023, 23:59 UTC: Final submission deadline. - July 27, 2023: The winners are announced at the special session at the IACT'23 <https://en.sce.ac.il/news/iact23> workshop. *The organizing team* - Dr. Marina Litvak (marinal(a)ac.sce.ac.il), Software Engineering Department, Shamoon College of Engineering, Beer Sheva, 84100, Israel - Dr. Irina Rabaev (irinar(a)ac.sce.ac.il), Software Engineering Department, Shamoon College of Engineering, Beer Sheva, 84100, Israel - Prof. Ricardo Campos (ricardo.campos(a)ipt.pt), Ci2 - Smart Cities Research Center, Polytechnic Institute of Tomar INESC TEC, Porto Porto, Portugal - Prof. Alípio Mário Jorge (amjorge(a)fc.up.pt) University of Porto Porto, Portugal - Prof. Adam Jatowt (adam.jatowt(a)uibk.ac.at) University of Innsbruck, Innsbruck, Austria - Mr. Vladimir Younkin (vladiyo(a)ac.sce.ac.il), Software Engineering Department, Shamoon College of Engineering, Beer Sheva, 84100, Israel -- Best regards, Marina Litvak

1 0