- Corpora - ELRA lists

Get your PhD @ UT Austin's iSchool (applications due Dec. 1)
by Matt Lease 06 Nov '23

06 Nov '23

The University of Texas at Austin's School of Information (iSchool) is seeking talented and committed students to join our Ph.D. Program in Fall 2024! We seek passionate and driven applicants who are excited to undertake cutting-edge research to benefit the world and its people. December 1st is our deadline to apply for fall 2024 admissions <https://www.ischool.utexas.edu/programs/phd-admissions>. Late applications may be submitted but are not guaranteed equal consideration. For additional information, please refer to our admissions website <https://www.ischool.utexas.edu/programs/phd-admissions> and recorded information session <https://youtu.be/_9OJ-uGo-Cg>. We will host a live information session online on Wednesday, November 15th (12pm-1pm US CDT). Event link: https://utexas.zoom.us/j/94848553117. Our iSchool is a top-ranked PhD program committed to making a difference in the world through high-impact research <https://www.ischool.utexas.edu/research>. Our work seeks to enhance human lives and communities by understanding the impact and potential of information, in all its forms. We work to harness the massive scale and value of information, discover the principles and processes to manage it, and design solutions that are novel, creative, accessible, useful, usable, and sustainable. To increase understanding of the role and impact of information in all human endeavors, we study problems and develop solutions and critique for better information design, management, organization, access, preservation, and retrieval. The study of information can encompass and often extends beyond any existing field. Our world-class, interdisciplinary faculty <https://www.ischool.utexas.edu/people/ischool-faculty-staff-students> spans a wide range of expertise: anthropology, communications, computer science, industrial engineering, library and information science, psychology, science & technology studies, and more. Our school has a strong tradition of fully financially supporting our doctoral students through a combination of Fellowships, Research Assistantships, and Teaching Assistantships. Our support packages are competitive, including tuition, health insurance, and a stipend sufficient to live comfortably in Austin. Beyond our top-ranked international graduate program, UT Austin is one of the world's premier research universities and is located in one of the USA's sunniest and most vibrant cities in which to live and work: - http://www.utexas.edu/about/overview - http://www.utexas.edu/campus-life/life-in-austin - https://www.austintexas.org/things-to-do/ Our university’s motto is not only that "What Starts Here Changes the World" <https://www.utexas.edu/> and we invite our community to "Make it Y/OUR Texas." Everyone who joins the iSchool community helps contribute to and amplifies our shared sense of belonging and purpose. We hope that you will embrace the opportunity and challenge that our PhD program offers and join us in conducting high-impact research that truly changes the world. We look forward to your application! For any questions about admissions to our doctoral program, please contact us at graduateadmit(a)ischool.utexas.edu. -- Matt Lease Professor School of Information University of Texas at Austin Voice: (512) 471-9350 · Fax: (512) 471-3971 · Office: UTA 5.536 http://www.ischool.utexas.edu/~ml

1 0

Call for papers - AAC/CFP Corpus 26 - 2025 - Extended deadline 17 november 2023
by Luca Pallanti 06 Nov '23

06 Nov '23

AAC/CFP Corpus 26 - 2025 - https://journals.openedition.org/corpus/ <https://journals.openedition.org/corpus/> Background noise or added value? Managing noise during computer processing of linguistic corpora Elisa Gugliotta, Luca Pallanti, Olivier Kraif, Iris Fabry et Martina Barletta (eds.) -------FRENCH VERSION BELOW----- The increasing influence of NLP-related methodologies on corpus linguistics has compelled researchers to reassess their practices for managing noise and its impact on research results (Fuchs & Habert, 2004; Léon, 2018; Zalmout et al., 2018). Whether working with long-diachronic corpora (e.g., medieval French), dialectal corpora with limited resources (e.g., oral or written texts in dialectal Arabic, cf. Arabizi), or corpora of texts deviating from the norm (e.g., learner corpora), conducting noise analysis becomes an essential step in drawing linguistic conclusions from the available data (Molinelli & Putzu, 2015; Scaglione, 2018; Litosseliti, 2018). This special issue of Corpus builds upon a workshop held in April 2023 (https://je-bruit-corpus.sciencesconf.org/) and offers an opportunity to examine noise management methods in the fields of NLP and corpus linguistics, as well as their impact on the quality of linguistic data (Kraif & Ponton, 2007; Goutte et al., 2012; Zeroual, 2018). The fundamental inquiries in any linguistic study revolve around defining the research object, understanding the nature of the data, and determining ways to preserve its inherent characteristics throughout the various processing steps (such as lemmatisation, normalisation, labelling, etc.) (Sarrica et al., 2016). Hence, selecting appropriate methods for identifying and controlling noise becomes crucial throughout the entire process, from data collection to the archiving phase, and from data preparation to annotation (Egbert & Baker, 2019). The definition of noise itself is diverse and far from self-evident. In the field of NLP alone, this term encompasses a wide range of highly heterogeneous phenomena, including web peritexts - such as hyperlinks, menus and computer codes - as well as code switching and instances of spelling or grammatical errors that punctuate productions (Al Sharou et al., 2021). This special issue aims to delve into the definition of noise, from a linguistic perspective, and the practices employed by researchers to mitigate the biases that can arise from it. These practices are implemented during collection, recording, and annotation of data. The question of noise inevitably emerges at each stage of the empirical process involved in data construction and analysis: 1. Noise during data collection and recording If one accepts the postulate that "linguistic data is a result" (Benveniste, 1966), decoding the noise stemming from data collection and recording becomes crucial. Depending on the research object, various factors may contribute to data alteration, including the researcher's preconceptions or the biases introduced by an OCR system (Jentsch & Porada, 2020). The key challenge lies in predicting or identifying the potential biases induced by these factors during the selection and formatting of data. This enables better control over subsequent research stages and ensures greater accuracy in the analysis process. 2. Data preparation and pre-processing The methods employed to refine raw data and prepare it for advanced manipulation can give rise to a significant source of noise (or, conversely, of silence, if noise elimination filters are applied). This is particularly evident during the data normalization process (Al Sharou et al., 2021). When transcribing data or correcting errors, researchers must make choices that inevitably influence the nature of the data, either by reducing or enriching its content. As a result, it becomes essential to anticipate the consequences of the transformations introduced by data processing methods (Tanguy, 2012). 3. The annotation process and metadata Initially, corpus annotation aims to enrich the data by categorizing units through a labelling process, depending on the developed analysis model (Péry-Woodley et al., 2011). However, while this process has the potential to introduce noise, it can result in detrimental silence (when missing or erroneous labels lead to incomplete results during data analysis or querying). The concept of metadata also raises questions: does categorizing data transform it into something different? Furthermore, does the absence of agreement or low agreement in annotations produced by humans reflect inter-individual variations akin to noise, or does it stem from the inherent vagueness of the categorizations themselves? *** At each and every step of the process, key methodological questions arise: what threshold can be considered acceptable for noise? How can we differentiate between noise and methodological bias? Is it possible to estimate noise without a ground truth? Which statistical tools are specific to corpus studies and enable the definition of confidence intervals? How can we strike a balance to prevent the noise resulting from compromising research outcomes? *** Proposals for articles may address these topics from a general point of view, offering a theoretical and methodological perspective. Alternatively, they can be based on one or more case studies that focus on specific observations, while highlighting the noise management methods employed throughout the study. References Al Sharou, K., Li, Z., & Specia, L. (2021). Towards a Better Understanding of Noise in Natural Language Processing. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 5362. https://aclanthology.org/2021.ranlp-1.7 Benveniste, É. (1966). Problèmes de linguistique générale. Gallimard. Egbert, J., & Baker, P. (Eds.). (2019). Using corpus methods to triangulate linguistic analysis. Routledge. Fuchs, C., & Habert, B. (2004). Le traitement automatique des langues : Des modèles aux ressources. Le Français Moderne - Revue de linguistique Française, CILF (conseil international de la langue française), LXXII: 1, online. Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers. Jentsch, P., & Porada, S. (2020). From Text to Data : Digitization, Text Analysis and Corpus Linguistics. In S. Schwandt (Éd.), Digital Humanities Research (1re éd., Vol. 1, p. 89128). transcript Verlag / Bielefeld University Press. https://doi.org/10.14361/9783839454190-004 Kraif, O., & Ponton, C. (2007). Du bruit, du silence et des ambiguïtés : Que faire du TAL pour l'apprentissage des langues ? TALN 2007, 143152. https://hal.archives-ouvertes.fr/hal-01073706 Léon, J. (2018). Tal et linguistique : Application, expérimentation, instrumentalisation. ELA. Etudes de linguistique appliquee, 2(190), 195203. Litosseliti, L. (Ed.). (2018). Research methods in linguistics. Bloomsbury Publishing. Molinelli, P., & Putzu, I. (2015). Modelli epistemologici, metodologie della ricerca e qualità del dato. Dalla linguistica storica alla sociolinguistica storica. Franco Angeli. Péry-Woodley, M.-P., Afantenos, S. D., Ho-Dac, L.-M., & Asher, N. (2011). La ressource ANNODIS, un corpus enrichi d'annotations discursives. TAL, 52(3), 71101. Sarrica, M., Mingo, I., Mazzara, B., & Leone, G. (2016). The effects of lemmatization on textual analysis conducted with IRaMuTeQ: results in comparison. JADT2016: 13ème Journées Internacionales d'Analyse Statistique de Données Textuelles. Scaglione, F. (2018). "Lavorare"; il dato linguistico: Prospettive e limiti. Alcune considerazioni dall'esperienza dell'Atlante Linguistico della Sicilia (ALS). In G. Sampino (Éd.), Atti del convegno internazionale dei dottorandi (p. 101122). Tanguy, L. (2012). Complexification des données et des techniques en linguistique : contribution du TAL aux solutions et aux problèmes. HDR dissertation, Université de Toulouse 2 - le Mirail. Zalmout, N., Erdmann, A., & Habash, N. (2018). Noise-robust morphological disambiguation for dialectal Arabic. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 953-964). Zeroual, I. (2018). Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments (Doctoral dissertation, University of Maryland, USA). Retro-planning * July 2023: call for publications. * 17 November: pre-selection based on article summaries. * March 2024: article submission deadline. * June 2024: response to the authors. * June-October 2024: review process with authors to submit the final version of the article. * November-December 2024: editing process. * January 2025: publication. Please note that this retro-planning outlines a general timeline and may vary depending on the specific publication requirements. Abstract submission * Your abstract should be no longer than 1,500 words, including bibliographical references. * Please submit your abstracts by November 10, 2023 to elisa.gugliotta(a)ilc.cnr.it and luca.pallanti(a)univ-lyon2.fr. ----- FRENCH VERSION------ Bruit de fond ou valeur ajoutée ? Gérer le bruit lors des traitements informatiques des corpus linguistiques Sous la direction de Elisa Gugliotta, Luca Pallanti, Olivier Kraif, Iris Fabry et Martina Barletta L'influence croissante des méthodologies liées au TAL sur la linguistique de corpus oblige les chercheurs à réinterroger les pratiques de gestion du bruit et son impact dans les résultats de recherche (Fuchs & Habert, 2004 ; Léon, 2018 ; Zalmout et al., 2018). Qu'il s'agisse de corpus en diachronie longue (ex. français médiéval), de corpus dialectaux aux ressources limitées (ex. textes oraux ou écrits en arabe dialectal, cf. arabizi), ou encore de corpus de textes éloignés de la norme (ex. corpus d'apprenants), l'analyse du bruit est une étape nécessaire pour tirer des conclusions linguistiques des données ainsi évaluées (Molinelli & Putzu, 2015 ; Scaglione, 2018 ; Litosseliti, 2018). Ce numéro thématique de la revue Corpus, qui fait suite à une journée d'étude sur le même thème organisée en avril 2023 (https://je-bruit-corpus.sciencesconf.org/), sera l'occasion de réfléchir sur les méthodes de gestion du bruit dans les domaines du TAL et de la linguistique de corpus outillée, et à son impact sur la qualité des données linguistiques (Kraif et Ponton, 2007 ; Goutte et al., 2012 ; Zeroual, 2018). Les questions sous-jacentes à toute étude linguistique concernent la définition de l'objet de recherche, la nature des données elles-mêmes, et la manière de préserver autant que possible leurs caractéristiques dans les différents traitements (lemmatisation, normalisation, étiquetage, etc.) (Sarrica et al., 2016). Ainsi, le choix des méthodes d'identification et de contrôle du bruit, de la phase de collecte à celle d'archivage, de la préparation des données à l'annotation, joue un rôle fondamental (Egbert & Baker, 2019). La définition même du bruit est multiple, et ne va pas de soi : dans le seul champ du TAL, ce terme, souvent peu interrogé, désigne des phénomènes variables et très hétérogènes, allant des péritextes du Web - hyperliens, menus et codes informatiques - au code switching, en passant par les erreurs d'orthographe ou de grammaire qui émaillent les productions (Al Sharou et al., 2021). Ce numéro thématique propose de mener une réflexion sur la définition du bruit, dans une perspective linguistique, et sur les pratiques des chercheurs visant à réduire la portée des biais qui en découlent, que ce soit durant la collecte, l'enregistrement ou l'annotation des données. Dans le concret de la recherche, la question du bruit se pose à chaque étape de la démarche empirique de construction et d'analyse des données : 1. Le bruit pendant la collecte et l'enregistrement des données Si l'on accepte le postulat selon lequel " la donnée linguistique est un résultat " (Benveniste, 1966), comment décoder le bruit causé par le recueil des données et leur enregistrement ? En effet, en fonction des objets de recherche, il existe des facteurs potentiels d'altération des données, comme par exemple les préconceptions du chercheur, ou les biais introduits par un système OCR donné (Jentsch & Porada, 2020). L'enjeu consiste alors à prédire ou à déterminer les biais potentiels induits par ces facteurs lors de la sélection et la mise en forme des données pour mieux contrôler les phases de recherche successives. 2. La préparation et le prétraitement des données. Les méthodes choisies pour affiner les données brutes et les rendre disponibles pour des manipulations avancées peuvent représenter une importante source de bruit (ou, au contraire, de silence si on applique un filtre pour éliminer le bruit) : c'est notamment le cas du processus de normalisation des données (Al Sharou et al., 2021). Qu'il s'agisse de transcrire des données ou de corriger des erreurs, le chercheur fait des choix qui impactent nécessairement la nature des données, soit en les réduisant, soit en les enrichissant. Il s'agit donc d'anticiper les conséquences des transformations produites par les méthodes de traitement des données (Tanguy, 2012). 3. Le processus d'annotation et les métadonnées À la base, l'annotation des corpus est une étape visant l'enrichissement des données : en fonction du modèle d'analyse mis au point, le chercheur tente de catégoriser des unités à travers un processus d'étiquetage (Péry-Woodley et al., 2011). Cependant, si d'un côté ce processus peut générer du bruit, de l'autre, il peut être une cause de silence fort préjudiciable aux résultats des recherches et à leur interprétation (des étiquettes absentes ou erronées pouvant générer des résultats lacunaires lors de l'analyse ou du requêtage des données). La notion de métadonnée peut également être mise en cause: catégoriser une donnée signifie-t-il la transformer en quelque chose d'autre ? Par ailleurs, l'absence d'accord ou un faible accord dans les annotations produites par l'humain manifeste-t-il des variations interindividuelles assimilables à du bruit, ou au caractère trop vague des catégorisations en jeu ? *** A chaque étape se posent des questions méthodologiques centrales : à partir de quel seuil peut-on considérer le bruit comme acceptable ? Comment différencier bruit et biais méthodologique ? Comment estimer le bruit sans vérité de terrain ? Quels outils statistiques spécifiques à l'étude des corpus permettent de délimiter des intervalles de confiance ? Comment atteindre l'équilibre nécessaire pour que le bruit causé par les traitements des données ne compromette pas les résultats des recherches ? *** Les propositions d'article pourront aborder ces questions d'un point de vue général, sous un angle théorique et méthodologique, ou s'appuyer sur une ou plusieurs études de cas portant sur des observations particulières, en prenant soin de mettre en lumière les méthodes de gestion du bruit tout au long de l'étude. Retro-planning * Juillet 2023 : publication du l'Appel * 17 novembre 2023 : pré-sélection sur résumé * Mars 2024 : remise des articles. Juin 2024 : réponse aux auteurs * Juin-octobre 2024 : navette avec les auteurs pour remise de l'article en forme définitive. * Novembre-décembre 2024 : édition. * Janvier 2025 : publication. Soumission des résumés * Votre résumé comptera 1.500 mots au maximum, références bibliographiques inclues. * Merci de soumettre vos résumés pour le 10 novembre 2023 aux adresses elisa.gugliotta(a)ilc.cnr.it et luca.pallanti(a)univ-lyon2.fr Références Al Sharou, K., Li, Z., & Specia, L. (2021). Towards a Better Understanding of Noise in Natural Language Processing. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 5362. https://aclanthology.org/2021.ranlp-1.7 Benveniste, É. (1966). Problèmes de linguistique générale. Gallimard. Egbert, J., & Baker, P. (Eds.). (2019). Using corpus methods to triangulate linguistic analysis. Routledge. Fuchs, C., & Habert, B. (2004). Le traitement automatique des langues : Des modèles aux ressources. Le Français Moderne - Revue de linguistique Française, CILF (conseil international de la langue française), LXXII: 1, online. Goutte, C., Carpuat, M., & Foster, G. (2012). The impact of sentence alignment errors on phrase-based machine translation performance. In Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers. Jentsch, P., & Porada, S. (2020). From Text to Data : Digitization, Text Analysis and Corpus Linguistics. In S. Schwandt (Éd.), Digital Humanities Research (1re éd., Vol. 1, p. 89128). transcript Verlag / Bielefeld University Press. https://doi.org/10.14361/9783839454190-004 Kraif, O., & Ponton, C. (2007). Du bruit, du silence et des ambiguïtés : Que faire du TAL pour l'apprentissage des langues ? TALN 2007, 143152. https://hal.archives-ouvertes.fr/hal-01073706 Léon, J. (2018). Tal et linguistique : Application, expérimentation, instrumentalisation. ELA. Etudes de linguistique appliquee, 2(190), 195203. Litosseliti, L. (Ed.). (2018). Research methods in linguistics. Bloomsbury Publishing. Molinelli, P., & Putzu, I. (2015). Modelli epistemologici, metodologie della ricerca e qualità del dato. Dalla linguistica storica alla sociolinguistica storica. Franco Angeli. Péry-Woodley, M.-P., Afantenos, S. D., Ho-Dac, L.-M., & Asher, N. (2011). La ressource ANNODIS, un corpus enrichi d'annotations discursives. TAL, 52(3), 71101. Sarrica, M., Mingo, I., Mazzara, B., & Leone, G. (2016). The effects of lemmatization on textual analysis conducted with IRaMuTeQ: results in comparison. JADT2016: 13ème Journées Internacionales d'Analyse Statistique de Données Textuelles. Scaglione, F. (2018). "Lavorare"; il dato linguistico: Prospettive e limiti. Alcune considerazioni dall'esperienza dell'Atlante Linguistico della Sicilia (ALS). In G. Sampino (Éd.), Atti del convegno internazionale dei dottorandi (p. 101122). Tanguy, L. (2012). Complexification des données et des techniques en linguistique : contribution du TAL aux solutions et aux problèmes. HDR dissertation, Université de Toulouse 2 - le Mirail. Zalmout, N., Erdmann, A., & Habash, N. (2018). Noise-robust morphological disambiguation for dialectal Arabic. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 953-964). Zeroual, I. (2018). Building Arabic Corpora: Concepts, Methodologies, Tools, and Experiments (Doctoral dissertation, University of Maryland, USA).

1 0

Call for papers for the issue 72 of the journal Procesamiento del Lenguaje Natural
by aitziber.atucha＠ehu.eus 06 Nov '23

06 Nov '23

[Spanish version below] Please consider contributing and/or forwarding to appropriate colleagues and groups. *******We apologize for the multiple copies of this e-mail****** Call for papers for the issue 72 of the journal Procesamiento del Lenguaje Natural http://www.sepln.org/en/journal http://www.sepln.org/en/journal/author-guidelines Introduction The aim of the journal Procesamiento del Lenguaje Natural is to provide a forum for the publication of scientific-technical articles in the field of Natural Language Processing (NLP), for both the national and international scientific community. The articles must be unpublished and cannot be simultaneously submitted for publication in other journals or conference proceedings. The journal also aims to promote the development of areas related to NLP, disseminate research carried out, identify future guidelines for basic research, and present software applications in this field. Every year the Sociedad Española de Procesamiento del Lenguaje Natural (SEPLN) (Spanish Society for the Natural Language Processing) publishes two issues of the journal, including original articles, presentations of R&D projects, book reviews and summaries of PhD theses. The scientific quality of the Journal is supported by the 2022 JCR index (JCI: 0.31, Q4-Linguistics - ESCI), the SCImago Journal Ranking (2022 SJR: 0.421, Q3-Computer Science Applications, Q1-Linguistics and Language), the Scopus Index (2022 CiteScore: 2.9, Q3-Computer Science Applications, Q1-Linguistics and Language) and the index SNIP (Source Normalized Impact per Paper) with 0.93 points. More information at: http://www.sepln.org/en/journal/quality. Topics Linguistic, mathematical and psycholinguistic models of language Machine learning in NLP Computational lexicography and terminology Corpus linguistics Development of linguistic resources and tools Grammars and formalisms for morphological and syntactic analysis Semantics, pragmatics and discourse Word sense disambiguation Monolingual and multilingual text generation Machine translation Knowledge and common sense Multimodality Speech synthesis and recognition Dialogue systems and interactive systems/ Conversational assistants Audio indexing and retrieval Monolingual and multilingual information extraction and retrieval Question answering systems Evaluation of NLP systems Automatic textual content analysis Sentiment analysis, opinion mining and argument mining Plagiarism detection Negation and speculation processing Text mining in blogosphere and social networks Text summarization Text simplification Image retrieval NLP in biomedical domain NLP-based generation of teaching resources NLP for languages with limited resources NLP industrial applications Low-resource NLP tasks, data augmentation Submission Information The proposal must be submitted by December 1st, 2023 and must meet certain format and style requirements. All submissions must be in PDF format and submitted electronically using the Myreview system available at: http://myreview.sepln.org/myreview-sepln72 Submitted papers will be subjected to a blind review by at least three members of the program committee. Categories of papers Regular papers with original contributions. Summary of PhD thesis. Information for Authors The proposals can be written in Spanish or English and should be at most 10 A4-size pages of content, plus unlimited pages for references, and 4 pages maximum for summaries of PhD theses. The papers must include the following sections: The title of the communication (in English and Spanish). An abstract in English and Spanish (maximum 150 words). A list of keywords or related topics (in English and Spanish). The documents must not include headers or footers. As reviewing will be blind, the paper should not include the authors’ names and affiliation. Furthermore, self-references that reveal the author’s identity should be avoided. The articles should only include the title, the abstract, the keywords and the proposal. We recommend using the LaTeX and Word templates that can be downloaded from the SEPLN web (author guidelines have been updated): http://www.sepln.org/index.php/en/journal/author-guidelines Note on camera ready The final version of the paper (camera ready) should be submitted together with a cover letter explaining how the suggestions of the reviewers were implemented in the final version. This cover letter will be considered in order to accept or finally reject the selected paper. Preprint policy The Journal allows the publication of preprints (non-refereed paper posted online, such as ArXiv) anytime, but during the review period the preprint must indicate that the paper it is “under review” in the Journal Procesamiento del Lenguaje Natural. Likewise, if the paper is accepted, the preprint must be updated with the DOI, name of the Journal and the bibliographic information of the paper. Important dates Submission deadline: 1 December 2023 Notification of acceptance: 5 February 2024 Camera ready: 12 February 2024 Publication: March 2024 Contact person: Aitziber Atutxa (aitziber.atucha(a)ehu.eus) Editorial Committee of the Procesamiento del Lenguaje Natural -------------------------------------------------------------------------------------------------------------------- ***********Disculpen si reciben varias copias de este mensaje ************ Por favor, si lo considera oportuno, distribuya este llamamiento entre sus colegas. Petición de artículos para la revista Procesamiento del Lenguaje Natural nº 72. http://www.sepln.org/la-revista http://www.sepln.org/la-revista/informacion-para-autores Objetivos de la revista La revista Procesamiento del Lenguaje Natural es un foro de publicación de artículos científico-técnicos en el ámbito del Procesamiento del Lenguaje Natural (PLN), tanto para la comunidad científica nacional como internacional. Los artículos tienen que ser inéditos y no haber sido postulados para ser publicados simultáneamente en otras revistas o actas de congresos. La revista quiere potenciar el desarrollo de las diferentes áreas relacionadas con el PLN, mejorar la divulgación de las investigaciones que se llevan a cabo, identificar las futuras directrices de la investigación básica y mostrar las posibilidades reales de aplicación en este campo. Anualmente la SEPLN (Sociedad Española para el Procesamiento del Lenguaje Natural) publica dos números de la revista, que incluyen artículos originales, presentaciones de proyectos, reseñas bibliográficas y resúmenes de tesis doctorales. La calidad científica de la Revista está respaldada por el índice del JCR 2022 (JCI: 0,31, Q4-Linguistics - ESCI), el índice SCImago Journal Ranking (SJR: 0,421, Q3-Computer Science Applications, Q1-Linguistics and Language), el índice de Scopus (CiteScore: 2,9, Q3-Computer Science Applications, Q1-Linguistics and Language) y el índice SNIP (Source Normalized Impact per Paper) con 0,93 puntos. Más información en http://www.sepln.org/la-revista/calidad. Áreas temáticas Modelos de lenguaje matemáticos y psicolingüísticos Aprendizaje automático en PLN Lexicografía y terminología computacional Lingüística de corpus Desarrollo de recursos y herramientas lingüísticas Gramáticas y formalismos para análisis morfológico y sintáctico Semántica, pragmática y discurso Resolución de ambigüedad léxico-semántica Generación de texto monolingüe y multilingüe Traducción automática Multimodalidad Reconocimiento y síntesis de habla Sistemas de diálogo/ asistentes conversacionales Auto-indexación Recuperación y extracción de información monolingüe y multilingüe Sistemas de búsqueda de respuestas Evaluación de sistemas de PLN Análisis automático de contenido textual Análisis de sentimiento y minería de opiniones Detección de plagio Procesamiento de la negación y la especulación Minería de texto en la blogosfera y las redes sociales Resumen automático de texto Simplificación de texto Recuperación de imágenes Conocimiento y sentido común PLN en el ámbito biomédico Generación de recursos didácticos basada en PLN PLN para lenguas con recursos limitados Aplicaciones industriales del PLN Tratamiento del Lenguaje Hablado Envío de trabajos Las propuestas de trabajos (artículos y resúmenes de tesis) podrán ser enviadas hasta la fecha límite del 1 de diciembre de 2023. El envío y la revisión de las propuestas se realizarán exclusivamente en formato PDF y se gestionarán a través del sistema Myreview: http://myreview.sepln.org/myreview-sepln72. La evaluación de los trabajos pasará por un proceso de revisión ciego realizado como mínimo por tres miembros del consejo asesor de la SEPLN. Tipos de trabajos Artículos sobre contribuciones originales. Reseñas de tesis doctorales. Instrucciones para los Autores Los trabajos pueden estar escritos en español o en inglés y su longitud máxima será de 10 páginas de contenido más un número ilimitado de páginas de referencias para los artículos científicos, y de un máximo de 4 páginas para los resúmenes de tesis. Las propuestas deben contener los siguientes apartados: El título del artículo (en español e inglés). Un resumen en español y un abstract en inglés de un máximo de 150 palabras. Un listado de temas relacionados o palabras clave (en español e inglés). Los documentos no podrán incluir cabeceras ni pies de página. Como la fase de revisión de los trabajos es ciega, en los artículos que se envíen no se debe incluir ninguna referencia a los autores ni referencias propias que revelen la identidad de los mismos. Todas las contribuciones deben contener únicamente el título, el resumen, las palabras claves y la propuesta. En el caso de los resúmenes de tesis, el anonimato no es necesario. Los trabajos deben seguir el formato de las revistas de la SEPLN disponible en la siguiente dirección: http://www.sepln.org/la-revista/informacion-para-autores Las guías se han actualizado, por favor, utilicen las que están disponibles en la página web de la revista. Nota sobre la versión final La versión final del trabajo (camera ready) debe enviarse con un documento en el que se explique cómo se han implementado las sugerencias de los revisores. Dicho documento se tendrá en cuenta para aceptar o rechazar el trabajo en cuestión. Política de prepublicación La revista permite publicar una versión no revisada de los artículos en plataformas de prepublicación (plataformas de artículos no evaluados como ArXiv). Sin embargo, durante el periodo de revisión se debe indicar que el artículo está “en revisión” en la revista Procesamiento del Lenguaje Natural. Si el artículo es aceptado, se debe actualizar la publicación en la plataforma de prepublicación con el DOI, nombre de la revista y la información bibliográfica del artículo. Fechas importantes Envío de trabajos: 1 de diciembre de 2023 Notificación de aceptación o rechazo: 5 de febrero de 2024 Versión final: 12 de febrero de 2024 Publicación: Marzo de 2024 Persona de contacto:Aitziber Atutxa (aitziber.atucha(a)ehu.eus) Consejo de redacción de la revista Procesamiento del Lenguaje Natural.

1 0

Looking for semantic resources in Italian, German or Spanish
by fanny.ducel＠lisn.upsaclay.fr 06 Nov '23

06 Nov '23

Dear all, For an experiment, I need some resources that include semantic annotations for inflectional languages (especially Italian, German, and Spanish). More precisely, I would need a list of common nouns that refer to human entities in these languages (i.e. "man", "woman", "teenager", "uncle", "baker", "liar", ...). If the annotations also include information on gender, it would be even better. For instance, for French, DELA (UNITEX dictionaries), Démonette, or FrSemCor are appropriate, but their counterparts in other languages (especially for DELA) do not include the semantic annotations I am looking for. To give you a more concrete idea, in the mentioned French resources, we can find lexical entities followed by annotations such as ":Person", "+Hum", "+Profession" (occupation) or "@AGM/@AGF" (masculine/feminine agent). Do you have any ideas or suggestions? So far, the relevant resources I found are either under a prohibitive license or the links are not working anymore. I am looking for resources that are free for non-commercial, academic use. Thanks a lot and have a nice week, Fanny Ducel - PhD Student at LISN, Université Paris-Saclay (France) - fanny.ducel(a)lisn.fr

5 4

Tenure-track position in Sociolinguistics @Universitat Pompeu Fabra, Barcelona
by Gemma Boleda 06 Nov '23

06 Nov '23

** note the tight deadline for applications: Nov 23, 2023* at 11:59 PM Eastern Time * *Tenure-track position in Sociolinguistics* Pompeu Fabra University: Department of Translation and Language Sciences *Position description* We are seeking to fill a faculty position in the area of Sociolinguistics. This is a regular faculty appointment. Duties include research and knowledge transfer, teaching, and administrative service. *Research profile: *We are seeking to hire a researcher with a PhD in Linguistics or allied field who will be establishing, or have established, a track record of high-quality research in sociolinguistics and who is committed to excellence in teaching and supervision. We particularly welcome applications from scholars whose research and teaching interests include experimental or computational methods for sociolinguistics which will establish new connections with other research areas in the department including computational linguistics, corpus studies, language variation, multi/bilingualism, language contact, language attitudes, or discourse analysis. Ideally, the candidate will be familiar with and/or work on the specific sociolinguistic situation of Catalonia or related/similar contexts. The candidate is expected to have skills for transferring research outcomes to the academic community as well as society at large. While pursuing their own research agenda, the candidate is expected to strengthen the Department’s research priorities and to be able to engage in interdisciplinary collaboration with research groups in the Department and, more broadly, the Catalan research ecosystem. They should have a clear vision on how to apply for funding, with an ambitious but realistic plan for the next five years. *Teaching responsibilities:*The appointee will teach between 120 and 180 classroom hours annually in undergraduate and graduate courses, including final project supervision (undergraduate, master’s and PhD dissertations). Specifically, they will be expected to coordinate teams of co-teachers as well as teach in the module of Language and Society in the BA degrees offered by the School of Translation and Language Sciences and in the master’s degrees of the Department (Master’s in Discourse Studies and Master’s in Theoretical and Applied Linguistics). *Department offer:*Annual gross salary is ca. 37,200 €. The standard duration of the contract is 5 years; during the 4th year the Tenure-track lecturer will undergo performance evaluation in all areas, including their commitment to the Department’s goals and needs. If performance evaluation is successful, the University will issue a call for a permanent, tenured position. Tenure-track faculty are eligible for the reimbursement of relocation expenses. *More information and application*: https://apply.interfolio.com/135520

1 0

Call for abstracts, UniDive 2nd general meeting, University of Naples L’Orientale, Italy, 7-9 February 2024
by jmonti＠unior.it 04 Nov '23

04 Nov '23

UniDive is a COST action, i.e. a scientific network, dedicated to universality, diversity and idiosyncrasy in language technology. It is structured around 4 Working Groups: WG1: Corpus annotation WG2: Lexicon-corpus interface WG3: Multilingual and cross-lingual language technology WG4: Quantifying and promoting diversity The second general meeting of the action will take place on February 7-9, 2024 at the University of Naples L’Orientale in Italy. We invite UniDive WG members to submit abstract proposals related to the scientific program of the WGs. Proposals may describe diverse types of contributions, according to 3 different tracks: Planned work Work in progress Complete work, also previously published A proposal should be anonymous, written in English and submitted in pdf only. It should include (on the title page) the list of the relevant WGs. It should not exceed 2 pages, including figures and tables (bibliographic references may go beyond the 2-page limit). If linguistic examples from languages other than English are included, those should be glossed and translated into English, and an extra half page is allowed for this purpose. For the sake of uniformity and easing the reviewers’ effort, we encourage authors to use the following Overleaf Latex template: https://www.overleaf.com/read/yqbpxcbjmjjw Other formats (not necessarily Latex-based) can also be used, provided that they conform to the following specifications: A4 paper, 11pt font, 1in margins. The submission link will be announced soon. The reviewing process is double-blind. The selection of proposals will be done by UniDive Program Committee according to the following criteria: - relevance to UniDive and the work program of its Working Groups (see pp. 18-20 of the Memorandum of Understanding), - clarity - diversity of the languages covered by the workshop program The selected proposals will be presented at the 2nd UniDive general meeting as posters and/or oral presentations. At least one author per selected proposal will be reimbursed for their travel and stay. Important dates - 26 October 2023: Call for abstracts - 24 November 2023: Submission deadline - 15 December 2023: Notification of acceptance - 20 December 2023: Communication of the names of the presenters - 12 January 2024: Final versions of abstracts - 7-9 February 2024: UniDive 2nd general meeting The time zone for all deadlines is anywhere on Earth (UTC-12). Due to the tight schedule, no extension of the submission deadline is foreseen. Program Chairs Victoria Bobicev, Technical University of Moldova (Moldova) Johanna Monti, University of Naples L’Orientale (Italy) Ranka Stanković, University of Belgrade (Serbia)

1 0

Berkeley ML/AI Safety Research Program
by rocket＠serimats.org 03 Nov '23

03 Nov '23

The MATS Winter 2023-24 Cohort has launched! Apply by Nov 17 (or Nov 10 for Neel Nanda). The ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program. MATS provides talented scholars with talks, workshops, and research mentorship in the field of AI safety, and connects them with the Berkeley AI safety research community. The program takes place in Berkeley, CA from January 8 to March 17. If you’d like to sign up for program updates and application deadline reminders, **fill out our interest form (see below)** (~1 min)! If you know anyone you’d be excited to see apply to MATS, you can fill out our recommendation form (see below) (~1 minute). We’ll reach out to encourage them to apply! Read more about the winter program here: https://www.lesswrong.com/posts/tqyg3DpoiE4DKyi4y/apply-for-mats-winter-202…. Website: https://matsprogram.org Mentors: https://www.matsprogram.org/mentors Interest form: https://airtable.com/appxum3Sqh7TdDvdg/shrwtDZXeSfzeh5GT Recommendation form: https://airtable.com/appxum3Sqh7TdDvdg/shrRtJW4Ux8oTY28C

1 0

Two Open NLP Positions at University of Würzburg (CAIDAS)
by janna.omeliyanenko＠gmail.com 03 Nov '23

03 Nov '23

The Data Science Chair at JMU Würzburg as a member of the Center for AI and Data Science (CAIDAS) offers two positions for doctoral researchers (m/w/d) in the area of machine learning and natural language processing. The first position will work within the BigData@Geo2 project, the followup of the successful BigData@Geo project [1], that provides machine-learning-aided decision support for agricultural measures in the light of regional climate change. This includes prediction of crop yields and enabling proactive agricultural strategies. The second position is part of the LitBERT project, which focuses on developing machine learning solutions to support and improve the modelling and analysis of characters in literary texts. The BigData@Geo2 position will allow you to work on data from many small companies in the form of historical yearbooks, as well as general information from local newspapers or social media discussing local climate events. Using this data, you will develop new methods for discovering climate, ecosystem and agriculturally relevant events that assist in the overarching goal of BigData@Geo2 of assessing the economic viability of agricultural decisions, such as which crops to grow in future seasons, or predicting crop yield. In the LitBERT position, you will work with a large collection of German literary texts such as narratives and novels to develop improved language models that provide a foundation for comprehensive analysis of literary characters. This includes automatic extraction of character traits, categorizing their types, and analysis of the complex evolution of relationships between characters in literary texts. The work focuses on using and improving state-of-the-art language models to effectively address the unique challenges posed by literary texts. Payment is at the level of E13 according to the German federal wage agreement scheme (TV-L). Candidates are expected to have a strong background in computer science and mathematics, with a specialization in machine learning and interest in the topic of one of the positions. Prior knowledge in the field of deep learning in one of the subject areas is advantageous. Please send your application (letter of motivation, curriculum vitae, academic records) at your earliest convenience, but no later than November 30th, 2023, to Prof. Dr. Andreas Hotho (dmir-jobs(a)uni-wuerzburg.de). You are welcome to contact us on the same address for additional details. [1] https://bigdata-at-geo.eu/

1 0

21st International Conference on Software and Systems Reuse (ICSR 2024): Second Call for Papers
by Announce 03 Nov '23

03 Nov '23

*** Second Call for Papers *** 21st International Conference on Software and Systems Reuse (ICSR 2024) June 10-12, 2024, 5* St. Raphael Resort and Marina, Limassol, Cyprus https://cyprusconferences.org/icsr2024/ (*** Submission Deadline: 12th February, 2024 AoE ***) The International Conference on Software and Systems Reuse (ICSR) is a biannual conference in the field of software reuse research and technology. ICSR is a premier event aiming to present the most recent advances and breakthroughs in the area of software reuse and to promote an intensive and continuous exchange among researchers and practitioners. The guiding theme of this edition is Sustainable Software Reuse. We invite submissions on new and innovative research results and industrial experience reports dealing with all aspects of software reuse within the context of the modern software development landscape. Topics include but are not limited to the following. 1 Technical aspects of reuse, including • Reuse in/for Quality Assurance (QA) techniques, testing, verification, etc. • Domain ontologies and Model-Driven Development • Variability management and software product lines • Context-aware and Dynamic Reuse • Reuse in and for Machine Learning • Domain-specific languages (DSLs) • New language abstractions for software reuse • Generative Development • COTS-based development and reuse of open source assets • Retrieval and recommendation of reusable assets • Reuse of non-code artefacts • Architecture-centric reuse approaches • Service-oriented architectures and microservices • Software composition and modularization • Sustainability and software reuse • Economic models of reuse • Benefit and risk analysis, scoping • Legal and managerial aspects of reuse • Reuse adoption and transition to software reuse • Lightweight reuse approaches • Reuse in agile projects • Technical debt and software reuse 2 Software reuse in industry and in emerging domains • Reuse success stories • Reuse failures, and lessons learned • Reuse obstacles and success factors • Return on Investment (ROI) studies • Reuse in hot topic domains (Artificial Intelligence, Internet of Things, Virtualization, Network functions, Quantum Computing, etc.) We welcome research (16 pages) and industry papers (12 pages) following the Springer Lecture Notes in Computer Science format. Submissions will be handled via EasyChair (https://easychair.org/my/conference?conf=icsr2024). Submissions will be **double-blindly** reviewed, meaning that authors should: • Omit all authors’ names and affiliations from the title page • Do not include the acknowledgement section, if you have any, in the submitted paper • Refer to your own work in the third person • Use anonymous GitHub, Zenondo, FigShare or equivalent to provide access to artefacts without disclosing your identity Both research and industry papers will be reviewed by members of the same program committee (check the website for details). Proceedings will be published by Springer in their Lecture Notes for Computer Science (LNCS) series. An award will be given to the best research and the best industry papers. The authors of selected papers from the conference will be invited to submit an extended version (containing at least 30% new material) to a special issue in the Journal of Systems and Software (Elsevier). More details will follow. IMPORTANT DATES • Abstract submission: February 12, 2024, AoE • Full paper submission: February 19, 2024, AoE • Notification: April 8, 2024, AoE • Camera Ready: April 22, 2024, AoE • Author Registration: April 22, 2024 AoE ORGANISATION Steering Committee • Eduardo Almeida, Federal University of Bahia, Brazil • Goetz Botterweck, Lero, University of Limerick, Ireland • Rafael Capilla, Rey Juan Carlos University, Spain • John Favaro, Trust-IT, Italy • William B. Frakes, IEEE TCSE committee on software reuse, USA • Martin L. Griss, Carnegie Mellon University, USA • Oliver Hummel, University of Applied Sciences, Germany • Hafedh Mili, Université du Québec à Montréal, Canada • Nan Niu, University of Cincinnati, USA • George Angelos Papadopoulos, University of Cyprus, Cyprus • Claudia M.L. Werner, Federal University of Rio de Janeiro, Brazil General Chair • George A. Papadopoulos, University of Cyprus, Cyprus Program Co-Chairs • Achilleas Achilleos, Frederick University, Cyprus • Lidia Fuentes, University of Malaga, Spain

1 0

Postdoc in Natural Language Processing / Text Simplification
by Horacio Saggion 02 Nov '23

02 Nov '23

We are looking for a well-motivated and technically strong postdoc researcher in the area of Natural Language Processing (NLP) with experience on Text Simplification, Text Summarization or Text Generation to carry out research and development in the context of the iDEM Horizon Europe project coordinated by Universitat Pompeu Fabra. The project's main objective is to produce technology to make text more accessible to people who need support to read and comprehend language and to facilitate communication and dialogue in democratic spaces. The postdoc will work in the LaSTUS/TALN laboratory under the supervision of Prof. Horacio Saggion, coordinator of iDEM. The LaSTUS laboratory and TALN research group is one of the leading groups in NLP in Spain, with an important track record of participation in European and national research projects. The group works in diverse areas of NLP including text accessibility, language translation, sign language processing, dialogue systems, summarization, lexicography, social network text analysis, etc. We offer a truly international and stimulating environment for developing an international and well recognized research career. The ideal candidate will have a PhD in Computer Science or Computational Linguistics with very good knowledge of current Deep learning for NLP, especially Large Language Models (LLMs), generative AI and their adaptation to different tasks. Knowledge and experience with current Deep Learning frameworks and libraries are required. Experience with software development and management tools is required. The candidate will work in several areas of the iDEM project including, but not limited to: · Carry out research on readability assessment of different corpora related to the project · Develop algorithms for multilingual text simplification (Spanish, Italian, Catalan, and English) for people with language impairment · Develop text generation algorithms based on LLMs to generate discourses for people with language impairment · Investigate and adopt methods for evaluation of the simplification and generation technology · Assist the principal investigator with the day-by-day research activities · Administer the software and data resources generated by the project · Coordinate work activities, tasks and deliverables with other partners · Produce deliverables of excellent quality for the funding body · Participate in quality assessment · Participate in ethical training activities · Participate in periodic meetings *Gross annual salary* of 39.758,76€ Duration of the contract is 3 years (depending on start date) indefinite. *Ideal start date*: 01/01/2024 *Application deadline:* 07/11/2023 *How to apply*: https://apply.interfolio.com/135337 Send your informal queries to Horacio Saggion horacio.saggion(a)upf.edu -- Professor Horacio Saggion Head of the Large Scale Text Understanding Systems Lab Full Professor / Chair in Computer Science and Artificial Intelligence TALN / DTIC Deputy Director for Recruitment Universitat Pompeu Fabra [image: https://twitter.com/h_saggion] [image: https://www.linkedin.com/in/horacio-saggion-1749b916]

1 0