February 2023 - Corpora

Invitation to the 2023 Joint International Conference (Seoul, South Korea, 6-8 July 2023)
by Prof CK Jung 06 Feb '23

06 Feb '23

Dear Colleagues My name is CK Jung and I’m Director of Institute for Corpus Research at Incheon National University. I will be President of the Korea Association of Secondary English Education (KASEE) from 1st March 2023 and I would like to introduce you to the 2023 Joint International Conference on English Language Teaching in Korea. The conference takes place at Konkuk University, Seoul, South Korea from 6 to 8 July and our plenary speakers are: - Tony McEnery (Lancaster University, UK) - Joan Kelly Hall (Penn. State University, USA) - Julio C. Rodriguez (University of Hawai‘i at Mānoa, USA) - Kazuya Saito (University College London, UK) - Yuko Goto Butler (University of Pennsylvania, USA) - Youngju Yi (The Ohio State University, USA) If you’re interested in presenting a paper or poster (we welcome all research areas), please visit the following website and fill out the online registration form by 13 February (we only need your presentation title at this time): https://docs.google.com/forms/d/e/1FAIpQLSevA4eQyAN5lnLeQFRQF4d3g5sUllRQdzo… *Please note that it is very important to choose ‘KASEE’ in the Affiliated Association (소속학회) (Choose One)’ section while you’re filling out the form.* If you would like to know more about the conference, please visit http://jointconference2023.com Looking forward to seeing you in Seoul in July. Best regards CK Jung --- *CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford* Department of English Language and Literature, Incheon National University, *South Korea* Director | Institute for Corpus Research, Incheon National University, *South Korea* (http://icr.or.kr) Editor | Asia Pacific Journal of Corpus Research, ICR, *International* ( http://icr.or.kr/apjcr) Editorial Board | Corpora, Edinburgh University Press, *UK* Editorial Board | English Today, Cambridge University Press, *UK* E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129 H(EN): http://ckjung.org

1 0

Deadline Extended: Special Issue on The Role of Context in Neural Machine Translation Systems and its Evaluation
by Knowles, Rebecca 02 Feb '23

02 Feb '23

*Apologies for cross-posting* [image.png] Deadline Extended: Special Issue on The Role of Context in Neural Machine Translation Systems and its Evaluation in Natural Language Engineering Submission deadline extended to 15 Feb., 2023. Guest editors: - Sheila Castilho (The ADAPT Centre, School of Applied Languages and Intercultural Studies, Dublin City University) - Rebecca Knowles (National Research Council Canada) For this special issue, we invite the submission of papers focusing on the variety of novel implementations of context into neural machine translation systems as well as novel approaches to its evaluation. Recent claims that machine translation systems are reaching (near) human parity at the sentence level have been followed by subsequent analyses that indicate remaining gaps in translation quality at the document level. How best to evaluate machine translation at the document level (and what exactly constitutes document level evaluation) remains an open question. At the same time, there is work seeking to add discourse and context into neural machine translation systems. Papers that focus on topics of context in neural machine translation, machine translation evaluation, or both are welcome. For full details, see: https://sites.google.com/dcu.ie/nlecontextnmt/home Topics of interest include, but are not limited to: - Novel language processing techniques for implementing discourse in NMT systems - Document-level NMT and evaluation - Use of target and source context - Context-aware techniques for quality evaluation - Context-aware automatic and human evaluation metrics - The size and composition of the training data and its effect on context-aware systems - The effect of the quality of training data and test sets on context-aware systems - Translationese and its effect on document-level training - Lexical diversity and lexical density in discourse NMT - Discourse NMT for different domains Publication Timeline: - Article deadline submission: now 15 February 2023 - Return of reviews to contributors: 1 April 2023 - Revised articles deadline submission: 1 May 2023 - Return of second reviews to contributors (if applicable): 1 July 2023 - Final Submission: 15 September 2023 - Publication: November 2023 / January 2024 Format and Submission: Typical submissions will be 12-25 pages in length. Authors should follow the "Author Instructions" section on the journal website: https://www.cambridge.org/core/journals/natural-language-engineering/inform… We highly recommend using the LaTeX template found under "Preparing your materials" at the link above. All manuscripts must be submitted online via the NLE ScholarOne website: http://mc.manuscriptcentral.com/nle. Under "Special Issue Designation", choose "The Role of Context in Neural Machine Translation Systems and its Evaluation". Queries: Any queries related to this special issue should be addressed to sheila.castilho(a)dcu.ie<mailto:sheila.castilho@dcu.ie> with NLE-ContextNMT in the subject line.

1 0

JLC 2023 : appel avec nouvelle date soumissions
by Marie-Paule Jacques 02 Feb '23

02 Feb '23

**** English version below **** Bonjour, Suite à diverses demandes, la date limite pour soumettre une proposition pour les prochaines Journées de la Linguistique de Corpus a été repoussée. Veuillez trouver ci-dessous le nouvel appel. *ATTENTION : nouvelle date limite* * 11es Journées Internationales de Linguistique de Corpus (JLC2023) * 3-6 juillet 2023, Grenoble, France * Appel à communications * https://jlc2023.sciencesconf.org/ Lancées en 2001 par Geoffrey Williams à l’université de Lorient-Bretagne Sud, les Journées Internationales de Linguistique de Corpus (JLC) réunissent régulièrement la communauté interdisciplinaire dont l’objet de recherche porte sur les corpus linguistiques. Après sept éditions, puis un passage à Orléans en 2015, elles s’installent à Grenoble en 2017 puis 2019, et pour cette nouvelle édition en 2023. Elles sont co-organisées par le Laboratoire LIDILEM (UGA) et d’autres laboratoires de l’UGA (ILCEA4, LIG, Litt&Arts) et d’universités partenaires (Lyon, Montpellier, Toulouse) : DDL, ICAR, Praxiling, CLLE. *Conférencier.e.s. invité.e.s* : Florence Mourlhon-Dallies ** (Université Paris Cité), Jérôme Jacquin (Université de Lausanne) Les JLC2023 ont pour vocation de rassembler une communauté autour d'approches variées, aussi bien du point de vue méthodologique que disciplinaire. Elles s’attachent à mener une réflexion sur la linguistique de corpus et à contribuer à l'évolution des pratiques scientifiques dans ce domaine. Ces journées visent ainsi à créer des passerelles entre différentes approches des corpus numériques. Dans la lignée des précédentes conférences, les JLC2023 proposeront, durant trois jours, des présentations scientifiques, des conférences invitées et des sessions de poster et discussions entre les participants. Des sessions de formation aux outils et à l’exploitation des corpus, particulièrement pour des objectifs didactiques, seront proposées. Les participant.e.s sont invité.e.s à confronter leurs outils et leurs expériences et à présenter leurs résultats, dans tous les champs dans lesquels l’utilisation des corpus est présente. Cette édition des JLC mettra un focus particulier sur corpus et didactique. Une partie des journées sera spécifiquement dédiée à cette thématique. On attend ainsi pour celle-ci des propositions de communication qui montrent et questionnent l’utilisation de corpus dans l’enseignement, qu’il s’agisse de retours d’expérience, d’exposé des démarches et approches méthodologiques, pour des publics variés, aussi bien que de points de vue plus théoriques... Ces journées ne se limiteront pas à cette thématique et restent accueillantes pour toutes sortes de contributions autour des corpus écrits, oraux ou multimodaux, qui pourront concerner, de manière non exhaustive : 1. Approches linguistiques et corpus 2. Méthodes et outils 3. Variations, genres, discours 4. Applications et usages des corpus : formation, traduction, terminologie... La présentation des contributions en français ou en anglais ne dépassera pas 3 pages (hors références bibliographiques et figures). Les soumissions anonymes seront déposées via le système SciencesConf pour une évaluation par deux relecteurs. À côté des communications classiques, il sera possible de proposer une démonstration (mêmes modalités de soumission). Une proposition de publication en ligne est envisagée à l'issue de la conférence. Calendrier : 1. Diffusion de l’appel : début novembre 2022 2. Date-limite de réception des soumissions : 3 février 2023 *17 février 2023* 3. Notification aux auteurs : Mi-avril 2023 4. Version définitive de la soumission : 19 mai 2023 5. Inscriptions : mai 2023 --------------------------------------------------------- * The 11th International Conference on Corpus Linguistics * 3-6 July 2023, Grenoble, France https://jlc2023.sciencesconf.org/ * Call for Papers * *NEW DEADLINE* The International Conference on Corpus Linguistics (JLC), founded by Geoffrey Williams in 2001 at the University of South Brittany, Lorient, France, regularly draws together an interdisciplinary community whose research focus is corpus linguistics. After seven gatherings in Lorient and an interlude in Orleans in 2015 (8th International Conference on Corpus Linguistics), the conference alighted in Grenoble in early July 2017 and in November 2019, organized by the LIDILEM Laboratory with contributions from LIG, ILCEA4, Litt&Arts and the MSH-Alpes. Université Grenoble Alpes is honored to host this international conference again from July 3rd to July 6th 2023. The JLC’23 are organized in collaboration with other labs from French universities (Lyon, Montpellier, Toulouse): DDL, ICAR, Praxiling, CLLE. The objective of JLC'23 is to (re)unite a community that adopts various approaches, be they methodological or disciplinary, to promote corpus linguistics, and to contribute to the evolution of practices in the field by building bridges between different approaches to digital corpora. The participants are invited to share and compare their knowledge of tools, experiences, and findings. In the tradition of previous conferences, the JLC in Grenoble will offer three days of presentations, guest speakers and discussion sessions among the participants. Training sessions on tools and methods will be organized over a half day. This edition of the JLC will put a particular focus on corpora and didactics. A part of the conference will be specifically dedicated to this theme. We expect papers that show and question the use of corpora in teaching, be they feedback from real uses, presentation of methodological approaches for various audiences, or more theoretical points of view... These days will not be limited to this theme and will be open to all kinds of contributions on written, oral or multimodal corpora, which may concern, in a non-exhaustive way : 1. Linguistic approaches to corpora 2. Methods and tools 3. Variations, genres, and discourse 4. Applications and uses of corpora for teaching and learning, translation, terminology... *Guest speakers include*: Florence Mourlhon-Dallies ** (Université Paris Cité), Jérôme Jacquin (Université de Lausanne) Submissions for a presentation or a demonstration in French or English should not exceed three pages (excluding figures and bibliographic references) and must be anonymous. They will get double peer-reviewing by members of the scientific board. JLC2023 will adopt the SciencesConf system to manage communication proposals. In addition to classic presentations, you may also propose a demonstration (identical submission guidelines). Publication: following the colloquium, authors are welcome to submit an article. This collection of articles will be reviewed and published online. Timetable: 1. First CFP: November 2022 2. Submission deadline: Friday February 3rd 2023*17th 2023* 3. Notification of acceptance: Mid-April 2023 4. Final submission version: Friday May 19th 2023 5. Registration begins: May 2023 -- Marie-Paule Jacques /Mobilisée pour la défense du service public de l'enseignement supérieur et de la recherche/ Maitre de conférences HDR Sciences du langage - Senior Lecturer in Linguistics INSPE et LIDILEM (Laboratoire de linguistique et didactique des langues étrangères et maternelles) Université Grenoble Alpes

1 0

GUM Corpus V9 - new data and annotations
by Amir Zeldes 02 Feb '23

02 Feb '23

(Apologies for cross-postings) *** The GUM Corpus - Release 9.0.0 *** *** Georgetown University Multilayer corpus *** Corpling@GU <https://gucorpling.org/corpling/> is happy to announce the first release of series 9 of the Georgetown University Multilayer corpus (GUM V9.0.0): https://gucorpling.org/gum/ New in this version: - 20 new documents added including more conversational data (total tokens: 203,879) - Abstractive summaries for each document - Annotations for salient/non-salient entities in each document - Foreign language tags to identify individual source languages where relevant - New easier process for reconstructing Reddit text data - Many corrections to all annotation layers GUM is an open source corpus of richly annotated English texts from multiple genres: academic, bio, conversation, fiction, interview, news, speeches, textbooks, travel, vlogs, how-to and Reddit forum discussions. The corpus is created by students as part of the Computational Linguistics curriculum at Georgetown University and is available under Creative Commons licenses. This is the first version of GUM series 9, containing roughly 200K tokens annotated for: - Multiple POS tags (100% manual gold PTB, extended PTB, converted CLAWS5 and UPOS) and UD morphological features - Manually corrected lemmatization - Sentence segmentation and rough speech act (manual) - Document structure using TEI tags (paragraphs, headings, figures, captions etc., all manual) - Constituent and dependency syntax (manually corrected Universal Dependencies, and PTB parses from gold tags with function labels) - Information status (given-active/inactive, accessible-inferable/common ground/aggregate, and new) - Entity type, salience and coreference annotation (including non-named entities, singletons, appositions, cataphora and several types of bridging) - Entity linking (Wikification) of all named entities with Wikipedia articles, including their non-named and pronominal mentions - Discourse parses in Rhetorical Structure Theory and discourse dependencies - Abstractive summaries Note on Reddit data: token text is not contained in the release but can be downloaded with an included script. For more information and to search or download the corpus online, see the corpus website <https://gucorpling.org/gum/> . Best wishes, The GUM team

1 0

[Touché@CLEF 2023] Intra-Multilingual Stance Classification in Online Debates — Call for Participation (apologies for cross-posting)
by valbarrierepro＠gmail.com 02 Feb '23

02 Feb '23

We invite you to participate in our multilingual stance classification shared task, as part of the Touché Lab, which will be held in conjunction with the CLEF'23 conference in Thessaloniki, Greece [1]. Context: Participatory Democracy at the scale of a continent like Europe brings many difficulties due to the high diversity of languages and cultures. At the same time, Machine Learning is an interesting tool for stance recognition in a large-scale context, in terms of data size, but also regarding the topics and themes addressed or the languages employed by the participants. Public consultations of citizens using Online Participatory Democracy platforms offer this kind of setting and are good use cases for automatic stance recognition systems. In the context of the Touché Lab at CLEF 2023 [2], we are proposing a shared task on data coming from the platform used during the Conference for the Future of Europe [2] which was inaugurated in 2021, where users can submit proposals and comment over them in any of the 24 official EU languages. A particularity of this platform is the use of a Machine Translation system in order to give the possibility to the users to interact between each others in their native languages, leading to what we call Intra-Multilingual data: pairs of proposal and comment in different languages. [1] https://clef2023.clef-initiative.eu/ [2] https://touche.webis.de/ [3] https://futureu.europa.eu/ Tasks: Given a proposal on a socially important issue, the task is to classify whether a comment is in favor, against, or neutral towards the proposal. Subtask1: Cross-debate Stance Classification. Subtask2: All-data-available Classification Learn more about this and other argumentation- and causality-related tasks at https://touche.webis.de/ Data available at https://touche.webis.de/clef23/touche23-web/multilingual-stance-classificat… Register via the CLEF website: https://clef2023-labs-registration.dei.unipd.it/ ------------------------------------------------------------------------------- Important Dates ------------------------------------------------------------------------------- Now open: Registration Jan. 15, 2023: Development data available April 30, 2023: Test data available May 2, 2023: Approaches submission on the test data June 5, 2023: Participant paper submission July 7, 2023: Camera-ready participant papers submission Sep. 18-21, 2023: Conference One of the conference days: Touché Workshop on Argument and Causal Retrieval ------------------------------------------------------------------------------- Special Announcements ------------------------------------------------------------------------------- Touché Open Source Proceedings Touché will host a collection of software developed by participants at GitHub. The Touché team invite you to publish your software too and invite software submissions using TIRA [ https://www.tira.io/ ]. In case of questions / suggestions / etc., please reach us at touche(a)webis.de. Best regards, CoFE Team @ Touché

1 0

Second Call For Papers: The Fourth Workshop on Insights from Negative Results (co-located with EACL 2023)
by Shabnam Tafreshi 02 Feb '23

02 Feb '23

Dear colleagues, The Fourth Workshop on Insights from Negative Results in NLP Co-located with EACL, May 2 or 6, 2023 First Call for Participation Insights Website: <https://insights-workshop.github.io/ <https://insights-workshop.github.io/index>> Contact email: insights-workshop-organizers(a)googlegroups.com *Overview Publication of negative results is difficult in most fields, but in NLP the problem is exacerbated by the near-universal focus on improvements in benchmarks. This situation implicitly discourages hypothesis-driven research, and it turns creation and fine-tuning of NLP models into art rather than science. Furthermore, it increases the time, effort, and carbon emissions spent on developing and tuning models, as the researchers have no opportunity to learn what has already been tried and failed. This workshop invites both practical and theoretical unexpected or negative results that have important implications for future research, highlight methodological issues with existing approaches, and/or point out pervasive misunderstandings or bad practices. In particular, the most successful NLP models currently rely on different kinds of pretrained meaning representations (from word embeddings to Transformer-based models like BERT and GPT-3). To complement all the success stories, it would be insightful to see where and possibly why they fail. Any NLP tasks are welcome: sequence labeling, question answering, inference, dialogue, machine translation - you name it. A successful negative results paper would contribute one of the following: ** broadly applicable recommendations for training/fine-tuning, especially if X that didn’t work is something that many practitioners would think reasonable to try, and if the demonstration of X’s failure is accompanied by some explanation/hypothesis; ** ablation studies of components in previously proposed models, showing that their contributions are different from what was initially reported; ** datasets or probing tasks showing that previous approaches do not generalize to other domains or language phenomena; ** trivial baselines that work suspiciously well for a given task/dataset; ** cross-lingual studies showing that a technique X is only successful for a certain language or language family; ** experiments on (in)stability of the previously published results due to hardware, random initializations, preprocessing pipeline components, etc; ** theoretical arguments and/or proofs for why X should not be expected to work; ** demonstration of issues with data processing/collection/annotation pipelines, especially if they are widely used; ** demonstration of issues with evaluation metrics (e.g. accuracy, F1 or BLEU), which prevent their usage for fair comparison of methods. * Important Dates ** Submission due: February 13, 2023 ** Submission due for papers reviewed through ACL Rolling Review: March 17, 2023 ** Notification of acceptance: March 13, 2023 ** Camera-ready papers due: March 27, 2023 ** Workshop: May 5 or 6, 2023 * Submission Submission is electronic, using the Softconf START conference management system. Submission link: <https://softconf.com/eacl2023/insights2023/> The workshop will accept short papers (up to 4 pages, excluding references), as well as 1-2 page non-archival abstract submissions for papers published elsewhere (e.g. in one of the main conferences or in non-NLP venues). The goal of this event is to stimulate a meaningful community-wide discussion of the deep issues in NLP methodology, and the authors of both types of submissions will be welcome to take part in our get-togethers. The workshop will run its own review process, and papers can be submitted directly to the workshop by Feb 13, 2023. It is also possible to submit a paper accompanied with reviews from the ACL Rolling Review system by March 17, 2023. The submission deadline for ARR papers follows the ACL RR calendar. Both research papers and abstracts must follow the ACL two-column format. Official style sheets: <https://www.overleaf.com/read/crtcwgxzjskr> <https://github.com/acl-org/ACLPUB/tree/master/templates> Please do not modify these style files, nor should you use templates designed for other conferences. Submissions that do not conform to the required styles, including paper size, margin width, and font size restrictions, will be rejected without review. * Multiple Submission Policy The workshop cannot accept work for publication or presentation that will be (or has been) published elsewhere and that have been or will be submitted to other meetings or publications whose review periods overlap with that of Insights. Any questions regarding submissions can be sent to insights-workshop-organizers(a)googlegroups.com. If the paper has been rejected from another venue, the authors will have the option to provide the original reviews and the author response. The new reviewers will not have access to this information, but the organizers will be able to take into account the fact that the paper has already been revised and improved. * Anonymity Period We are not enforcing any anonymity period. * Presentation All accepted papers must be presented at the workshop to appear in the proceedings. Authors of accepted papers must notify the program chairs by the camera-ready deadline if they wish to withdraw the paper. At least one author of each accepted paper must register for the workshop. Previous presentations of the work (e.g. preprints on arXiv.org) should be noted in a footnote in the camera-ready version (but not in the anonymized version of the paper). The workshop will take place on May 2 or 6 2023. The workshop will be hybrid with both in-person and virtual presentations. * Organization Committee ** Shabnam Tafreshi, University of Maryland: ARLIS ** Arjun Reddy Akula, Google ** João Sedoc, New York University ** Anna Rogers, University of Copenhagen ** Aleksandr Drozd, RIKEN ** Anna Rumshisky, University of Massachusetts Lowell / Amazon Alexa * Contact info Any questions regarding the workshop can be sent to insights-workshop-organizers(a)googlegroups.com. Please continue reading about: Authorship, Citation and Comparison, Ethics Policy, Reproducibility, Anonymity Period, and Presentation in the call for paper page on our website: https://insights-workshop.github.io/2023/cfp/ Regards, Insights 2023 Organizers -- *Shabnam Tafreshi, PhD* *Assistant Research Scientist* *Computational Linguistics, NLP* *UMD: ARLIS @ College Park* *"All the problems of the world could be settled easily, if people only willing to think."* *-Thomas J. Watson*

1 0

[job] PhD position on NLP for video subtitling at U. of Amsterdam
by vlad.niculae.uva＠gmail.com 02 Feb '23

02 Feb '23

Fully funded 4-year PhD position on NLP for video subtitling at the University of Amsterdam, Language Technology Lab. This is a collaboration with RTL and part of the LTP ROBUST program. The call text is below my signature, mirrored from the official listing: https://vacatures.uva.nl/UvA/job/PhD-Candidate-in-Natural-Language-Processi… Apply, only through the link above, before Feb 24. For more context, see also my web site https://vene.ro/jobs.html. For further questions, don’t hesitate to e-mail me—please include [PhD 11053] in the subject line so my filters can catch your email. Vlad Niculae [he/him] Asst. Prof. @ LTL, IvI, University of Amsterdam https://vene.ro --- PhD Candidate in Natural Language Processing for Video Subtitling Faculteit/Dienst: Faculteit der Natuurw., Wiskunde & Informatica Opleidingsniveau: Master Functie type: Promotieplaats Sluitingsdatum: 24 februari 2023 Vacaturenummer: 11053 We are inviting applications for a fully-funded, four-year PhD position in natural language processing for video subtitling. This is a collaboration between core Computer Science, Science, Technology, and Social Studies. Are you eager to work on applied research models for accessibility language technologies? Do you want to research controllability of language generation for generating adequate, appropriate, and faithful subtitles? This position might be the one for you! What are you going to do? You will be embedded in the Language Technology Lab (LTL) under the supervision of Dr. Vlad Niculae and lead a project to investigate and improve NLP generative models for semi-automatic subtitling for Dutch and English television and video-on-demand. As captions provide access to information to many, high quality and unbiased performance are of critical societal importance. Powerful speech recognition systems are available today and provide a solid basis, but do not solve subtitling. We aim towards a subtitling system that is: contextualized: it uses speaker identity, available scripts, and visual cues for improved accuracy; machine-in-the-loop: it quantifies its own uncertainty, giving control to expert human operators; faithful: it maintains good performance across languages, topics, and speaker identities (such as gender, age, region). The PhD position will be part of the large LTP ROBUST program “Trustworthy AI-based Systems for Sustainable Growth” consortium, comprising 17 universities, 19 industry partners, and 15 collaborating partners representing diverse stakeholder groups. You will gain valuable experience working with an industry partner and will be able to tap into a wealth of networking, career development, and training opportunities in conjunction with ICAI, the Innovation Center for Artificial Intelligence at the University of Amsterdam. You will be part of one of the 17 new ICAI labs, named TAIM (Trustworthy AI for Media Lab) consisting of 5 PhD students, who will collaborate on developing methods, metrics and tools to evaluate and improve diversity and inclusion in media. Tasks and responsibilities With our help and support, you will: innovate in research on contextual, uncertainty-aware, faithful generative models of language for subtitling; deploy prototypes and evaluate subtitling in the applied setting of RTL; complete and defend your PhD thesis; become an active participant in the research community and collaborate within and outside the TAIM lab and the Language Technology Lab; publish and present work regularly at international conferences, workshops, and journals; assist in educational tasks (labs / tutorials, supervising bachelor and Master projects.) Additionally, you will have the opportunity to closely collaborate with a leading entertainment brand, RTL. You are expected to work at their premises one day per week in Hilversum and one day remotely, making use of their resources and deployment context. We care strongly about respecting work-life balance and contractual hours. What do you have to offer? Your experience and profile: A Master’s degree (completed or near completion) with a thesis in Natural Language Processing, Machine Learning, Computer Science, or similar relevant areas; Serious interest in pursuing fundamental research with concrete applications; A good background in Natural Language Processing, Machine Learning, and Deep Learning; Advanced programming skills; Professional command of the English language; A commitment to maintaining an inclusive, collaborative, diverse, and supportive work environment. Interdisciplinary collaborations and backgrounds are appreciated, especially along fields related to linguistics and communication science. Experience with using subtitles or similar accessibility language technologies is a pre. If this describes you, we encourage your application. If you are interested but unsure if you are qualified, please contact Dr. Vlad Niculae before applying. If your Master’s degree is near completion, it must be completed before the start date. Knowledge of the Dutch language is not required for this position, but can help both for living in Amsterdam and for a good understanding of the video content. The UvA provides the opportunity to attend Dutch language classes. Our offer A temporary contract for 38 hours per week for the duration of 4 years (the initial contract will be for a period of 18 months and after satisfactory evaluation it will be extended for a total duration of 4 years). The preferred starting date is April 2023. Your work should lead to a dissertation (PhD thesis). We will draft an educational plan that includes attendance of courses and (international) meetings. We also expect you to assist in teaching undergraduates and master students. The gross monthly salary, based on 38 hours per week and dependent on relevant experience, ranges between € 2,541 in the first year to € 3,247 in the last year (scale P). UvA additionally offers an extensive package of secondary benefits, including 8% holiday allowance and a year-end bonus of 8.3%. The UFO profile PhD Candidate is applicable. A favourable tax agreement, the ‘30% ruling’, may apply to non-Dutch applicants. The Collective Labour Agreement of Universities of the Netherlands is applicable. Besides the salary and a vibrant and challenging environment at Science Park we offer you multiple fringe benefits: 232 holiday hours per year (based on fulltime) and extra holidays between Christmas and 1 January. Multiple courses to follow from our Teaching and Learning Centre. A complete educational program for PhD students. Multiple courses on topics such as leadership for academic staff. Multiple courses on topics such as time management, handling stress and an online learning platform with 100+ different courses. 7 weeks birth leave (partner leave) with 100% salary. Partly paid parental leave. The possibility to set up a workplace at home; A pension at ABP for which UvA pays two third part of the contribution. The possibility to follow courses to learn Dutch; Help with housing for a studio or small apartment when you’re moving from abroad. Are you curious to read more about our extensive package of secondary employment benefits, take a look here. About us The University of Amsterdam is the Netherlands' largest university, offering the widest range of academic programmes. At the UvA, 42,000 students, 6,000 staff members and 3,000 PhD candidates study and work in a diverse range of fields, connected by a culture of curiosity. The Faculty of Science has a student body of around 8,000, as well as 1,800 members of staff working in education, research or support services. Researchers and students at the Faculty of Science are fascinated by every aspect of how the world works, be it elementary particles, the birth of the universe or the functioning of the brain. The mission of the Informatics Institute (IvI) is to perform curiosity-driven and use-inspired fundamental research in Computer Science. The main research themes are Artificial Intelligence, Computational Science and Systems and Network Engineering. Our research involves complex information systems at large, with a focus on collaborative, data driven, computational and intelligent systems, all with a strong interactive component. The Language Technology Lab (LTL) is a research group focusing on information access from natural language data. Our work ranges from basic research in natural language processing to key applications in human language technology, and covers areas such as machine translation, summarization, question answering, language modeling, and image captioning. LTL positions itself primarily in the AI research theme, with some links to the Data Science theme of the Informatics Institute. You will be part of one of the 17 new ICAI labs, named TAIM (Trustworthy AI for Media Lab) consisting of 5 PhD students, who will collaborate on developing methods, metrics and tools to evaluate and improve diversity and inclusion in media. You are joining a unique team also including the Department of Advanced Computing Sciences at Maastricht University (UM) and media and entertainment company RTL Nederland. The TAIM lab will bring together two of the strongest groups on personalization and recommender systems in the Netherlands (UM and UvA), with a leading media organization (RTL), to develop trustworthy and personalized media. The lab will focus on the development of media that is inclusive, informed by democratic norms, and in line with RTLs values to represent, and give a voice to, all of the Netherlands in the design of their personalization algorithms. Want to know more about our organisation? Read more about working at the University of Amsterdam. Any questions? Do you have any questions or do you require additional information? Please contact: E: Dr. Vlad Niculae, Assistant Professor. Job application If you feel the profile fits you, and you are interested in the job, we look forward to receiving your application. You can apply online via the button below. We accept applications until and including 24 February 2023. Applications should include the following information (all files besides your CV should be submitted in one single pdf file): a letter of motivation (max 2 pages) in which you: motivate your choice for this position and your interest in the proposed project; indicate your preferred starting date and availability; sketch out some thoughts and ideas about tackling the project (not a fully-detailed or binding proposal). a Curriculum Vitae (including start/end months of education and work experience); a summary of, or a copy of, your Master’s thesis; a copy of your Master’s and Bachelor’s transcript/diploma. If your MSc thesis is not finished or not in English, submit a brief summary in 1-4 pages. If your transcripts or diplomas are not available yet, please attach a note clearly stating which documents are not available, and when they will be available. This note can be in your own words. Before submitting, please make sure to provide ALL requested documents mentioned above. You can use the CV field to upload your resume as a separate pdf document. Use the Cover Letter field to upload the other requested documents, including the motivation letter, as one single pdf file. Please do not submit applications by e-mail. Only complete applications received within the response period via the link below will be considered. The interviews will be held in March 2023. The UvA is an equal-opportunity employer. We prioritize diversity and are committed to creating an inclusive environment for everyone. We value a spirit of enquiry and perseverance, provide the space to keep asking questions, and promote a culture of curiosity and creativity. If you encounter Error GBB451/ GBC451, please try using a VPN connection when outside of the European Union. Please reach out directly to our to our HR Department directly. They will gladly help you continue your application. No agencies please.

1 0

Second Call for Participation - EXIST 2023: sEXism Identification in Social neTworks @ CLEF
by JORGE AMANDO CARRILLO DE ALBORNOZ CUADRADO 02 Feb '23

02 Feb '23

Please, consider participating and/or forwarding to colleagues and groups. ****We apologize for multiple postings of this e-mail**** ---------------------------------------------------------------------------------------------------- Call for Participation ---------------------------------------------------------------------------------------------------- Second Call for Participation EXIST 2023 at CLEF 2023 Task: EXIST 2023: sEXism Identification in Social neTworks Website: http://nlp.uned.es/exist2023/ EXIST is a series of scientific events and shared tasks on sexism identification in social networks that aims to capture sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours (EXIST 2021, EXIST 2022). The third edition of the EXIST shared task will be held as a Lab at CLEF 2023, which will take place on September 18-21, 2023, in the Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece. Social Networks are the main platforms for social complaint, activism and expression of opinions and personal views in general. Movements like #MeTwoo, #8M or #Time’sUp have spread rapidly. Under the umbrella of social networks, many women all around the world have reported abuses, discriminations and other sexist experiences suffered in real life. Social networks are also contributing to the transmission of sexism and other disrespectful and hateful behaviours. In this context, automatic tools not only may help to detect and alert against sexist behaviours and discourses, but also to estimate how often sexist and abusive situations are found in social media platforms, what forms of sexism are more frequent and how sexism is expressed in these media. Given the success of the tasks, EXIST 2023 is a follow up of the tasks addressed in previous years, while facing yet a new challenge: the identification of the intention of the author of the sexist message. Additionally, the main novelty will be the adoption of the “learning with disagreements” paradigm for the development of the dataset and for the evaluation of the systems. The adoption of this paradigm along with our effort to control bias in the annotations will allow us to evaluate whether including the different views and sensibilities of the annotators contributes to the development of more accurate and fairer NLP systems. Participants will be asked to classify tweets (in English and Spanish) according to the following three tasks: TASK 1 - Sexism Identification: a binary classification where systems have to decide whether or not a given text (tweets) contains sexist expressions or behaviours (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour). TASK 2 - Source Intention: for the tweets that have been classified as sexist, the second task aims to classify each tweet according to the intention of the person who wrote it. We propose a ternary classification task: (i) direct sexist message, (ii) reported sexist message and (iii) judgemental message. TASK 3 - Sexism Categorization: once a message has been classified as sexist, the third task aims to categorize the message in different types of sexism (according to the categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, each sexist tweet must be categorized in one or more of the following categories: (i) Ideological and inequality, (ii) Stereotyping and dominance, (iii) Objectification, (iv) Sexual violence and (v) Misogyny and non-sexual violence. Although we recommend to participate in all subtasks, participants are allowed to participate just in one of them. During the training phase, the task organizers will provide to the participants the manually-annotated EXIST 2023 dataset. For the evaluation of the teams, the unlabelled test data will be released. We encourage participation from both academic institutions and industrial organizations. We invite the participants to register for the lab at CLEF 2023 Labs Registration site (http://clef2023-labs-registration.dei.unipd.it/registrationForm.php). Upon registration participants will receive information about how to join the Google Group about the EXIST 2023 shared task. Important Dates: * 14 November 2022: Registration open. * 13 February 2023: Training set available. * 27 March 2023: Development set available. * 10 April 2023: Test set available. * 28 April 2023: Registration closes. * 10 May 2023: Runs submission due. * 26 May 2023: Results notification. * 5 June 2023: Submission of Working Notes by participants. * 23 June 2023: Notification of acceptance (peer-reviews). * 7 July 2023: Camera-ready participant papers due. * 18-21 September 2023: EXIST 2023 at CLEF Conference. **Note: All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").** Organizers: Laura Plaza, Universidad Nacional de Educación a Distancia (UNED) Jorge Carrillo-de-Albornoz, Universidad Nacional de Educación a Distancia (UNED) Roser Morante, Universidad Nacional de Educación a Distancia (UNED) Enrique Amigó, Universidad Nacional de Educación a Distancia (UNED) Julio Gonzalo, Universidad Nacional de Educación a Distancia (UNED) Damiano Spina, Royal Melbourne Institute of Technology (RMIT) Paolo Rosso, Universitat Politècnica de Valencia (UPV) Contact: Contact the organizers by writing to: jcalbornoz(a)lsi.uned.es Website: http://nlp.uned.es/exist2023/ AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Departamento de Política Jurídica de Seguridad de la Información<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://sede.uned.es/> de la Universidad. Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.

1 0

Call-for-Participation: 2nd Fusion Task @ ImageCLEF 2023 (Machine Learning System Fusion)
by Bogdan Ionescu 01 Feb '23

01 Feb '23

[Apologies for multiple postings] ImageCLEFfusion (2nd edition) Registration: https://www.imageclef.org/2023/fusion Run submission: May 10, 2023 Working notes submission: June 5, 2023 CLEF 2023 conference: September 18-21, Thessaloniki, Greece *** CALL FOR PARTICIPATION *** While deep neural networks have proven their predictive power in many tasks, there are still several domains where a single deep learning network is not enough for attaining high precision, e.g., prediction of subjective concepts such as violence, memorability, etc. Late fusion, also called ensembling or decision-level fusion, represents one of the approaches that researchers employ to increase the performance of single-system approaches. It consists of using a series of weaker learner methods called inducers, whose prediction outputs are combined in the final step, via a fusion mechanism to create a new and improved super predictor. These systems have a long history and are shown to be particularly useful in scenarios where the performance of single-system approaches is not considered satisfactory. The task challenges participants to develop and benchmark late fusion schemes. This task would allow to explore various aspects of late fusion mechanisms, such as the performance of different fusion methods, the methods for selecting inducers from a larger set, the exploitation of positive and negative correlations between inducers, and so on. *** TASK *** The participants will receive a data set of real inducers and are expected to provide a fusion mechanism that would allow to combine them into a super-system yielding superior performance compared to the highest performing individual system. The provided inducers were developed to solve three real tasks: (i) prediction of visual interestingness (int --- regression task), (ii) diversification of image search results (div --- retrieval task), (iii) medical image captioning (cap --- multi-class labeling task). *** DATA SET *** ImageCLEFfusion-int. The data for this task is extracted and corresponds to the Interestingness10k dataset. We will provide output data from 33 inducers, while 1,826 samples will be used for the development set, and 609 samples will be used for the testing set. ImageCLEFfusion-div. The data for this task is extracted and corresponds to the Retrieving Diverse Social Images Task dataset. We will provide outputs data from 117 inducers, while 104 queries will be used for the development set, and 35 samples will be used for the testing set. ImageCLEFfusion-cap. The data for this task is extracted from the ImageCLEFmedical Caption task. We will provide output data from 85 inducers, while 5,700 images will be used for the development set, and 1900 images will be used for the testing set. *** METRICS *** Evaluation will be performed using the metrics specific to each dataset we use, e.g., MAP@10, F1@20, ClusterRecall@20, accuracy. *** IMPORTANT DATES *** - Run submission: May 10, 2023 - Working notes submission: June 5, 2023 - CLEF 2023 conference: September 18-21, Thessaloniki, Greece (https://clef2023.clef-initiative.eu/) *** OVERALL COORDINATION *** Liviu-Daniel Stefan, Politehnica University of Bucharest, Romania Mihai Gabriel Constantin, Politehnica University of Bucharest, Romania Mihai Dogariu, Politehnica University of Bucharest, Romania Bogdan Ionescu, Politehnica University of Bucharest, Romania *** ACKNOWLEDGEMENT *** The task is supported under the H2020 AI4Media “A European Excellence Centre for Media, Society and Democracy” project, contract #951911 https://www.ai4media.eu/. On behalf of the Organizers, Bogdan Ionescu https://www.AIMultimediaLab.ro/

1 0

Second CFP: Field Matters - The Second Workshop on NLP Applications to Field Linguistics
by fieldmattersworkshop＠gmail.com 01 Feb '23

01 Feb '23

Dear colleagues, The Second Workshop on NLP Applications to Field Linguistics (Field Matters 2023) invites paper submissions. The workshop will take place at EACL 2023 (https://2023.eacl.org/) in Dubrovnik, Croatia on May 5 or 6 (online participants are also welcomed). We accept papers on the following topics: - Application of NLP to field linguistics workflow; - Transfer learning for under-resourced language processing; - The use of fieldwork data to build NLP systems; - Modeling morphology and syntax of typologically diverse languages in the low-resource setting; - Speech processing for under-resourced languages; - Computational analysis of field linguistics datasets; - Using technology for preserving culture via language; - Improving ways of interaction with Indigenous communities; - Machine-readable field linguistic datasets. Submission deadline is February 13. The workshop will run its own review process, and papers can be submitted directly to the workshop via Start (https://softconf.com/eacl2023/FieldMatters2023/). You can find more information on the submission process and format requirements on our web-site (https://field-matters.github.io/cfp2023). Subscribe to our Twitter page (https://twitter.com/field_matters) to follow the updates. If you have any questions, feel free to ask them! Best regards, Anna Postnikova Field Matters workshop organizing committee

1 0

2026

2025

2024

2023

2022

Corpora February 2023