February 2023 - Corpora

3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France)
by Pascal Denis 13 May '25

13 May '25

Hello, Could you please distribute the following job offer? Thanks. Best, Pascal ------------------------------------------------------------------------------------- 3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France) We invite applications for a 3-year PhD position at the University of Lille in the context of the recently funded research project "COMANCHE" (Computational Models of Lexical Meaning and Change). The position is funded by Inria, the French national research institute in Computer Science and Applied Mathematics. COMANCHE proposes to transfer and adapt neural word embeddings algorithms to model the acquisition and evolution of word meaning, by comparing them with linguistic theories on language acquisition and language evolution. At the intersection between Natural Language Processing, psycholinguistics and historical linguistics, this project intends to validate or revise some of these theories, while also developing computational models that are less data hungry and computationally intensive as they exploit new inductive biases inspired by these disciplines. The first strand of the project, on which the successful candidate will work, focuses on the development of computational models of semantic memory and its acquisition. Two main research directions will be pursued. On the one hand, we will compare the structural properties associated to different semantic spaces derived from word embedding algorithms to those found in human semantic memory as reflected in behavioral data (such as typicality norms) as well as brain imaging data. The latter data will then used as additional supervision to inject more hierarchical structure into the learned semantic spaces. One the other hand, we intend to experiment with training regimes for word embedding algorithms that are closer to those of humans when they acquire language, controlling the quantity as well as the linguistic complexity of the inputs fed to the learning algorithms through the use of longitudinal and child directed speech corpora (e.g., CHILDES, Colaje). In both cases, both English and French data will be considered. The successful candidate holds a Master's degree in computational linguistics or computer science or cognitive science and has prior experience in word embedding models. Furthermore, the candidate will provide strong programming skills, expertise in machine learning approaches and is eager to work across languages. The position is affiliated with the MAGNET team at Inria, Lille [1] as well as with the SCALAB group at University of Lille [2] in an effort to strenghten collaborations between these two groups, and ultimately foster cross-fertilizations between Natural Language Processing and Psycholinguistics. Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English or French, should include a brief cover letter with research interests and vision, a CV (including your contact address, work experience, publications), and contact information for at least 2 referees. Applications (and questions) should be sent to Angèle Brunellière (angele.brunelliere(a)univ-lille.fr) and Pascal Denis (pascal.denis(a)inria.fr). The starting date of the position is 1 October 2022 or soon thereafter, for a total of 3 full years. Best regards, Angèle Brunellière and Pascal Denis [1] https://team.inria.fr/magnet/ [2] https://scalab.univ-lille.fr/ -- Pascal ---- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. ---- +++++++++++++++++++++++++++++++++++++++++++++++ Pascal Denis Equipe MAGNET, INRIA Lille Nord Europe Bâtiment B, Avenue Heloïse Parc scientifique de la Haute Borne 59650 Villeneuve d'Ascq Tel: ++33 3 59 35 87 24 Url: http://researchers.lille.inria.fr/~pdenis/ +++++++++++++++++++++++++++++++++++++++++++++++

1 2

3-year PhD position in Automatic Argumentation Mining in French Legal Decisions (Inria Lille, University of Lille, and LexisNexis France)
by Pascal Denis 10 Nov '23

10 Nov '23

Hi there, Could you please distribute the following job offer? Thanks. Best, Pascal ------------------------------------------------------------------------------------- We invite applications for a 3-year PhD position co-funded by Inria, the French national research institute in Computer Science and Applied Mathematics, and LexisNexis France, leader of legal information in France and subsidiary of the RELX Group. The overall objective of this project is to develop an automated system for detecting argumentation structures in French legal decisions, using recent machine learning-based approaches (i.e. deep learning approaches). In the general case, these structures take the form of a directed labeled graph, whose nodes are the elements of the text (propositions or groups of propositions, not necessarily contiguous) which serve as components of the argument, and edges are relations that signal the argumentative connection between them (e.g., support, offensive). By revealing the argumentation structure behind legal decisions, such a system will provide a crucial milestone towards their detailed understanding, their use by legal professionals, and above all contributes to greater transparency of justice. The main challenges and milestones of this project start with the creation and release of a large-scale dataset of French legal decisions annotated with argumentation structures. To minimize the manual annotation effort, we will resort to semi-supervised and transfer learning techniques to leverage existing argument mining corpora, such as the European Court of Human Rights (ECHR) corpus, as well as annotations already started by LexisNexis. Another promising research direction, which is likely to improve over state-of-the-art approaches, is to better model the dependencies between the different sub-tasks (argument span detection, argument typing, etc.) instead of learning these tasks independently. A third research avenue is to find innovative ways to inject the domain knowledge (in particular the rich legal ontology developed by LexisNexis) to enrich enrich the representations used in these models. Finally, we would like to take advantage of other discourse structures, such as coreference and rhetorical relations, conceived as auxiliary tasks in a multi-tasking architecture. The successful candidate holds a Master's degree in computational linguistics, natural language processing, machine learning, ideally with prior experience in legal document processing and discourse processing. Furthermore, the candidate will provide strong programming skills, expertise in machine learning approaches and is eager to work at the interplay between academia and industry. The position is affiliated with the MAGNET [1], a research group at Inria, Lille, which has expertise in Machine Learning and Natural Language Processing, in particular Discourse Processing. The PhD student will also work in close collaboration with the R&D team at LexisNexis France, who will provide their expertise in the legal domain and the data they have collected. Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English or French, should include a brief cover letter with research interests and vision, a CV (including your contact address, work experience, publications), and contact information for at least 2 referees. Applications (and questions) should be sent to Pascal Denis (pascal.denis(a)inria.fr). The starting date of the position is 1 November 2022 or soon thereafter, for a total of 3 full years. Best regards, Pascal Denis [1] https://team.inria.fr/magnet/ [2] https://www.lexisnexis.fr/ -- Pascal ---- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. ---- +++++++++++++++++++++++++++++++++++++++++++++++ Pascal Denis Equipe MAGNET, INRIA Lille Nord Europe Bâtiment B, Avenue Heloïse Parc scientifique de la Haute Borne 59650 Villeneuve d'Ascq Tel: ++33 3 59 35 87 24 Url: http://researchers.lille.inria.fr/~pdenis/ +++++++++++++++++++++++++++++++++++++++++++++++

1 3

Core metadata scheme for learner corpora - feedback needed!
by Magali Paquot 30 Oct '23

30 Oct '23

Dear colleagues, Last month, we shared the result of our collaborative work on a core metadata scheme for learner corpora with LCR2022 participants. Our proposal builds on Granger and Paquot (2017)'s first attempt to design such a scheme and during our presentation, we explained the rationale for expanding on the initial proposal and discussed selected aspects of the revised scheme. Our proposal is available at https://docs.google.com/spreadsheets/d/1-RbX5iUCUtCBkZU9Rfk-kv-Vzc--F-eUW2O…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…> We firmly believe that our efforts to develop a core metadata scheme for learner corpora will only be successful to the extent that (1) the LCR community is given the opportunity to engage with our work in various ways (provide feedback on the general structure of the scheme, the list of variables that we identified as core and their operationalization; test the metadata on other learner corpora; use the scheme to start a new corpus compilation, etc.) and (2) the core metadata scheme is the result of truly collaborative work. As mentioned at LCR2022, we will be collecting feedback on the metadata scheme until the end of October. The online feedback form is available at: https://docs.google.com/document/d/1NeDUuxGJlPSJI9wHVA1xgGM-aV8jXTa8Qlb45K-…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…> We'd like to thank all the colleagues who already got back to us (at LCR2022, by email or via the online form). We also thank them for their appreciation and enthusiasm for our work! We'd also like to encourage more colleagues (and particularly those of you who have experience in learner corpus compilation) to provide feedback! We need help in finalizing the core metadata scheme to make sure that it can be applied in all learner compilation contexts. In short, we need you to make sure the scheme meets the needs of the LCR community at large. With very best wishes, Magali Paquot (also on behalf of Alexander König, Jennifer-Carmen Frey, and Egon W. Stemle) Reference Granger, S. & M. Paquot (2017). Towards standardization of metadata for L2 corpora. Invited talk at the CLARIN workshop on Interoperability of Second Language Resources and Tools, 6-8 December 2017, University of Gothenburg, Sweden. Dr. Magali Paquot Centre for English Corpus Linguistics Institut Langage et Communication UCLouvain https://perso.uclouvain.be/magali.paquot/

1 1

New articles for Asia Pacific Journal of Corpus Research (APJCR) Vol. 3, No. 1 are available online (Open Access)
by Prof CK Jung 08 Jul '23

08 Jul '23

Dear all Just wanted to let you know that APJCR Vol. 3, No. 1 is now available to view online. http://icr.or.kr/ejournals-apjcr CK --- *CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford* Department of English Language and Literature, Incheon National University, *South Korea* Vice President | The Korea Association of Primary English Education (KAPEE), *South Korea* Vice President | The Korea Association of Secondary English Education (KASEE), *South Korea* Director | Institute for Corpus Research, Incheon National University, *South Korea* (http://icr.or.kr) Editor | Asia Pacific Journal of Corpus Research, ICR, *International* ( http://icr.or.kr/apjcr) Deputy Editor | Korean Journal of English Language and Linguistics, KASELL, *South Korea* Editorial Board | Corpora, Edinburgh University Press, *UK* Editorial Board | English Today, Cambridge University Press, *UK* E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129 H(EN): http://ckjung.org H(KR): http://prof1.inu.ac.kr/user/ckjung

1 12

PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing
by François Portet 01 Jun '23

01 Jun '23

PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing Starting date: November 1st, 2022 (flexible) Application deadline: September 5th, 2022 Interviews (tentative): September 19th, 2022 Salary: ~2000€ gross/month (social security included) Mission: research oriented (teaching possible but not mandatory) *Keywords:*speech processing, natural language processing, self-supervised learning, knowledge informed learning, Robustness, fairness *CONTEXT* The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies) will start on November 1st 2022. Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks. *PROJECT OBJECTIVES* Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an impressive impact on downstream tasks performance. This is mainly due to their ability to benefit from a large amount of data at the cost of a tremendous carbon footprint rather than improving the efficiency of the learning. Another question related to SSL models is their unpredictable results once applied to realistic scenarios which exhibit their lack of robustness. Furthermore, as for any pre-trained models applied in society, it isimportant to be able to measure the bias of such models since they can augment social unfairness. The goals of this PhD position are threefold: - to design new evaluation metrics for SSL of speech models ; - to develop knowledge-driven SSL algorithms ; - to propose methods for learning robust and unbiased representations. SSL models are evaluated with downstream task-dependent metrics e.g., word error rate for speech recognition. This couple the evaluation of the universality of SSL representations to a potentially biased and costly fine-tuning that also hides the efficiencyinformation related to the pre-training cost. In practice, we will seek to measure the training efficiency as the ratio between the amount of data, computation and memory needed to observe a certain gain in terms of performance on a metric of interest i.e.,downstream dependent or not. The first step will be to document standard markers that can be used as robust measurements to assess these values robustly at training time. Potential candidates are, for instance, floating point operations for computational intensity, number of neural parameters coupled with precision for storage, online measurement of memory consumption for training and cumulative input sequence length for data. Most state-of-the-art SSL models for speech rely onmasked prediction e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such prevalence in the literature is mostly linked to the size, amount of data and computational resources injected by thecompany producing these models. In fact, vanilla masking approaches and contrastive losses may be identified as uninformed solutions as they do not benefit from in-domain expertise. For instance, it has been demonstrated that blindly masking frames in theinput signal i.e. HuBERT and WavLM results in much worse downstream performance than applying unsupervised phonetic boundaries [Yue2021] to generate informed masks. Recently some studies have demonstrated the superiority of an informed multitask learning strategy carefully selecting self-supervised pretext-tasks with respect to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive learning loss [Zaiem2022]. In this PhD project, our objective is: 1. continue to develop knowledge-driven SSL algorithms reaching higher efficiency ratios and results at the convergence, data consumption and downstream performance levels; and 2. scale these novel approaches to a point enabling the comparison with current state-of-the-art systems and therefore motivating a paradigm change in SSL for the wider speech community. Despite remarkable performance on academic benchmarks, SSL powered technologies e.g. speech and speaker recognition, speech synthesis and many others may exhibit highly unpredictable results once applied to realistic scenarios. This can translate into a global accuracy drop due to a lack of robustness to adversarial acoustic conditions, or biased and discriminatory behaviors with respect to different pools of end users. Documenting and facilitating the control of such aspects prior to the deployment of SSL models into the real-life is necessary for the industrial market. To evaluate such aspects, within the project, we will create novel robustness regularization and debasing techniques along two axes: 1. debasing and regularizing speech representations at the SSL level; 2. debasing and regularizing downstream-adapted models (e.g. using a pre-trained model). To ensure the creation of fair and robust SSL pre-trained models, we propose to act both at the optimization and data levels following some of our previous work on adversarial protected attribute disentanglement and the NLP literature on data sampling and augmentation [Noé2021]. Here, we wish to extend this technique to more complex SSL architectures and more realistic conditions by increasing the disentanglement complexity i.e. the sex attribute studied in [Noé2021] is particularly discriminatory. Then, and to benefit from the expert knowledge induced by the scope of the task of interest, we will build on a recent introduction of task-dependent counterfactual equal odds criteria [Sari2021] to minimize the downstream performance gap observed in between different individuals of certain protected attributes and to maximize the overall accuracy. Following this multi-objective optimization scheme, we will then inject further identified constraints as inspired by previous NLP work [Zhao2017]. Intuitively, constraints are injected so the predictions are calibrated towards a desired distribution i.e. unbiased. *SKILLS* * Master 2 in Natural Language Processing, Speech Processing, computer science or data science. * Good mastering of Python programming and deep learning framework. * Previous in Self-Supervised Learning, acoustic modeling or ASR would be a plus * Very good communication skills in English * Good command of French would be a plus but is not mandatory *SCIENTIFIC ENVIRONMENT* The thesis will be conducted within the Getalp teams of the LIG laboratory (_https://lig-getalp.imag.fr/_ <https://lig-getalp.imag.fr/>) and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team and the LIA have a strong expertise and track record in Natural Language Processing and speech processing. The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment. The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and LIA. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments. The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon) and Benjamin Lecouteux and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore, the project will involve one of the founders of SpeechBrain, Titouan Parcollet with whom the candidate will interact closely. *INSTRUCTIONS FOR APPLYING* Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Mickael Rouvier (_mickael.rouvier(a)univ-avignon.fr_ <mailto:mickael.rouvier@univ-avignon.fr>), Benjamin Lecouteux(benjamin.lecouteux(a)univ-grenoble-alpes.fr) and François Portet (_francois.Portet(a)imag.fr_ <mailto:francois.Portet@imag.fr>). We celebrate diversity and are committed to creating an inclusive environment for all employees. *REFERENCES:* [Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T., Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation in Proc. Interspeech 2021 (2021), 1902–1906. [Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3515–3525 (2021) [Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech Representation Learning in Proc. Interspeech 2021 (2021), 746–750. [Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection for Multitask Self-Supervised Speech Representation in AAAI, The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing, 2023 (2022). [Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), 2979–2989. -- François PORTET Professeur - Univ Grenoble Alpes Laboratoire d'Informatique de Grenoble - Équipe GETALP Bâtiment IMAG - Office 333 700 avenue Centrale Domaine Universitaire - 38401 St Martin d'Hères FRANCE Phone: +33 (0)4 57 42 15 44 Email:francois.portet@imag.fr www:http://membres-liglab.imag.fr/portet/

1 6

Grenoble Interdisciplinary Institute in Artificial Intelligence : Call for Research Fellow Chairs 2023
by François Portet 22 May '23

22 May '23

*Call for Research Fellow Chairs 2023* MIAI, the Grenoble Interdisciplinary Institute in Artificial Intelligence (https://miai.univ-grenoble-alpes.fr/), is opening three research fellow chairs in AI reserved to persons who have spent most of their research career outside France (see below). MIAI is one of the four AI institutes created by the French government and is dedicated to AI for the human beings and the environment. Research activities in MIAI aim to cover all aspects of AI and applications of AI with a current focus on embedded and hardware architectures for AI, learning and reasoning, perception and interaction, AI & society, AI for health, AI for environment & energy, and AI for industry 4.0. These research fellow chairs aim to to address important and ambitious research problems in AI-related fields and will partly pave the way for the future research to be conducted in MIAI. Successful candidates will be appointed by MIAI and will be allocated, for the whole duration of the chair, a budget of 250k€ covering PhD and/or postdoc salaries, internships, travels, … They will be part of MIAI and the French network of AI institutes (comprising, in addition to MIAI, the AI institutes in Paris, Toulouse and Nice) which provide a very dynamic environment for conducting research in AI. *Eligibility*//To be eligible, candidates must hold a PhD from a non-French university obtained after January 2014 for male applicants and after 2014-/n/, where /n/ is the number of children, for female applicants. They must also have spent more than two thirds of their research career since the beginning of their PhD outside France. Lastly, they should be pursuing internationally recognized research in AI-related fields (including applications of AI to any research field). *To apply* Interested candidates should first contact Eric Gaussier (eric.gaussier(a)univ-grenoble-alpes.fr) to discuss salary and application modalities. It is important to note that candidates should identify a local collaborator working in one of the Grenoble academic research labs with whom they will interact. If selected, they will join the research team of this collaborator. They should then send their application to Manel Boumegoura (manel.boumegoura(a)univ-grenoble-alpes.fr) and Eric Gaussier (eric.gaussier(a)univ-grenoble-alpes.fr) /by March 11, 2023/. Each application should comprise a 2-page CV, a complete list of publications, 2 reference letters, a letter from the local collaborator indicating the relevance and importance of the proposed project, and a 4-page description of the research project which can target any topic of AI or applications of AI. It is important to emphasize, in the description, the ambition, the originality and the potential impact of the research to be conducted, as well as the collaborations the candidate has or will develop with Grenoble researchers in order to achieve her or his research goals. *Starting date and duration* Each chair is intended for 3 to 4 years, starting no later than September 2023. *Location* The work will take place in Grenoble, in the research lab of the identified collaborator. For any question, please contact Eric Gaussier (eric.gaussier(a)univ-grenoble-alpes.fr) or Manel Boumegoura (manel.boumegoura(a)univ-grenoble-alpes.fr). *******

1 1

CfP - Linking Lexicographic and Language Learning Resources - 4LR workshop @ LDK 2023
by kris.heylen＠ivdnt.org 12 May '23

12 May '23

** apologies for cross-posting ** Linking Lexicographic and Language Learning Resources (4LR) Workshop at LDK 2023 – Call for Papers Workshop website: https://lexicala.com/4lr/ The workshop ‘Linking Lexicographic and Language Learning Resources’ (4LR) will be held in conjunction with LDK 2023 – 4th conference on Language, Data and Knowledge – (http://2023.ldk-conf.org/) at the University of Vienna, Austria, on September 13 (tentative), in hybrid mode. The aim of this workshop is to explore linguistic linked (open) data and knowledge management methods and technologies for linking lexicographic and language learning resources, tools and applications in general and dictionaries and CEFR lists in particular. Our starting point is, on the one hand, enhancing CEFR-graded language proficiency lists with lexicographic content and, on the other hand, incorporating CEFR labels in learner’s dictionaries. CEFR – the Common European Framework of Reference for Languages – is a generally established international standard for describing language proficiency, and CEFR-graded resources have been developed for many languages in Europe. However, incorporating their information is still not a common practice in modern lexicography for most languages, except for notably two English dictionaries for advanced learners (Cambridge and Oxford). There are substantial unsolved issues, such as inconsistencies in vocabulary size per level between languages; no, or limited, sense disambiguation in CEFR resources; words from a higher CEFR level in definitions and example sentences. Moreover, there has been limited collaboration and interoperability so far among the related fields of lexicography, language acquisition, and linguistic linked data, whether regarding research, development, or practical application. 4LR will feature an overview by the organizers, as well as an invited talk by Jorge Gracia from University of Zaragoza and chair of NexusLinguarum CA on Linked Data for Lexicographic Resources. In addition, we invite submissions for papers (20 minutes, plus discussion) on the following topics: • Linking lexicographic content to CEFR-graded vocabularies • Pedagogical lexicography and knowledge graphs • Attributing CEFR labels in learner’s dictionaries • Incorporating vocabulary and grammar profiles in lexicographic resources • Creating and linking crosslingual concept-based CEFR resources • Multilingual knowledge management and language learning applications and tools SUBMISSION AND DATES Please submit your abstract including 300-500 words via EasyChair [https://easychair.org/conferences/?conf=4lr2023]. 19 May 2023 Deadline for abstract submission 29 May 2023 Deadline for notification for abstract submission 30 June 2023 Deadline for camera-ready paper submission 13 Sep 2023 (tentative) 4LR workshop 14–15 Sep LDK 2023 conference ORGANIZERS AND CONTACT Kris Heylen. Dutch Language Institute (Kris DOT Heylen AT ivdnt DOT org) Jelena Kallas. Institute of the Estonian Language Ilan Kernerman. Lexicala by K Dictionaries Carole Tiberius. Dutch Language Institute Website: https://lexicala.com/4lr/ 4LR is supported by NexusLinguarum COST Action (CA18209) – European network for Web-centered linguistic data science. xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 4LR workshop at LDK 2023 will follow a related workshop ‘Lexicography and CEFR: Linking Lexicographic Resources and Language Proficiency Levels’ that will be held in conjunction with eLex 2023 on June 29 in Brno, Czech Republic.

1 2

eLex 2023: Invisible Lexicography - Call for papers
by Miloš Jakubíček 10 May '23

10 May '23

(apologies for multiple postings) *CALL FOR PAPERS* <https://elex.link/elex2023/call-for-papers/> *eLex 2023: Electronic lexicography in the 21st century.* The topic of next year's conference is Invisible Lexicography. Dates: 27-29 June 2023 (with workshops on June 26th) Venue: Hotel Passage, Brno, Czechia Deadline for abstract submissions: January 31st 2023 Conference website: https://elex.link/elex2023/ Language of the conference: English Format: The conference will be organized as a hybrid event and while we encourage everyone to participate on-site, we plan to provide live streaming and recording of the event for registered participants. Looking forward to seeing you all in Brno, Miloš Jakubíček in the name of the organising committee

1 4

Digital lexicography and lexical computing workshop, Cambridge, UK
by Ondřej Matuška 02 May '23

02 May '23

*<Lexicom/>* a workshop in digital lexicography and lexical computing *SAVE THE DATE* *Jesus College, Cambridge, UK*11 – 15 September 2023 Your 5 days to get up-to-date with the latest developments in *corpus-driven lexicography* and to activate and enhance your *corpus building and corpus query skills* with some of the top experts in the field. For the programme, lecturers, invited speakers, fees and registration, visit this website *lexicom.courses <https://lexicom.courses/lexicom-2023-cambridge-uk-lexicography-workshop/>* I hope to meet you in Cambridge in September! Ondřej *Ondřej Matuška* sketchengine.co.uk <http://www.sketchengine.co.uk> | Facebook <https://www.facebook.com/SketchEngine/> | LinkedIn <https://www.linkedin.com/in/ondrejmatuska> | Twitter <https://twitter.com/SketchEngine>

2 1

1st CFP: 18th Workshop on Innovative Use of NLP for Building Educational Applications
by Ekaterina Kochmar 21 Apr '23

21 Apr '23

First Call for Papers The 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023) Toronto Thursday, July 13, 2023 (co-located with ACL 2023) https://sig-edu.org/bea/current *Submission Deadline: Monday, April 24, 2023, 11:59pm UTC-12* WORKSHOP DESCRIPTION The BEA Workshop is a leading venue for NLP innovation in the context of educational applications. It is one of the largest one-day workshops in the ACL community with over 100 registered attendees in the past several years. The growing interest in educational applications and a diverse community of researchers involved resulted in the creation of the Special Interest Group in Educational Applications (SIGEDU) <https://www.aclweb.org/adminwiki/index.php?title=2019Q3_Reports:_SIGEDU> in 2017, which currently has over 300 members. We will solicit papers that incorporate NLP methods, including, but not limited to: - automated scoring of open-ended textual and spoken responses; - automated scoring/evaluation for written student responses (across multiple genres); - game-based instruction and assessment; - educational data mining; - intelligent tutoring; - collaborative learning environments; - peer review; - grammatical error detection and correction; - learner cognition; - spoken dialog; - multimodal applications; - annotation standards and schemas; - tools and applications for classroom teachers, learners and/or test developers; and - use of corpora in educational tools. INVITED TALKS The workshop will feature invited talks from Susan Lottridge (Cambium Assessment) and Jordana Heller (Textio), as well as a speaker from one of the IAALDE <https://alliancelss.com/> societies. IMPORTANT DATES All deadlines are 11:59 pm UTC-12 (anywhere on earth). - Anonymity Period Begins: *Friday, March 24, 2023* - Submission Deadline: Monday, April 24, 2023 - Notification of Acceptance: Monday, May 22, 2023 - Camera-ready Papers Due: Tuesday, May 30, 2023 - Workshop: Thursday, July 13, 2023 SUBMISSION INFORMATION We will be using the ACL Submission Guidelines for the BEA Workshop this year. Authors are invited to submit a long paper of up to eight (8) pages of content, plus unlimited references; final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers’ comments can be taken into account. We also invite short papers of up to four (4) pages of content, plus unlimited references. Upon acceptance, short papers will be given five (5) content pages in the proceedings. Authors are encouraged to use this additional page to address reviewers’ comments in their final versions. Papers which describe systems are also invited to give a demo of their system. If you would like to present a demo in addition to presenting the paper, please make sure to select either “long paper + demo” or “short paper + demo” under “Submission Category” in the START submission page. Previously published papers cannot be accepted. The submissions will be reviewed by the program committee. As reviewing will be blind, please ensure that papers are anonymous. Self-references that reveal the author’s identity, e.g., “We previously showed (Smith, 1991) …”, should be avoided. Instead, use citations such as “Smith previously showed (Smith, 1991) …”. We have also included conflict of interest in the submission form. You should mark all potential reviewers who have been authors on the paper, are from the same research group or institution, or who have seen versions of this paper or discussed it with you. We will be using the START conference system to manage submissions: https://www.softconf.com/acl2023/bea2023/ DOUBLE SUBMISSION POLICY We will follow the official ACL double-submission policy <https://www.aclweb.org/archive/policies/current/double-submission-policy.ht…>. Specifically: Papers being submitted both to BEA and another conference or workshop must: ● Note on the title page the other conference or workshop to which they are being submitted. ● State on the title page that if the authors choose to present their paper at BEA (assuming it was accepted), then the paper will be withdrawn from other conferences and workshops. ORGANIZING COMMITTEE - Ekaterina Kochmar <https://ekochmar.github.io/about/>, MBZUAI - Jill Burstein <https://sites.google.com/site/jbursteinets/>, Duolingo - Andrea Horbach <https://www.ltl.uni-due.de/team/andrea-horbach/>, FernUniversität in Hagen - Ronja Laarmann-Quante <https://www.ltl.uni-due.de/team/ronja-laarmann-quante>, Ruhr University Bochum - Nitin Madnani <https://desilinguist.org/>, Educational Testing Service - Anaïs Tack <https://anaistack.github.io/>, KU Leuven - Victoria Yaneva <http://www.victoriayaneva.info/>, National Board of Medical Examiners - Zheng Yuan <https://www.cl.cam.ac.uk/~zy249/>, King’s College London - Torsten Zesch <https://www.ltl.uni-due.de/team/torsten-zesch>, FernUniversität in Hagen Workshop contact email address: bea.nlp.workshop(a)gmail.com

1 4

2025

2024

2023

2022

Corpora February 2023