November 2022 - Corpora

3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France)
by Pascal Denis 13 May '25

13 May '25

Hello, Could you please distribute the following job offer? Thanks. Best, Pascal ------------------------------------------------------------------------------------- 3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France) We invite applications for a 3-year PhD position at the University of Lille in the context of the recently funded research project "COMANCHE" (Computational Models of Lexical Meaning and Change). The position is funded by Inria, the French national research institute in Computer Science and Applied Mathematics. COMANCHE proposes to transfer and adapt neural word embeddings algorithms to model the acquisition and evolution of word meaning, by comparing them with linguistic theories on language acquisition and language evolution. At the intersection between Natural Language Processing, psycholinguistics and historical linguistics, this project intends to validate or revise some of these theories, while also developing computational models that are less data hungry and computationally intensive as they exploit new inductive biases inspired by these disciplines. The first strand of the project, on which the successful candidate will work, focuses on the development of computational models of semantic memory and its acquisition. Two main research directions will be pursued. On the one hand, we will compare the structural properties associated to different semantic spaces derived from word embedding algorithms to those found in human semantic memory as reflected in behavioral data (such as typicality norms) as well as brain imaging data. The latter data will then used as additional supervision to inject more hierarchical structure into the learned semantic spaces. One the other hand, we intend to experiment with training regimes for word embedding algorithms that are closer to those of humans when they acquire language, controlling the quantity as well as the linguistic complexity of the inputs fed to the learning algorithms through the use of longitudinal and child directed speech corpora (e.g., CHILDES, Colaje). In both cases, both English and French data will be considered. The successful candidate holds a Master's degree in computational linguistics or computer science or cognitive science and has prior experience in word embedding models. Furthermore, the candidate will provide strong programming skills, expertise in machine learning approaches and is eager to work across languages. The position is affiliated with the MAGNET team at Inria, Lille [1] as well as with the SCALAB group at University of Lille [2] in an effort to strenghten collaborations between these two groups, and ultimately foster cross-fertilizations between Natural Language Processing and Psycholinguistics. Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English or French, should include a brief cover letter with research interests and vision, a CV (including your contact address, work experience, publications), and contact information for at least 2 referees. Applications (and questions) should be sent to Angèle Brunellière (angele.brunelliere(a)univ-lille.fr) and Pascal Denis (pascal.denis(a)inria.fr). The starting date of the position is 1 October 2022 or soon thereafter, for a total of 3 full years. Best regards, Angèle Brunellière and Pascal Denis [1] https://team.inria.fr/magnet/ [2] https://scalab.univ-lille.fr/ -- Pascal ---- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. ---- +++++++++++++++++++++++++++++++++++++++++++++++ Pascal Denis Equipe MAGNET, INRIA Lille Nord Europe Bâtiment B, Avenue Heloïse Parc scientifique de la Haute Borne 59650 Villeneuve d'Ascq Tel: ++33 3 59 35 87 24 Url: http://researchers.lille.inria.fr/~pdenis/ +++++++++++++++++++++++++++++++++++++++++++++++

1 2

3-year PhD position in Automatic Argumentation Mining in French Legal Decisions (Inria Lille, University of Lille, and LexisNexis France)
by Pascal Denis 10 Nov '23

10 Nov '23

Hi there, Could you please distribute the following job offer? Thanks. Best, Pascal ------------------------------------------------------------------------------------- We invite applications for a 3-year PhD position co-funded by Inria, the French national research institute in Computer Science and Applied Mathematics, and LexisNexis France, leader of legal information in France and subsidiary of the RELX Group. The overall objective of this project is to develop an automated system for detecting argumentation structures in French legal decisions, using recent machine learning-based approaches (i.e. deep learning approaches). In the general case, these structures take the form of a directed labeled graph, whose nodes are the elements of the text (propositions or groups of propositions, not necessarily contiguous) which serve as components of the argument, and edges are relations that signal the argumentative connection between them (e.g., support, offensive). By revealing the argumentation structure behind legal decisions, such a system will provide a crucial milestone towards their detailed understanding, their use by legal professionals, and above all contributes to greater transparency of justice. The main challenges and milestones of this project start with the creation and release of a large-scale dataset of French legal decisions annotated with argumentation structures. To minimize the manual annotation effort, we will resort to semi-supervised and transfer learning techniques to leverage existing argument mining corpora, such as the European Court of Human Rights (ECHR) corpus, as well as annotations already started by LexisNexis. Another promising research direction, which is likely to improve over state-of-the-art approaches, is to better model the dependencies between the different sub-tasks (argument span detection, argument typing, etc.) instead of learning these tasks independently. A third research avenue is to find innovative ways to inject the domain knowledge (in particular the rich legal ontology developed by LexisNexis) to enrich enrich the representations used in these models. Finally, we would like to take advantage of other discourse structures, such as coreference and rhetorical relations, conceived as auxiliary tasks in a multi-tasking architecture. The successful candidate holds a Master's degree in computational linguistics, natural language processing, machine learning, ideally with prior experience in legal document processing and discourse processing. Furthermore, the candidate will provide strong programming skills, expertise in machine learning approaches and is eager to work at the interplay between academia and industry. The position is affiliated with the MAGNET [1], a research group at Inria, Lille, which has expertise in Machine Learning and Natural Language Processing, in particular Discourse Processing. The PhD student will also work in close collaboration with the R&D team at LexisNexis France, who will provide their expertise in the legal domain and the data they have collected. Applications will be considered until the position is filled. However, you are encouraged to apply early as we shall start processing the applications as and when they are received. Applications, written in English or French, should include a brief cover letter with research interests and vision, a CV (including your contact address, work experience, publications), and contact information for at least 2 referees. Applications (and questions) should be sent to Pascal Denis (pascal.denis(a)inria.fr). The starting date of the position is 1 November 2022 or soon thereafter, for a total of 3 full years. Best regards, Pascal Denis [1] https://team.inria.fr/magnet/ [2] https://www.lexisnexis.fr/ -- Pascal ---- Pour une évaluation indépendante, transparente et rigoureuse ! Je soutiens la Commission d'Évaluation de l'Inria. ---- +++++++++++++++++++++++++++++++++++++++++++++++ Pascal Denis Equipe MAGNET, INRIA Lille Nord Europe Bâtiment B, Avenue Heloïse Parc scientifique de la Haute Borne 59650 Villeneuve d'Ascq Tel: ++33 3 59 35 87 24 Url: http://researchers.lille.inria.fr/~pdenis/ +++++++++++++++++++++++++++++++++++++++++++++++

1 3

Core metadata scheme for learner corpora - feedback needed!
by Magali Paquot 30 Oct '23

30 Oct '23

Dear colleagues, Last month, we shared the result of our collaborative work on a core metadata scheme for learner corpora with LCR2022 participants. Our proposal builds on Granger and Paquot (2017)'s first attempt to design such a scheme and during our presentation, we explained the rationale for expanding on the initial proposal and discussed selected aspects of the revised scheme. Our proposal is available at https://docs.google.com/spreadsheets/d/1-RbX5iUCUtCBkZU9Rfk-kv-Vzc--F-eUW2O…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…> We firmly believe that our efforts to develop a core metadata scheme for learner corpora will only be successful to the extent that (1) the LCR community is given the opportunity to engage with our work in various ways (provide feedback on the general structure of the scheme, the list of variables that we identified as core and their operationalization; test the metadata on other learner corpora; use the scheme to start a new corpus compilation, etc.) and (2) the core metadata scheme is the result of truly collaborative work. As mentioned at LCR2022, we will be collecting feedback on the metadata scheme until the end of October. The online feedback form is available at: https://docs.google.com/document/d/1NeDUuxGJlPSJI9wHVA1xgGM-aV8jXTa8Qlb45K-…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…> We'd like to thank all the colleagues who already got back to us (at LCR2022, by email or via the online form). We also thank them for their appreciation and enthusiasm for our work! We'd also like to encourage more colleagues (and particularly those of you who have experience in learner corpus compilation) to provide feedback! We need help in finalizing the core metadata scheme to make sure that it can be applied in all learner compilation contexts. In short, we need you to make sure the scheme meets the needs of the LCR community at large. With very best wishes, Magali Paquot (also on behalf of Alexander König, Jennifer-Carmen Frey, and Egon W. Stemle) Reference Granger, S. & M. Paquot (2017). Towards standardization of metadata for L2 corpora. Invited talk at the CLARIN workshop on Interoperability of Second Language Resources and Tools, 6-8 December 2017, University of Gothenburg, Sweden. Dr. Magali Paquot Centre for English Corpus Linguistics Institut Langage et Communication UCLouvain https://perso.uclouvain.be/magali.paquot/

1 1

New articles for Asia Pacific Journal of Corpus Research (APJCR) Vol. 3, No. 1 are available online (Open Access)
by Prof CK Jung 08 Jul '23

08 Jul '23

Dear all Just wanted to let you know that APJCR Vol. 3, No. 1 is now available to view online. http://icr.or.kr/ejournals-apjcr CK --- *CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford* Department of English Language and Literature, Incheon National University, *South Korea* Vice President | The Korea Association of Primary English Education (KAPEE), *South Korea* Vice President | The Korea Association of Secondary English Education (KASEE), *South Korea* Director | Institute for Corpus Research, Incheon National University, *South Korea* (http://icr.or.kr) Editor | Asia Pacific Journal of Corpus Research, ICR, *International* ( http://icr.or.kr/apjcr) Deputy Editor | Korean Journal of English Language and Linguistics, KASELL, *South Korea* Editorial Board | Corpora, Edinburgh University Press, *UK* Editorial Board | English Today, Cambridge University Press, *UK* E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129 H(EN): http://ckjung.org H(KR): http://prof1.inu.ac.kr/user/ckjung

1 12

PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing
by François Portet 01 Jun '23

01 Jun '23

PhD in ML/NLP – Efficient, Fair, robust and knowledge informed self-supervised learning for speech processing Starting date: November 1st, 2022 (flexible) Application deadline: September 5th, 2022 Interviews (tentative): September 19th, 2022 Salary: ~2000€ gross/month (social security included) Mission: research oriented (teaching possible but not mandatory) *Keywords:*speech processing, natural language processing, self-supervised learning, knowledge informed learning, Robustness, fairness *CONTEXT* The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive and Innovative Speech Technologies) will start on November 1st 2022. Self-supervised learning (SSL) has recently emerged as one of the most promising artificial intelligence (AI) methods as it becomes now feasible to take advantage of the colossal amounts of existing unlabeled data to significantly improve the performances of various speech processing tasks. *PROJECT OBJECTIVES* Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an impressive impact on downstream tasks performance. This is mainly due to their ability to benefit from a large amount of data at the cost of a tremendous carbon footprint rather than improving the efficiency of the learning. Another question related to SSL models is their unpredictable results once applied to realistic scenarios which exhibit their lack of robustness. Furthermore, as for any pre-trained models applied in society, it isimportant to be able to measure the bias of such models since they can augment social unfairness. The goals of this PhD position are threefold: - to design new evaluation metrics for SSL of speech models ; - to develop knowledge-driven SSL algorithms ; - to propose methods for learning robust and unbiased representations. SSL models are evaluated with downstream task-dependent metrics e.g., word error rate for speech recognition. This couple the evaluation of the universality of SSL representations to a potentially biased and costly fine-tuning that also hides the efficiencyinformation related to the pre-training cost. In practice, we will seek to measure the training efficiency as the ratio between the amount of data, computation and memory needed to observe a certain gain in terms of performance on a metric of interest i.e.,downstream dependent or not. The first step will be to document standard markers that can be used as robust measurements to assess these values robustly at training time. Potential candidates are, for instance, floating point operations for computational intensity, number of neural parameters coupled with precision for storage, online measurement of memory consumption for training and cumulative input sequence length for data. Most state-of-the-art SSL models for speech rely onmasked prediction e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such prevalence in the literature is mostly linked to the size, amount of data and computational resources injected by thecompany producing these models. In fact, vanilla masking approaches and contrastive losses may be identified as uninformed solutions as they do not benefit from in-domain expertise. For instance, it has been demonstrated that blindly masking frames in theinput signal i.e. HuBERT and WavLM results in much worse downstream performance than applying unsupervised phonetic boundaries [Yue2021] to generate informed masks. Recently some studies have demonstrated the superiority of an informed multitask learning strategy carefully selecting self-supervised pretext-tasks with respect to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive learning loss [Zaiem2022]. In this PhD project, our objective is: 1. continue to develop knowledge-driven SSL algorithms reaching higher efficiency ratios and results at the convergence, data consumption and downstream performance levels; and 2. scale these novel approaches to a point enabling the comparison with current state-of-the-art systems and therefore motivating a paradigm change in SSL for the wider speech community. Despite remarkable performance on academic benchmarks, SSL powered technologies e.g. speech and speaker recognition, speech synthesis and many others may exhibit highly unpredictable results once applied to realistic scenarios. This can translate into a global accuracy drop due to a lack of robustness to adversarial acoustic conditions, or biased and discriminatory behaviors with respect to different pools of end users. Documenting and facilitating the control of such aspects prior to the deployment of SSL models into the real-life is necessary for the industrial market. To evaluate such aspects, within the project, we will create novel robustness regularization and debasing techniques along two axes: 1. debasing and regularizing speech representations at the SSL level; 2. debasing and regularizing downstream-adapted models (e.g. using a pre-trained model). To ensure the creation of fair and robust SSL pre-trained models, we propose to act both at the optimization and data levels following some of our previous work on adversarial protected attribute disentanglement and the NLP literature on data sampling and augmentation [Noé2021]. Here, we wish to extend this technique to more complex SSL architectures and more realistic conditions by increasing the disentanglement complexity i.e. the sex attribute studied in [Noé2021] is particularly discriminatory. Then, and to benefit from the expert knowledge induced by the scope of the task of interest, we will build on a recent introduction of task-dependent counterfactual equal odds criteria [Sari2021] to minimize the downstream performance gap observed in between different individuals of certain protected attributes and to maximize the overall accuracy. Following this multi-objective optimization scheme, we will then inject further identified constraints as inspired by previous NLP work [Zhao2017]. Intuitively, constraints are injected so the predictions are calibrated towards a desired distribution i.e. unbiased. *SKILLS* * Master 2 in Natural Language Processing, Speech Processing, computer science or data science. * Good mastering of Python programming and deep learning framework. * Previous in Self-Supervised Learning, acoustic modeling or ASR would be a plus * Very good communication skills in English * Good command of French would be a plus but is not mandatory *SCIENTIFIC ENVIRONMENT* The thesis will be conducted within the Getalp teams of the LIG laboratory (_https://lig-getalp.imag.fr/_ <https://lig-getalp.imag.fr/>) and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team and the LIA have a strong expertise and track record in Natural Language Processing and speech processing. The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment. The means to carry out the PhD will be providedboth in terms of missions in France and abroad and in terms of equipment. The candidate will have access to the cluster of GPUs of both the LIG and LIA. Furthermore, access to the National supercomputer Jean-Zay will enable to run large scale experiments. The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon) and Benjamin Lecouteux and François Portet (Université Grenoble Alpes). Joint meetings are planned on a regular basis and the student is expected to spend time in both places. Moreover, the PhD student will collaborate with several team members involved in the project in particular the two other PhD candidates who will be recruited and the partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore, the project will involve one of the founders of SpeechBrain, Titouan Parcollet with whom the candidate will interact closely. *INSTRUCTIONS FOR APPLYING* Applications must contain: CV + letter/message of motivation + master notes + be ready to provide letter(s) of recommendation; and be addressed to Mickael Rouvier (_mickael.rouvier(a)univ-avignon.fr_ <mailto:mickael.rouvier@univ-avignon.fr>), Benjamin Lecouteux(benjamin.lecouteux(a)univ-grenoble-alpes.fr) and François Portet (_francois.Portet(a)imag.fr_ <mailto:francois.Portet@imag.fr>). We celebrate diversity and are committed to creating an inclusive environment for all employees. *REFERENCES:* [Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T., Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation in Proc. Interspeech 2021 (2021), 1902–1906. [Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3515–3525 (2021) [Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech Representation Learning in Proc. Interspeech 2021 (2021), 746–750. [Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection for Multitask Self-Supervised Speech Representation in AAAI, The 2nd Workshop on Self-supervised Learning for Audio and Speech Processing, 2023 (2022). [Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), 2979–2989. -- François PORTET Professeur - Univ Grenoble Alpes Laboratoire d'Informatique de Grenoble - Équipe GETALP Bâtiment IMAG - Office 333 700 avenue Centrale Domaine Universitaire - 38401 St Martin d'Hères FRANCE Phone: +33 (0)4 57 42 15 44 Email:francois.portet@imag.fr www:http://membres-liglab.imag.fr/portet/

1 6

eLex 2023: Invisible Lexicography - Call for papers
by Miloš Jakubíček 10 May '23

10 May '23

(apologies for multiple postings) *CALL FOR PAPERS* <https://elex.link/elex2023/call-for-papers/> *eLex 2023: Electronic lexicography in the 21st century.* The topic of next year's conference is Invisible Lexicography. Dates: 27-29 June 2023 (with workshops on June 26th) Venue: Hotel Passage, Brno, Czechia Deadline for abstract submissions: January 31st 2023 Conference website: https://elex.link/elex2023/ Language of the conference: English Format: The conference will be organized as a hybrid event and while we encourage everyone to participate on-site, we plan to provide live streaming and recording of the event for registered participants. Looking forward to seeing you all in Brno, Miloš Jakubíček in the name of the organising committee

1 4

Digital lexicography and lexical computing workshop, Cambridge, UK
by Ondřej Matuška 02 May '23

02 May '23

*<Lexicom/>* a workshop in digital lexicography and lexical computing *SAVE THE DATE* *Jesus College, Cambridge, UK*11 – 15 September 2023 Your 5 days to get up-to-date with the latest developments in *corpus-driven lexicography* and to activate and enhance your *corpus building and corpus query skills* with some of the top experts in the field. For the programme, lecturers, invited speakers, fees and registration, visit this website *lexicom.courses <https://lexicom.courses/lexicom-2023-cambridge-uk-lexicography-workshop/>* I hope to meet you in Cambridge in September! Ondřej *Ondřej Matuška* sketchengine.co.uk <http://www.sketchengine.co.uk> | Facebook <https://www.facebook.com/SketchEngine/> | LinkedIn <https://www.linkedin.com/in/ondrejmatuska> | Twitter <https://twitter.com/SketchEngine>

2 1

[Corpora-List]NLP4CALL 2022 First call for papers
by David Alfter 20 Feb '23

20 Feb '23

== 11th NLP4CALL, Louvain-la-Neuve, Belgium== The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of "Computational SLA" through setting up Second Language research infrastructure(s), on the other. The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings "understanding" of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools. The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field. We welcome papers: - that describe research directly aimed at ICALL; - that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning; - that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback; - that discuss challenges and/or research agenda for ICALL - that describe empirical studies on language learner data. This year a special focus is given to work done on second language vocabulary and grammar profiling, as well as the use of crowdsourcing for creating, collecting and curating data in NLP projects. We encourage paper presentations and software demonstrations describing the above-mentioned themes primarily, but not exclusively, for the Nordic languages. ==Invited speakers== This year, we have the pleasure to announce two invited talks. The first talk is by Christopher Bryant from Reverso and the University of Cambridge. The second talk is given by Marije Michel from the University of Amsterdam. ==Submission information== Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references. We will be using the NLP4CALL workshop template for the workshop this year. The author kit, including LaTeX and Microsoft Word templates can be accessed here, alternatively on Overleaf: <https://spraakbanken.gu.se/sites/default/files/2022/NLP4CALL%20workshop%20t…> <https://spraakbanken.gu.se/sites/default/files/2022/nlp4call%20template.doc> <https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…> Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2022>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments. Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>). ==Important dates== 7 October 2022: paper submission deadline 4 November 2022: notification of acceptance 25 November 2022: camera-ready papers for publication 9 December 2022: workshop date ==Organizers== David Alfter (1,2), Elena Volodina (2), Thomas François (1), Piet Desmet (3), Frederik Cornillie (3), Arne Jönsson (4), Eveline Rennes (4) (1) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium (2) Språkbanken, University of Gothenburg, Sweden (3) Itec, Department of Linguistics at KU Leuven & imec, Belgium (4) Department of Computer and Information Science, Linköping University, Sweden ==Contact== For any questions, please contact David Alfter, david.alfter(a)uclouvain.be For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…> Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/> David Alfter, PhD Post-doctoral researcher Institut Langage et communication, CENTAL Université catholique de Louvain Place Montesquieu, 3 (box L2.06.04) 1348 Louvain-la-Neuve

1 7

2-year full-time postdoc in computational linguistics at FAU Erlangen-Nürnberg
by Stephanie Evert 05 Dec '22

05 Dec '22

The project “Reading concordances in the 21st century (RC21)”, run jointly by Friedrich-Alexander-Universität Erlangen-Nürnberg and the University of Birmingham is looking for a POSTDOCTORAL RESEARCHER IN COMPUTATIONAL LINGUISTICS (100%, E13 TV-L, starting ASAP) Application deadline: 16 Dec 2022 Interviews: week beginning 19 Dec 2022 (held via zoom) Format of application: by email to stephanie.evert(a)fau.de (including evidence of all relevant qualifications, preferably in a single PDF) Start of position: 1 Feb 2023 Duration of employment: until 31 Jan 2025 Placement: Computational Corpus Linguistics group (www.linguistik.fau.de) Contact / queries: Stephanie Evert https://www.jobs.fau.de/jobs/postdoctoral-researcher-in-computational-lingu… This is the first of two postdoctoral positions. The second post, which will be based at the University of Birmingham, will be advertised in the new year, with a starting date in the spring. PROJECT INFO In today's digital world, the amount of text communicated in electronic form is ever-increasing and there is a growing need for approaches and methods to extract meanings from texts at scale. Corpus linguists have long been studying recurring patterns in digitised texts with the help of concordances, i.e. displays that show many occurrences of a word, phrase or construction across a range of contexts in a compact format. However, lacking a well-established and clear-cut methodology, the art of reading concordances has not yet realised its full potential. At the same time, there has been very little innovation in algorithms in the concordance software packages available to corpus linguists. This project proposes an innovative approach to reading concordances in the 21st century. Through the collaboration between the University of Birmingham and Friedrich-Alexander-Universität Erlangen-Nürnberg we combine strengths in theoretical work in corpus linguistics with expertise in computational algorithms in order to develop a systematic methodology for reading concordances and corresponding algorithms for the semi-automatic analysis of concordance lines. Through two case studies on English and German data sets, we will establish an approach that not only provides innovation in corpus linguistics, but also has wider implications for the analysis of textual data at scale, while still retaining a humanities perspective. RESPONSIBILITIES OF THE POSTDOCTORAL RESEARCHER • Developing, implementing, and applying novel computer algorithms to support and improve the manipulation of concordance displays • Producing documentation • Carrying out interdisciplinary case studies using the new methodology in close collaboration with other researchers • Leading on the organisation of workshops and other dissemination activities • Developing training materials for workshops and materials for web presence • Analysing data, writing up results, and co-authoring publications • Managing project tasks, supervising student assistants, and providing progress reports • Managing collaborative activities with the UK team REQUIRED QUALIFICATIONS • PhD or DPhil in Computational Linguistics, Computer Science, Machine Learning, Corpus Linguistics, Computational Humanities, or similar subject (completed or near completion) • Evidence of strong programming skills, ideally in Python • Evidence of ability to analyse linguistic data • Native-like language skills in German and English • Excellent communication skills in English, including the ability to write for publication, present research proposals and results, and represent the project team at meetings and research events • Ability to work independently, manage own academic research and associated activities, and to supervise student assistants OPTIONAL QUALIFICATIONS • Strong publication record (commensurate with opportunities and experience) at relevant international conferences (computational linguistics, DH, corpus linguistics) • Experience in interdisciplinary research is a plus • Experience with deep learning approaches and/or statistical methods is a plus

2 1

UK Jobs in Computing and Data Science
by Eric Atwell 30 Nov '22

30 Nov '22

Promotion opportunities: Head of the School of Computing<https://jobs.leeds.ac.uk/vacancy.aspx?ref=ASN-84381> University of Leeds Lecturers in Computer Science<https://jobs.leeds.ac.uk/vacancy.aspx?ref=EPSCP1116> University of Leeds Lecturer in Computer Science<https://jobs.leeds.ac.uk/vacancy.aspx?ref=SWJTU1007> University of Leeds Teaching Assistants in Computer Science<https://jobs.leeds.ac.uk/vacancy.aspx?ref=EPSCP1115> University of Leeds Data Engineering Developer<https://jobs.leeds.ac.uk/vacancy.aspx?ref=ITDSV1000> University of Leeds Data Analyst<https://jobs.leeds.ac.uk/vacancy.aspx?ref=ITDSV1002> University of Leeds Professor in Artificial Intelligence<https://www.linkedin.com/jobs/view/3375574406/?eBP=JYMBII_JOBS_HOME_ORGANIC…> Manchester Metropolitan University Professor in Data Science<https://www.linkedin.com/jobs/view/3381166449/?eBP=JYMBII_JOBS_HOME_ORGANIC…> University of the Arts London Deputy Dean, Faculty of Computing,<https://www.linkedin.com/jobs/view/3376789059/?eBP=JYMBII_JOBS_HOME_ORGANIC…> De Montford University Leicester Professor in Creative Computing<https://www.linkedin.com/jobs/view/3381171110/?eBP=JYMBII_JOBS_HOME_ORGANIC…> University of the Arts London Programme Director - Computing<https://www.linkedin.com/jobs/view/3368580632/?eBP=JYMBII_JOBS_HOME_ORGANIC…> Kaplan International, Leeds office Professor/Reader of Statistical Data Science<https://www.linkedin.com/jobs/view/3373317196/?eBP=JYMBII_JOBS_HOME_ORGANIC…> Newcastle University Professor in Fintech<https://www.linkedin.com/jobs/view/3356652057/?eBP=JYMBII_JOBS_HOME_ORGANIC…> University of East London Director of Education<https://www.linkedin.com/jobs/view/3344955080/?eBP=JYMBII_JOBS_HOME_ORGANIC…> HyperionDev Computing Education Senior Lecturer/Associate Professor: Data Science<https://www.linkedin.com/jobs/view/3374847671/?eBP=JYMBII_JOBS_HOME_ORGANIC…> Falmouth University

1 0

2026

2025

2024

2023

2022

Corpora November 2022