Hi there,
Could you please distribute the following job offer? Thanks.
Best,
Pascal
-------------------------------------------------------------------------------------
We invite applications for a 3-year PhD position co-funded by Inria,
the French national research institute in Computer Science and Applied
Mathematics, and LexisNexis France, leader of legal information in
France and subsidiary of the RELX Group.
The overall objective of this project is to develop an automated
system for detecting argumentation structures in French legal
decisions, using recent machine learning-based approaches (i.e. deep
learning approaches). In the general case, these structures take the
form of a directed labeled graph, whose nodes are the elements of the
text (propositions or groups of propositions, not necessarily
contiguous) which serve as components of the argument, and edges are
relations that signal the argumentative connection between them (e.g.,
support, offensive). By revealing the argumentation structure behind
legal decisions, such a system will provide a crucial milestone
towards their detailed understanding, their use by legal
professionals, and above all contributes to greater transparency of
justice.
The main challenges and milestones of this project start with the
creation and release of a large-scale dataset of French legal
decisions annotated with argumentation structures. To minimize the
manual annotation effort, we will resort to semi-supervised and
transfer learning techniques to leverage existing argument mining
corpora, such as the European Court of Human Rights (ECHR) corpus, as
well as annotations already started by LexisNexis. Another promising
research direction, which is likely to improve over state-of-the-art
approaches, is to better model the dependencies between the different
sub-tasks (argument span detection, argument typing, etc.) instead of
learning these tasks independently. A third research avenue is to find
innovative ways to inject the domain knowledge (in particular the rich
legal ontology developed by LexisNexis) to enrich enrich the
representations used in these models. Finally, we would like to take
advantage of other discourse structures, such as coreference and
rhetorical relations, conceived as auxiliary tasks in a multi-tasking
architecture.
The successful candidate holds a Master's degree in computational
linguistics, natural language processing, machine learning, ideally
with prior experience in legal document processing and discourse
processing. Furthermore, the candidate will provide strong programming
skills, expertise in machine learning approaches and is eager to work
at the interplay between academia and industry.
The position is affiliated with the MAGNET [1], a research group at
Inria, Lille, which has expertise in Machine Learning and Natural
Language Processing, in particular Discourse Processing. The PhD
student will also work in close collaboration with the R&D team at
LexisNexis France, who will provide their expertise in the legal
domain and the data they have collected.
Applications will be considered until the position is filled. However,
you are encouraged to apply early as we shall start processing the
applications as and when they are received. Applications, written in
English or French, should include a brief cover letter with research
interests and vision, a CV (including your contact address, work
experience, publications), and contact information for at least 2
referees. Applications (and questions) should be sent to Pascal Denis
(pascal.denis(a)inria.fr).
The starting date of the position is 1 November 2022 or soon
thereafter, for a total of 3 full years.
Best regards,
Pascal Denis
[1] https://team.inria.fr/magnet/
[2] https://www.lexisnexis.fr/
--
Pascal
----
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.
----
+++++++++++++++++++++++++++++++++++++++++++++++
Pascal Denis
Equipe MAGNET, INRIA Lille Nord Europe
Bâtiment B, Avenue Heloïse
Parc scientifique de la Haute Borne
59650 Villeneuve d'Ascq
Tel: ++33 3 59 35 87 24
Url: http://researchers.lille.inria.fr/~pdenis/
+++++++++++++++++++++++++++++++++++++++++++++++
Dear colleagues,
Last month, we shared the result of our collaborative work on a core metadata scheme for learner corpora with LCR2022 participants. Our proposal builds on Granger and Paquot (2017)'s first attempt to design such a scheme and during our presentation, we explained the rationale for expanding on the initial proposal and discussed selected aspects of the revised scheme.
Our proposal is available at https://docs.google.com/spreadsheets/d/1-RbX5iUCUtCBkZU9Rfk-kv-Vzc--F-eUW2O…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…>
We firmly believe that our efforts to develop a core metadata scheme for learner corpora will only be successful to the extent that (1) the LCR community is given the opportunity to engage with our work in various ways (provide feedback on the general structure of the scheme, the list of variables that we identified as core and their operationalization; test the metadata on other learner corpora; use the scheme to start a new corpus compilation, etc.) and (2) the core metadata scheme is the result of truly collaborative work.
As mentioned at LCR2022, we will be collecting feedback on the metadata scheme until the end of October. The online feedback form is available at:
https://docs.google.com/document/d/1NeDUuxGJlPSJI9wHVA1xgGM-aV8jXTa8Qlb45K-…<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.goog…>
We'd like to thank all the colleagues who already got back to us (at LCR2022, by email or via the online form). We also thank them for their appreciation and enthusiasm for our work! We'd also like to encourage more colleagues (and particularly those of you who have experience in learner corpus compilation) to provide feedback! We need help in finalizing the core metadata scheme to make sure that it can be applied in all learner compilation contexts. In short, we need you to make sure the scheme meets the needs of the LCR community at large.
With very best wishes,
Magali Paquot (also on behalf of Alexander König, Jennifer-Carmen Frey, and Egon W. Stemle)
Reference
Granger, S. & M. Paquot (2017). Towards standardization of metadata for L2 corpora. Invited talk at the CLARIN workshop on Interoperability of Second Language Resources and Tools, 6-8 December 2017, University of Gothenburg, Sweden.
Dr. Magali Paquot
Centre for English Corpus Linguistics
Institut Langage et Communication
UCLouvain
https://perso.uclouvain.be/magali.paquot/
Dear all
Just wanted to let you know that APJCR Vol. 3, No. 1 is now available to
view online.
http://icr.or.kr/ejournals-apjcr
CK
---
*CK Jung BEng(Hons) Birmingham MSc Warwick EdD Warwick Cert Oxford*
Department of English Language and Literature, Incheon National
University, *South
Korea*
Vice President | The Korea Association of Primary English Education
(KAPEE), *South Korea*
Vice President | The Korea Association of Secondary English Education
(KASEE), *South Korea*
Director | Institute for Corpus Research, Incheon National University, *South
Korea* (http://icr.or.kr)
Editor | Asia Pacific Journal of Corpus Research, ICR, *International* (
http://icr.or.kr/apjcr)
Deputy Editor | Korean Journal of English Language and Linguistics,
KASELL, *South
Korea*
Editorial Board | Corpora, Edinburgh University Press, *UK*
Editorial Board | English Today, Cambridge University Press, *UK*
E: ckjung(a)inu.ac.kr / T: +82 (0)32 835 8129
H(EN): http://ckjung.org
H(KR): http://prof1.inu.ac.kr/user/ckjung
PhD in ML/NLP – Efficient, Fair, robust and knowledge informed
self-supervised learning for speech processing
Starting date: November 1st, 2022 (flexible)
Application deadline: September 5th, 2022
Interviews (tentative): September 19th, 2022
Salary: ~2000€ gross/month (social security included)
Mission: research oriented (teaching possible but not mandatory)
*Keywords:*speech processing, natural language processing,
self-supervised learning, knowledge informed learning, Robustness, fairness
*CONTEXT*
The ANR project E-SSL (Efficient Self-Supervised Learning for Inclusive
and Innovative Speech Technologies) will start on November 1st 2022.
Self-supervised learning (SSL) has recently emerged as one of the most
promising artificial intelligence (AI) methods as it becomes now
feasible to take advantage of the colossal amounts of existing unlabeled
data to significantly improve the performances of various speech
processing tasks.
*PROJECT OBJECTIVES*
Recent SSL models for speech such as HuBERT or wav2vec 2.0 have shown an
impressive impact on downstream tasks performance. This is mainly due to
their ability to benefit from a large amount of data at the cost of a
tremendous carbon footprint rather than improving the efficiency of the
learning. Another question related to SSL models is their unpredictable
results once applied to realistic scenarios which exhibit their lack of
robustness. Furthermore, as for any pre-trained models applied in
society, it isimportant to be able to measure the bias of such models
since they can augment social unfairness.
The goals of this PhD position are threefold:
- to design new evaluation metrics for SSL of speech models ;
- to develop knowledge-driven SSL algorithms ;
- to propose methods for learning robust and unbiased representations.
SSL models are evaluated with downstream task-dependent metrics e.g.,
word error rate for speech recognition. This couple the evaluation of
the universality of SSL representations to a potentially biased and
costly fine-tuning that also hides the efficiencyinformation related to
the pre-training cost. In practice, we will seek to measure the training
efficiency as the ratio between the amount of data, computation and
memory needed to observe a certain gain in terms of performance on a
metric of interest i.e.,downstream dependent or not. The first step will
be to document standard markers that can be used as robust measurements
to assess these values robustly at training time. Potential candidates
are, for instance, floating point operations for computational
intensity, number of neural parameters coupled with precision for
storage, online measurement of memory consumption for training and
cumulative input sequence length for data.
Most state-of-the-art SSL models for speech rely onmasked prediction
e.g. HuBERT and WavLM, or contrastive losses e.g. wav2vec 2.0. Such
prevalence in the literature is mostly linked to the size, amount of
data and computational resources injected by thecompany producing these
models. In fact, vanilla masking approaches and contrastive losses may
be identified as uninformed solutions as they do not benefit from
in-domain expertise. For instance, it has been demonstrated that blindly
masking frames in theinput signal i.e. HuBERT and WavLM results in much
worse downstream performance than applying unsupervised phonetic
boundaries [Yue2021] to generate informed masks. Recently some studies
have demonstrated the superiority of an informed multitask learning
strategy carefully selecting self-supervised pretext-tasks with respect
to a set of downstream tasks, over the vanilla wav2vec 2.0 contrastive
learning loss [Zaiem2022]. In this PhD project, our objective is: 1.
continue to develop knowledge-driven SSL algorithms reaching higher
efficiency ratios and results at the convergence, data consumption and
downstream performance levels; and 2. scale these novel approaches to a
point enabling the comparison with current state-of-the-art systems and
therefore motivating a paradigm change in SSL for the wider speech
community.
Despite remarkable performance on academic benchmarks, SSL powered
technologies e.g. speech and speaker recognition, speech synthesis and
many others may exhibit highly unpredictable results once applied to
realistic scenarios. This can translate into a global accuracy drop due
to a lack of robustness to adversarial acoustic conditions, or biased
and discriminatory behaviors with respect to different pools of end
users. Documenting and facilitating the control of such aspects prior to
the deployment of SSL models into the real-life is necessary for the
industrial market. To evaluate such aspects, within the project, we will
create novel robustness regularization and debasing techniques along two
axes: 1. debasing and regularizing speech representations at the SSL
level; 2. debasing and regularizing downstream-adapted models (e.g.
using a pre-trained model).
To ensure the creation of fair and robust SSL pre-trained models, we
propose to act both at the optimization and data levels following some
of our previous work on adversarial protected attribute disentanglement
and the NLP literature on data sampling and augmentation [Noé2021].
Here, we wish to extend this technique to more complex SSL architectures
and more realistic conditions by increasing the disentanglement
complexity i.e. the sex attribute studied in [Noé2021] is particularly
discriminatory. Then, and to benefit from the expert knowledge induced
by the scope of the task of interest, we will build on a recent
introduction of task-dependent counterfactual equal odds criteria
[Sari2021] to minimize the downstream performance gap observed in
between different individuals of certain protected attributes and to
maximize the overall accuracy. Following this multi-objective
optimization scheme, we will then inject further identified constraints
as inspired by previous NLP work [Zhao2017]. Intuitively, constraints
are injected so the predictions are calibrated towards a desired
distribution i.e. unbiased.
*SKILLS*
*
Master 2 in Natural Language Processing, Speech Processing, computer
science or data science.
*
Good mastering of Python programming and deep learning framework.
*
Previous in Self-Supervised Learning, acoustic modeling or ASR would
be a plus
*
Very good communication skills in English
*
Good command of French would be a plus but is not mandatory
*SCIENTIFIC ENVIRONMENT*
The thesis will be conducted within the Getalp teams of the LIG
laboratory (_https://lig-getalp.imag.fr/_ <https://lig-getalp.imag.fr/>)
and the LIA laboratory (https://lia.univ-avignon.fr/). The GETALP team
and the LIA have a strong expertise and track record in Natural Language
Processing and speech processing. The recruited person will be welcomed
within the teams which offer a stimulating, multinational and pleasant
working environment.
The means to carry out the PhD will be providedboth in terms of missions
in France and abroad and in terms of equipment. The candidate will have
access to the cluster of GPUs of both the LIG and LIA. Furthermore,
access to the National supercomputer Jean-Zay will enable to run large
scale experiments.
The PhD position will be co-supervised by Mickael Rouvier (LIA, Avignon)
and Benjamin Lecouteux and François Portet (Université Grenoble Alpes).
Joint meetings are planned on a regular basis and the student is
expected to spend time in both places. Moreover, the PhD student will
collaborate with several team members involved in the project in
particular the two other PhD candidates who will be recruited and the
partners from LIA, LIG and Dauphine Université PSL, Paris. Furthermore,
the project will involve one of the founders of SpeechBrain, Titouan
Parcollet with whom the candidate will interact closely.
*INSTRUCTIONS FOR APPLYING*
Applications must contain: CV + letter/message of motivation + master
notes + be ready to provide letter(s) of recommendation; and be
addressed to Mickael Rouvier (_mickael.rouvier(a)univ-avignon.fr_
<mailto:mickael.rouvier@univ-avignon.fr>), Benjamin
Lecouteux(benjamin.lecouteux(a)univ-grenoble-alpes.fr) and François Portet
(_francois.Portet(a)imag.fr_ <mailto:francois.Portet@imag.fr>). We
celebrate diversity and are committed to creating an inclusive
environment for all employees.
*REFERENCES:*
[Noé2021] Noé, P.- G., Mohammadamini, M., Matrouf, D., Parcollet, T.,
Nautsch, A. & Bonastre, J.- F. Adversarial Disentanglement of Speaker
Representation for Attribute-Driven Privacy Preservation in Proc.
Interspeech 2021 (2021), 1902–1906.
[Sari2021] Sarı, L., Hasegawa-Johnson, M. & Yoo, C. D. Counterfactually
Fair Automatic Speech Recognition. IEEE/ACM Transactions on Audio,
Speech, and Language Processing 29, 3515–3525 (2021)
[Yue2021] Yue, X. & Li, H. Phonetically Motivated Self-Supervised Speech
Representation Learning in Proc. Interspeech 2021 (2021), 746–750.
[Zaiem2022] Zaiem, S., Parcollet, T. & Essid, S. Pretext Tasks Selection
for Multitask Self-Supervised Speech Representation in AAAI, The 2nd
Workshop on Self-supervised Learning for Audio and Speech Processing,
2023 (2022).
[Zhao2017] Zhao, J., Wang, T., Yatskar, M., Ordonez, V. & Chang, K. - W.
Men Also Like Shopping: Reducing Gender Bias Amplification using
Corpus-level Constraints in Proceedings of the 2017 Conference on
Empirical Methods in Natural Language Processing (2017), 2979–2989.
--
François PORTET
Professeur - Univ Grenoble Alpes
Laboratoire d'Informatique de Grenoble - Équipe GETALP
Bâtiment IMAG - Office 333
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
FRANCE
Phone: +33 (0)4 57 42 15 44
Email:francois.portet@imag.fr
www:http://membres-liglab.imag.fr/portet/
(apologies for multiple postings)
*CALL FOR PAPERS* <https://elex.link/elex2023/call-for-papers/>
*eLex 2023: Electronic lexicography in the 21st century.* The topic of next
year's conference is Invisible Lexicography.
Dates: 27-29 June 2023 (with workshops on June 26th)
Venue: Hotel Passage, Brno, Czechia
Deadline for abstract submissions: January 31st 2023
Conference website: https://elex.link/elex2023/
Language of the conference: English
Format:
The conference will be organized as a hybrid event and while we encourage
everyone to participate on-site, we plan to provide live streaming and
recording of the event for registered participants.
Looking forward to seeing you all in Brno,
Miloš Jakubíček
in the name of the organising committee
Hello,
Could you please distribute the following job offer? Thanks.
Best,
Pascal
-------------------------------------------------------------------------------------
3-year PhD position in Computational Models of Semantic Memory and its Acquisition (Inria and University of Lille, France)
We invite applications for a 3-year PhD position at the University of
Lille in the context of the recently funded research project
"COMANCHE" (Computational Models of Lexical Meaning and Change). The
position is funded by Inria, the French national research institute in
Computer Science and Applied Mathematics.
COMANCHE proposes to transfer and adapt neural word embeddings
algorithms to model the acquisition and evolution of word meaning, by
comparing them with linguistic theories on language acquisition and
language evolution. At the intersection between Natural Language
Processing, psycholinguistics and historical linguistics, this project
intends to validate or revise some of these theories, while also
developing computational models that are less data hungry and
computationally intensive as they exploit new inductive biases
inspired by these disciplines.
The first strand of the project, on which the successful candidate
will work, focuses on the development of computational models of
semantic memory and its acquisition. Two main research directions will
be pursued. On the one hand, we will compare the structural properties
associated to different semantic spaces derived from word embedding
algorithms to those found in human semantic memory as reflected in
behavioral data (such as typicality norms) as well as brain imaging
data. The latter data will then used as additional supervision to
inject more hierarchical structure into the learned semantic
spaces. One the other hand, we intend to experiment with training
regimes for word embedding algorithms that are closer to those of
humans when they acquire language, controlling the quantity as well as
the linguistic complexity of the inputs fed to the learning algorithms
through the use of longitudinal and child directed speech corpora
(e.g., CHILDES, Colaje). In both cases, both English and French data
will be considered.
The successful candidate holds a Master's degree in computational
linguistics or computer science or cognitive science and has prior
experience in word embedding models. Furthermore, the candidate will
provide strong programming skills, expertise in machine learning
approaches and is eager to work across languages.
The position is affiliated with the MAGNET team at Inria, Lille [1] as
well as with the SCALAB group at University of Lille [2] in an effort
to strenghten collaborations between these two groups, and ultimately
foster cross-fertilizations between Natural Language Processing and
Psycholinguistics.
Applications will be considered until the position is filled. However,
you are encouraged to apply early as we shall start processing the
applications as and when they are received. Applications, written in
English or French, should include a brief cover letter with research
interests and vision, a CV (including your contact address, work
experience, publications), and contact information for at least 2
referees. Applications (and questions) should be sent to Angèle
Brunellière (angele.brunelliere(a)univ-lille.fr) and Pascal Denis
(pascal.denis(a)inria.fr).
The starting date of the position is 1 October 2022 or soon
thereafter, for a total of 3 full years.
Best regards,
Angèle Brunellière and Pascal Denis
[1] https://team.inria.fr/magnet/
[2] https://scalab.univ-lille.fr/
--
Pascal
----
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.
----
+++++++++++++++++++++++++++++++++++++++++++++++
Pascal Denis
Equipe MAGNET, INRIA Lille Nord Europe
Bâtiment B, Avenue Heloïse
Parc scientifique de la Haute Borne
59650 Villeneuve d'Ascq
Tel: ++33 3 59 35 87 24
Url: http://researchers.lille.inria.fr/~pdenis/
+++++++++++++++++++++++++++++++++++++++++++++++
== 11th NLP4CALL, Louvain-la-Neuve, Belgium==
The workshop series on Natural Language Processing (NLP) for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote development of "Computational SLA" through setting up Second Language research infrastructure(s), on the other.
The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings "understanding" of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools.
The NLP4CALL workshop series is aimed at bringing together competences from these areas for sharing experiences and brainstorming around the future of the field.
We welcome papers:
- that describe research directly aimed at ICALL;
- that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning;
- that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts and responses, individualized learning solutions, provision of feedback;
- that discuss challenges and/or research agenda for ICALL
- that describe empirical studies on language learner data.
This year a special focus is given to work done on second language vocabulary and grammar profiling, as well as the use of crowdsourcing for creating, collecting and curating data in NLP projects.
We encourage paper presentations and software demonstrations describing the above-mentioned themes primarily, but not exclusively, for the Nordic languages.
==Invited speakers==
This year, we have the pleasure to announce two invited talks.
The first talk is by Christopher Bryant from Reverso and the University of Cambridge.
The second talk is given by Marije Michel from the University of Amsterdam.
==Submission information==
Authors are invited to submit long papers (8-12 pages) alternatively short papers (4-7 pages), page count not including references. We will be using the NLP4CALL workshop template for the workshop this year. The author kit, including LaTeX and Microsoft Word templates can be accessed here, alternatively on Overleaf:
<https://spraakbanken.gu.se/sites/default/files/2022/NLP4CALL%20workshop%20t…>
<https://spraakbanken.gu.se/sites/default/files/2022/nlp4call%20template.doc>
<https://www.overleaf.com/latex/templates/nlp4call-workshop-template/qqqzqqy…>
Submissions will be managed through the electronic conference management system EasyChair <https://easychair.org/conferences/?conf=nlp4call2022>. Papers must be submitted digitally through the conference management system, in PDF format. Final camera-ready versions of accepted papers will be given an additional page to address reviewer comments.
Papers should describe original unpublished work or work-in-progress. Papers will be peer reviewed by at least two members of the program committee in a double-blind fashion. All accepted papers will be collected into a proceedings volume to be submitted for publication in the NEALT Proceeding Series (Linköping Electronic Conference Proceedings) and, additionally, double-published through the ACL anthology, following experiences from the previous NLP4CALL editions (<https://www.aclweb.org/anthology/venues/nlp4call/>).
==Important dates==
7 October 2022: paper submission deadline
4 November 2022: notification of acceptance
25 November 2022: camera-ready papers for publication
9 December 2022: workshop date
==Organizers==
David Alfter (1,2), Elena Volodina (2), Thomas François (1), Piet Desmet (3), Frederik Cornillie (3), Arne Jönsson (4), Eveline Rennes (4)
(1) CENTAL, Institute for Language and Communication, Université Catholique de Louvain, Belgium
(2) Språkbanken, University of Gothenburg, Sweden
(3) Itec, Department of Linguistics at KU Leuven & imec, Belgium
(4) Department of Computer and Information Science, Linköping University, Sweden
==Contact==
For any questions, please contact David Alfter, david.alfter(a)uclouvain.be
For further information, see the workshop website <https://spraakbanken.gu.se/en/research/themes/icall/nlp4call-workshop-serie…>
Follow us on Twitter @NLP4CALL <https://twitter.com/NLP4CALL/>
David Alfter, PhD
Post-doctoral researcher
Institut Langage et communication, CENTAL
Université catholique de Louvain
Place Montesquieu, 3 (box L2.06.04)
1348 Louvain-la-Neuve
The project “Reading concordances in the 21st century (RC21)”, run jointly by Friedrich-Alexander-Universität Erlangen-Nürnberg and the University of Birmingham is looking for a
POSTDOCTORAL RESEARCHER IN COMPUTATIONAL LINGUISTICS (100%, E13 TV-L, starting ASAP)
Application deadline: 16 Dec 2022
Interviews: week beginning 19 Dec 2022 (held via zoom)
Format of application: by email to stephanie.evert(a)fau.de
(including evidence of all relevant qualifications, preferably in a single PDF)
Start of position: 1 Feb 2023
Duration of employment: until 31 Jan 2025
Placement: Computational Corpus Linguistics group (www.linguistik.fau.de)
Contact / queries: Stephanie Evert
https://www.jobs.fau.de/jobs/postdoctoral-researcher-in-computational-lingu…
This is the first of two postdoctoral positions. The second post, which will be based at the University of Birmingham, will be advertised in the new year, with a starting date in the spring.
PROJECT INFO
In today's digital world, the amount of text communicated in electronic form is ever-increasing and there is a growing need for approaches and methods to extract meanings from texts at scale. Corpus linguists have long been studying recurring patterns in digitised texts with the help of concordances, i.e. displays that show many occurrences of a word, phrase or construction across a range of contexts in a compact format. However, lacking a well-established and clear-cut methodology, the art of reading concordances has not yet realised its full potential. At the same time, there has been very little innovation in algorithms in the concordance software packages available to corpus linguists. This project proposes an innovative approach to reading concordances in the 21st century. Through the collaboration between the University of Birmingham and Friedrich-Alexander-Universität Erlangen-Nürnberg we combine strengths in theoretical work in corpus linguistics with expertise in computational algorithms in order to develop a systematic methodology for reading concordances and corresponding algorithms for the semi-automatic analysis of concordance lines. Through two case studies on English and German data sets, we will establish an approach that not only provides innovation in corpus linguistics, but also has wider implications for the analysis of textual data at scale, while still retaining a humanities perspective.
RESPONSIBILITIES OF THE POSTDOCTORAL RESEARCHER
• Developing, implementing, and applying novel computer algorithms to support and improve the manipulation of concordance displays
• Producing documentation
• Carrying out interdisciplinary case studies using the new methodology in close collaboration with other researchers
• Leading on the organisation of workshops and other dissemination activities
• Developing training materials for workshops and materials for web presence
• Analysing data, writing up results, and co-authoring publications
• Managing project tasks, supervising student assistants, and providing progress reports
• Managing collaborative activities with the UK team
REQUIRED QUALIFICATIONS
• PhD or DPhil in Computational Linguistics, Computer Science, Machine Learning, Corpus Linguistics, Computational Humanities, or similar subject (completed or near completion)
• Evidence of strong programming skills, ideally in Python
• Evidence of ability to analyse linguistic data
• Native-like language skills in German and English
• Excellent communication skills in English, including the ability to write for publication, present research proposals and results, and represent the project team at meetings and research events
• Ability to work independently, manage own academic research and associated activities, and to supervise student assistants
OPTIONAL QUALIFICATIONS
• Strong publication record (commensurate with opportunities and experience) at relevant international conferences (computational linguistics, DH, corpus linguistics)
• Experience in interdisciplinary research is a plus
• Experience with deep learning approaches and/or statistical methods is a plus