- Corpora - ELRA lists

New positions at Language Technologies Unit at BSC-CNS
by Montserrat Marimon 12 Jan '23

12 Jan '23

The Language Technologies Unit at the Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) invites applications for the following 6 positions: - Deep Learning Engineer for Language Technologies (RE1): https://www.bsc.es/join-us/job-opportunities/1123lstmre1 - Deep Learning Engineer for Language Technologies (RE2): https://www.bsc.es/join-us/job-opportunities/1423lstmre <https://www.bsc.es/join-us/job-opportunities/1423lstmre2> - <https://www.bsc.es/join-us/job-opportunities/1423lstmre2>Machine Translation Engineer (RE1): https://www.bsc.es/join-us/job-opportunities/1023lstmre1 - Machine Translation Engineer (RE2): https://www.bsc.es/join-us/job-opportunities/1323lstmre <https://www.bsc.es/join-us/job-opportunities/1323lstmre2> - <https://www.bsc.es/join-us/job-opportunities/1323lstmre2>Data Engineer for Language and Translation Technologies (RE1): https://www.bsc.es/join-us/job-opportunities/923lstmre1 - Data Engineer for Language and Translation Technologies (RE2): https://www.bsc.es/join-us/job-opportunities/1223lstmre2 We offer: - Full-time contracts, a highly stimulating environment with state-of-the-art infrastructure, flexible working hours, extensive training plan, tickets restaurant, private health insurance, and full support for relocation procedures. - A competitive salary commensurate with the qualifications and experience of the candidate and according to the cost of living in Barcelona. - Open-ended contract due to technical and scientific activities linked to the project and budget duration About BSC and the Language Technologies Unit The Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) is the leading supercomputing center in Spain. It houses MareNostrum, one of the most powerful supercomputers in Europe, and is a hosting member of the PRACE European distributed supercomputing infrastructure. The mission of BSC is to research, develop and manage information technologies in order to facilitate scientific progress. BSC combines HPC service provision and R&D into both computer and computational science (life, earth and engineering sciences) under one roof, and currently has over 770 staff from 55 countries. The Language Technologies Unit at BSC has extensive experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. It has been entrusted by the Spanish and the Catalan governments with the mission to develop essential open-source resources and technologies for Spanish and Catalan languages. In connection with this, the LT Unit is currently in charge of two flagship projects at the national and regional level: the Spanish National Language Technology Plan, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Unit participates in various EU-funded international projects. -- *Montserrat Marimon* Language Technologies Unit -⁠ Life Sciences BSC-CNS

1 0

First Call for Participation - EXIST 2023: sEXism Identification in Social neTworks @ CLEF
by JORGE AMANDO CARRILLO DE ALBORNOZ CUADRADO 12 Jan '23

12 Jan '23

Please, consider participating and/or forwarding to colleagues and groups. ****We apologize for multiple postings of this e-mail**** ---------------------------------------------------------------------------------------------------- Call for Participation ---------------------------------------------------------------------------------------------------- First Call for Participation EXIST 2023 at CLEF 2023 Task: EXIST 2023: sEXism Identification in Social neTworks Website: http://nlp.uned.es/exist2023/ EXIST is a series of scientific events and shared tasks on sexism identification in social networks that aims to capture sexism in a broad sense, from explicit misogyny to other subtle expressions that involve implicit sexist behaviours (EXIST 2021, EXIST 2022). The third edition of the EXIST shared task will be held as a Lab at CLEF 2023, which will take place on September 18-21, 2023, in the Centre for Research & Technology Hellas (CERTH), Thessaloniki, Greece. Social Networks are the main platforms for social complaint, activism and expression of opinions and personal views in general. Movements like #MeTwoo, #8M or #Time’sUp have spread rapidly. Under the umbrella of social networks, many women all around the world have reported abuses, discriminations and other sexist experiences suffered in real life. Social networks are also contributing to the transmission of sexism and other disrespectful and hateful behaviours. In this context, automatic tools not only may help to detect and alert against sexist behaviours and discourses, but also to estimate how often sexist and abusive situations are found in social media platforms, what forms of sexism are more frequent and how sexism is expressed in these media. Given the success of the tasks, EXIST 2023 is a follow up of the tasks addressed in previous years, while facing yet a new challenge: the identification of the intention of the author of the sexist message. Additionally, the main novelty will be the adoption of the “learning with disagreements” paradigm for the development of the dataset and for the evaluation of the systems. The adoption of this paradigm along with our effort to control bias in the annotations will allow us to evaluate whether including the different views and sensibilities of the annotators contributes to the development of more accurate and fairer NLP systems. Participants will be asked to classify tweets (in English and Spanish) according to the following three tasks: TASK 1 - Sexism Identification: a binary classification where systems have to decide whether or not a given text (tweets) contains sexist expressions or behaviours (i.e., it is sexist itself, describes a sexist situation or criticizes a sexist behaviour). TASK 2 - Source Intention: for the tweets that have been classified as sexist, the second task aims to classify each tweet according to the intention of the person who wrote it. We propose a ternary classification task: (i) direct sexist message, (ii) reported sexist message and (iii) judgemental message. TASK 3 - Sexism Categorization: once a message has been classified as sexist, the third task aims to categorize the message in different types of sexism (according to the categorization proposed by experts and that takes into account the different facets of women that are undermined). In particular, each sexist tweet must be categorized in one or more of the following categories: (i) Ideological and inequality, (ii) Stereotyping and dominance, (iii) Objectification, (iv) Sexual violence and (v) Misogyny and non-sexual violence. Although we recommend to participate in all subtasks, participants are allowed to participate just in one of them. During the training phase, the task organizers will provide to the participants the manually-annotated EXIST 2023 dataset. For the evaluation of the teams, the unlabelled test data will be released. We encourage participation from both academic institutions and industrial organizations. We invite the participants to register for the lab at CLEF 2023 Labs Registration site (http://clef2023-labs-registration.dei.unipd.it/registrationForm.php). Upon registration participants will receive information about how to join the Google Group about the EXIST 2023 shared task. Important Dates: * 14 November 2022: Registration open. * 13 February 2023: Training set available. * 27 March 2023: Development set available. * 10 April 2023: Test set available. * 28 April 2023: Registration closes. * 10 May 2023: Runs submission due. * 26 May 2023: Results notification. * 5 June 2023: Submission of Working Notes by participants. * 23 June 2023: Notification of acceptance (peer-reviews). * 7 July 2023: Camera-ready participant papers due. * 18-21 September 2023: EXIST 2023 at CLEF Conference. **Note: All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").** Organizers: Laura Plaza, Universidad Nacional de Educación a Distancia (UNED) Jorge Carrillo-de-Albornoz, Universidad Nacional de Educación a Distancia (UNED) Roser Morante, Universidad Nacional de Educación a Distancia (UNED) Enrique Amigó, Universidad Nacional de Educación a Distancia (UNED) Julio Gonzalo, Universidad Nacional de Educación a Distancia (UNED) Damiano Spina, Royal Melbourne Institute of Technology (RMIT) Paolo Rosso, Universitat Politècnica de Valencia (UPV) Contact: Contact the organizers by writing to: jcalbornoz(a)lsi.uned.es Website: http://nlp.uned.es/exist2023/ [http://nlp.uned.es/exist2023/images/icon_hu9683616acaba39c3cdd30865f1cf9d69…]<http://nlp.uned.es/exist2023/> EXIST 2023 - nlp.uned.es<http://nlp.uned.es/exist2023/> We will carry out a “hard evaluation” and a “soft evaluation”. Hard evaluation: the hard evaluation will assume that a single label is provided by the systems for every example in the dataset.; Soft evaluation: the soft evaluation is intended to measure the ability of the model to capture disagreements, by considering the distribution of labels in the output as a soft label and ... nlp.uned.es AVISO LEGAL. Este mensaje puede contener información reservada y confidencial. Si usted no es el destinatario no está autorizado a copiar, reproducir o distribuir este mensaje ni su contenido. Si ha recibido este mensaje por error, le rogamos que lo notifique al remitente. Le informamos de que sus datos personales, que puedan constar en este mensaje, serán tratados en calidad de responsable de tratamiento por la UNIVERSIDAD NACIONAL DE EDUCACIÓN A DISTANCIA (UNED) c/ Bravo Murillo, 38, 28015-MADRID-, con la finalidad de mantener el contacto con usted. La base jurídica que legitima este tratamiento, será su consentimiento, el interés legítimo o la necesidad para gestionar una relación contractual o similar. En cualquier momento podrá ejercer sus derechos de acceso, rectificación, supresión, oposición, limitación al tratamiento o portabilidad de los datos, ante la UNED, Departamento de Política Jurídica de Seguridad de la Información<https://www.uned.es/dpj>, o a través de la Sede electrónica<https://sede.uned.es/> de la Universidad. Para más información visite nuestra Política de Privacidad<https://descargas.uned.es/publico/pdf/Politica_privacidad_UNED.pdf>.

1 0

CFP: Constraint Grammar Workshop at NoDaLiDa 2023 - Thórshavn
by Eckhard Bick 11 Jan '23

11 Jan '23

============================================================================= CONSTRAINT GRAMMAR WORKSHOP - CALL FOR PAPERS Constraint Grammar - Methods, Tools and Applications in conjunction with NoDaLiDa 2023, Thórshavn, Faroe Islands, May 22, 2023 https://visl.sdu.dk/nodalida2023.html ============================================================================= This workshop on practical and theoretical aspects of CG will be co-located with NoDaLiDa 2023 in Thórshavn. The new edition of the workshop continues the tradition of CG workshops at NoDaLiDa, which started in 2005. Apart from the traditional field of corpus-oriented tagging and parsing, Constraint Grammar continues to inspire applicational work, providing a robust NLP backbone in end user-oriented systems in various areas of language technology, such as spell and grammar checking, comma correction, ICALL, machine translation, lexicography and others. We therefore envision workshop contributions both regarding basic grammatical research and corpus linguistics on the one hand, and CG-based applications on the other hand. Constraint Grammar has always elicited a strong interest from researchers working on less-resourced languages, such as the Sami languages, Greenlandic, Faroese, Tibetan and the Celtic languages, for which we explicitly invite both finished and ongoing work. Finally, there will be room for methodological contributions on the CG formalism itself regarding either its expressive power or improvements in compiler implementation. CALL FOR ABSTRACTS We invite contributions concerning CG grammars for various languages or CG systems used in tools and applications. Research reports from fields relevant to the CG framework on the input side - such as finite-state analyzers, ontologies etc. - are also welcome. Finally, we are hoping for methodological contributions and experiments exploiting advances in expressive power in the most widely used CG compiler, CG-3. As usual, we encourage short papers on ongoing work. The workshop will be organized as a half-day workshop with both full and short papers. Contributions will be reviewed anonymously, and the papers will be published in the NoDaLiDa 2023 workshop proceedings. We invite extended abstracts, approximately 1500 words (for an 8 page full paper) or 750 words (for a 4 page short paper) - additional pages with bibliographic references not included. Final full versions of accepted papers can be submitted after the workshop, and will be published in the NEALT Proceedings Series by Linköping University Electronic Press. IMPORTANT DATES Monday, April 10, 2023: Submission of abstracts Monday, April 17, 2023: Notification of acceptance Monday, May 22, 2023: Workshop (NoDaLiDa main conference May 23-24) Monday, June 26, 2023: Submission of camera-ready full manuscripts SUBMISSION FORMATS All submissions must follow the NoDaLiDa 2023 style files, which are available for LaTeX (preferred) and MS Word and can be retrieved from the following address: https://www.nodalida2023.fo/authorkit-nodalida23 Submissions must be anonymous, i.e. not reveal author(s) on the title page or through self-references. Abstracts (1500 words for full papers and 750 words for short papers, excluding bibliography) must be submitted digitally, in PDF, and uploaded through the on-line conference system. Abstract submissions that violate either of these requirements will be returned without review. SUBMISSION MANAGEMENT Submissions to the conference must be uploaded electronically, obeying the above requirements and no later than (end of day, world-wide): Monday, April 10, 2023 NoDaLiDa 2023 utilizes the OpenReview conference management system for submission, reviewing, and preparation of proceedings. Submission for the conference can be made at: https://openreview.net/group?id=NoDaLiDa/2023/Workshop/CG-MTA ORGANIZERS * Eckhard Bick, bick(a)sdu.dk, University of Southern Denmark * Tino Didriksen, tinod(a)sdu.dk, GrammarSoft ApS & University of Southern Denmark * Kristin Hagen, kristin.hagen(a)iln.uio.no, University of Oslo * Kaili Müürisep, kaili.muurisep(a)ut.ee, University of Tartu * Trond Trosterud, trond.trosterud(a)uit.no, University of Tromsø. * Linda Wiechetek, linda.wiechetek(a)uit.no, University of Tromsø -- Eckhard Bick, cand.med., dr.phil. University of Southern Denmark e-mail: eckhard.bick(a)gmail.com web: http://beta.visl.sdu.dk

1 0

1st Call for the The Anthony C. Clarke Award for the 2022 EAMT Best Thesis
by Carol Scarton 10 Jan '23

10 Jan '23

********************************************************************* The Anthony C. Clarke Award for the 2022 EAMT Best Thesis Submission deadline: March 3, 2023, 23:59 CEST ********************************************************************* The European Association for Machine Translation (EAMT, http://www.eamt.org) is an organization that serves the growing community of people interested in MT and translation tools, including translators, users, developers, and researchers of this increasingly viable technology. The EAMT invites entries for its eleventh EAMT Best Thesis Award for a PhD or equivalent thesis on a topic related to machine translation. Previous year winners can be found at https://eamt.org/best-thesis-award/. * Eligibility * Researchers who - have completed a PhD (or equivalent) thesis on a relevant topic in a European, African or Middle Eastern institution within calendar year 2022, - have not previously won another international award for that thesis, and, - are members of the EAMT at the time of submission, are invited to submit their theses to the EAMT for consideration. * Panel * The submissions will be judged by a panel of experts who will be specifically appointed, based on the EAMT 2023 program committee, and which will be ratified by the Executive Board of the EAMT. * Selection criteria * Each thesis will be judged according to how challenging the problem was, to how relevant the results are for machine translation as a field, and to the strength of their impact in terms of scientific publications. * Scope * The scope of the thesis does not need to be confined to a technical area, and applications are also invited from students who carried out their research into commercial and management aspects of machine translation. Possible areas of research include: - development of machine translation or advanced computer-assisted translation: methods, software or resources - machine translation for less-resourced languages - the use of these systems in professional environments (freelance translators, translation agencies, localisation, etc.) - the increasing impact of machine translation on non-professional Internet users and its impact in communications, social networking, etc. - spoken language translation - the integration of machine translation and translation memory systems - the integration of machine translation software in larger IT applications - the evaluation of machine translation systems in real tasks such as those above - the cross-fertilisation between machine translation and other language technologies * Prize * The winner will be announced on the 31st of March 2023 and will receive a prize of €500, together with an inscribed certificate. The recipient of the award will be required to briefly present their research at EAMT 2023 to be held from 12th June to 15th June 2022 in Tampere, Finland. In order to facilitate this, the EAMT will waive the winner's registration costs, and will make available a travel bursary of €200 to enable the recipient of the award to attend the said conference. The prize includes complimentary membership in the EAMT for 2024. * Submission * Candidates will submit using EasyChair: https://easychair.org/conferences/?conf=eamt2023 (Submission type: Thesis Award), a single PDF file containing: - a 2-page summary of your thesis in English, containing: ---> your full contact details, ---> the name and contact details of your supervisor(s), - a copy of your CV in English (at most one page, plus a complete list of publications directly related to the thesis) - an electronic copy of your thesis - optionally, an appendix with any other relevant information on the thesis By submitting their work, authors - agree that, in case they are granted the award, any subsequently published version of the thesis should carry the citation "The Anthony C. Clarke Award for the 2022 EAMT Best Thesis" and - acknowledge the right of the EAMT to publicize the granting of the award. For this year's Best Thesis Award we are requiring candidates to be an individual EAMT member at the time of submission. For EAMT memberships, please visit: http://www.eamt.org/membership.php. * Closing date * Submission deadline: March 3, 2023, 23:59 CEST. Award notification: March 31, 2023. -- *Carolina Scarton* Lecturer in Natural Language Processing Department of Computer Science University of Sheffield http://staffwww.dcs.shef.ac.uk/people/C.Scarton/

1 0

Postdoctoral position at Cardiff University
by Steven Schockaert 10 Jan '23

10 Jan '23

Location: Cardiff, UK Deadline for applications: 31st January 2023 Start date: as soon as possible Duration: 30 months Keywords: natural language processing, neurosymbolic AI, graph neural networks, commonsense reasoning Details about the post Applications are invited for a Research Associate post in the Cardiff University School of Computer Science & Informatics, to work on the EPSRC Open Fellowship project ReStoRe (Reasoning about Structured Story Representations), which is focused on story-level language understanding. The overall aim of this project is to develop methods for learning graph-structured representations of stories. For this post, the specific focus will be on developing common sense reasoning strategies, based on graph neural networks, to fill the gap between what is explicitly stated in a story and what a human reader would infer by “reading between the lines”. More details about the post and instructions on how to apply are available here: https://www.jobs.ac.uk/job/CWM298/research-associate Background about the ReStoRe project When we read a story as a human, we build up a mental model of what is described. Such mental models are crucial for reading comprehension. They allow us to relate the story to our earlier experiences, to make inferences that require combining information from different sentences, and to interpret ambiguous sentences correctly. Crucially, mental models capture more information than what is literally mentioned in the story. They are representations of the situations that are described, rather than the text itself, and they are constructed by combining the story text with our commonsense understanding of how the world works. The field of Natural Language Processing (NLP) has made rapid progress in the last few years, but the focus has largely been on sentence-level representations. Stories, such as news articles, social media posts or medical case reports, are essentially modelled as collections of sentences. As a result, current systems struggle with the ambiguity of language, since the correct interpretation of a word or sentence can often only be inferred by taking its broader story context into account. They are also severely limited in their ability to solve problems where information from different sentences needs to be combined. As a final example, current systems struggle to identify correspondences between related stories (e.g. different news articles about the same event), especially if they are written from a different perspective. To address these fundamental challenges, we need a method to learn story-level representations that can act as an analogue to mental models. Intuitively, there are two steps involved in learning such story representations: first we need to model what is literally mentioned in the story, and then we need some form of commonsense reasoning to fill in the gaps. In practice, however, these two steps are closely interrelated: interpreting what is mentioned in the story requires a model of the story context, but constructing this model requires an interpretation of what is mentioned. The solution that is proposed in this fellowship is based on representations called story graphs. These story graphs encode the events that occur, the entities involved, and the relationships that hold between these entities and events. A story can then be viewed as an incomplete specification of a story graph, similar to how a symbolic knowledge base corresponds to an incomplete specification of a possible world. The proposed framework will allow us to reason about textual information in a principled way. It will lead to significant improvements in NLP tasks where a commonsense understanding is required of the situations that are described, or where information from multiple sentences or documents needs to be combined. It will furthermore enable a step change in applications that directly rely on structured text representations, such as situational understanding, information retrieval systems for the legal, medical and news domains, and tools for inferring business insights from news stories and social media feeds.

1 0

2-year Postdoc position in NLP at Cental (University of Louvain, Belgium)
by pat 10 Jan '23

10 Jan '23

UCLouvain is looking for: a postdoctoral researcher in machine learning / natural language processing - Full-time (100%) fixed-term contract of two years - for the Centre de traitement automatique du langage (Cental) within the Institut Langage & Communication (IL&C) in UCLouvain (Louvain-la-Neuve) - Start date : as soon as possible This postdoctoral position offer is part of a research project led by the Cental (https://uclouvain.be/fr/instituts-recherche/ilc/cental) around legal data processing. Regarding the concrete application, the project aims at automatizing the analysis of documents related to clinic trials (meeting minutes, legal documents, contracts, ...) to assess their compliance to RGPD. The proposed solution should thus be flexible enough to, on one hand, ensure that the model(s) can be adapted to the various document types and, on the other hand, limit the need of specialists' expertise for training data annotation. In consequence, the scientific core of this project is directly related fo the question of few-shot learning, which we intend to address through active learning and meta-learning. The role of the hired postdoc will be to (1) develop the resources needed for learning, (2) implement an architecture that incorporates active learning and meta-learning, (3) evaluate the models and (4) implement the components into a web service. The postdoc will also be required to disseminate the results through scientific publications and/or reports. Work environment: CENTAL is part of the Institut Langage & Communication ( https://uclouvain.be/fr/instituts-recherche/ilc), in UCLouvain. This university is located in Louvain-la-Neuve, Belgium ( https://uclouvain.be/fr/sites/louvain-la-neuve), a walkable city, that offers a pleasant and dynamic living environment. The research project will be supervised by Patrick Watrin. Required skills: - A completed PhD in Computer Science, Machine Learning, NLP or a similar domain. - Excellent programming skills: - Python - TensorFlow/Keras or PyTorch - Linux (server administration) - Knowledge of the main supervised learning algorithms and deep learning algorithms is required - A good knowledge of the main NLP tools and algorithms is a plus - Strong research track record (publications, conferences, etc.) - Autonomy, teamwork, ability to understand and analyze needs, adaptability - Excellent command of the French language (at least C1) and good command of English (at least B2) Conditions: - Fixed-term contract of one year, renewable once - Salary based on experience, ranging from 4250€ to 4850€ (monthly, gross) The position requires residency in Belgium. Candidates from outside the EU are responsible for obtaining the adequate visa and/or permits, with support from the UCLouvain. How to apply: - Deadline : February 15 - The application file should be sent electronically to Patrick Watrin ( patrick.watrin(a)uclouvain.be) and contain: - A detailed resume showing the adequate qualifications and skills, as well and the scientific/academic experiences and publications; - A cover letter in french, describing your interest for the role, how your profile complies with the project's needs, etc.; - A recommendation letter in french or in english. The shortlisted candidates will be invited to participate in a remote videocall (details will be communicated in a timely manner).

1 0

3-year PhD position on quantitative typology, Université Paris Nanterre
by Sylvain Kahane 10 Jan '23

10 Jan '23

The Autogramm project (https://autogramm.github.io/en) invites applications for a 3-year PhD position starting between now and October 2023. The position is funded by ANR (Agence National de la recherche), France. Applications and questions can be sent to Sylvain Kahane <sylvain(a)kahane.fr> Applications should include: - Cover letter outlining interest in the position - Names of two referees - Curriculum Vitae (CV) with publications (if applicable) - Copy of MA degree - University grade sheet of at least the two last years Today, we have databases concerning several dozen languages, including corpora annotated according to the same principle, thanks in particular to corpora annotated in interlinear gloss (IGT, see for example the Pangloss collection, https://pangloss.cnrs.fr) or with the Universal Dependencies annotation scheme (UD, https://universaldependencies.org and its SUD variant, https://surfacesyntacticud.github.io/). These databases allow typological studies and have several advantages: - the results obtained are based directly on primary data (corpora) and not secondary data (grammars written by linguists). (This is only partially true, since the results still depend on the choices made by a linguist in selecting the corpus and annotating it; nevertheless, these choices are visible and can be discussed.) - the results are reproducible as long as the data are freely accessible; - the nature of the data allows for quantitative results: we will not say that a language is OV or VO, but that it has such and such a percentage of OV constructions, and we will be able to observe directly on the data which factors determine the distribution between OV and VO (Levshina 2019, Gerdes et al. 2019, Futrell et al. 2015). (See also https://typometrics.elizia.net/#/.) The goal of the thesis topic is to contribute to the development of quantitative typology by participating in the construction of a quantitative database on a large number of typologically diverse languages and by focusing on the exploitation of such a dataset (Levshina 2022). The originality of the project lies in the fact that we are working on quantitative data and not on categorical features like existing typological databases (see in particular the Word Atlas of Language Structure online, https://wals.info/, which gives access to data on more than 2500 languages). The following questions can be studied: - How to identify cross-linguistic regularities, such as quantitative entailment universals, from a set of corpora of world languages (see for example Gerdes et al. 2021)? How can we make inferences between quantitatively valued features? - What quantitative information can be extracted from a corpus that is useful for a typological study? Which features require prior annotation of the data and what is the nature of the annotations needed (see for example the case of IGT for morphosyntactic features and treebanks for word order). - How to identify the typological signature of a language from an annotated corpus and determine what makes it special within a group of languages (see Bickel & Nichols 2002 and AutoTyp project). - How to take into account the imbalance of a database that is not representative of the distribution of languages in the world, but includes a higher proportion of languages from certain regions or families (Indo-European languages, Semitic languages, East Asian languages, etc.) to the detriment of other regions or families (Papua New Guinea, Oceania, Sub-Saharan Africa, Amerindian languages, aboriginal languages)? (see Guzmán Naranjo & Becker 2022). - How to solve the question of the commensurability of the categories used in the description of the different languages? How can we check the consistency of the data? This question can be addressed by studying the consistency of treebanks of the same language or language family. How to detect the presence of aberrations in some treebanks (categorization choices not conforming to the universal scheme, e.g. assignment of the subject relation in ergative languages, use of the ADJ category in languages without real adjectives, etc.)? - How to visualize multidimensional quantitative data? Linguistic data pose many challenges. The work will be conducted in collaboration with the members of the ANR Autogramm project (https://autogramm.github.io/), researchers in field linguistics, typology, formal linguistics and automatic language processing. It could lead, with the help of engineers, to the constitution of a typometric database accompanied by query and data visualization tools. Bickel & Nichols 2021 Futrell 2015 Gerdes et al. 2019 Gerdes et al. 2021 Guzmán Naranjo & Becker 2022 Levshina 2019 Levshina 2022

1 0

PhD position on Computational Journalism at the University of Tartu, Estonia
by Rajesh Sharma 10 Jan '23

10 Jan '23

Hello All, Happy New Year 2023 ! Sorry for cross-posting . Please feel free to spread a word about the PhD position on "Computational Journalism" in my group. Computational Social Science group (https://css.cs.ut.ee/) is looking for motivated researchers who are interested in working on the topics of computational journalism, especially on understanding echo chambers, biasness in news media, fairness in news media applications (recommendation). We expect the candidate to know one or more aspects of the following techniques and programming languages (if not all): (i) Preferred programming languages: Python or R. (ii) Exploratory data analysis: feature extraction, visualization, etc. (ii) Machine learning and deep learning with some hands-on experience. (iv) Social media analysis: This includes collecting data from Twitter/Reddit and analyze it for more insight. An ideal candidate should be mindful of what's going on social media as well. (v) Social network analysis and Natural Language Processing. Program Benefits ================ The funding covers the student fees and a monthly stipend of 2000 Euros (gross salary) for 4 years and Tuition fee is waived. Health insurance is provided Academic and industrial professional development including travel support. Interaction with world-renowned external board members and speakers. Travel grant for attending conferences and workshops. Location of PhD study: Institute of Computer Science, University of Tartu, Estonia. Institute of Computer Science is located in the University of Tartu Delta Centre (https://delta.ut.ee/en/) and it is a unique multidisciplinary centre for digital technology, analytics and economic thought, bringing together more than 2500 students, university teachers, scientists and R&D staff from companies. In short you will get an opportunity to work in a diverse environment and collaborate with colleagues. Delta Centre opened in January 2020 and is one of the most modern centres of digital technology, analytical and economic thought in the Nordic region. University of Tartu is the leading higher education and research center in Estonia, with more than 16000 students and 1800 academic staff. It is also the highest ranked university in the Baltic States according to both the Times Higher Education and the QS World University rankings. University of Tartu's Institute of Computer Science, ranks 176-200 (according to Times Higher Education), and hosts 750 Bachelors and Masters students and 60 doctoral students. The institute has a strong international orientation: over 40% of graduate students and a quarter of academic and research staff members are international. Graduate teaching in the institute is in English. Estonia is famous for its e-approach and home to many startups like Skype, Transferwise and Bolt to name a few. Tartu, university town, is the second largest city of Estonia and is relatively less expensive (compared to its neighbors like Sweden and Finland) and is surrounded by nature within the walkable distance from the city. The applicant should have: - Applicant should have a master's degree in computer science, mathematics or other relevant discipline, - Excellent programming skills. - A good command of spoken and written English, - Background in statistics/Data Mining/Machine Learning, social media analysis would be ideal. Knowledge of social network analysis would be an additional advantage. Applications with a CV (max. 2 page), with experience in research (publications) and knowledge of programming languages/tools, can be sent to rajesh.sharma(a)ut.ee with the subject "PhD application". If you have any queries, please do not hesitate to contact me. Kind Regards Rajesh Sharma, Associate Professor Head, Computational Social Science Group Institute of Computer Science University of Tartu, Estonia. Group webpage https://css.cs.ut.ee/

1 0

Deadline extension to 15/02: JLM spec. issue on Computational Approaches to Morphological Typology
by Sacha Beniamine 10 Jan '23

10 Jan '23

Dear colleagues, Happy new year ! We are extending the deadline for this call to the 15th of February. At the request of some authors, we also adapted the most recent JLM LaTeX template so that it be compatible with overleaf, it can be found here: https://fr.overleaf.com/latex/templates/template-for-journal-of-language-mo… Please find below the updated call: ------------- We invite researchers in the broad area of computational morphology to submit their recent, unpublished work to a special issue of the Journal of Language Modelling <https://jlm.ipipan.waw.pl/index.php/JLM><https://jlm.ipipan.waw.pl/index.php/JLM>. Motivation: Computational techniques have a long history of use in the study of morphology, where they have been used both for practical tasks such as the analysis and production of complex word forms and for theoretical ones such as structural and informational analysis of morphological systems. As both systems and datasets improve, these techniques are increasingly developed and evaluated on a typologically diverse array of languages, including many which are endangered or lack large-scale resources. Detailed comparisons across languages can help to reveal typological biases or assumptions within existing computational techniques [1, 2]. Alternatively, computational methods and analyses can also shed light on questions within linguistic typology [3, 4, 5, 6]. The goal of this special issue is to bring researchers from multiple communities together in exploring issues of linguistic typology across a wide range of different languages and phenomena. We encourage the submission of work on endangered or less-studied languages. The Journal of Language Modelling is a free (for readers and authors alike) open-access peer-reviewed journal. All articles are peer-reviewed by at least 3 reviewers, usually including at least one member of the Editorial Board. Topics of interest: - Typological clustering or classification of languages - Investigation of particular linguistic features which improve or detract from the performance of computational morphology tools - Comparison of morphological structures (e.g., inflection classes, implicative networks) across typologically different languages - Investigation of diachronic typological change using computational methods - Creation, curation or analysis of typological databases via computational methods Submissions: The submissions should be journal papers, not proceedings papers, totalling 25-50 pages, excluding references. Authors are advised to use the online manuscript submission for the journal. Make sure to select the special issue when asked to provide the article type. More information, including formatting instructions for authors can be found on the journal's webpage at: https://jlm.ipipan.waw.pl/index.php/JLM/about/submissions. An adaptation of the LaTeX template for overleaf can be found at: https://fr.overleaf.com/latex/templates/template-for-journal-of-language-mo…. Important dates: Call for papers issued: 15/7/2022 Submissions due: 15/1/2023 --- extended to 15/02/2023 Author notification: Spring 2023 Guest editors: Sacha Beniamine (University of Surrey) Micha Elsner (The Ohio State University) Katharina Kann (University of Colorado, Boulder) References [1] Ryan Cotterell, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner, and Mans Hulden. 2016a. The SIGMORPHON 2016 shared Task— Morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 10–22, Berlin, Germany. Association for Computational Linguistics. [2] Huiming Jin, Liwei Cai, Yihui Peng, Chen Xia, Arya McCarthy, and Katharina Kann. 2020. Unsupervised morphological paradigm completion. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6696– 6707, Online. Association for Computational Linguistics. [3] Neil Rathi, Michael Hahn, and Richard Futrell. 2021. An Information-Theoretic Characterization of Morphological Fusion. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10115–10120, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. [4] Parker, J., Reynolds, R., & Sims, A. (2022). Network Structure and Inflection Class Predictability: Modeling the Emergence of Marginal Detraction. In A. Sims, A. Ussishkin, J. Parker, & S. Wray (Eds.), Morphological Diversity and Linguistic Cognition (pp. 247-281). Cambridge: Cambridge University Press. DOI: 10.1017/9781108807951.010 [5] Guzmán Naranjo, Matías and Becker, Laura. Statistical bias control in typology. Linguistic Typology, to appear, 2021. DOI: 10.1515/lingty-2021-0002 [6] Sacha Beniamine. 2021. One lexeme, many classes: Inflection class systems as lattices. In Berthold Crysmann & Manfred Sailer (eds.), One-to-many relations in morphology, syntax, and semantics, 23--51. Berlin: Language Science Press. DOI: 10.5281/zenodo.4729789

1 0

Call for abstracts: Sinn und Bedeutung 28 (Bochum, Sept. 5-8, 2023), with a special session on big data
by Tatjana Scheffler 10 Jan '23

10 Jan '23

Sinn und Bedeutung 28 will take place at Ruhr University Bochum (RUB) from September 5-8, 2023. The conference is jointly organized by the RUB Department of Linguistics, the Linguistic Data Science Lab, the Department of German Language and Literature, and the Departments of Philosophy I and II. The conference will feature a three-day main session (Sept. 6-8) and two parallel one-day special sessions on The Semantics and Pragmatics of Co-Speech / Co-Sign Communication and on Big Data in Semantics and Pragmatics (Sept. 5). Conference Website: https://www.ruhr-uni-bochum.de/sub28/ Invited Speakers (main session): — Dorothy Ahn (Rutgers University) — Hazel Pearson (Queen Mary University of London) — Graham Priest (City University of New York, University of Melbourne, RUB) Invited Speakers (special sessions): Semantics and Pragmatics of Co-Speech / Co-Sign Communication — Cornelia Ebert (Goethe University Frankfurt) Big Data in Semantics and Pragmatics — Racquel Fernandez (University of Amsterdam) Call for Papers We invite abstract submissions for talks or posters on topics pertaining to natural language semantics, pragmatics, the syntax-semantics interface, super semantics, philosophy of language, and psycho-/neurolinguistic investigations related to meaning. We specifically welcome submissions on the semantics of under-represented languages and phenomena. Abstracts should contain original research that, at the time of submission, has neither been published nor accepted for publication. One person can submit at most one abstract as sole author and one abstract as co-author (or two co-authored abstracts) for the main session and special session combined. Submissions must be anonymous and must not reveal the identity of the authors in any form. Abstracts should fit two pages (letter size or A4 paper, 2.54cm or 1 inch margins on all sides, 12 point font, Times New Roman), with an additional third page used *exclusively* for the following elements: references (obligatory), large figures or tables, as many lines of text as there are lines of glosses and translations in non-English glossed examples. Abstracts must be submitted in PDF format via EasyChair by Wednesday, March 15, 2023 (23:59 Central European Standard Time): https://easychair.org/conferences/?conf=sub28. Easychair will open for submissions on January 15, 2023. Note: Since Bochum gets very busy during the summer, we strongly recommend booking your accommodation as early as possible (with a cancellation option). Important Dates: — Submission deadline: March 15, 2023 — Notification of acceptance: May 30, 2023 — Special sessions: September 5, 2023 — Main session: September 6-8, 2023 Organizers: — Kristina Liefke (RUB Philosophy II) — Ralf Klabunde (RUB Linguistics, Linguistic Data Science Lab) — Agata Renans (RUB Linguistics) — Daniel Gutzmann (RUB German Language & Literature) — Tatjana Scheffler (RUB German Language & Literature) — Dolf Rami (RUB Philosophy I) — Heinrich Wansing (RUB Philosophy I) — Markus Werning (RUB Philosophy II) Email: sub28(a)ruhr-uni-bochum.de <mailto:sub28@ruhr-uni-bochum.de> --- Jun.-Prof. Dr. Tatjana Scheffler (she/her) GB 5/157 Ruhr-Universität Bochum Fakultät für Philologie, Germanistik Universitätsstraße 150 44780 Bochum Germany Mail: tatjana.scheffler(a)rub.de Web: http://staff.germanistik.rub.de/digitale-forensische-linguistik/ Tel.: +49 234 32-21471

1 0

2025

2024

2023

2022