December 2022 - Corpora

Assistant Professor, Humanities Data Science, University of Washington
by Gina-Anne Levow 14 Dec '22

14 Dec '22

Assistant Professor of Humanities Data SciencesPosition Description The Division of the Humanities in the College of Arts and Sciences at the University of Washington, Seattle, WA, USA invites applications for a full-time, tenure-track Assistant Professor position in Non-Anglophone Humanities Data Science. The successful candidate will be appointed to one of the eight departments in the Division: Asian Languages and Literature, French and Italian Studies, German Studies, Linguistics, Middle Eastern Languages and Cultures, Scandinavian Studies, Slavic Languages and Literatures, and Spanish and Portuguese Studies. The successful candidate for this position, researching and teaching primarily in one or several languages other than English, will be expected to : - create, analyze, use, present and communicate data, broadly understood, in disciplines of language, literature, and/or culture - explore a variety of forms of data including large historical or contemporary corpora, textual archives, digital media and genres, or online datasets and their uses - engage a range of analytical techniques and methods of analysis - investigate questions and challenges posed by data science and by data, such as the role of bias in algorithms, uncertainty in artificial intelligence, and/or the use of ever-larger datasets in machine translation - engage in an active research agenda Since this is a cross-disciplinary position, the successful applicant will also be expected to work across departmental and disciplinary boundaries, and contribute to the development of existing cross-disciplinary initiatives (e.g., Translation Studies Hub <https://urldefense.com/v3/__https://uwtranslationhub.wixsite.com/uwtranslat…> , Textual and Digital Studies <http://txtds.uw.edu/>, Global Literary Studies <https://slavic.washington.edu/fields/global-literary-studies>) and to new curricula and programs in areas of inquiry such as human and machine translation, computational text analysis, AI and creativity, the future of language learning. In particular, this position has been created in conjunction with the establishment of a new minor in Data Science intended for Social Science and Humanities students ( http://www.washington.edu/uaa/advising/single-pages/data-science-minor/). The candidate hired into this position will dedicate 25% of their teaching and service responsibilities to this program, with the remainder determined within the context of the appointing department. The successful candidate will be expected to take a leading role in the minor, including developing and teaching courses in the required Data Skills and Data Studies categories, and serving on the Steering and Curriculum Committees of the Data Science minor. Candidates should in addition be prepared to take full advantage of the rich array of resources at the UW for research and teaching in data science as well as in the digital humanities, including the <https://urldefense.com/v3/__https://simpsoncenter.org/programs/digital-huma…>Simpson Center for the Humanities <https://urldefense.com/v3/__https://simpsoncenter.org/programs/digital-huma…>, the <https://escience.washington.edu/>eScience Institute <https://escience.washington.edu/>, the <https://www.lib.washington.edu/openscholarship>Open Scholarship Commons <https://www.lib.washington.edu/openscholarship> in the UW Libraries, and the Humanities Data Lab <http://humanitiesdatalab.ds.lib.uw.edu/>. The candidate is also expected to actively contribute to the diversity, equity and inclusivity goals of their potential department, the Humanities Division, and the University (as articulated in departmental statements and in the UW Diversity Blueprint: <http://www.washington.edu/diversity/diversity-blueprint/> http://www.washington.edu/diversity/diversity-blueprint/). University of Washington faculty engage in teaching, research and service. This position has an anticipated start date of September 2023, and will have a 9-month service period. Qualifications Applicants must, by the start of the appointment, have a PhD, or foreign equivalent, in a field consistent with an appointment in a department in the Division of the Humanities. The successful candidate must have a record of innovative and effective teaching and student mentoring; and a record of contribution to departmental or institutional diversity, equity and inclusion initiatives. Instructions Priority will be given to applicants who submit the following materials by January 9 at https://ap.washington.edu/ahr/position-details/?job_id=107708: - A cover letter that describes the candidate’s suitability for this position and explains how the candidate’s experiences, activities, and goals will lead to success in a cross-disciplinary position - CV - Research statement, not to exceed one page, describing present projects and future directions - A representative example of research (such as a chapter, article, or a link to an online project or a digital tool) - A teaching portfolio, which should provide evidence of teaching excellence and may include such items as: example syllabi, lecture videos, website links for courses or student activities, or digital tools created by the candidate - A statement, not to exceed one page, on how their work would contribute to the diversity, equity and inclusivity goals of their department, the Humanities Division, and the University (as articulated in the UW Diversity Blueprint: <http://www.washington.edu/diversity/diversity-blueprint/> http://www.washington.edu/diversity/diversity-blueprint/) - Three letters of recommendation Because applicants are expected to come from a variety of disciplinary orientations, traditional and nontraditional backgrounds, resulting in different professional profiles, applicants are welcomed in any of the above materials to provide details that can help to contextualize their dossier. Such details might include experiences, aspects of training, outreach, teaching or research that are felt to contribute to a distinctive professional profile. Contact Prof. Gina-Anne Levow for questions regarding this position (levow(a)uw.edu).

1 0

2nd Call for Abstracts (Extended Deadline) - 2023 NARNiHS Research Incubator
by Lauersdorf, Mark R. 13 Dec '22

13 Dec '22

*** *Second* Call for Abstracts and *Extended Deadline* *** 2023 NARNiHS Research Incubator *** North American Research Network in Historical Sociolinguistics *** 5th edition 20-22 April 2023 - entirely online! The 2023 NARNiHS Research Incubator will take place as an entirely online event (with free registration). This presents a great opportunity for scholars in historical sociolinguistics from all over the world to participate as presenters and/or attendees without the limitations imposed by international travel, and we encourage our fellow historical sociolinguists, and scholars from related fields, from our global scholarly community (in addition to North America), to join us online for our Research Incubator this spring. ==> NEW Abstract submission deadline: ==> 15 January 2023, 11:59 PM (U.S. Eastern Time). ==> Abstract submission online: ==> http://linguistlist.org/easyabs/NARNiHS2023_RI . The North American Research Network in Historical Sociolinguistics (NARNiHS) is accepting abstracts for its 2023 NARNiHS Research Incubator. Building on the great success of the first four years, the 5th edition of this unique kind of NARNiHS conference seeks to provide a collaborative environment where presenters bring work that is in-progress, exploratory, proof-of-concept, prototyping; and the audience actively participates in the brainstorming and workshopping of those new ideas. We see the NARNiHS Research Incubator as a place for testing/pushing boundaries; developing new theories, methods, models, tools; seeking feedback from peers willing to engage in productive assessment of fledgling ideas and nascent projects. Successful abstracts for this research incubator environment will demonstrate thorough grounding in the field, scientific rigor in the formulation of research questions, and promise for rich discussion of ideas. NARNiHS welcomes papers in all areas of historical sociolinguistics, which is understood as the application/development of sociolinguistic theories, methods, and models for the study of historical language variation and change over time, or more broadly, the study of the interaction of language and society in historical periods and from historical perspectives. Thus, a wide range of linguistic areas, subdisciplines, and methodologies easily find their place within the field, and we encourage submission of abstracts that reflect this broad scope. We are soliciting abstracts for 25-minute presentations. Presenters will have the entire 25 minutes for their presentations, with discussion happening in the "incubation session" at the end of each panel. Abstracts should be no more than one page (not including examples and references, see below). Abstracts will be accepted until 19 December 2022 - late abstracts will not be considered. Successful abstracts will be explicit about which theoretical frameworks, methodological protocols, and analytical strategies are being applied or critiqued; and data sources and examples should be sufficiently (if briefly) presented, so as to allow reviewers a full understanding of the scope and claims of the research. Please note that the connection of your research to the field of historical sociolinguistics should be explicitly outlined in your abstract. Failure to adhere to these criteria will likely result in non-acceptance. To encourage maximum exchange of ideas in the brainstorming/workshopping environment of the NARNiHS Research Incubator, presentations will be grouped into thematic panels of three presentations, each panel followed by an hour-long discussion with the audience led by specialists. Discussion will encompass specific feedback on the individual papers as well as consideration of overarching questions of theory, methods, and models emerging from the papers. To facilitate such discussion, authors will be required to submit a draft of their presentation materials for distribution to the panel discussants and to the other presenters 10 days prior to the start of the conference. General Requirements: 1) Abstracts must be submitted electronically, using the following link: http://linguistlist.org/easyabs/NARNiHS2023_RI 2) Papers must be delivered as projected in the abstract or represent bona fide developments of the same research. 3) Authors are expected to virtually attend the conference and present their own papers. 4) Presentations will be delivered via a video-conferencing platform, most likely Zoom. Technical details and instructions regarding the platform for our NARNiHS Research Incubator will be sent to authors in due time. Content Requirements: 1) Abstracts should be explicit about which theoretical frameworks, methodological protocols, and analytical strategies are being applied or critiqued. 2) Data sources and examples should be sufficiently (if briefly) presented, so as to allow reviewers a full understanding of the scope and claims of the research. 3) The connection of your research to the field of historical sociolinguistics should be explicitly outlined. Abstract Format Guidelines: 1) Abstracts must be submitted in PDF format. 2) Abstracts must fit on one standard 8.5×11 inch page, with margins no smaller than 1 inch and a font style and size no smaller than Times New Roman 12 point. All additional content (visualizations, trees, tables, figures, captions, examples, and references) must fit on a single (1) additional page. No exceptions to these requirements are allowed. 3) Anonymize your abstract. We realize that sometimes it is not possible to attain complete anonymity, but there is a difference between "inability to anonymize completely" (due to the nature of the research) and "careless non-anonymizing" (for example: "In Jones 2021, I describe..."). In addition, be sure to anonymize your PDF file (you may do so in Adobe Acrobat Reader by clicking on "File", then "Properties", removing your name if it appears in the "Author" line of the "Description" tab, and re-saving before submitting it). Please be aware that abstract file names might not be automatically anonymized by the system; do not use your name (e.g. Smith_Abstract.pdf) when saving your abstract in PDF format, rather, use non-identifying information (e.g. HistSoc4Lyfe_NARNiHS.pdf). Your name should only appear in the online form accompanying your abstract submission. Papers that are not sufficiently anonymized wherever possible (whether in the text of the abstract or in the metadata of the digital file) risk being rejected. Please contact us at NARNiHistSoc(a)gmail.com with any questions.

1 0

CfP Workshop @ICHL26 on Computational models of diachronic language change
by Stefania Degaetano-Ortlieb 13 Dec '22

13 Dec '22

Call for abstracts Workshop: Computational models of diachronic language change @the International Conference on Historical Linguistics (ICHL26) Organizers: Stefania Degaetano-Ortlieb*, Lauren Fonteyn+, Marie-Pauline Krielke*, Elke Teich* *Saarland University, +Leiden University Submission deadline: January 1st 2023 Submission format: One-page abstract (plus references) to be sent to s.degaetano(a)mx.uni-saarland.de <mailto:s.degaetano@mx.uni-saarland.de> Notification of acceptance: January 12th 2023 Additional information: We envisage a full day workshop in presence with presentations (20min + 5-10 min discussion). After the workshop, we aim to publish a Special Journal Issue in an open access journal. Workshop description: While the study of diachronic language change has long been firmly grounded in corpus data analysis, it seems fair to state that the field has been subject of a computational turn over the last decade or so, computational models being increasingly adopted across several research communities, including corpus and computational linguistics, computational social science, digital humanities, and historical linguistics. The core technique for the investigation of diachronic change are distributional models (DMs). DMs rely on the fact that related meanings occur in similar contexts and allow us to study lexical-semantic change in a data-driven way (e.g. as argued by Sagi et al. 2011), and on a larger scale (e.g. as shown on the Google NGram corpus by Gulordava & Baroni 2011). Besides count-based models (e.g. Hilpert & Saavedra 2017), contextualized word embeddings are increasingly employed for diachronic modeling, as such models are able to encode rich, context-sensitive information on word usage (see Lenci 2018 or Fonteyn et al., 2022 for discussion). In previous work, DMs have been used to determine laws of semantic change (e.g. Hamilton et al. 2016b, Dubossarsky et al. 2017) as well as develop statistical measures that help detect different types of change (e.g. specification vs. broadening; cultural change vs. linguistic change; Hamilton et al. 2016a, Del Tredici et al. 2019). DMs have also been used to map change in specific (groups of) concepts (e.g. racism, knowledge; see Sommerauer & Fokkens 2019 for a discussion). Further studies have suggested ways of improving the models that generate (diachronic) word embeddings to attain these goals (e.g. Rudolph & Blei 2018). Existing studies and projects focus on capturing and quantifying aspects of semantic change. Yet, over the past decade, DMs have also been shown to be useful to investigate other types of change in language use, including grammatical change. Within the computational and corpus linguistic communities, for example, Bizzoni et al. (2019, 2020) have shown an interdependency between lexical and grammatical changes and Teich et al. (2021) use embeddings to detect (lexico-) grammatical conventionalization (which may lead to grammaticalization). Within diachronic linguistics, the use of distributional models is focused on examining the underlying functions of grammatical structures across time (e.g. Perek 2016, Hilpert and Perek 2015, Gries and Hilpert 2008, Fonteyn 2020, Budts 2020). Specifically targeting historical linguistic questions, Rodda et al. (2019) and Sprugnoli et al. (2020) have shown that computational models are promising for analyzing ancient languages, and McGillivray et al. (2022) highlight the advantages of word embeddings (vs. count-based methods) while also pointing to the challenges and the limitations of these models. A common concern across these different communities is to better understand the general principles or laws of language change and the underlying mechanisms (analogy, priming, processing efficiency, contextual predictability as measured by surprisal, etc.). In the proposed workshop, we want to bring together researchers from relevant communities to talk about the unique promises that computational models hold when applied to diachronic data as well as the specific challenges they involve. In doing so, we will identify common ground and explore the most pressing problems and possible solutions. Specific questions will concern: Model utility: How can we capture change in language use beyond lexical-semantic change, e.g. change in grammatical constructions, collocations, phraseology? Model quality: How can we evaluate computational models of historical language stages in absence of native-speaker gold standards? To what extent does the quality of historical and diachronic corpora affect the performance of models? Model analytics: How do we transition from testing the reliability of models to employing them to address previously unanswered research questions on language change? How can we detect and measure change? What are suitable analytic procedures to interpret the output of models? References: Bizzoni, Y., Degaetano-Ortlieb, S., Menzel, K., Krielke, P., and Teich, E. (2019). Grammar and meaning: analysing the topology of diachronic word embeddings. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, ACL, Florence, Italy, pp. 175185. Bizzoni, Y., Degaetano-Ortlieb, S., Fankhauser, P., and Teich, E. (2020). Linguistic variation and change in 250 years of English scientific writing: a data-driven approach. Frontiers in Artificial Intelligence, 3. Budts, S. (2020). "A connectionist approach to analogy. On the modal meaning of periphrastic do in Early Modern English". Corpus Linguistics and Linguistic Theory, 18(2), pp. 337364. Del Tredici, M., Fernández, R., and Boleda, G. (2019). Short-term meaning shift: A distributional exploration. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies, Minneapolis, Minnesota, USA, pp. 20692075. Dubossarsky, H., Weinshall, D., and Grossman, E. (2017). Outta control: laws of semantic change and inherent biases in word representation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, pp. 11361145. Fonteyn, L. (2020). "What about grammar? Using BERT embeddings to explore functional-semantic shifts of semi-lexical and grammatical constructions." Computational Humanities Research CEUR-WS, pp. 257268. Fonteyn, L., Manjavacas, E., and Budts, S. (2022). Exploring Morphosyntactic Variation & Change with Distributional Semantic Models. Journal of Historical Syntax, 7(12), pp. 141. Gries, S. T., and Hilpert, M. (2008). The identification of stages in diachronic data: variability-based Neighbor Clustering. Corpora, 3(1), pp. 5981. Gulordava, K., and Baroni, M. (2011). A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. In Proceedings of Geometrical Models for Natural Language Semantics (GEMS), EMNLP, Edinburgh, United Kingdom, pp. 6771. Hamilton, W. L., Leskovec, J., and Jurafsky, D. (2016a). Cultural shift or linguistic drift? comparing two computational models of semantic change. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP, Austin, Texas, USA, pp. 21162121. Hamilton, W. L., Leskovec J., and Jurafsky, D. (2016b). Diachronic word embeddings reveal statistical laws of semantic change. In Proceedings of Morphosyntactic Variation & Change with DSMs, 54th Annual Meeting of the Association for Computational Linguistics, ACL, Berlin, Germany, pp. 14891501. Hilpert, M., and Saavedra, D.C. (2020). "Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims". Corpus Linguistics and Linguistic Theory, 16(2), pp. 393424. Hilpert, M. and Perek, F. (2015). Meaning change in a petri dish: constructions, semantic vector spaces, and motion charts. Linguistics Vanguard, 1(1), pp. 339350. Lenci, A. (2018). Distributional Models of Word Meaning. Annual Review of Linguistics, 4, pp. 151171. Perek, F. (2016). Using distributional semantics to study syntactic productivity in diachrony: a case study. Linguistics, 54(1), pp. 149188. Rodda, M.A., Probert, P., and McGillivray, B. (2019). Vector space models of Ancient Greek word meaning, and a case study on Homer. TAL Traitement Automatique des Langues, 60(3), pp. 6387. Rudolph, M., and Blei, D. (2018). Dynamic embeddings for language evolution. In Proceedings of the 2018 World Wide Web Conference (WWW 18), Lyon, France, pp. 10031011. Sagi, E., Kaufmann, S., and Clark, B. (2011). Tracing semantic change with Latent Semantic Analysis. Current Methods in Historical Semantics, 73, pp. 161183. Sommerauer, P., and Fokkens, A. (2019). Conceptual Change and Distributional Semantic Models: An Exploratory Study on Pitfalls and Possibilities. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, Florence, Italy, pp. 223233. Sprugnoli, R., Moretti, G., and Passarotti, M. (2020). Building and Comparing Lemma Embeddings for Latin. Classical Latin versus Thomas Aquinas. IJCoL. Italian Journal of Computational Linguistics, 6(6-1), pp. 2945. Teich, E., Fankhauser P., Degaetano-Ortlieb, S., and Bizzoni, Y. (2021). Less is More/More Diverse: On the Communicative Utility of Linguistic Conventionalization. Frontiers in Communication, 5.

1 0

Tenure-Track position in computational linguistics at Brandeis University
by Nianwen Xue 13 Dec '22

13 Dec '22

*Tenure-track Assistant Professor, Computational Linguistics* The Department of Computer Science at Brandeis University invites applications for a tenure-track assistant professor in computational linguistics, beginning Fall 2023. Qualifications required of all applicants include a Ph.D., in hand by Fall 2023, in Computer Science or a related discipline, a strong research record, and a commitment to teaching at the undergraduate and graduate levels. Particular attention will be given to candidates pursuing research in the broad area of speech, dialogue, or multimodal language processing. This position is subject to budgetary approval. The Department consists of a diverse group of 20 full-time faculty members and researchers and offers programs leading to B.A./B.S., M.S., and Ph.D. degrees in Computer Science and an M.S. in Computational Linguistics. The Department has research strengths in computational linguistics, theoretical linguistics, machine learning, computer vision, data mining, networking, distributed systems, operating systems, databases, algorithms, and software design and implementation. In addition, members of the Department collaborate closely with faculty across the university including biology, neuroscience, economics, physics, political science, among others. At Brandeis, we believe that diversity, equity, and inclusion are essential components of academic excellence. Brandeis University is an affirmative action, equal opportunity employer that is committed to creating equitable access and opportunities for applicants to all employment positions. Because diversity, equity, and inclusion are at the core of Brandeis’ history and mission, we value and are seeking candidates that represent a variety of social identities, including those that have been underrepresented in higher education, who possess skills that spark innovation, and who, through their scholarly pursuits, teaching, and/or service experiences, bring expertise in building, engaging, and sustaining a pluralistic, just, and inclusive campus community. Applicants should submit a CV, research statement, and teaching statement, and arrange for at least three reference letters to be submitted to AcademicJobsOnline. Because Brandeis is committed to advancing diversity, equity, and inclusion in all areas of faculty effort, applicants should address at least one of these areas in their cover letter and/or the teaching statement. Qualified applicants should apply at https://academicjobsonline.org/ajo/jobs/22659 . First consideration will be given to applications received by December 20, 2022. Questions about the position can be directed to Professor Nianwen Xue at xuen(a)brandeis.edu. Additional information about the Department is available at https://www.brandeis.edu/computer-science <https://www.brandeis.edu/computer-science/,>, and information about the Computational Linguistics program is available at https://www.brandeis.edu/computer-science/computational-linguistics/index.h… .

1 0

Learning with Small Data -- First CfP
by Sharid Loáiciga 13 Dec '22

13 Dec '22

# Learning with Small Data -- 1st CfP https://sites.google.com/view/learning-with-small-data/home There is now an acute need for intensive research on the possibility of effective learning with small data. Our 2023 conference, LSD, is devoted to work on this problem, with application to computational linguistics. Learning with Small Data will bring together researchers from various areas to discuss the sustainability of current state of the art methods in computational linguistics which rely on very large models, such as GPT2-3, BERT, and XLNet. The conference encourages contributions from machine learning, computational linguistics, theoretical linguistics, philosophy, cognitive science, and psycholinguistics, as well as from artificial intelligence ethics and social policy. We hope to see innovative technical proposals, and we will cultivate a wide spectrum of views within a lively dialog on the issues that the conference addresses. The conference is organized by CLASP, University of Gothenburg. ## Important Dates: Submission deadline: 2023 May 5, anywhere on Earth Notification of acceptance: 2023 June 12, anywhere on Earth Camera ready: 2023 August 14, anywhere on Earth Conference: 2023 September 11-12, not anywhere on Earth, but in Gothenburg ## Topics of interest We welcome all relevant approaches to text-based and multimodal computational neural language modeling as well as psycholinguistic perspectives, neurolinguistic perspectives, ethical, and policy issues. Papers are invited on topics in these and closely related areas, including (but not limited to) the following: small-scale neural language modeling, both text-only and multimodal training corpus and test task development visual, dialog and multi-modal inference systems neurolinguistic and psycholinguistic experimental approaches to human language processing semantics and pragmatics in neural models dialog modeling and linguistic interaction formal and theoretical approaches to language production and comprehension language acquisition in the context of computational linguistics statistical, machine learning, reinforcement learning and information theoretic approaches that embrace small data methodologies and practices for annotating datasets visual, dialog and multi-modal generation text generation in both the dialog and monologue settings semantics-pragmatics interface social and ethical implications of the development and application of large or small neural language models, as well as relevant policy implications and debates. ## Submission Requirements LSD 2023 will feature three types of submissions: long papers, student papers, and short papers. All types of papers should be submitted not later than 5 May 2023. Long papers must describe original research, and they must not exceed 8 pages excluding references. They will be presented at the conference either orally or as posters. Student papers describe original research, and the first author must be a student, or at least 2/3 of the work on a paper should be done by students. Student papers must not exceed 6 pages excluding references. Reviewers will give special support to student authors through mentoring. The papers will be presented orally or as posters at the conference. Short papers present work in progress, or they describe systems and/or projects. They must not exceed 4 pages excluding references. They will be presented as posters at the conference and summarized in lightning talks. Position papers are also accepted. These should be formatted in the same way as long papers. All types of papers will be published in the 2023 ACL Anthology as a CLASP Conference Proceedings. Submissions should be pdf files and use the Latex or Word templates provided for ACL 2023 submissions ( https://2023.aclweb.org/calls/style_and_formatting/). Submissions have to be anonymous. Papers should be electronically submitted in PDF format via the softconf system at: https://softconf.com/n/lsd2023/. Please make sure that you select the right track when submitting your paper. Contact the organisers if you have problems using softconf. ## Concurrent Submissions Papers that have been or will be submitted to other meetings or publications must indicate this at submission time using a footnote on the title page of the submissions. Authors of papers accepted for presentation at Learning with Small Data 2023 must notify the program chairs by the camera-ready deadline as to whether the paper will be presented. All accepted papers must be presented at the conference to appear in the proceedings. We will not accept for publication or presentation papers that overlap significantly in content or results with papers that will be (or have been) published elsewhere. ## Camera Ready Versions Camera ready versions should follow the same guidelines with respect to style and page numbers as the initial submission, i.e. there are no additional pages allowed in the final submission. Please submit the camera ready version by 2023 August 14.

1 0

PhD opportunity at King's College London
by Barbara McGillivray 13 Dec '22

13 Dec '22

(Apologies for cross-posting) A fully funded PhD position is now available at King’s College London on the project “‘Lost for words’: semantic search in the Find Case Law service of The National Archives”, a Collaborative Doctoral Award received by King’s College London in collaboration with The National Archives and funded by the London Arts & Humanities Partnership (LAHP<https://www.lahp.ac.uk/about-us/>). This interdisciplinary project is an exciting opportunity to work in natural language processing (particularly computational semantics and information retrieval) applied to legal texts and digital humanities. About the project Access to case law is vital for safeguarding the constitutional right of access to justice. It enables members of the public to understand their position when facing litigation and to scrutinise court judgements. Since April 2022, UK court and tribunal decisions are preserved by The National Archives’ Find Case Law service as freely accessible online public records. This project seeks to improve Find Case Law by enhancing it with meaning-sensitive (semantic) search functionality. It will study how individuals without legal training use language to navigate court judgments and it will develop tools to facilitate this navigation. In most digital cultural heritage catalogues, while we can search for words within the metadata describing their records, we cannot search for records based on the meaning of words contained within these records, for example the different words to refer to “knife crime”. Therefore, users’ access to collection is determined by their ability to articulate their information need precisely. Recent advances in natural language processing unlock new possibilities for querying documents via state-of-the-art semantic search. Incorporating such search capabilities in the Find Case Law collection is crucial for democratising access to digital collections, helping expose the social impact of how the law is written. For queries specific to the project, please contact the project’s lead supervisor Barbara McGillivray (barbara.mcgillivray(a)kcl.ac.uk<mailto:barbara.mcgillivray@kcl.ac.uk>). Supervisory team * Barbara McGillivray<https://www.kcl.ac.uk/people/barbara-mcgillivray> (Department of Digital Humanities, King’s College London) * Nicki Welch (The National Archives) * Rose Rees Jones (The National Archives) * Niccolò Ridi<https://www.kcl.ac.uk/people/niccolo-ridi> (Department of Law, King’s College London) * Marton Ribary<https://pure.royalholloway.ac.uk/en/persons/marton-ribary> (Department of Law and Criminology, Royal Holloway University of London) Skills required Essential • Experience with Natural Language Processing research and applied work, including developing new tools. • Interest in working with UK case law for improving access to justice Desirable • Background in law or legal research. • Experience working with digital archives • Knowledge of User experience (UX) research • Knowledge of lexical semantics. • Experience with semantic search. • Experience with NLP applied to legal texts. About application process Applicants will need to submit an application for a PhD in Digital Humanities at King’s<https://www.kcl.ac.uk/study-legacy/postgraduate/research-courses/digital-hu…> (details here<https://www.kcl.ac.uk/study-legacy/postgraduate/research-courses/digital-hu….>) and an application for the LAHP (details here<https://www.lahp.ac.uk/prospective-students/collaborative-doctoral-awards-p…>). Both applications need to be submitted by 27 January 2023 at 5pm. About Collaborative Doctoral Awards Collaborative Doctoral Awards (CDAs) provide funding for doctoral students to work on a project in collaboration with an organisation outside higher education. They are intended to encourage and develop collaboration and partnerships and to provide opportunities for doctoral students to gain first-hand experience of work outside the university environment. They enhance the employment-related skills and training available to the research student during the course of the award. The studentship includes a stipend at the Research Council UK Home/ EU rate (£19,668 per annum) plus fees for three and half years. The awarded candidate will also be entitled to a £550 per annum stipend top-up. LAHP welcomes applications: * From ‘home’ and ‘international’ (including EU) applicants who meet the residency requirements as detailed on the UKRI Guidance document on EU and International eligibility<https://www.ukri.org/what-we-offer/developing-people-and-skills/find-studen…> * From those who have recently completed their Masters’ programmes and those with relevant professional and/or practitioner experience; * From those wishing to study on a full-time or part-time basis; * From applicants of all ages and backgrounds. * For full details on the LAHP Collaborative Doctoral Awards, please visit https://www.lahp.ac.uk/prospective-students/collaborative-doctoral-awards-p… Barbara McGillivray | @BarbaraMcGilli<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.c…> Lecturer in Digital Humanities and Cultural Computation Strand Campus, Strand, London, WC2R 2LS, Room 3.28, Department of Digital Humanities, King’s College London Turing Fellow<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.turin…>, The Alan Turing Institute Editor-in-chief of Journal of Open Humanities Data<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenhuman…>

1 0

Call For Papers: FICTA-2023, Cardiff Metropolitan University, United Kingdom (UK top university for Sustainability) || Springer SIST Series (Scopus Indexed) || Paper Submission Deadline: 10 January 2023 || https://ficta.co.uk/
by Sandeep Sengar 13 Dec '22

13 Dec '22

Dear All, We are happy to inform you that the Eleventh International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA-2023) will be organized by Cardiff Metropolitan University, United Kingdom. We invite you to participate in FICTA-2023: https://ficta.co.uk/ on 11-12 April 2023, being organized in a hybrid mode and connecting with high-profile committee members across the globe. Publication: All FICTA 2023 registered and presented papers will be published in conference proceedings by Springer-Smart Innovation, Systems and Technologies (SIST) Series (https://lnkd.in/eGDscF_X). Topics of interest: Submissions of quality papers are expected in all areas of research and application in intelligent computing, refer call for papers at https://lnkd.in/eD_UmBrT. Papers Submission: Submissions are handled through the Springer EquinOCS using the link: https://lnkd.in/eiKgiTc4 Call for Special Session Proposals: If interested to float/organizing a special session please visit the link and follow the necessary guidelines: https://lnkd.in/ev9c9y7k For any queries related to the conference you may feel free to e-mail: FICTA2023(a)cardiffmet.ac.uk Kindly share this information in your network to make our conference a grand success. -- Warm Regards, *Sandeep Singh Sengar*, Lecturer in Computer Science Cluster Leader Computer Vision / Image Processing Cardiff Metropolitan University, Cardiff, UK CF5 2YB *-------------------------------------------------------------------------------* *Email: SSSengar(a)cardiffmet.ac.uk <SSSengar(a)cardiffmet.ac.uk>* *Web: **https://sites.google.com/view/sandeepsengar <https://sites.google.com/view/sandeepsengar>*

1 0

Release of BFM2022 Corpus
by Alexey Lavrentev 13 Dec '22

13 Dec '22

[Apologies for cross-posting] Dear colleagues, We are delighted to announce that the BFM2022 corpus of Old and Middle French (9th to 15th centuries) is now available from the web portal of the Base de Français Médiéval at https://txm-bfm.huma-num.fr/txm/?command=documentation&path=/BFM2022. The Base de Français Médiéval provides free access to several corpora (source texts and digital annotation) under a French public open data license (https://www.etalab.gouv.fr/licence-ouverte-open-licence). Three modes of access are supported : • search, analysis and reading tools provided by the TXM-BFM web portal; • download a binary corpus file for use with TXM local application; • download TEI XML source files from NAKALA repository: https://nakala.fr/collection/10.34847/nkl.93ee3ts1. The BFM portal is now hosted by the Huma-Num infrastructure which provides a secure connection for user data. The BFM2022 corpus includes some fifty new texts, amounting to approximately 6,450,000 words. All the texts are formatted according to the TEI guidelines (including the instances of direct speech), automatically pos-tagged and lemmatized. The POS tags have been manually verified in 8 new texts (46 total, approximately 1,000,000 words), and the lemmatization has been verified and disambiguated in 27 texts (aproximately 620,000 words). An original digital edition of Psautier d’Arundel by C. Pignatelli is one of the new texts included in the corpus. As well as BFM2022, a syntactically annotated corpus PROFITEROLE-V1-0 is now available from the BFM web portal. Produced by the ANR funded PROFITEROLE Project (https://www.lattice.cnrs.fr/projets/projets-passes/projet-anr-profiterole), it supports querying syntactic relations encoded according the Universal Dependencies guidelines (https://universaldependencies.org). We will appreciate any feedback on technical issues or errors in texts you may encounter while using the BFM. Best regards, The BFM Team bfm [at] ens-lyon [dot] fr -- Alexey Lavrentev Ingénieur de recherche UMR 5317 IHRIM, CNRS

1 0

New release: the FR-R-MIGR-TWIT and the UK-R-MIGR-RA-TWIT Corpora.
by Paola Pietrandrea 13 Dec '22

13 Dec '22

The *MIGR-TWIT Corpus* is a bilingual diachronic corpus of tweets created with the aim to study the evolution of public discourse on migration in Europe in the past 10 years. We are pleased to announce the release of the *first two components* of the corpus: *the **FR-R-MIGR-TWIT-2011-2022 *and the* UK-R-MIGR-RA-TWIT-2012-2022 Corpora.* · *FR-R-MIGR-TWIT-2011-2022 Corpus* includes all the tweets displaying at least one occurrence of the lexical root -*migr- *(*i.e*., the words *immigration(s), migrant(s), immigré(s)*), posted by *16* *right and far-right French politicians and political parties, between 2011 and 2022,* for a total amount of 11,761 tweets and 358,491 words. · *UK-R-MIGR-RA-TWIT-2012-2022 Corpus *includes all the tweets displaying at least one occurrence of the words derived from the Latin lexical root “*migr*” of *migrare (to move from one place to another) *in addition to the keywords “*refugee*(*s*)” and “*asylum*”, posted by *12 **right and far-right British politicians and political parties between 2012 and 2022*, for a total amount of 6,472 tweets and 174,707 words. The whole corpus contains 18,233 tweets and 533,198 words. The posts were automatically retrieved using the *Twitter API v2 Academic Research*. The whole corpus contains two CSV Zip files (tab-delimited format) corresponding to each sub-corpus. The complete corpus is presented in two versions: - version1 with the tweet identifier (*data__id*) and the text of the tweet (*data__text*) as a header (folders named *FR-R-MIGR-TWIT-2011-2022_textonly* and *UK-R-MIGR-RA-TWIT-2012-2022_textonly*, respectively composed of 12 and 11 Zip files of every single year); - version2 with all tweet fields information included as a header, such as the posting date (*data__created__at*), the username (*author__name*), and the number of retweets (*data__public_metrics__retweet_count*), etc., with two folders named *FR-R-MIGR-TWIT-2011-2022_meta* and *UK-R-MIGR-RA-TWIT-2012-2022_meta* The corpus was created by Elena Battaglia (Università della Svizzera Italiana and Université de Lille), Guido Blandino (University of Wolverhampton), Paola Pietrandrea and Sangwan Jeon (Université de Lille), with the collaboration of Adelina Stojan (Université de Lille), within the framework of the observatory OLiNDiNUM, *Observatoire LINguistique du DIscours NUMérique* <https://olindinum.huma-num.fr/>, [Linguistic Observatory of Digital Discourse], coordinated by Paola Pietrandrea. The creation of the corpus was funded by Université de Lille, Projet d'Internationalisation 2021 - Université Franco-italienne / Università Italo Francese - Campus France (Hubert Curien Partnerships): Italie - PHC Galilée 2018-19, Pays-Bas - PHC Van Gogh 2018-19. The corpus is freely accessible through the platforms Ortolang <https://www.ortolang.fr/market/corpora/migr-twit-corpus/v1> and Zenodo <https://zenodo.org/record/7347479#.Y5ee5naZMuE>. Elena Battaglia, Guido Blandino, Sangwan Jeon, Paola Pietrandrea Le *Corpus MIGR-TWIT* est un corpus diachronique de tweets bilingues, établi dans l’objectif d’étudier l’évolution du discours public sur l’immigration en Europe au cours de ces 10 dernières années. Nous avons le plaisir de vous annoncer la publication des *deux premières composantes* du corpus : les *corpus FR-R-MIGR-TWIT-2011-2022* et *UK-R-MIGR-RA-TWIT-2012-2022*. · Le *corpus FR-R-MIGR-TWIT-2011-2022* rassemble tous les tweets contenant au moins une occurrence du lexique dérivé de la racine lexicale - *migr*- (*i.e*. *immigration(s), migrant(s), immigré(s)*), qui ont été postés par *16 figures et partis politiques de la droite et de l’extrême-droite françaises entre 2011 et 2022*, comptant un total de 11,761 tweets et 358,491 mots. · Le *corpus UK-R-MIGR-RA-TWIT-2012-2022* rassemble tous les tweets contenant au moins une occurrence du lexique dérivé de la racine latine “ *migr*” de *migrare* (*s’en aller d’un lieu*) en plus des mots-clés “ *refugee(s)*” et “*asylum*” (*asile*), qui ont été postés par *12 figures, partis et institutions politiques de la droite et de l’extrême-droite britanniques entre 2012 et 2022*, comptant un total de 6,472 tweets et 174,707 mots. L’ensemble du corpus compte au total 18,233 tweets et 533,198 mots. Les données ont été automatiquement récupérées à l’aide du *Twitter API v2 Academic Research*. Le corpus complet contient deux fichiers CSV (format tabulaire de données) correspondant à chaque sous-corpus. Le corpus complet se présente en deux versions : - version1 avec l’identifiant du tweet (*data__id*) et le texte du tweet (*data__text*) comme l’entête (les fichiers nommés *FR-R-MIGR-TWIT-2011-2022_textonly* et *UK-R-MIGR-RA-TWIT-2012-2022_textonly*, respectivement composés de 12 et 11 fichiers CSV de chaque année) ; - version2 avec toutes les métadonnées du tweet comme l’entête, telles que la date de publication (*data__created__at*), le nom d’utilisateur (*author__name*), et le nombre de retweets ( *data__public_metrics__retweet_count*), etc., avec deux fichiers nommées *FR-R-MIGR-TWIT-2011-2022_meta *et *UK-R-MIGR-RA-TWIT-2012-2022_meta* Le corpus a été créé par Elena Battaglia (Università della Svizzera Italiana et Université de Lille), Guido Blandino (University of Wolverhampton), Paola Pietrandrea et Sangwan Jeon (Université de Lille), avec la collaboration d’Adelina Stojan (Université de Lille), dans le cadre du projet *OLiNDiNUM, Observatoire LINguistique du DIscours NUMérique* <https://olindinum.huma-num.fr/>, coordonné par Paola Pietrandrea. La création du corpus a été financée par l’Université de Lille, Projet d’Internationalisation 2021 - l’Université Franco-italienne / Università Italo Francese - Campus France (Partenariats Hubert Curien) : Italie - PHC Galilée 2018-19, Pays-Bas - PHC Van Gogh 2018-19. Le corpus est librement accessible via les plateformes Ortolang <https://www.ortolang.fr/market/corpora/migr-twit-corpus/v1> et Zenodo <https://zenodo.org/record/7347479#.Y5ee5naZMuE>. Elena Battaglia, Guido Blandino, Sangwan Jeon, Paola Pietrandrea

1 0

Reminder CILC2023
by Paula Rodríguez-Puente 12 Dec '22

12 Dec '22

Dear colleagues, As a reminder, we send information on the third call for papers for the *XIV International Conference on Corpus Linguistics *which will be held in Oviedo 10-12 May 2023. *IMPORTANT INFORMATION:* 1) *Deadline extension: 15 January 2023.* 2) Several hotels in Oviedo have agreed to offer *special prices <https://cilc2023.wordpress.com/accommodation-travel-information/> *for conference attendants. Please, visit the conference website for further information: *https://cilc2023.wordpress.com/* <https://urldefense.com/v3/__https://cilc2023.wordpress.com/__;!!D9dNQwwGXtA…> *SUBMISSION OF PROPOSALS:* *https://old.linguistlist.org/confservices/14CILC.OVIEDO* <https://urldefense.com/v3/__https://old.linguistlist.org/confservices/14CIL…> Best regards, The Organising Committee -- Paula Rodríguez Puente paula.r.puente(a)gmail.com http://www.usc-vlcg.es/PRP.htm

1 0

2026

2025

2024

2023

2022

Corpora December 2022