UMRs in Boulder Summer School - 3rd Call for Applications - DEADLINE EXTENDED to Feb. 9, 2024
University of Colorado, Boulder, June 10-13, 2024
Held in conjunction with the UMR Parsing Workshop, June 14, 2024
https://umr4nlp.github.io/web/SummerSchool.html
Impressive progress has been made in many aspects of natural language processing (NLP) in recent years. Most notably, the achievements of transformer-based large language models such as ChatGPT would seem to obviate the need for any type of semantic representation beyond what can be encoded as contextualized word embeddings of surface text. Advances have been particularly notable in areas where large training data sets exist, and it is advantageous to build an end-to-end training architecture without resorting to intermediate representations. For any truly interactive NLP applications, however, a more complete understanding of the information conveyed by each sentence is needed to advance the state of the art. Here, "understanding'' entails the use of some form of meaning representation. NLP techniques that can accurately capture the required elements of the meaning of each utterance in a formal representation are critical to making progress in these areas and have long been a central goal of the field. As with end-to-end NLP applications, the dominant approach for deriving meaning representations from raw textual data is through the use of machine learning and appropriate training data. This allows the development of systems that can assign appropriate meaning representations to previously unseen text.
In this four-day course, instructors from the University of Colorado and Brandeis University will describe the framework of Uniform Meaning Representations (UMRs), a recent cross-lingual, multi-sentence incarnation of Abstract Meaning Representations (AMRs), that addresses these issues and comprises such a transformative representation. Incorporating Named Entity tagging, discourse relations, intra-sentential coreference, negation and modality, and the popular PropBank-style predicate argument structures with semantic role labels into a single directed acyclic graph structure, UMR builds on AMR and keeps the essential characteristics of AMR while making it cross-lingual and extending it to be a document-level representation. It also adds aspect, multi-sentence coreference and temporal relations, and scope. Each day will include lectures and hands-on practice.
Topics to be covered June 10-13:
1. The basic structural representation of UMR and its application to multiple languages;
2. How UMR encodes different types of MWE (multi-word expressions), discourse and temporal relations, and TAM (tense-aspect-modality) information in multiple languages, and differences between AMR and UMR;
3. Going from IGT (interlinear glossed text) to UMR graphs semi-automatically;
4. Formal semantic interpretation of UMR incorporating a continuation-based semantics for scope phenomena involving modality, negation, and quantification;
5. Extension to UMR for encoding gesture in multimodal dialogue, Gesture AMR (GAMR), which aligns with speech-based UMR to account for situated grounding in dialogue.
The fifth day of the summer school, June 14, will be co-located with a UMR Parsing Workshop, focusing on parsing algorithms that generate AMR and UMR representations over multiple languages.
https://umr4nlp.github.io/web/UMRParsingWorkshop.html
Participation will be fully funded (reasonable airfare, lodging, and meals). This summer school has been made possible by funding from NSF Collaborative Research: Building a Broad Infrastructure for Uniform Meaning Representations (Award # 2213805), with additional support from the University of Colorado Boulder and the CLEAR Center.
To apply, please complete this form by Feb. 9, 2024.
https://www.colorado.edu/linguistics/umrs-boulder-summer-school-application
Other important dates:
● Notification of acceptance: Feb. 20, 2024
● Confirmation of participation: Mar. 1, 2024
● Arrival in Boulder June 9, departure June 15, 2024.
/*SOMD: Shared Task on Software Mention Detection in Scholarly
Publications*/
collocated with 1st Workshop on Natural Scientific Language Processing
and Research Knowledge Graphs (NSLP 2024)
26 or 27 May 2024 (tbc)
Hersonissos, Crete, Greece
(co-located with ESWC2024)
Website: https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html
* Task Description*
***********************
Scientific research is almost exclusively published in unstructured text
formats, which are not readily machine-readable. Thus, information
extraction methods have been used widely to extract entities of
different types from scholarly publication.
While software are important parts of the scientific process and should
therefore be recognized as first class citizen of research, methods for
software mention detection are still not widely available and used.
Given the scale and heterogeneity of software citations, robust methods
are required to detect and disambiguate mentions of software and related
metadata. The SOftware Mention Detection in Scholarly Publications
(SOMD) task will utilise the SoMeSci – Software mentions in Science–
corpus to address three different subtasks in the context of software
citations. Participants can sign up for one or more subtasks. Automated
evaluations of submitted systems are done through the Codalab platform.
Subtask I: Software mention recognition.
Subtask II: Additional information.
Subtask III: Relation classification.
More infos about the task and how to participate at
https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html
* Important dates *
************************
* Training and test data: already released
* Deadline for system submissions: February 22, 2024
* Organisers *
*********************
* Stefan Dietze (GESIS Leibniz Institut für Sozialwissenschaften,
Cologne & Heinrich-Heine-University Düsseldorf, Germany)
* Frank Krüger (Wismar University of Applied Sciences, Germany)
* Saurav Karmarkar (GESIS Leibniz Institut für Sozialwissenschaften,
Cologne Germany)
* Contact *
*****************
* Frank Krüger (frank.krueger(a)hs-wismar.de)
Postdoc in Sociolinguistics at the University of Iceland
Job percentage: 100%
Application deadline until end of: 15.02.2024
*See ad on Euraxess:*https://euraxess.ec.europa.eu/jobs/189143
(Note that knowledge of Icelandic is not required at the time of applying.)
The Language and Technology lab at the University of Iceland, led by
associate professor Dr. Anton Karl Ingason, is seeking to hire a full time
post-doctoral researcher in sociolinguistics. The position is initially for
12 months and can be extended by 12 additional months. The position is a
part of the project Explaining Individual Lifespan Change (EILisCh); this
is a five-year research project which is backed by the European Research
Council (ERC). The goal of this project is to explain Individual Lifespan
Change in linguistic behavior, drawing on recent advances in
sociolinguistics, quantitative syntactic theory, clinical linguistics, as
well as resources recently made available by Language Technology.
Our group works at the intersection of Language and Technology. In addition
to our work on Lifespan Change, we focus on automated assistance for
language use (such as proofreading), corpora (especially treebanks),
analysis of Cognitive Decline, and parsing, Language Technology
infrastructure, and the interfaces between language, society, and
technology. We emphasize work that is related to the Icelandic language but
the methods we use are in general language-independent.
Our group: http://linguist.is/language-and-technology-lab/
*Tasks:*
The person that will be hired will be using Natural Language Processing
tools to extract information about variables from transcribed speech and
they will develop models that account for sociolinguistic trajectories in
the data.
*Requirements:*
- PhD degree in a discipline related to Sociolinguistics and
quantitative data analysis or an expected PhD award date (with evidence)
before the start date of the position.
- Python and R.
- Ability to analyze quantitative findings using modern statistical
methods
- Effective collaboration skills and experience with working in a group.
- Good written and spoken English language skills.
- Ability to actively participate in preparing grant proposals.
Wages according to the current collective agreement by the Minister of
Finance and Economic Affairs and the relevant trade union.
The position's start date is in the summer or fall of 2024.
This is mostly an in-office, in Iceland, position, at a physical lab.
Working remotely from abroad is only available to a limited extent, such as
for shorter term travel, as agreed upon by the PI.
The application materials must be submitted before the application
deadline. The application must be in English or Icelandic and must include:
- A letter that explains why you are the right candidate for the job.
- A detailed CV with a list of publications and other relevant items.
- Full text of your most important publications (in your opinion). In
the case of co-authored work, describe your role in the work in question.
- Documentation of academic degrees (degree certificates).
- Names and emails of two references.
All applications will be answered and applicants will be informed about the
appointment when a decision has been made. We may request more information
to help us assess your application. Applications may be valid for six
months.
Appointments to positions at the University of Iceland are made in
consideration of the Equal Rights Policy
<http://english.hi.is/university/equal_rights_policy> of the University of
Iceland.
The University of Iceland has a special Language Policy
<https://english.hi.is/node/24581>. Note that knowledge of Icelandic is not
required at the time of applying.
*Specialized assistance and practical support is offered to all incoming
international staff and their families on various issues related to moving
to Iceland. More information can be found at the University of Iceland
website, **International Staff Service*
<https://english.hi.is/international_staff_services>*.*
Job percentage: 100%
Application deadline until end of: 15.02.2024
*More info provided by*
Eiríkur Smári Sigurðarson - esmari(a)hi.is -
Anton Karl Ingason - antoni(a)hi.is -
*Where to apply:*
https://radningarkerfi.orri.is/?s=36312&oj_Router=1N4IgTg9hAuIFwgPwGcC8BmAb…
--
www.linguist.is
The Cog-SUP <https://cog-sup.fr/>master's degree is an interdisciplinary and collaborative master’s program in Cognitive Science, taught in English and heir of the Cogmaster <https://cogmaster.ens.psl.eu/en>. We offer a very broad interdisciplinary openness and a fundamentally collaborative spirit, bringing together professors, researchers and students from a wide range of backgrounds in the cognitive sciences and beyond.
Among the various tracks offered by Cog-SUP, we would like to draw your attention to the Computational Linguistics track. The track enables students to acquire genuine expertise in the concepts, methods and techniques specific to the field. A common core curriculum and introductory courses to the other tracks create a common culture right from the first year. In the second year, most courses are taught in English, are entirely interdisciplinary and open to all tracks. In this way, we aim to train specialists in computational linguistics who possess both solid disciplinary expertise and a broad interdisciplinary culture, the two keys to fruitful collaboration between disciplines.
The application procedure can be found here, <https://cog-sup.fr/application/> and the registration platform is open here <https://apply.cog-sup.fr/>. Please note that the registration period begins on January 17, 2024 and ends on March 10, 2024.
Do not hesitate to spread the word!
Benoit Crabbé and François Yvon.
Useful links:
Cog-SUP: https://cog-sup.fr/about/
Applications: https://cog-sup.fr/application/
[Apologies for cross-posting]
Dear linguists,
We would like to remind you that this is the last week of submitting your abstract to the NooJ Conference!
The linguistic software- NooJ, is organising its 18th International Conference in Bergamo, italy! This conference is for linguists, scholars, and professionals to engage in thought-provoking discussions on a myriad of topics encompassing Natural Language Processing (NLP), Linguistic Resources, Digital Humanities, and Language in Society.
We are thrilled to invite you to apply for the Call for Papers by the 4th of FEB, which covers the following topics:
📚NLP Societal applications and citizen science:
Typography, Spelling, Syllabification, Phonemic and Prosodic Transcription, Morphology, Lexical Analysis, Local Syntax, Structural Syntax, Transformational Analysis, Paraphrase Generation, Semantic Annotations, Semantic Analysis.
🗣️Linguistic Resources:
Corpus Linguistics, Discourse Analysis, Sentiment analysis, Literature Studies, Second-Language Teaching, Narrative content analysis, Corpus processing for the Social Sciences.
🧠Digital Humanities:
Business Intelligence, Text Mining, Text Generation. Language Teaching Software, Automatic Paraphrasing, Machine Translation, etc.
💻Natural Language Processing Applications:
Computational Socio-Linguistic (migration, geography, tourism, political discourse, cinema, social media, gender studies…)
Important dates!
Abstract Submission: Feb 4 2024
Notification of accept: March 10 2024
Camera ready: March 24 2024
Early bird registrations: From March 11 to March 31st 2024
Deadline for the other registrations: April 15 2024
Selected papers submission: Sept 15 2024
Important links!
NooJ Conference website: https://nooj2024.x-23.org/
Submitting the paper via EasyChair: https://easychair.org/conferences/?conf=18njhttps://easychair.org/conferences/?conf=18nj
A selection of the papers presented at the 18th NooJ International Conference 2024 will be published by Springer Verlag in their CCIS Series (Communication in Computer and Information Sciences). CCIS is abstracted/indexed in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, Scopus. CCIS volumes are also submitted for the inclusion in ISI Proceedings. Deadline for submission of full camera-ready papers is September 15th, 2024.
Please feel free to contact us in case of any questions.
Best,
The 18th NooJ Conference Organisation Board
__________________
THE 18TH NOOJ INTERNATIONAL CONFERENCE 2024
JUN 4th to 7th, 2024 — Bergamo, Italy
Managed by The Nooj Association
Powered and hosted by X23 Srl
*********************************************************************************
Second Call for Papers:
The 6th workshop on: "Open-Source Arabic Corpora and Processing Tools (OSACT6) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation"
Workshop: co-located with LREC-COLING 2024 | Torino (Italia) | 20-25 May, 2024
The OSACT6 Workshop invites the submission of long and short papers on current language resources, tools and technologies and Issues in the design, construction and use of Arabic language resources.
In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Arabic LLMs Hallucination and Dialect to MSA Machine Translation.
Website: https://osact-lrec.github.io/
Shared Tasks:
Task 1: Arabic LLMs Hallucination
Task 2: Dialect to MSA Machine Translation
Important dates:
Submission deadline: Feb 25, 2024
Paper acceptance notification: March 25, 2024
Camera-ready versions: March 30, 2024
OSACT 2024 day: May 25, 2024
LREC-COLING 2024 conference: 20–25 May 2024
Don’t miss this opportunity to contribute to a pioneering field!
***********************************************************************************
OSACT6 workshop encourages researchers and practitioners of Arabic language technologies, including CL, NLP and IR to share and discuss their latest research efforts, corpora, and tools. The workshop will also give special attention to Large Language Models (LLMs) and Generative AI, which is a hot topic nowadays. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Arabic LLMs Hallucination and Dialect to MSA Machine Translation.
We are inviting papers on topics including, but not limited to, the following topics:
Pre-trained Arabic language models and their applications.
Surveying and evaluating the design of available Arabic corpora, their associated and processing tools.
Availing new annotated corpora for NLP and IR applications such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning.
Evaluating the use of crowdsourcing platforms for Arabic data annotation.
Open source Arabic processing toolkits.
Language modeling and pre-trained models.
Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, etc.
Sentiment analysis, dialect identification, and text classification.
Dialect translation.
Fake news detection.
Web and social media search and analytics.
Issues in the design, construction, and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data.
Guidelines, standards, best practices, and models for LRs interoperability.
Methodologies and tools for LRs construction and annotation.
Methodologies and tools for extraction and acquisition of knowledge
Guidelines, standards, best practices and models for LRs interoperability.
Methodologies and tools for LRs construction and annotation.
Methodologies and tools for extraction and acquisition of knowledge.
Ontologies, terminology and knowledge representation.
LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.).
Submissions for both short and long papers will be made directly via START, following submission guidelines issued by LREC-COLING 2024.
Paper submission instructions: https://lrec-coling-2024.org/authors-kit/
Paper submission: https://softconf.com/lrec-coling2024/osact2024/
For full submission details please refer to our workshop website here.
Contact email: OSACT.W...(a)gmail.com
The OSACT 2024 Organizing Committee
Hend Al-Khalifa, King Saud University, KSA;
Hamdy Mubarak, Qatar Computing Research Institute, Qatar;
Kareem Darwish, aiXplain Inc., US;
Tamer Elsayed, Qatar University, Qatar;
Mona Ali, Northeastern University, Canada
Looking forward to your participation and to seeing you in LERC-COLING in May 2024!
************************************************************************************
* Deadline extended to February 2, 2024 *
You are invited to submit your contribution to the 14th international workshop on Bibliometric-enhanced Information Retrieval (BIR 2024), to be held as part of the 46th European Conference on Information Retrieval (ECIR 2024, https://www.ecir2024.org/) in Glasgow, Scotland.
https://sites.google.com/view/bir-ws/bir-2024
The workshop is planned as an onsite event. We encourage all speakers to join us in Glasgow (UK).
=== Important Dates ===
All dates are in Anywhere on Earth – AoE Time Zone
- Submissions: 2 February 2024
- Notifications: 19 February 2024
- Camera Ready Contributions: 3 March 2024
- Workshop: 24 March 2024
=== tl;dr ===
The Bibliometric-enhanced Information Retrieval (BIR) workshop series at ECIR tackles issues related to academic search, at the intersection between Information Retrieval and Bibliometrics. BIR is a hot topic investigated by both academia and industry (e.g., Dimensions, Lens, Google Scholar, scite.ai, Semantic Scholar). The BIR workshop at ECIR is a full-day workshop.
An overview of the BIR/BIRNDL workshop series can be found at: https://sites.google.com/view/bir-ws/home. Past BIR proceedings are available online at https://dblp.org/search?q=BIR.ECIR as open access.
=== Keywords ===
Academic Search • Information Retrieval • Digital Libraries • Bibliometrics • Scientometrics
=== Workshop Topics ===
During BIR 2024, we address, but are not limited to, the following current research topics regarding 4 aspects of the academic search and recommendation process:
User needs and behaviour regarding scientific information, such as:
Finding relevant papers/authors for a literature review.
Identifying expert reviewers for a given submission.
Understanding information-seeking behaviour and HCI in academic search.
Filtering high-quality research papers, e.g., in preprint servers.
Measuring the degree of plagiarism in a paper.
Flagging predatory conferences and journals, or other forms of scientific misbehaviour.
Mining the scientific literature, such as:
Information extraction, text mining and parsing of scholarly literature.
Natural language processing of scientific papers (e.g., citation contexts).
Discourse modelling and argument mining.
Academic search/recommendation systems, such as:
Modelling the multifaceted nature of scientific information.
Building test collections for reproducible BIR.
System support for literature search and recommendation.
Computational methods for systematic reviewing.
Generative AI and Large Language Models with bibliometric-enhanced IR, such as:
Retrieval-augmented LLMs for academic search and recommendation.
LM-enhanced retrieval and recommendation in scholarly settings.
Challenges with generative LLMs for scholarly texts and references.
We especially invite descriptions of running projects and ongoing work as well as contributions from industry. Papers that investigate multiple themes directly are especially welcome.
=== Submission Details ===
All submissions must be written in English following the CEURART 1-column paper style (6 pages (short paper), 12 pages (full paper)/, please see below) and should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. Please be aware of the fact that at least one author per paper needs to register for the workshop and attend the workshop to present the work. In case of no-show the paper (even if accepted) will be deleted from the proceedings AND from the program.
CEURART (incl. LaTeX and Word templates)
https://ceurws.wordpress.com/2020/03/31/ceurws-publishes-ceurart-paper-styl…
Submission via EasyChair:
https://easychair.org/conferences/?conf=bir2024
Page limits:
Full paper: 12 pages excluding references
Short paper: 6 pages excluding references
Workshop proceedings will be deposited online in the CEUR workshop proceedings publication service (ISSN 1613-0073) - this way the proceedings will be permanently available and citable (digital persistent identifiers and long-term preservation).
=== Workshop Chairs ===
Ingo Frommholz, University of Wolverhampton, UK
Philipp Mayr, GESIS - Leibniz Institute for the Social Sciences, Germany
Guillaume Cabanac, University of Toulouse, France
Suzan Verberne, Leiden University, the Netherlands
For any enquiries please email bir2024(a)easychair.org.
--
Ingo Frommholz (he/him), PhD, FBCS, FHEA
Reader (~Associate Professor) in Data Science
ACM CIKM 2023 General Chair
Head of Data, AI, Interaction, Retrieval and Language Group http://dairel.org
Deputy Head Digital Innovations and Solutions Centre (DISC)
University of Wolverhampton, UK
Adjunct Professor, Bern University of Applied Sciences, Switzerland
Web: http://www.frommholz.org/ | Email: ifrommholz(a)acm.org
Twitter: @iFromm | Mastodon: @ingo@idf.social
PGP/GPG fingerprint: B74E A422 C7B2 A5BB 2BC2 523B 2790 216E F8F8 D166
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x2790216EF8F8D166
School of Computer Science and Digital Technologies, Aston University, UK, is offering two PhD positions in language and speech processing in the following two topics. The application deadline is 16th February 2024. Applications for the position can be submitted via Aston's PGR webpage (https://www.aston.ac.uk/graduate-school/how-to-apply/studentships). Enquiries about the positions can be made to Dr Tharindu Ranasinghe, School of Computer Science and Digital Technologies, Aston University, UK - t.ranasinghe(a)aston.ac.uk .
Building Trustworthy Automatic Speech Recognition Systems
Dr Tharindu Ranasinghe<https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> (School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr <https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> Phil Weber<https://research.aston.ac.uk/en/persons/phil-weber> (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Prof Aniko Ekart<https://research.aston.ac.uk/en/persons/aniko-ek%C3%A1rt> (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr Muhidin Mohamed<https://research.aston.ac.uk/en/persons/muhidin-mohamed> (College of Business and Social Sciences - Operations & Information Management)
Project Summary, Aim and Objectives:
Automatic Speech Recognition (ASR) has gained popularity in the last decade thanks to advancements in speech and natural language processing, along with the availability of powerful hardware for processing extensive data streams. ASR is crucial in transcription services for various sectors, including legal, healthcare, and entertainment. It also plays a vital role in e-learning platforms, customer support systems, and enhancing accessibility for individuals with disabilities. Additionally, ASR significantly contributes to language translation, making it widely adopted across diverse sectors.
Although ASR has come a long way in recent years, it still has limitations, and the produced output is far from perfect. However, most commercial ASR systems do not explicitly state this to the user, leaving the user to assume that the output is accurate. Most large-scale ASR systems perform better for widely spoken languages, while low-resource languages have lower quality. ASR systems also struggle to handle different accents and dialects, especially of non-native speakers. Furthermore, most ASR systems are trained in the general domain and do not perform optimally in specific domains such as healthcare. These limitations result in wrong outputs, and the lack of transparency and accountability can lead to severe consequences, especially in critical domains such as healthcare or legal. Therefore, a quality indicator for ASR systems has become essential as they can play a significant role in informing the user about the output quality.
This PhD research aims to develop a comprehensive quality indicator system for ASR. The specific goals are (1) Investigate what makes ASR trustworthy (2) Evaluate ASR systems in challenging scenarios (3) Design quality indicator metrics in ASR (i.e. sentence level scores, word level error spans, critical errors, etc.) (4) Introduce public benchmarks and investigate novel approaches for predicting quality in ASR. The output of the PhD will contribute towards trustworthy ASR systems..
Knowledge and skills required in applicant:
Natural Language Processing, Speech Processing, Machine Learning and Deep Learning. The applicant should be familiar with Python and neural network framework(s) such as PyTorch and TensorFlow and should have excellent programming skills.
Evidence-based detection of misuse of large language models
Dr<https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> Phil Weber<https://research.aston.ac.uk/en/persons/phil-weber> (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr Tharindu Ranasinghe<https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> (School of Computer Science and Digital Technologies - Applied AI & Robotics Department)
Dr Muhidin Mohamed<https://research.aston.ac.uk/en/persons/muhidin-mohamed> (College of Business and Social Sciences - Operations & Information Management)
Dr Paul Grace<https://research.aston.ac.uk/en/persons/paul-grace> (Cyber Security Innovation Research Centre – CSI, School of Computer Science and Digital Technologies - School of Computer Science and Digital Technologies)
Project Summary, Aim and Objectives:
Large language models (LLMs) have become ubiquitous since the release of ChatGPT, bringing a paradigm shift in the processing and generation of text, images, speech and video. New methods for training very large neural models using massive unlabelled data created the opportunity for foundation models able to generate data with apparently human-like ability. Publicly available pre-trained models facilitate novel tools; Google Gemini, Microsoft Co-Pilot, Dall-E and many start-ups allow non-experts to conversationally instruct and use AI systems in everyday life, seamlessly employing complex technologies including automatic speech recognition, natural language processing, machine translation and image captioning.
New dangers accompany this rapid and unstructured step-change in technology. Beyond unease over energy use, environmental impact, and digital divides, many are concerned with the ease with which fake media increasingly difficult to distinguish from real media can be created. In education, plagiarism detection becomes more nuanced with the need to identify AI-generated text. In the justice domain, forensic determination of the source of a voice or face is obfuscated by the potential that it was artificially generated. Politicians worry about the impact on democracy of undetectable deepfakes, and cybersecurity experts about identity theft. The problems are exacerbated by the potential for LLM-generated data to be reused for training downstream models.
Scientifically well-founded methods for detecting and quantifying the risk of LLM-generated media are therefore urgently needed.
This project builds on established methods in forensic data analysis to develop rigorous methods for detecting AI-generated media. Specifically: 1) review existing approaches to detecting AI-generated and spoofed media, 2) build on methods for forensic voice comparison to develop and validate new approaches to forensic text comparison, 3) apply to detecting plagiarism and deep fakes, 4) extend to image data, 4) propose principles to contribute to broader questions of safe, fair and transparent use of LLMs.
Knowledge and skills required in applicant:
Strong programming skills, preferably in Python, including development of large language models. Knowledge of machine learning theory, applications, and related statistical and probability theory. Awareness of modern approaches to forensic data science.
Dear Colleagues,
We at the University have eight openings for professional teaching
faculty at the University of Maryland at all levels of seniority. The
minimum requirement is a MS degree (although PhD is a plus), and one
of the degrees needs to be in CS or a related field (computational
linguistics, information science, etc. all count). All areas are
needed, including computational linguistics and data science (and I'd
particularly want to see those kinds of applications!).
You'd be teaching courses at all levels of the curriculum: from
introductory courses to courses around your research specialty to
supervising undergraduate research or collaborating with the faculty
at the University of Maryland.
We're located just outside Washington, DC, an exceedingly
international city. Please consider applying here or forwarding to
your colleagues:
https://ejobs.umd.edu/postings/116061
The best consideration date is 02/03/2024.
Best,
Jordan
***********************************************************************************
Second Call for Papers:
The 5th workshop on: "Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from
people with various forms of cognitive/psychiatric/developmental impairments"
Workshop: co-located with LREC-COLING 2024 | Turin, Italy | May 21st, 2024
RaPID-5 serves as an interdisciplinary platform for researchers to exchange insights, methods, and experiences related to collecting and processing data from individuals with mental, cognitive, neuropsychiatric, or neurodegenerative impairments. The workshop focuses on creating, processing, and applying such data resources from individuals at different stages and severity levels of these impairments. The ultimate goal of RaPID-5 is to facilitate the study of relationships among linguistic, paralinguistic, and extra-linguistic observations, with applications ranging from aiding diagnosis to enhancing monitoring and predicting individuals at higher risk, ultimately promoting multidisciplinary collaboration across clinical, language technology, computational linguistics, and computer science communities.
Submission deadline: Sun., 17th of March, 2024 (anywhere on earth - new date!)
Paper submission: https://softconf.com/lrec-coling2024/rapid2024/
Website and more details: https://spraakbanken.gu.se/en/rapid-2024
Contact: Dimitrios Kokkinakis
Contact email: dimitrios.kokkinakis(a)gu.se<mailto:dimitrios.kokkinakis@gu.se>
Invited Speakers:
* Dr. Alexandra König, BSc MSc PhD, Institut national de recherche en informatique et en automatique (INRIA); Cobtek (Cognition; Behaviour; Technology) Lab; University Côte d'Azur, France
* Prof. Maria Liakata, EPSRC/UKRI Turing Institute AI fellow, Queen Mary University of London, UK
Organizing committee:
* Kathleen C. Fraser, National Research Council, Canada;
* Dimitrios Kokkinakis, University of Gothenburg, Sweden;
* Kristina Lundholm Fors, Lund University, Sweden;
* Charalambos K. Themistocleous, University of Oslo, Norway;
* Athanasios Tsanas, The University of Edinburgh, UK;
* Fredrik Öhman, University of Gothenburg and Sahlgrenska University Hospital, Sweden
************************************************************************************