January 2024 - Corpora

CfP: Workshop on Advanced Analysis and recognition of Parliamentary corpora (ARPC) [Updated Due Date for Paper Submission: 15 March 2024]
by George Mikros 30 Jan '24

30 Jan '24

Apologies for cross-posting ======================= CALL FOR PAPERS (Updated Due Date for Paper Submission: 15 March 2024) Workshop on Advanced analysis and recognition of parliamentary corpora (ARPC) The ARPC organizing committee invites papers for the workshop to be held in physical format during the ICDAR 2024 conference (August 30 - September 4, 2024) in Athens, Greece (https://icdar2024.net/). The exact date of the ARPC workshop will be communicated soon. Workshop Context Data-driven insights from archives have the potential to steer academic research in a variety of fields. This workshop attempts to address the growing importance of employing advanced recognition and analytical methods and tools to decode the complexities within legislative and administrative documents of parliamentary origin. The workshop will deep dive into cutting-edge OCR techniques for parliamentary corpora. Further attention will be placed into recognizing patterns, extracting meaningful insights and understanding the intricate dimensions of contemporary and historical parliamentary discourse. The relevance of this topic lies in its potential to bridge previously isolated domains of research, fostering interdisciplinary collaboration. By connecting history, political science, and linguistics, participants will unlock a richer understanding of legislative evolution, political trends, and linguistic nuances embedded in parliamentary proceedings. A keynote presentation will open the workshop, followed by a couple of sessions dedicated to specific topics related to the analysis and recognition of parliamentary corpora. Each session will be concluded by a structured panel discussion. The organization of the ARPC workshop is supported by the Hellenic OCR Team. We encourage the authors to submit papers on the topics detailed below. Topics - The recognition of polytonic Greek fonts - Recognition of mixed text (printed and handwritten) - Parliamentary discourse analysis - Historical trends in parliamentary language use - Integration of linguistic and political science methodologies in OCR - Cross-lingual OCR challenges in parliamentary texts - Machine learning approaches for semantic analysis of parliamentary proceedings - Ethical considerations in the digitization and analysis of parliamentary records - Developing standardized formats for parliamentary data preservation - The role of OCR technology in enhancing public access to parliamentary archives - Comparative analysis of parliamentary rhetoric across different eras - The impact of digital humanities tools on legislative studies - Application of Natural Language Processing techniques in political discourse analysis - Automated categorization and indexing of parliamentary documents - Challenges and solutions in digitizing non-standard parliamentary texts. Paper Tracks There is both a standard conference paper track and a journal track at ICDAR 2024; details regarding the journal track may be found in a separate Call for Papers on the conference website, https://icdar2024.net/. ICDAR 2024 will follow a double blind review process. Authors should not include their names and affiliations anywhere in the manuscript. Authors should also ensure that their identity is not revealed indirectly by citing their previous work in the third person and omit acknowledgements until the camera-ready version. Important Dates 15 March 2024 - Paper submission deadline 19 April 2024 - Paper acceptance notification 30 April 2024 - Camera-ready paper 31 August 2024 - ARPC Workshop Submission Guidelines & Enquiries All proposals should be submitted electronically via an easychair online submission form: https://easychair.org/conferences/?conf=icdar2024 . Enquiries should be sent to pc-chairs(a)icdar2024.net <mailto:pc-chairs@icdar2024.net> . The submitted papers will respect the same policy and conditions of ICDAR 2023 conference papers. Papers should be formatted according to the instructions and style files provided by Springer. Papers accepted for the conference will be allocated up to 15 pages (usually not counting references) in the proceedings. Submissions are expected to be in the range of 10-15 pages. Each accepted paper requires at least one author to perform a full registration. The registration fee for only workshop participants will be discounted. Publisher ICDAR 2024 proceedings will be published under the Springer Lecture Notes in Computer Science (LNCS) series. This provides the proceedings of the conference and the workshops with an excellent online accessibility, including free access to SpringerLink via links on the conference website during one year after the publication and free access for everyone in SpringerLink four years after the publication. Organizing Committee Dr. Fotios Fitsilis (Scientific Service, Hellenic Parliament) Email: fitsilisf(a)parliament.gr <mailto:fitsilisf@parliament.gr> Prof. George Mikros (College of Humanities and Social Sciences, Hamad Bin Khalifa University), Email: gmikros(a)hbku.edu.qa <mailto:gmikros@hbku.edu.qa>

1 0

[CfP] e-Commerce and NLP Workshop @ LREC-COLING 2024
by besnikf＠amazon.com 30 Jan '24

30 Jan '24

The Seventh Workshop on e-Commerce and NLP (ECNLP 7) Co-located with LREC-COLING 2024 in Torino, Italy – May 21, 2024 https://sites.google.com/view/ecnlp/ Submission Deadline: Friday Feb 23, 2024 - 23:59pm (AoE) ECNLP focuses on NLP for e-Commerce and online shopping applications. We welcome papers covering all aspects on online commerce and data, including search, retrieval, and customer-facing applications and tasks. Important Dates Submission Deadline: Friday Feb 23, 2024 - 23:59pm (AoE) Acceptance Notification: Friday March 29, 2024 Camera-ready versions: Friday April 12, 2024 Workshop: Tuesday May 21, 2024 Instructions for Authors Papers must be submitted in PDF format using the official LREC-COLING template. More details available on the website. Additional Information and Contact Details https://sites.google.com/view/ecnlp/home/ Workshop Scope ECNLP invites quality research contributions as short or long papers. All submissions will undergo a double-blind review process, and accepted submissions will be presented at the workshop. NLP and IR have been powering e-Commerce applications since the early days of the fields. Today, NLP and IR already play a significant role in e-commerce tasks, including product search, recommender systems, product question answering, machine translation, sentiment analysis, product description and review summarization, and customer review processing, among many other tasks. With the exploding popularity of chatbots and shopping assistants – both text- and voice-based – NLP, IR, question answering, and dialogue systems research is poised to transform e-commerce once again, but requires a forum where new and unfinished ideas could be discussed. The ECNLP workshop will provide a venue for the dissemination of NLP and IR research results related to e-commerce and online shopping, bringing together researchers from both academia and industry. The workshop welcomes submission of late-breaking and preliminary research results, as well as opinion and position papers. Topics of interest include but are not limited to: - Product classification and cataloguing (including into types and hierarchies) - NER for products, brands, attributes, and part names - Search and product query auto-completion - Recommender systems and product suggestions - Machine Translation applied to e-commerce (e.g. translating product titles/reviews) - Voice & dialogue-based e-commerce applications; ASR for e-commerce - Advertising and ad prediction/forecasting models - Fraud and spam detection in e-commerce (e.g. in customer reviews/comments) - Product description and review summarization - Product similarity and matching of seller-provided listings to catalog products - Technical support request processing (user emails, chat agents, etc.) - E-commerce related social media processing - The intersection of Computer Vision and NLP (e.g. product images and text) - Product Question Answering - Shopping assistants, agents, and chat bots - Sentiment analysis, opinion mining, and stance detection in user-generated content - Relevant resources and datasets Thank you, The ECNLP Organizing Committee

1 0

UMRs in Boulder Summer School - 3rd Call for Applications - DEADLINE EXTENDED to Feb. 9, 2024
by kristine.stenzel＠colorado.edu 29 Jan '24

29 Jan '24

UMRs in Boulder Summer School - 3rd Call for Applications - DEADLINE EXTENDED to Feb. 9, 2024 University of Colorado, Boulder, June 10-13, 2024 Held in conjunction with the UMR Parsing Workshop, June 14, 2024 https://umr4nlp.github.io/web/SummerSchool.html Impressive progress has been made in many aspects of natural language processing (NLP) in recent years. Most notably, the achievements of transformer-based large language models such as ChatGPT would seem to obviate the need for any type of semantic representation beyond what can be encoded as contextualized word embeddings of surface text. Advances have been particularly notable in areas where large training data sets exist, and it is advantageous to build an end-to-end training architecture without resorting to intermediate representations. For any truly interactive NLP applications, however, a more complete understanding of the information conveyed by each sentence is needed to advance the state of the art. Here, "understanding'' entails the use of some form of meaning representation. NLP techniques that can accurately capture the required elements of the meaning of each utterance in a formal representation are critical to making progress in these areas and have long been a central goal of the field. As with end-to-end NLP applications, the dominant approach for deriving meaning representations from raw textual data is through the use of machine learning and appropriate training data. This allows the development of systems that can assign appropriate meaning representations to previously unseen text. In this four-day course, instructors from the University of Colorado and Brandeis University will describe the framework of Uniform Meaning Representations (UMRs), a recent cross-lingual, multi-sentence incarnation of Abstract Meaning Representations (AMRs), that addresses these issues and comprises such a transformative representation. Incorporating Named Entity tagging, discourse relations, intra-sentential coreference, negation and modality, and the popular PropBank-style predicate argument structures with semantic role labels into a single directed acyclic graph structure, UMR builds on AMR and keeps the essential characteristics of AMR while making it cross-lingual and extending it to be a document-level representation. It also adds aspect, multi-sentence coreference and temporal relations, and scope. Each day will include lectures and hands-on practice. Topics to be covered June 10-13: 1. The basic structural representation of UMR and its application to multiple languages; 2. How UMR encodes different types of MWE (multi-word expressions), discourse and temporal relations, and TAM (tense-aspect-modality) information in multiple languages, and differences between AMR and UMR; 3. Going from IGT (interlinear glossed text) to UMR graphs semi-automatically; 4. Formal semantic interpretation of UMR incorporating a continuation-based semantics for scope phenomena involving modality, negation, and quantification; 5. Extension to UMR for encoding gesture in multimodal dialogue, Gesture AMR (GAMR), which aligns with speech-based UMR to account for situated grounding in dialogue. The fifth day of the summer school, June 14, will be co-located with a UMR Parsing Workshop, focusing on parsing algorithms that generate AMR and UMR representations over multiple languages. https://umr4nlp.github.io/web/UMRParsingWorkshop.html Participation will be fully funded (reasonable airfare, lodging, and meals). This summer school has been made possible by funding from NSF Collaborative Research: Building a Broad Infrastructure for Uniform Meaning Representations (Award # 2213805), with additional support from the University of Colorado Boulder and the CLEAR Center. To apply, please complete this form by Feb. 9, 2024. https://www.colorado.edu/linguistics/umrs-boulder-summer-school-application Other important dates: ● Notification of acceptance: Feb. 20, 2024 ● Confirmation of participation: Mar. 1, 2024 ● Arrival in Boulder June 9, departure June 15, 2024.

1 0

CFP: Shared Task on Software Mention Detection in Scholarly Publications (SOMD2024)
by Stefan Dietze 29 Jan '24

29 Jan '24

/*SOMD: Shared Task on Software Mention Detection in Scholarly Publications*/ collocated with 1st Workshop on Natural Scientific Language Processing and Research Knowledge Graphs (NSLP 2024) 26 or 27 May 2024 (tbc) Hersonissos, Crete, Greece (co-located with ESWC2024) Website: https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html * Task Description* *********************** Scientific research is almost exclusively published in unstructured text formats, which are not readily machine-readable. Thus, information extraction methods have been used widely to extract entities of different types from scholarly publication. While software are important parts of the scientific process and should therefore be recognized as first class citizen of research, methods for software mention detection are still not widely available and used. Given the scale and heterogeneity of software citations, robust methods are required to detect and disambiguate mentions of software and related metadata. The SOftware Mention Detection in Scholarly Publications (SOMD) task will utilise the SoMeSci – Software mentions in Science– corpus to address three different subtasks in the context of software citations. Participants can sign up for one or more subtasks. Automated evaluations of submitted systems are done through the Codalab platform. Subtask I: Software mention recognition. Subtask II: Additional information. Subtask III: Relation classification. More infos about the task and how to participate at https://nfdi4ds.github.io/nslp2024/docs/somd_shared_task.html * Important dates * ************************ * Training and test data: already released * Deadline for system submissions: February 22, 2024 * Organisers * ********************* * Stefan Dietze (GESIS Leibniz Institut für Sozialwissenschaften, Cologne & Heinrich-Heine-University Düsseldorf, Germany) * Frank Krüger (Wismar University of Applied Sciences, Germany) * Saurav Karmarkar (GESIS Leibniz Institut für Sozialwissenschaften, Cologne Germany) * Contact * ***************** * Frank Krüger (frank.krueger(a)hs-wismar.de)

2 1

Postdoc in Sociolinguistics at the University of Iceland
by Anton Karl Ingason 29 Jan '24

29 Jan '24

Postdoc in Sociolinguistics at the University of Iceland Job percentage: 100% Application deadline until end of: 15.02.2024 *See ad on Euraxess:*https://euraxess.ec.europa.eu/jobs/189143 (Note that knowledge of Icelandic is not required at the time of applying.) The Language and Technology lab at the University of Iceland, led by associate professor Dr. Anton Karl Ingason, is seeking to hire a full time post-doctoral researcher in sociolinguistics. The position is initially for 12 months and can be extended by 12 additional months. The position is a part of the project Explaining Individual Lifespan Change (EILisCh); this is a five-year research project which is backed by the European Research Council (ERC). The goal of this project is to explain Individual Lifespan Change in linguistic behavior, drawing on recent advances in sociolinguistics, quantitative syntactic theory, clinical linguistics, as well as resources recently made available by Language Technology. Our group works at the intersection of Language and Technology. In addition to our work on Lifespan Change, we focus on automated assistance for language use (such as proofreading), corpora (especially treebanks), analysis of Cognitive Decline, and parsing, Language Technology infrastructure, and the interfaces between language, society, and technology. We emphasize work that is related to the Icelandic language but the methods we use are in general language-independent. Our group: http://linguist.is/language-and-technology-lab/ *Tasks:* The person that will be hired will be using Natural Language Processing tools to extract information about variables from transcribed speech and they will develop models that account for sociolinguistic trajectories in the data. *Requirements:* - PhD degree in a discipline related to Sociolinguistics and quantitative data analysis or an expected PhD award date (with evidence) before the start date of the position. - Python and R. - Ability to analyze quantitative findings using modern statistical methods - Effective collaboration skills and experience with working in a group. - Good written and spoken English language skills. - Ability to actively participate in preparing grant proposals. Wages according to the current collective agreement by the Minister of Finance and Economic Affairs and the relevant trade union. The position's start date is in the summer or fall of 2024. This is mostly an in-office, in Iceland, position, at a physical lab. Working remotely from abroad is only available to a limited extent, such as for shorter term travel, as agreed upon by the PI. The application materials must be submitted before the application deadline. The application must be in English or Icelandic and must include: - A letter that explains why you are the right candidate for the job. - A detailed CV with a list of publications and other relevant items. - Full text of your most important publications (in your opinion). In the case of co-authored work, describe your role in the work in question. - Documentation of academic degrees (degree certificates). - Names and emails of two references. All applications will be answered and applicants will be informed about the appointment when a decision has been made. We may request more information to help us assess your application. Applications may be valid for six months. Appointments to positions at the University of Iceland are made in consideration of the Equal Rights Policy <http://english.hi.is/university/equal_rights_policy> of the University of Iceland. The University of Iceland has a special Language Policy <https://english.hi.is/node/24581>. Note that knowledge of Icelandic is not required at the time of applying. *Specialized assistance and practical support is offered to all incoming international staff and their families on various issues related to moving to Iceland. More information can be found at the University of Iceland website, **International Staff Service* <https://english.hi.is/international_staff_services>*.* Job percentage: 100% Application deadline until end of: 15.02.2024 *More info provided by* Eiríkur Smári Sigurðarson - esmari(a)hi.is - Anton Karl Ingason - antoni(a)hi.is - *Where to apply:* https://radningarkerfi.orri.is/?s=36312&oj_Router=1N4IgTg9hAuIFwgPwGcC8BmAb… -- www.linguist.is

1 0

Computational linguistics track in the master CogSUP: registration is open
by Benoît Crabbé 29 Jan '24

29 Jan '24

The Cog-SUP <https://cog-sup.fr/>master's degree is an interdisciplinary and collaborative master’s program in Cognitive Science, taught in English and heir of the Cogmaster <https://cogmaster.ens.psl.eu/en>. We offer a very broad interdisciplinary openness and a fundamentally collaborative spirit, bringing together professors, researchers and students from a wide range of backgrounds in the cognitive sciences and beyond. Among the various tracks offered by Cog-SUP, we would like to draw your attention to the Computational Linguistics track. The track enables students to acquire genuine expertise in the concepts, methods and techniques specific to the field. A common core curriculum and introductory courses to the other tracks create a common culture right from the first year. In the second year, most courses are taught in English, are entirely interdisciplinary and open to all tracks. In this way, we aim to train specialists in computational linguistics who possess both solid disciplinary expertise and a broad interdisciplinary culture, the two keys to fruitful collaboration between disciplines. The application procedure can be found here, <https://cog-sup.fr/application/> and the registration platform is open here <https://apply.cog-sup.fr/>. Please note that the registration period begins on January 17, 2024 and ends on March 10, 2024. Do not hesitate to spread the word! Benoit Crabbé and François Yvon. Useful links: Cog-SUP: https://cog-sup.fr/about/ Applications: https://cog-sup.fr/application/

1 0

One week Countdown! Call for Papers for Linguists- 18th International NooJ Conference
by THE 18TH NOOJ INTERNATIONAL CONFERENCE 2024 29 Jan '24

29 Jan '24

[Apologies for cross-posting] Dear linguists, We would like to remind you that this is the last week of submitting your abstract to the NooJ Conference! The linguistic software- NooJ, is organising its 18th International Conference in Bergamo, italy! This conference is for linguists, scholars, and professionals to engage in thought-provoking discussions on a myriad of topics encompassing Natural Language Processing (NLP), Linguistic Resources, Digital Humanities, and Language in Society. We are thrilled to invite you to apply for the Call for Papers by the 4th of FEB, which covers the following topics: 📚NLP Societal applications and citizen science: Typography, Spelling, Syllabification, Phonemic and Prosodic Transcription, Morphology, Lexical Analysis, Local Syntax, Structural Syntax, Transformational Analysis, Paraphrase Generation, Semantic Annotations, Semantic Analysis. 🗣️Linguistic Resources: Corpus Linguistics, Discourse Analysis, Sentiment analysis, Literature Studies, Second-Language Teaching, Narrative content analysis, Corpus processing for the Social Sciences. 🧠Digital Humanities: Business Intelligence, Text Mining, Text Generation. Language Teaching Software, Automatic Paraphrasing, Machine Translation, etc. 💻Natural Language Processing Applications: Computational Socio-Linguistic (migration, geography, tourism, political discourse, cinema, social media, gender studies…) Important dates! Abstract Submission: Feb 4 2024 Notification of accept: March 10 2024 Camera ready: March 24 2024 Early bird registrations: From March 11 to March 31st 2024 Deadline for the other registrations: April 15 2024 Selected papers submission: Sept 15 2024 Important links! NooJ Conference website: https://nooj2024.x-23.org/ Submitting the paper via EasyChair: https://easychair.org/conferences/?conf=18nj https://easychair.org/conferences/?conf=18nj A selection of the papers presented at the 18th NooJ International Conference 2024 will be published by Springer Verlag in their CCIS Series (Communication in Computer and Information Sciences). CCIS is abstracted/indexed in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, Scopus. CCIS volumes are also submitted for the inclusion in ISI Proceedings. Deadline for submission of full camera-ready papers is September 15th, 2024. Please feel free to contact us in case of any questions. Best, The 18th NooJ Conference Organisation Board __________________ THE 18TH NOOJ INTERNATIONAL CONFERENCE 2024 JUN 4th to 7th, 2024 — Bergamo, Italy Managed by The Nooj Association Powered and hosted by X23 Srl

1 0

Second Call for Papers: OSACT 2024 workshop@LREC-COLING 2024
by m.zakiali80＠gmail.com 29 Jan '24

29 Jan '24

********************************************************************************* Second Call for Papers: The 6th workshop on: "Open-Source Arabic Corpora and Processing Tools (OSACT6) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation" Workshop: co-located with LREC-COLING 2024 | Torino (Italia) | 20-25 May, 2024 The OSACT6 Workshop invites the submission of long and short papers on current language resources, tools and technologies and Issues in the design, construction and use of Arabic language resources. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Arabic LLMs Hallucination and Dialect to MSA Machine Translation. Website: https://osact-lrec.github.io/ Shared Tasks: Task 1: Arabic LLMs Hallucination Task 2: Dialect to MSA Machine Translation Important dates: Submission deadline: Feb 25, 2024 Paper acceptance notification: March 25, 2024 Camera-ready versions: March 30, 2024 OSACT 2024 day: May 25, 2024 LREC-COLING 2024 conference: 20–25 May 2024 Don’t miss this opportunity to contribute to a pioneering field! *********************************************************************************** OSACT6 workshop encourages researchers and practitioners of Arabic language technologies, including CL, NLP and IR to share and discuss their latest research efforts, corpora, and tools. The workshop will also give special attention to Large Language Models (LLMs) and Generative AI, which is a hot topic nowadays. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on two shared tasks, namely: Arabic LLMs Hallucination and Dialect to MSA Machine Translation. We are inviting papers on topics including, but not limited to, the following topics: Pre-trained Arabic language models and their applications. Surveying and evaluating the design of available Arabic corpora, their associated and processing tools. Availing new annotated corpora for NLP and IR applications such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning. Evaluating the use of crowdsourcing platforms for Arabic data annotation. Open source Arabic processing toolkits. Language modeling and pre-trained models. Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, etc. Sentiment analysis, dialect identification, and text classification. Dialect translation. Fake news detection. Web and social media search and analytics. Issues in the design, construction, and use of Arabic LRs: text, speech, sign, gesture, image, in single or multimodal/multimedia data. Guidelines, standards, best practices, and models for LRs interoperability. Methodologies and tools for LRs construction and annotation. Methodologies and tools for extraction and acquisition of knowledge Guidelines, standards, best practices and models for LRs interoperability. Methodologies and tools for LRs construction and annotation. Methodologies and tools for extraction and acquisition of knowledge. Ontologies, terminology and knowledge representation. LRs and Semantic Web (including Linked Data, Knowledge Graphs, etc.). Submissions for both short and long papers will be made directly via START, following submission guidelines issued by LREC-COLING 2024. Paper submission instructions: https://lrec-coling-2024.org/authors-kit/ Paper submission: https://softconf.com/lrec-coling2024/osact2024/ For full submission details please refer to our workshop website here. Contact email: OSACT.W...(a)gmail.com The OSACT 2024 Organizing Committee Hend Al-Khalifa, King Saud University, KSA; Hamdy Mubarak, Qatar Computing Research Institute, Qatar; Kareem Darwish, aiXplain Inc., US; Tamer Elsayed, Qatar University, Qatar; Mona Ali, Northeastern University, Canada Looking forward to your participation and to seeing you in LERC-COLING in May 2024! ************************************************************************************

1 0

Deadline extended: BIR@ECIR2024 - 14th International Workshop on Bibliometric-enhanced Information Retrieval
by Ingo Frommholz 28 Jan '24

28 Jan '24

* Deadline extended to February 2, 2024 * You are invited to submit your contribution to the 14th international workshop on Bibliometric-enhanced Information Retrieval (BIR 2024), to be held as part of the 46th European Conference on Information Retrieval (ECIR 2024, https://www.ecir2024.org/) in Glasgow, Scotland. https://sites.google.com/view/bir-ws/bir-2024 The workshop is planned as an onsite event. We encourage all speakers to join us in Glasgow (UK). === Important Dates === All dates are in Anywhere on Earth – AoE Time Zone - Submissions: 2 February 2024 - Notifications: 19 February 2024 - Camera Ready Contributions: 3 March 2024 - Workshop: 24 March 2024 === tl;dr === The Bibliometric-enhanced Information Retrieval (BIR) workshop series at ECIR tackles issues related to academic search, at the intersection between Information Retrieval and Bibliometrics. BIR is a hot topic investigated by both academia and industry (e.g., Dimensions, Lens, Google Scholar, scite.ai, Semantic Scholar). The BIR workshop at ECIR is a full-day workshop. An overview of the BIR/BIRNDL workshop series can be found at: https://sites.google.com/view/bir-ws/home. Past BIR proceedings are available online at https://dblp.org/search?q=BIR.ECIR as open access. === Keywords === Academic Search • Information Retrieval • Digital Libraries • Bibliometrics • Scientometrics === Workshop Topics === During BIR 2024, we address, but are not limited to, the following current research topics regarding 4 aspects of the academic search and recommendation process: User needs and behaviour regarding scientific information, such as: Finding relevant papers/authors for a literature review. Identifying expert reviewers for a given submission. Understanding information-seeking behaviour and HCI in academic search. Filtering high-quality research papers, e.g., in preprint servers. Measuring the degree of plagiarism in a paper. Flagging predatory conferences and journals, or other forms of scientific misbehaviour. Mining the scientific literature, such as: Information extraction, text mining and parsing of scholarly literature. Natural language processing of scientific papers (e.g., citation contexts). Discourse modelling and argument mining. Academic search/recommendation systems, such as: Modelling the multifaceted nature of scientific information. Building test collections for reproducible BIR. System support for literature search and recommendation. Computational methods for systematic reviewing. Generative AI and Large Language Models with bibliometric-enhanced IR, such as: Retrieval-augmented LLMs for academic search and recommendation. LM-enhanced retrieval and recommendation in scholarly settings. Challenges with generative LLMs for scholarly texts and references. We especially invite descriptions of running projects and ongoing work as well as contributions from industry. Papers that investigate multiple themes directly are especially welcome. === Submission Details === All submissions must be written in English following the CEURART 1-column paper style (6 pages (short paper), 12 pages (full paper)/, please see below) and should be submitted as PDF files to EasyChair. All submissions will be reviewed by at least two independent reviewers. Please be aware of the fact that at least one author per paper needs to register for the workshop and attend the workshop to present the work. In case of no-show the paper (even if accepted) will be deleted from the proceedings AND from the program. CEURART (incl. LaTeX and Word templates) https://ceurws.wordpress.com/2020/03/31/ceurws-publishes-ceurart-paper-styl… Submission via EasyChair: https://easychair.org/conferences/?conf=bir2024 Page limits: Full paper: 12 pages excluding references Short paper: 6 pages excluding references Workshop proceedings will be deposited online in the CEUR workshop proceedings publication service (ISSN 1613-0073) - this way the proceedings will be permanently available and citable (digital persistent identifiers and long-term preservation). === Workshop Chairs === Ingo Frommholz, University of Wolverhampton, UK Philipp Mayr, GESIS - Leibniz Institute for the Social Sciences, Germany Guillaume Cabanac, University of Toulouse, France Suzan Verberne, Leiden University, the Netherlands For any enquiries please email bir2024(a)easychair.org. -- Ingo Frommholz (he/him), PhD, FBCS, FHEA Reader (~Associate Professor) in Data Science ACM CIKM 2023 General Chair Head of Data, AI, Interaction, Retrieval and Language Group http://dairel.org Deputy Head Digital Innovations and Solutions Centre (DISC) University of Wolverhampton, UK Adjunct Professor, Bern University of Applied Sciences, Switzerland Web: http://www.frommholz.org/ | Email: ifrommholz(a)acm.org Twitter: @iFromm | Mastodon: @ingo@idf.social PGP/GPG fingerprint: B74E A422 C7B2 A5BB 2BC2 523B 2790 216E F8F8 D166 http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x2790216EF8F8D166

1 0

Two PhD Studentships in Language and Speech Processing at Aston University, Birmingham, UK
by Tharindu Ranasinghe 28 Jan '24

28 Jan '24

School of Computer Science and Digital Technologies, Aston University, UK, is offering two PhD positions in language and speech processing in the following two topics. The application deadline is 16th February 2024. Applications for the position can be submitted via Aston's PGR webpage (https://www.aston.ac.uk/graduate-school/how-to-apply/studentships). Enquiries about the positions can be made to Dr Tharindu Ranasinghe, School of Computer Science and Digital Technologies, Aston University, UK - t.ranasinghe(a)aston.ac.uk . Building Trustworthy Automatic Speech Recognition Systems Dr Tharindu Ranasinghe<https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> (School of Computer Science and Digital Technologies - Applied AI & Robotics Department) Dr <https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> Phil Weber<https://research.aston.ac.uk/en/persons/phil-weber> (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department) Prof Aniko Ekart<https://research.aston.ac.uk/en/persons/aniko-ek%C3%A1rt> (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department) Dr Muhidin Mohamed<https://research.aston.ac.uk/en/persons/muhidin-mohamed> (College of Business and Social Sciences - Operations & Information Management) Project Summary, Aim and Objectives: Automatic Speech Recognition (ASR) has gained popularity in the last decade thanks to advancements in speech and natural language processing, along with the availability of powerful hardware for processing extensive data streams. ASR is crucial in transcription services for various sectors, including legal, healthcare, and entertainment. It also plays a vital role in e-learning platforms, customer support systems, and enhancing accessibility for individuals with disabilities. Additionally, ASR significantly contributes to language translation, making it widely adopted across diverse sectors. Although ASR has come a long way in recent years, it still has limitations, and the produced output is far from perfect. However, most commercial ASR systems do not explicitly state this to the user, leaving the user to assume that the output is accurate. Most large-scale ASR systems perform better for widely spoken languages, while low-resource languages have lower quality. ASR systems also struggle to handle different accents and dialects, especially of non-native speakers. Furthermore, most ASR systems are trained in the general domain and do not perform optimally in specific domains such as healthcare. These limitations result in wrong outputs, and the lack of transparency and accountability can lead to severe consequences, especially in critical domains such as healthcare or legal. Therefore, a quality indicator for ASR systems has become essential as they can play a significant role in informing the user about the output quality. This PhD research aims to develop a comprehensive quality indicator system for ASR. The specific goals are (1) Investigate what makes ASR trustworthy (2) Evaluate ASR systems in challenging scenarios (3) Design quality indicator metrics in ASR (i.e. sentence level scores, word level error spans, critical errors, etc.) (4) Introduce public benchmarks and investigate novel approaches for predicting quality in ASR. The output of the PhD will contribute towards trustworthy ASR systems.. Knowledge and skills required in applicant: Natural Language Processing, Speech Processing, Machine Learning and Deep Learning. The applicant should be familiar with Python and neural network framework(s) such as PyTorch and TensorFlow and should have excellent programming skills. Evidence-based detection of misuse of large language models Dr<https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> Phil Weber<https://research.aston.ac.uk/en/persons/phil-weber> (Aston Centre for Artificial Intelligence Research and Application – ACAIRA, School of Computer Science and Digital Technologies - Applied AI & Robotics Department) Dr Tharindu Ranasinghe<https://research.aston.ac.uk/en/persons/tharindu-ranasinghe> (School of Computer Science and Digital Technologies - Applied AI & Robotics Department) Dr Muhidin Mohamed<https://research.aston.ac.uk/en/persons/muhidin-mohamed> (College of Business and Social Sciences - Operations & Information Management) Dr Paul Grace<https://research.aston.ac.uk/en/persons/paul-grace> (Cyber Security Innovation Research Centre – CSI, School of Computer Science and Digital Technologies - School of Computer Science and Digital Technologies) Project Summary, Aim and Objectives: Large language models (LLMs) have become ubiquitous since the release of ChatGPT, bringing a paradigm shift in the processing and generation of text, images, speech and video. New methods for training very large neural models using massive unlabelled data created the opportunity for foundation models able to generate data with apparently human-like ability. Publicly available pre-trained models facilitate novel tools; Google Gemini, Microsoft Co-Pilot, Dall-E and many start-ups allow non-experts to conversationally instruct and use AI systems in everyday life, seamlessly employing complex technologies including automatic speech recognition, natural language processing, machine translation and image captioning. New dangers accompany this rapid and unstructured step-change in technology. Beyond unease over energy use, environmental impact, and digital divides, many are concerned with the ease with which fake media increasingly difficult to distinguish from real media can be created. In education, plagiarism detection becomes more nuanced with the need to identify AI-generated text. In the justice domain, forensic determination of the source of a voice or face is obfuscated by the potential that it was artificially generated. Politicians worry about the impact on democracy of undetectable deepfakes, and cybersecurity experts about identity theft. The problems are exacerbated by the potential for LLM-generated data to be reused for training downstream models. Scientifically well-founded methods for detecting and quantifying the risk of LLM-generated media are therefore urgently needed. This project builds on established methods in forensic data analysis to develop rigorous methods for detecting AI-generated media. Specifically: 1) review existing approaches to detecting AI-generated and spoofed media, 2) build on methods for forensic voice comparison to develop and validate new approaches to forensic text comparison, 3) apply to detecting plagiarism and deep fakes, 4) extend to image data, 4) propose principles to contribute to broader questions of safe, fair and transparent use of LLMs. Knowledge and skills required in applicant: Strong programming skills, preferably in Python, including development of large language models. Knowledge of machine learning theory, applications, and related statistical and probability theory. Awareness of modern approaches to forensic data science.

1 0

2026

2025

2024

2023

2022

Corpora January 2024