January 2024 - Corpora

Edge Hill Corpus Research Group, 11 January 2024
by Costas Gabrielatos 04 Jan '24

04 Jan '24

The next meeting of the Edge Hill Corpus Research Group will take place online (via Teams) on Thursday 11 January 2024, 2:00-3:00 pm (GMT). Attendance is free. You can register here: https://store.edgehill.ac.uk/conferences-and-events/conferences/events/edge… Registration closes on Wednesday 10 January, 12 noon (GMT) Topics: Corpus Methodology, Phraseology Speaker: Benet Vincent<https://www.coventry.ac.uk/life-on-campus/staff-directory/arts-and-humaniti…> (Coventry University, UK) Title: Methodological issues and challenges in the use of phrase-frames to investigate phraseology Abstract The importance of gaining a better understanding of phraseology has been recognised for some time now in the area of English for Academic Purposes (EAP). A widespread approach is to extract from a corpus frequently-occurring fixed strings (lexical bundles, or clusters) of potentially useful phrases/multi-word units (see e.g. Gilmore and Millar's 2018). A limitation of this sort of study is the focus on fixed continuous sequences when phrases are well-known to allow a degree of variation (see e.g. Gries, 2008). One proposal to address this limitation is the 'phrase frame' (p-frame), a fixed sequence of items occurring frequently in a corpus with one or two empty slots (Lu, Yoon & Kisselev, 2021). This approach allows researchers to retrieve the most frequent p-frames in a particular corpus, then identify which items typically fill these slots and what meanings / functions might be associated with them. The idea is that the results of such research can help us better understand how members of a specific discourse community typically express themselves, which in turn may inform EAP pedagogy (Lu, Yoon, & Kisselev, 2018). Our project aimed to use a p-frame approach to create a list of pedagogically useful phrases to help novice writers of RA introductions in Health Sciences. A number of studies have used a p-frame approach with similar aims though for different discipline areas, including Fuster-Márquez and Pennock-Speck (2015), Cunningham (2017) and Lu et al., (2018, 2021). However, analysis of these studies indicates that they lack consensus on a number of issues central to p-frame methodology, presenting a challenge for new work in this area. This presentation will provide an overview of the key issues in p-frame research which we have identified and show how we have addressed them. The main aim will be to underline the importance of ensuring that the methods applied by a p-frame study align with the aims of the project. References Cunningham, K. J. (2017). A phraseological exploration of recent mathematics research articles through key phrase frames. Journal of English for Academic Purposes, 25, 71. https://doi.org/10.1016/j.jeap.2016.11.005 Fuster-Márquez, M., & Pennock-Speck, B. (2015). Target frames in British hotel websites. International Journal of English Studies, 15(1), 51-69. https://doi.org/10.6018/ijes/2015/1/213231 Gilmore, A., & Millar, N. (2018). The language of civil engineering research articles: A corpus-based approach. English for Specific Purposes, 51, 1-17. https://doi.org/10.1016/j.esp.2018.02.002 Gries, S. (2008). Phraseology and linguistic theory. In Phraseology: An interdisciplinary perspective, S. Granger & F. Meunier (eds.), 3-26. Lu, X., Yoon, J., & Kisselev, O. (2018). A phrase-frame list for social science research article introductions. Journal of English for Academic Purposes, 36, 76-85. https://doi.org/10.1016/j.jeap.2018.09.004 Lu, X., Yoon, J., & Kisselev, O. (2021). Matching phrase-frames to rhetorical moves in social science research article introductions. English for Specific Purposes, 61, 63-83. https://doi.org/10.1016/j.esp.2020.10.001 ________________________________ Edge Hill University<http://ehu.ac.uk/home/emailfooter> Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter> University of the Year, Educate North 2021/21 ________________________________ This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>

1 0

Call for Papers with Publication Opportunities for Linguists
by THE 18TH NOOJ INTERNATIONAL CONFERENCE 2024 04 Jan '24

04 Jan '24

Dear linguists, We would like to give our warmest wishes to you! Happy 2024! In the spirit of new beginnings, we are thrilled to invite you again to apply for the Call for Papers for the 18th NooJ International Conference, taking place in Bergamo from June 4th to 7th, 2024. This conference is for linguists, scholars, and professionals to engage in thought-provoking discussions on a myriad of topics encompassing Natural Language Processing (NLP), Linguistic Resources, Digital Humanities, and Language in Society. Website: https://nooj2024.x-23.org/ Submission website: https://easychair.org/conferences/?conf=18nj Abstract submission deadline: 4th Feb 2024 We are currently calling papers on the following topics: 📚NLP Societal applications and citizen science: Typography, Spelling, Syllabification, Phonemic and Prosodic Transcription, Morphology, Lexical Analysis, Local Syntax, Structural Syntax, Transformational Analysis, Paraphrase Generation, Semantic Annotations, Semantic Analysis. 🗣️Linguistic Resources: Corpus Linguistics, Discourse Analysis, Sentiment analysis, Literature Studies, Second-Language Teaching, Narrative content analysis, Corpus processing for the Social Sciences. 🧠Digital Humanities: Business Intelligence, Text Mining, Text Generation. Language Teaching Software, Automatic Paraphrasing, Machine Translation, etc. 💻Natural Language Processing Applications: Computational Socio-Linguistic (migration, geography, tourism, political discourse, cinema, social media, gender studies…) Important dates! Abstract Submission: Feb 4 2024 Notification of accept: March 10 2024 Camera ready: March 24 2024 Early bird registrations: From March 11 to March 31st 2024 Deadline for the other registrations: April 15 2024 Selected papers submission: Sept 15 2024 Opportunity for publication A selection of the papers presented at the 18th NooJ International Conference 2024 will be published by Springer Verlag in their CCIS Series (Communication in Computer and Information Sciences). CCIS is abstracted/indexed in DBLP, Google Scholar, EI-Compendex, Mathematical Reviews, SCImago, Scopus. CCIS volumes are also submitted for the inclusion in ISI Proceedings. Deadline for submission of full camera-ready papers is September 15th, 2024. Please feel free to contact us in case of any question. Best, The 18th NooJ Conference Organisation Board __________________ THE 18TH NOOJ INTERNATIONAL CONFERENCE 2024 JUN 4th to 7th, 2024 — Bergamo, Italy Managed by The Nooj Association Powered and hosted by X23 Srl

1 0

CfP: Workshop on Advanced analysis and recognition of parliamentary corpora (ARPC)
by George Mikros 04 Jan '24

04 Jan '24

Apologies for cross-posting ======================= CALL FOR PAPERS Workshop on Advanced analysis and recognition of parliamentary corpora (ARPC) The ARPC organizing committee invites papers for the workshop to be held in physical format during the ICDAR 2024 conference (August 30 - September 4, 2024) in Athens, Greece (https://icdar2024.net/). The exact date of the ARPC workshop will be communicated soon. Workshop Context Data-driven insights from archives have the potential to steer academic research in a variety of fields. This workshop attempts to address the growing importance of employing advanced recognition and analytical methods and tools to decode the complexities within legislative and administrative documents of parliamentary origin. The workshop will deep dive into cutting-edge OCR techniques for parliamentary corpora. Further attention will be placed into recognizing patterns, extracting meaningful insights and understanding the intricate dimensions of contemporary and historical parliamentary discourse. The relevance of this topic lies in its potential to bridge previously isolated domains of research, fostering interdisciplinary collaboration. By connecting history, political science, and linguistics, participants will unlock a richer understanding of legislative evolution, political trends, and linguistic nuances embedded in parliamentary proceedings. A keynote presentation will open the workshop, followed by a couple of sessions dedicated to specific topics related to the analysis and recognition of parliamentary corpora. Each session will be concluded by a structured panel discussion. The organization of the ARPC workshop is supported by the Hellenic OCR Team. We encourage the authors to submit papers on the topics detailed below. Topics - The recognition of polytonic Greek fonts - Recognition of mixed text (printed and handwritten) - Parliamentary discourse analysis - Historical trends in parliamentary language use - Integration of linguistic and political science methodologies in OCR - Cross-lingual OCR challenges in parliamentary texts - Machine learning approaches for semantic analysis of parliamentary proceedings - Ethical considerations in the digitization and analysis of parliamentary records - Developing standardized formats for parliamentary data preservation - The role of OCR technology in enhancing public access to parliamentary archives - Comparative analysis of parliamentary rhetoric across different eras - The impact of digital humanities tools on legislative studies - Application of Natural Language Processing techniques in political discourse analysis - Automated categorization and indexing of parliamentary documents - Challenges and solutions in digitizing non-standard parliamentary texts. Paper Tracks There is both a standard conference paper track and a journal track at ICDAR 2024; details regarding the journal track may be found in a separate Call for Papers on the conference website, https://icdar2024.net/. ICDAR 2024 will follow a double blind review process. Authors should not include their names and affiliations anywhere in the manuscript. Authors should also ensure that their identity is not revealed indirectly by citing their previous work in the third person and omit acknowledgements until the camera-ready version. Important Dates 1 February 2024 - Paper submission deadline 15 April 2024 - Paper acceptance notification 30 April 2024 - Camera-ready paper Date TBC - Workshop Submission Guidelines & Enquiries All proposals should be submitted electronically via an easychair online submission form: https://easychair.org/conferences/?conf=icdar2024 . Enquiries should be sent to pc-chairs(a)icdar2024.net <mailto:pc-chairs@icdar2024.net> . The submitted papers will respect the same policy and conditions of ICDAR 2023 conference papers. Papers should be formatted according to the instructions and style files provided by Springer. Papers accepted for the conference will be allocated up to 15 pages (usually not counting references) in the proceedings. Submissions are expected to be in the range of 10-15 pages. Each accepted paper requires at least one author to perform a full registration. The registration fee for only workshop participants will be discounted. Publisher ICDAR 2024 proceedings will be published under the Springer Lecture Notes in Computer Science (LNCS) series. This provides the proceedings of the conference and the workshops with an excellent online accessibility, including free access to SpringerLink via links on the conference website during one year after the publication and free access for everyone in SpringerLink four years after the publication. Organizing Committee Dr. Fotios Fitsilis (Scientific Service, Hellenic Parliament) Email: fitsilisf(a)parliament.gr <mailto:fitsilisf@parliament.gr> Prof. George Mikros (College of Humanities and Social Sciences, Hamad Bin Khalifa University), Email: gmikros(a)hbku.edu.qa <mailto:gmikros@hbku.edu.qa>

1 0

[2nd call] SIGTYP 2024 Shared Task on Word Embedding Evaluation for Ancient and Historical Languages
by Oksana Dereza 04 Jan '24

04 Jan '24

Dear colleagues, [apologies for cross-posting] We would like to remind you that this year SIGTYP is hosting a Shared Task on Word Embedding Evaluation for Ancient and Historical Language: https://github.com/sigtyp/ST2024/ Test data has been released, and CodaLab competitions are up and running, so we encourage you to register if you still haven't! There is still a week before the deadline. :) *Summary* In recent years, sets of downstream tasks called benchmarks have become a very popular, if not default, method to evaluate general-purpose word and sentence embeddings. Starting with decaNLP (McCann et al., 2018) and SentEval (Conneau & Kiela, 2018), multitask benchmarks for NLU keep appearing and improving every year. However, even the largest multilingual benchmarks, such as XGLUE, XTREME, XTREME-R or XTREME-UP (Hu et al., 2020; Liang et al., 2020; Ruder et al., 2021, 2023), only include modern languages. When it comes to ancient and historical languages, scholars mostly adapt/translate intrinsic evaluation datasets from modern languages or create their own diagnostic tests. We argue that there is a need for a universal evaluation benchmark for embeddings learned from ancient and historical language data and view this shared task as a proving ground for it. The shared task involves solving the following problems for 12+ ancient and historical languages that belong to 4 language families and use 6 different scripts. Participants will be invited to describe their system in a paper for the SIGTYP workshop proceedings. The task organizers will write an overview paper that describes the task and summarizes the different approaches taken, and analyzes their results. *Subtasks* For subtask A, participants are not allowed to use any additional data; however, they can reduce and balance provided training datasets if they see fit. For subtask B, participants are allowed to use any additional data in any language, including pre-trained embeddings and LLMs. A. Constrained 1. POS-tagging 2. Full morphological annotation 3. Lemmatisation B. Unconstrained 1. POS-tagging 2. Detailed morphological annotation 3. Lemmatisation 4. Filling the gaps - Word-level - Character-level *Important links* - *Registration form* <https://docs.google.com/forms/d/e/1FAIpQLSdINgMfzzZGIZ-uBVQhvyndB6yeaaj-wT7…> - Detailed description, incl. submission format: https://github.com/ sigtyp/ST2024 <https://github.com/sigtyp/ST2024> - Constrained subtask on CodaLab: https://codalab.lisn.upsaclay.fr/competitions/16822 - Unconstrained subtask on CodaLab: https://codalab.lisn.upsaclay.fr/competitions/16818 *Important dates* *05 Nov 2023*: Release of training and validation data *02 Jan 2024*: Release of test data - * 09 Jan 2024:* Submission of results for Phase 1 of the Constrained Subtask - * 12 Jan 2024:* Submission of results for Phase 2 of the Constrained Subtask and for the Unconstrained Subtask *13 Jan 2024*: Notification of results *20 Jan 2024*: Submission of shared task papers *27 Jan 2024*: Notification of acceptance to authors *03 Feb 2024*: Camera-ready *15 Mar 2024*: Video recordings due *21/22 Mar 2024*: SIGTYP workshop Kind regards, Oksana and the organisers' team -- [image: https://nuig.insight-centre.org/] <https://www.insight-centre.org/> Oksana Dereza | PhD student on the Cardamom <http://cardamom.insight-centre.org/> project | Unit for Linguistic Data | Insight Centre for Data Analytics | Data Science Institute | University of Galway Oksana Dereza | Iarrthóir PhD ar thionscadal Cardamom <http://cardamom.insight-centre.org/> | An tAonad um Shonraí Teangeolaíocha | Insight, Ionad na hAnailísíochta Sonraí | Institiúid Eolaíochta Sonraí | Ollscoil na Gaillimhe

1 0

Postdoc positions at the Alan Turing Institute (Deadline: 07/01/2024)
by Pranava Madhyastha 03 Jan '24

03 Jan '24

Dear all, We are hiring for the following two postdoctoral positions at the Alan Turing Institute both focussed on probabilistic program scaffolds for large language models. This is a collaborative project lead by Dr. Pranava Madhyastha from City, University of London along with Prof. Alessandra Russo from Imperial College London and Prof. Anthony Cohn from the University of Leeds. Opportunity 1: LLM Inference Expert The first position requires experience with controlling inference in LLMs and transformer-based sequence-to-sequence models. More details and application link can be found here: https://cezanneondemand.intervieweb.it/turing/jobs/senior-research-associat… . Opportunity 2: Probabilistic Programming Specialist The second position requires a solid background in probabilistic programming, logic programming or symbolic models for artificial intelligence (more details and application link can be found here: https://cezanneondemand.intervieweb.it/turing/jobs/research-associate-proba… ) As a postdoctoral researcher at the Alan Turing Institute, you will be part of a vibrant and collaborative research environment, surrounded by renowned experts and cutting-edge technologies. This position provides an excellent platform to advance your career and make lasting contributions to the field of artificial intelligence. For any questions, get in touch with me (over pranava.madhyastha(a)city.ac.uk ). Kind regards, Pranava

1 1

CAiSE'24 Forum: Second Call for Papers and Tool Demonstrations
by Announce 03 Jan '24

03 Jan '24

*** CAiSE'24 Forum: Second Call for Papers and Tool Demonstrations *** 36th International Conference on Advanced Information Systems Engineering (CAiSE'24) June 3-7, 2024, 5* St. Raphael Resort and Marina, Limassol, Cyprus https://cyprusconferences.org/caise2024/ (*** Submission Deadline: 4th March, 2024 AoE ***) The CAiSE Forum is a space within the CAiSE conference to present and discuss the new exciting ideas and tools related to Information Systems Engineering. The Forum intends to serve as an interactive platform, encourage potential authors to present emerging topics and controversial positions, and demonstrate innovative systems, tools, and applications. The Forum sessions at the CAiSE conference will facilitate the interaction, discussion, and exchange of ideas among presenters and participants. Contributions to the CAiSE'24 Forum are welcome to address any of the CAiSE'24 conference topics and, particularly, this year's theme—Information Systems in the Age of Artificial Intelligence. We invite two types of submissions: • Visionary papers present innovative research projects, which are still at a relatively early stage and do not necessarily include a full-scale validation. Visionary papers will be presented as posters in the Forum. • Demo papers describe innovative tools and prototypes that implement the results of research efforts. The tools and prototypes will be presented as demos in the Forum, accompanied by a poster. Both visionary papers and demo papers must not exceed 8 pages in LNCS format. See authors' guidelines at the Springer site: https://www.springer.com/gp/computer-science/lncs/conference-proceedings-gu… . Papers should be submitted in PDF format through the conference management system available at Easy Chair (https://easychair.org/my/conference?conf=caise2024) and select the Forum option. The submitted papers must be unpublished and must not be under review elsewhere. PUBLICATION AND PRESENTATIONS Accepted papers will be published by Springer in a CAISE Forum proceedings volume within the Lecture Notes in Business Information Processing (LNBIP) series (https://www.springer.com/series/7911). Authors should consult Springer's authors guidelines and use their LaTeX or Word proceedings templates for the preparation of their papers. Springer encourages authors to include their ORCIDs in their papers. In addition, the corresponding author of each paper, acting on behalf of all of the authors of that paper, must complete and sign a Consent-to-Publish form. The corresponding author signing the copyright form should match the corresponding author marked on the paper. Once the files have been sent to Springer, changes relating to the authorship of the papers cannot be made. It is expected that at least one of the authors attends CAiSE'24, presents the poster/delivers the demo, and interacts with the Forum participants. We also envision a short oral presentation for all papers to attract participants to the posters. IMPORTANT DATES • Paper Submission Deadline: 4th March, 2024 (AoE) • Notification of Acceptance: 1st April, 2024 • Camera-ready Deadline: 8th April, 2024 • Author Registration Deadline: 8th April, 2024 FORUM CHAIRS • Shareeful Islam, Anglia Ruskin University, United Kingdom • Arnon Sturm, Ben-Gurion University of the Negev, Israel FORUM COMMITTEE • Steven Alter, University of San Francisco • Abel Armas Cervantes, The University of Melbourne • Giuseppe Berio, Université de Bretagne Sud and IRISA UMR 6074 • Drazen Brdjanin, University of Banja Luka • Corentin Burnay, University of Namur • Cinzia Cappiello, Politecnico di Milano • Suphamit Chittayasothorn, King Mongkut's Institute of Technology Ladkrabang • Maya Daneva, University of Twente • Sergio de Cesare, University of Westminster • Johannes De Smedt, KU Leuven • Marne de Vries, University of Pretoria • Michael Fellmann, University of Rostock • Christophe Feltus, Luxembourg Institute of Science and Technology • Hans-Georg Fill, University of Fribourg • Janis Grabis, Riga Technical University • Sergio Guerreiro, INESC-ID / Instituto Superior Técnico • Martin Henkel, Stockholm University • Jennifer Horkoff, Chalmers University of Technology • Shareeful Islam, Anglia Ruskin University • Janis Kampars, RTU • Evangelia Kavakli, University of the Aegean • Marite Kirikova, Riga Technical University • Janne J. Korhonen, Aalto University • Elena Kornyshova, CNAM • Agnes Koschmider, University of Bayreuth • Chung Lawrence, University of Texas at Dallas • Henrik Leopold, Kühne Logistics University • Tong Li, Beijing University of Technology • Beatriz Marín, Universidad Politecnica de Valencia • Andrea Marrella, Sapienza University of Rome • Raimundas Matulevicius, University of Tartu • Jose Ignacio Panach Navarrete, Universitat de València • Oscar Pastor, Universidad Politécnica de Valencia • Francisca Pérez, Universidad San Jorge • Pierluigi Plebani, Politecnico di Milano • Manuel Resinas, University of Seville • Genaina Rodrigues, University of Brasilia • Ben Roelens , Open Universiteit, Ghent University • Mattia Salnitri, Politecnico di Milano • Stefan Strecker, University of Hagen • Arnon Sturm, Ben-Gurion University of the Negev • Irene Vanderfeesten, Katholieke Universiteit Leuven • Yves Wautelet, Katholieke Universiteit Leuven • Hans Weigand, Tilburg University • Manuel Wimmer, Johannes Kepler University Linz • Anna Zamansky, University of Haifa

1 0

1st Workshop on NLP for Indigenous Languages of Lusophone Countries (ILLC-NLP 2024) -- 2nd CFP
by Aline Paes 02 Jan '24

02 Jan '24

Apologies for cross-posting. --------------------------------------------------------------------------- *1st Workshop on NLP for Indigenous Languages of Lusophone Countries (ILLC-NLP 2024) -- 2nd CFP* January 10, 2024: Papers submission due January 25, 2024: Notification of Acceptance March 12, 2024: Workshop Workshop website: https://sites.google.com/view/illc-nlp-2024/home <https://sites.google.com/view/illc-nlp-2024/home> Co-located with PROPOR 2024 <https://propor2024.citius.gal/> in Santiago de Compostela —————————————————————————————————— *Overview and goals:* The workshop aims to explore, discuss, and enhance the development of resources, methods, and applications of NLP for indigenous languages, especially those spoken or that have influenced languages spoken in countries where Portuguese is currently the official language. We hope to contribute to the preservation and promotion of these languages. This is one of the several initiatives aiming at expanding knowledge and research in NLP for underrepresented languages. We encourage the participation of everyone who shares an interest in preserving and enriching the linguistic and cultural heritage of indigenous languages in a broad sense. This way, we welcome the submission of works including languages from all Portuguese-speaking nations, like those of African origin in Angola, Mozambique, and the Atlantic islands, as well as minority languages in Portugal. *Submissions*: IILC-NLP seeks submissions under the following categories: - Full papers: 8 pages+unlimited reference - Short papers (work in progress, innovative ideas/proposals, research ideas): 4 pages+unlimited reference - Submissions should be written in English. At submission time, papers must be in PDF format only. For the final versions, authors of accepted papers will be given one extra content page to consider the reviews. Authors of accepted papers will be requested to send the source files to produce the proceedings. All submitted papers must conform to the official ACL style guidelines (Latex <https://github.com/acl-org/acl-style-files/tree/master/latex> or Word <https://github.com/acl-org/acl-style-files/tree/master/word>). Both long and short papers will be published in the ACL Anthology. Submission site: https://easychair.org/conferences/?conf=illcnlp2024 Reviewing format: At least two reviewers will evaluate each submission. The reviewing format will be single-blind. Please help us spread the word about this event by sharing this call with your contacts and institutions. Your participation and support are crucial for the success of this workshop. Sincerely, Aline Paes, Aline Villavicencio, Claudio Pinhanez, Edward Gow-Smith, Paulo Rodrigo Cavalin (Workshop organisers) ------------------------------------------------------------------------------------------------- *Profa. Dra. Aline Paes (she/her)* *Associate professor - Computer Science (Artificial Intelligence)* Institute of Computing / Universidade Federal Fluminense (IC/UFF) Member of CE-PLN <https://sites.google.com/view/ce-pln/inicio> and BPLN <https://brasileiraspln.com/> CNPq PQ-E and FAPERJ JCNE __________________________________________________________ url: www.ic.uff.br/~alinepaes Av Gal Milton Tavares de Souza, S/N, Computing Building, Office 504 São Domingos, Niterói, RJ, Brazil. ZIP 24210-346 ------------------------------------------------------------------------------------------------- ****Please do not feel any pressure to respond out of your own regular working hours. Remember that this is supposed to be an asynchronous tool***

1 0

CFP: FinNLP-KDF workshop@COLING 2024
by Zhiqiang Ma 02 Jan '24

02 Jan '24

*FinNLP-KDF-2024: Joint Workshop of the 7th Financial Technology and Natural Language Processing (FinNLP) and the 5th Knowledge Discovery from Unstructured Data in Financial Services (KDF)* LREC-COLING-2024 Torina, Italy, May 20, 2024 *Conference website: * https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-kdf-2024/home *Submission deadline: *March 1, 2024 *Introduction* FinNLP has emerged since 2019 as one of the pivotal workshops dedicated to harnessing NLP for financial technology applications. By collocating with representative conferences like IJCAI, EMNLP, and IJCNLP-AACL, it has bridged the AI and NLP communities. Its proceedings can be accessed on the ACL Anthology. On the other hand, KDF, initiated in 2020 as a workshop at AAAI, concentrates on multimodal knowledge discovery for financial services. It's particularly renowned for its keynote speaker series, inviting both academic and industry researchers to shed light on the latest topics. Recognizing the consistent efforts and contributions of both workshops over the years, we believe it's an opportune moment to convene their audiences, reflecting on achievements of the past five years and envisaging the roadmap for the next half-decade. The adoption of artificial intelligence (AI) and machine learning (ML) in financial technology has been extensive. A significant observation is the diminishing barriers between diverse data modalities and also between distinct model architectures of different tasks. This progress emerges particularly after the inception of the Transformer model and potential of large language models (LLMs) as foundation models or generic resolvers of variety of NLP tasks. For instance, financial question-answering diverged from conventional machine reading comprehension tasks in NLP, such as SQuAD. This divergence was mainly attributed to the integration of knowledge from tabular data and image data. Given these advancements, we plan to expand the purview of both workshops. We are confident that such a merger will generate unprecedented synergy. *List of Topics* We invite submissions of original contributions on methods, theories, applications, and systems on artificial intelligence, machine learning, natural language processing & understanding, big data, statistical learning, data analytics, and deep learning, with a focus on knowledge discovery in the financial services domain. The scope of the workshop includes, but is not limited to, the following areas: - Representation learning, and distributed representation learning and encoding in natural language processing for financial document - Language modeling on financial corpora including tabular and numerical data, and multi-modal modeling; large language models (LLMs) and applications for finance - Graph representation learning, mining learning on graph structures from financial data - Multi-source knowledge integration and fusion, and knowledge alignment and integration from heterogeneous data - Synthetic or genuine financial datasets and benchmarks for baseline models - Transfer learning applications for financial data, knowledge distillation as a method for compression of pre-trained models or adaptation to financial datasets - Search and question answering systems designed for financial corpora - Event discovery from alternative data and impact on organization equity price - Environmental, social, governance (ESG) event discovery, evaluation, and impact assessment *Submission Guidelines* - Submission Deadline: March 1st, 2024 (AoE) - Submission System: https://softconf.com/lrec-coling2024/finnlp-kdf2024/ - Long Paper: May consist of up to 8 pages of content, plus unlimited pages for references and appendix. - Short Paper and Demo Paper: May consist of up to 4 pages of content, plus unlimited references and appendix. Accepted papers proceedings will be published at ACL Anthology <https://www.aclweb.org/anthology/>. *Organizing Committees* Chung-Chi Chen, Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Japan Xiaomo Liu, JP Morgan AI Research, US Armineh Nourbakhsh, JP Morgan AI Research, US Zhiqiang Ma, JP Morgan AI Research, US Charese Smiley, JP Morgan AI Research, US Manling Li, University of Illinois Urbana-Champaign, US Mohammad Ghassemi, Michigan State University, US Hen-Hsen Huang, Institute of Information Science, Academia Sinica, Taiwan Hiroya Takamura, Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Japan Hsin-Hsi Chen, Department of Computer Science and Information Engineering, National Taiwan University, Taiwan *Venue* In conjunction with LREC-COLING-2024 <https://lrec-coling-2024.org/>, May 20-25, 2024, Torina, Italia *Contact* For general inquiries about the workshop, please write to finnlp(a)nlg.csie.ntu.edu.tw. -- ============================== Zhiqiang Ma ZQMASTER(a)GMAIL.COM

1 0

2nd CFP - Third Workshop on Digital Humanities and Natural Language Processing
by Leonardo Zilio 02 Jan '24

02 Jan '24

* We apologize if you receive multiple copies of this CFP * For the online version of this Call, visit: https://easychair.org/cfp/3rdDHandNLP =============== 3rd DHandNLP Third Workshop on Digital Humanities and Natural Language Processing Co-located with PROPOR 2024 12-15 March 2024, Universidade de Santiago de Compostela, Galicia, Spain *Website:* https://sites.google.com/view/dhandnlp-propor *Submission deadline:* 20 January 2024 (23:59 GMT) *Submission link: *https://easychair.org/conferences/?conf=3rddhandnlp *3rd DHandNLP is a one-day workshop on 12 March 2024* *Workshop description* Digital humanities (DH) stand at the intersection of computing and the humanities, involving collaborative transdisciplinary research. While current DH practice already shows an impressive array of new digital tools and methods for the study of the humanities, we believe that natural language processing techniques and experience can significantly enhance the field, while DH can also bring new testbeds and problems for the NLP community. As shown in the previous workshops, there is an increasing set of researchers in the processing of Portuguese who are interested in this active collaboration, and we believe that we should cater for a forum which may join the two communities, DH and NLP, showcasing several different aspects allowed by this cross-fertilization. The 3rdDHandNLP welcomes papers stemming from humanities that deal with language, such as philosophy, history, geography, law, philology, linguistics, or literature, and that can benefit from a digital approach or enhanced with computational linguistics methods or techniques, be it by using large sets of (written or spoken) textual data or by developing applications for an increasingly digital world. We also welcome papers that use “traditional” DH tools or techniques, such as topic modeling, and papers that use standard NLP tools that were already applied in different DH contexts, such as named entity recognition, document clustering and classification, sentiment analysis, dialect/language identification and linked data. *Main workshop topics* - Digital philology, critical editions production and textual criticism - Lexicometrics, lexicology and lexicography - Visualization or sonification of large textual bodies in specific domains - Computational stylometry, authorship attribution and profiling - Distant reading of literature - Construction of historical thesauri Finally, we are especially interested in approaches that deal with historical material, involving not only historical linguistics but historical lexicology, corpus processing and their multilingual analysis. *SUBMISSION GUIDELINES* All papers must be anonymous, original and not simultaneously submitted to another journal or conference. They must strictly adhere to the submission templates of the main conference. We welcome submissions of: - Short papers, consisting of up to 4 pages of content, plus unlimited pages of references - Full papers, consisting of up to 8 pages of content, plus unlimited pages of references Kind regards, Maria José B. Finatto and Leonardo Zilio (on behalf of the organising committee)

1 0

2026

2025

2024

2023

2022

Corpora January 2024