October 2023 - Corpora

[CFP] for SIGIR Forum - December 2023 Edition (Deadline - Nov 13)
by Tirthankar Ghosal 26 Oct '23

26 Oct '23

Dear Colleague, We invite you to submit your contribution to the upcoming December 2023 Edition of the SIGIR Forum, the official newsletter of the ACM Special Interest Group on Information Retrieval (SIGIR). The SIGIR Forum consists of two issues (June, December). It serves as a medium for disseminating general information and opinions on matters of interest to the IR community, conference and workshop reports, papers and book reviews, and Ph.D. dissertation abstracts. *** Call for Contributions for the December 2023 issue *** We invite contributions to the following categories, including: - Reports of IR-related conferences and workshops: Reports from the chairpersons of IR-related workshops (such as the satellite workshops of SIGIR, JCDL, or CIKM, or other workshops such as NTCIR, INEX) or IR-related conferences other than SIGIR (such as ECIR, HLT, CHIIR, SPIRE, or TREC); - Papers from IR-related invited talks which are not published in full in the relevant conference proceedings; - Papers describing new public infrastructures for IR research, such as in-depth descriptions of newly available test collections, newly available open-source or public domain IR software of particular relevance, new evaluation campaigns, etc.; - Papers about funding initiatives, industry trends, connections between research and industry, legal issues that are of potential interest to the IR community at large; - Any paper that, while of general interest to the IR community, is non-technical, and because of this would be unsuitable for publication in technical publishing forums such as the SIGIR Annual Conference; - Book reviews, bibliographies of general interest to the IR community; - Abstracts of recently published Ph.D. theses of interest to the general IR community. Note: Unless specifically stated, contents of the SIGIR Forum do not represent the official position of SIGIR or ACM. Contributions to the Forum are unrefereed papers unless otherwise indicated. The editorial board may desk-reject papers if they are out of scope. From June 2020 onwards, the SIGIR Forum newsletter is continuing only online. *** Important dates for the June 2023 Edition *** *- 13 November 2023: Deadline for contributions* - December 2023: Online publication *** Submission Instructions *** Kindly see http://sigir.org/forum/ for details on previous issues, template, and submission instructions and checklist. For inquiries about contributions, please contact the editors at editors_SIGIR(a)acm.org. Tirthankar Ghosal (Oak Ridge National Laboratory, US) Josiane Mothe (IRIT, Univ. de Toulouse) Julián Urbano (Delft University of Technology) -- +++++++++++++++++++++++++++++++++++ *Tirthankar Ghosal* Scientist National Center for Computational Sciences (NCCS) Oak Ridge National Laboratory, United States ++++++++++++++++++++++++++++++++++++

1 0

Edge Hill Corpus Research Group, Thursday 9 November 2023
by Costas Gabrielatos 26 Oct '23

26 Oct '23

The next meeting of the Edge Hill Corpus Research Group will take place online (via MS Teams) on Thursday 9 November 2023, 2-3 pm (UK time). Topics: Discourse-Oriented Corpus Studies, Immigration Speaker: Katia Adimora (Edge Hill University, UK) Title: Towards more positive portrayals of Mexican immigration/immigrants in the American and Mexican press Abstract: Various studies (e.g., Galindo Gómez, 2019; Taylor, 2009; Gabrielatos and Baker, 2008) have explored press attitudes towards immigration/ immigrants in different countries. To analyse the attitudes towards Mexican immigration/immigrants in the American and Mexican press, two specialised corpora of 30 million words were created. The American corpus includes more than 12,000 articles from six American newspapers: The New York Times, The Washington Post, USA Today, Los Angeles Times, The Arizona Republic and Chicago Tribune. The corpus articles were published between 16 June 2015, which marked the start of Trump's presidential campaign, and 20 January 2021, the date of Biden's presidential inauguration. The Mexican corpus includes more than 20,000 articles from six Mexican newspapers, published during Trump's era: El Universal, Elimparcial.com, Reforma, El Norte, Lacronica.com and Mural. Even though the negative discourse prosodies seem to dominate newspaper discourses, this study argues that the attitudes towards Mexican immigration/immigrants in American and, especially, in Mexican newspapers are not as negative as expected. The results show that two-third (66%) of the instances in American corpus newspapers and more than three quarters (78%) of the instances in Mexican corpus newspapers express a positive perspective. However, among the most frequent negative attitudes in American and Mexican corpus newspapers is the description of immigrants as criminals (20% and 18%). The diachronic frequency analysis of the attitudes towards 'immigration' and 'immigrant(s)' shows correlations between socio-political events and press discourses, which might contribute to public opinion about Mexican immigration/immigrants. For instance, Trump's family separation policy might have ignited empathy towards immigrants in the corpus newspapers. You can register here: https://store.edgehill.ac.uk/conferences-and-events/conferences/events/edge… The EHU CRG programme for 2023-24 is here: https://sites.edgehill.ac.uk/crg/next ________________________________ Edge Hill University<http://ehu.ac.uk/home/emailfooter> Modern University of the Year, The Times and Sunday Times Good University Guide 2022<http://ehu.ac.uk/tef/emailfooter> University of the Year, Educate North 2021/21 ________________________________ This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. Any views or opinions presented are solely those of the author and do not necessarily represent those of Edge Hill or associated companies. Edge Hill University may monitor email traffic data and also the content of email for the purposes of security and business communications during staff absence.<http://ehu.ac.uk/itspolicies/emailfooter>

1 0

Webminar by Emily M. Bender (University of Washington)
by HiTZ zentroa 26 Oct '23

26 Oct '23

**** We apologize for the multiple copies of this email. In case you are already registered to the next webinar, you do not need to register again. **** Dear coleague, We are happy to announce the next webinar in the Language Technology webinar series organized by the HiTZ research center (Basque Center for Language Technology, http://hitz.eus). You can check the videos of previous webinars and the schedule for upcoming webinars here: http://www.hitz.eus/webinars Next webinar: * *Speaker*: Emily M. Bender (University of Washington) * *Title*: Meaning making with artificial interlocutors and risks of language technology * *Date*: Nov 2, 2023, 16:00 CET * *Summary*: Humans make sense of language in context, bringing to bear their own understanding of the world including their model of their interlocutor's understanding of the world. In this talk, I will explore various potential risks that arise when we as humans bring this sense-making capacity to interactions with artificial interlocutors. That is, I will ask what happens in conversations where one party has no (or extremely limited) access to meaning and all of the interpretative work rests with the other, and briefly explore what this entails for the design of language technology. * *Bio*: Emily M. Bender is a Professor of Linguistics and an Adjunct Professor in the School of Computer Science and the Information School at the University of Washington, where she has been on the faculty since 2003. Her research interests include multilingual grammar engineering, computational semantics, and the societal impacts of language technology. In 2022 she was elected as a Fellow of the American Association for the Advancement of Science (AAAS). * *Upcoming webinars: * Heng Ji (February 1, 2024) * Smaranda Muresan (March 7, 2024) * Ralf Schlüter (May 2, 2024) * Marco Baroni (June 6, 2024) Check past and upcoming webinars at the following url: http://www.hitz.eus/webinars If you are interested in participating, please complete this registration form: http://www.hitz.eus/webinar_izenematea If you cannot attend this seminar, but you want to be informed of the following HiTZ webinars, please complete this registration form instead: http://www.hitz.eus/webinar_info Best wishes, HiTZ Zentroa

1 0

SPECIAL INTEREST GROUP: NLP & LLM SECURITY
by Leon Derczynski 26 Oct '23

26 Oct '23

https://sig.llmsecurity.net/ We're proud to announce a new research special interest group, SIGSEC, to cover work on LLM and NLP security. SIGSEC is part of the Association for Computational Linguistics (www.aclweb.org). We host regular talks on NLP & LLM Security, a mailing list for people interested in NLP & LLM security, and an annual research workshop. The ACL Special Group on NLP Security exists to: * provide infrastructure and community for those many ACL members working in NLP Security; * establish a serious research body that represents NLP and ACL interests in the burgeoning field of LLM and NLP security; and * bridge the Information Security and Computational Linguistics communities, which is a link already actively being pursued by the Information Security community. Membership is free, and there's an exciting talks series. The video links are posted on https://sig.llmsecurity.net/talks/. We start with: * Thursday November 2nd, 10.00 ET / 15.00 CET - Text Embeddings Reveal (Almost) As Much As Text - John X. Morris * Thursday November 9th, 11.00 ET / 17.00 CET - LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games - Sahar Abdelnabi * Thursday November 23rd, 11.00 ET / 17.00 CET - Privacy Side Channels in Machine Learning Systems - Edoardo Debenedetti All talks present cutting-edge research on LLM security vulnerabilities and assessment methods. Join us here! https://sig.llmsecurity.net/join/ We look forward to welcoming you. SIGSEC President: Leon Derczynski, ITU Copenhagen / NVIDIA Corp SIGSEC Secretary: Muhao Chen, University of Southern California SIGSEC Expert Advisor: Jekaterina Novikova, AI Risk and Vulnerability Alliance / Cambridge Cognition

1 0

DEADLINE EXTENSION RATIO-24: 1st Intern. Conference on Robust Argumentation Machines, June 5th-7th, 2024, Bielefeld, Germany
by Anette Frank 26 Oct '23

26 Oct '23

*Final CFP (EXTENDED DEADLINE) for RATIO-24: The 1st International Conference on Robust Argumentation Machines (RATIO-24) will take place from June 5th-7th, 2024, in Bielefeld, Germany. * https://ratio-conference.net <https://ratio-conference.net> In recent years, we have witnessed significant advances in our ability to develop approaches that support the automated analysis, summarization, aggregation, retrieval and ranking of arguments exchanged “in the wild” at large scale. By "in the wild" we mean arguments exchanged on the web in debate portals or other online formats where users share opinions and viewpoints on topics relevant to them. Argument analysis methods have indeed reached a level of maturity and robustness that make them applicable to the analysis of real online debates, to find the main arguments exchanged, to summarize and group arguments, or even to automatically generate arguments to present different viewpoints and perspectives. We call for submissions of original research work on the following topics: * automatic semantic analysis of arguments, including tasks such as stance detection, keypoint identification, attack/support classification, etc. * analysis of arguments in discourse and dialogue * automatic synthesis and generation of arguments * summarization of arguments * argument retrieval * methods for predicting argument quality * ranking of arguments according to, e.g., quality * methods for rephrasing and repurposing arguments * inferring the frame, viewpoint or perspective of an argument * common sense knowledge in the automated analysis of arguments * scalable reasoning methods for arguments * applications of argument analysis in domains such as political discourse, law, science, education, finance, social sciences, etc. Papers will be peer-reviewed and published by Springer in the LNCS series. Two types of papers will be accepted: * Long Papers(up to 15 pages including references): Description of substantive and original research. * Short Papers(up to 8 pages including references): Description of work in progress or original research contribution of limited scope. Papers should be submitted via Easychair: https://easychair.org/conferences/?conf=ratio24 <https://easychair.org/conferences/?conf=ratio24> Important dates (NOTE THE EXTENDED DEADLINES) Abstract submission deadline: *November 10th, 2023* Full paper submission deadline: *November 24th 2023* Notification of Acceptance: *February 2nd, 2024* Camera-ready version: *March 1st, 2024* Conference Chairs: Philipp Cimiano (CITEC, Bielefeld University) Anette Frank (University of Heidelberg) Michael Kohlhase (University of Erlangen-Nürnberg) Benno Stein (Bauhaus University Weimar) Jürgen Ziegler (University Duisburg - Essen) Invited Speakers: Elena Cabrio (Université Côte d’Azur, Inria <http://www.unice.fr/>) Yufang Hou (IBM Research Europe) Henning Wachsmuth (Institute for Artificial Intelligence, Leibniz University of Hannover) Venue: The conference will be held in Bielefeld, Germany at the Cognitive Interaction Technology Center (CITEC). All questions about submissions should be emailed to Philipp Cimiano: cimiano(a)cit-ec.uni.bielefeld.de -- Prof. Dr. Anette Frankhttp://www.cl.uni-heidelberg.de/~frank Computational Linguistics Department email:frank@cl.uni-heidelberg.de University of Heidelberg phone: +49-(0)6221/54-3247 Im Neuenheimer Feld 325 secr: +49-(0)6221/54-3245 69120 Heidelberg, Germany fax: +49-(0)6221/54-3242

1 0

1st Call for abstracts - COST Action CA21167 UniDive
by johanna monti 26 Oct '23

26 Oct '23

Call for abstracts COST Action CA21167 UniDive 2nd general meeting, University of Naples L’Orientale, Italy 7-9 February 2024 UniDive (https://www.cost.eu/actions/CA21167/) is a COST action, i.e. a scientific network, dedicated to universality, diversity and idiosyncrasy in language technology. It is structured around 4 Working Groups: - WG1: Corpus annotation - WG2: Lexicon-corpus interface - WG3: Multilingual and cross-lingual language technology - WG4: Quantifying and promoting diversity The second general meeting of the action will take place on February 7-9, 2024 at the University of Naples L’Orientale in Italy. We invite UniDive WG members <https://www.cost.eu/actions/CA21167/#tabs+Name:Working%20Groups%20and%20Mem…> to submit abstract proposals related to the scientific program of the WGs. Proposals may describe diverse types of contributions, according to 3 different tracks: - Planned work - Work in progress - Complete work, also previously published A proposal should be anonymous, written in English and submitted in pdf only. It should include (on the title page) the list of the relevant WGs. It should not exceed 2 pages, including figures and tables (bibliographic references may go beyond the 2-page limit). If linguistic examples from languages other than English are included, those should be glossed and translated into English, and an extra half page is allowed for this purpose. For the sake of uniformity and easing the reviewers’ effort, we encourage authors to use the following Overleaf Latex template: https://www.overleaf.com/read/yqbpxcbjmjjw Other formats (not necessarily Latex-based) can also be used, provided that they conform to the following specifications: A4 paper, 11pt font, 1in margins. The submission link will be announced soon. The reviewing process is double-blind. The selection of proposals will be done by UniDive Program Committee according to the following criteria: - relevance to UniDive and the work program of its Working Groups (see pp. 18-20 of the Memorandum of Understanding <https://e-services.cost.eu/files/domain_files/CA/Action_CA21167/mou/CA21167…> ), - clarity - diversity of the languages covered by the workshop program The selected proposals will be presented at the 2nd UniDive general meeting as posters and/or oral presentations. At least one author per selected proposal will be reimbursed for their travel and stay. Important dates - 26 October 2023, Call for abstracts - 24 November 2023, Submission deadline - 15 December 2023, notification of acceptance - 20 December 2023, communication of the names of the presenters - 12 January 2024, Final versions of abstracts - 7-9 February 2024, UniDive 2nd general meeting The time zone for all deadlines is Anywhere on Earth (UTC-12). Due to the tight schedule, no extension of the submission deadline is foreseen. Johanna Monti Third Mission Delegate Full Professor in Foreign Languages Teaching Specialised Translation MT and CAT tools, Computational Linguistics Chief Scientist of the UNIOR NLP Research Group Department of Literary, Linguistic and Comparative Studies University of Naples "L'Orientale" Via Duomo, 219 80138 Napoli tel. +39 081 6909913 http://docenti.unior.it/index2.php?user_id=jmonti&content_id_start=1 *Linkedin*: https://www.linkedin.com/in/johanna-monti-03553310 *UNIOR NLP Research Group*: http://docenti.unior.it/index2.php?content_id=26056&content_id_start=1 *Skype*: johanna5962 *Twitter*: @selena245 Monti J. (2019), Dalla Zairja alla traduzione automatica - Riflessioni sulla traduzione nell'era digitale, Napoli: Loffredo Editore. Mitkov, R., Monti, J., Pastor, G. C., & Seretan, V. (Eds.). (2018). *Multiword units in machine translation and translation technology*(Vol. 341). John Benjamins Publishing Company. (https://benjamins.com/catalog/cilt.341) ************************************************** *Firma per destinare il tuo 5xmille all’Università L’Orientale e aiuta così i nostri studenti a fare un’esperienza di studio o tirocinio all’estero. Indica il C.F. 00297640633 nel riquadro* *“Finanziamento della ricerca scientifica e della Università”*

1 0

[CfP] The Sixth International and Interdisciplinary Conference on the Quantitative and Computational Analysis of Textual Data (COMPTEXT) will be held in Amsterdam, the Netherlands, on 2-4 May 2024.
by Johannes B. Gruber 26 Oct '23

26 Oct '23

Dear colleagues, COMPTEXT is an international community of quantitative text analysis and computational social science scholars in political science, international relations and beyond. COMPTEXT 2024 in Amsterdam follows in the footsteps of previous conferences in Budapest (2018), Tokyo (2019) and Innsbruck (online, 2020), Dublin (2022), and Glasgow (2023). COMPTEXT conferences offer ample opportunities to network with computational scholars, to exchange technological knowledge of computational methods, and to obtain useful feedback on ongoing research. For COMPTEXT 2024 in Amsterdam we are seeking paper submissions that: - rely on image, video, text or other digital trace data to study social and political phenomena broadly construed - propose or evaluate new computational methods or tools - seek to make contributions at the intersection of social science and computer science We accept both substantive and methodological papers for presentation: substantive papers may be on any studies in social sciences or humanities that utilize computational methods; methodological papers may describe new computational methods, tools and approaches. Note that conference proceeding will not be published, as the conference format follows social science practices. In keeping with our tradition, ahead of the conference a series of methods training tutorials will be held for registered participants. Courses will be offered for both beginner and advanced level participants. *Submission of Paper Abstracts:* Abstracts of max. 250 words and three substantive and/or methods-related keywords, should be submitted by *Wednesday 20 December 2023*. Notifications of acceptance will be sent by *16 February, 2024*. The registration deadline is *15 March, 2024*. Please submit your paper at https://forms.gle/VrzhEzJEcTNdM3RN9 Please be advised that a conference fee will be charged for participants with accepted papers. The COMPTEXT 2024 Organising Committee consists of: - Mariken A.C.G. van der Velden (Vrije Universiteit Amsterdam) - Roan Buma (Vrije Universiteit Amsterdam) - Alona O. Dolinsky (Vrije Universiteit Amsterdam) - Johannes Gruber (Universiteit van Amsterdam) - Kasper Welbers (Vrije Universiteit Amsterdam) - Miklós Sebők (Centre for Social Sciences, Budapest) *Equality, Diversion, and Inclusion:* COMPTEXT is committed to creating an inclusive conference where diversity is celebrated, and everyone is afforded equality of opportunity. We welcome applications from everyone, including those who identify with any of the protected characteristics that are set out in VU’s Equality, Diversity and Inclusion policy (https://vu.nl/en/about-vu/more-about/diversity). We especially encourage scholars from traditionally underrepresented groups, female scholars, and early-career researchers to apply. For more information, please visit our website: http://www.comptextconference.org/ Questions related to COMPTEXT Amsterdam 2024 should be directed to comptext2024(a)gmail.com. Best regards, The Organizers

1 0

Re: [External] Re: NIF: NLP Interchange Format
by Bilgin, Orhan (Postgraduate Researcher) 26 Oct '23

26 Oct '23

Hi Ada, Thank you for your reply. I don't think it is possible to follow your advice to wean ourselves of the concept of a lemma and at the same time think of "a verb that can be conjugated", because that is precisely an example of what I would call a lemma. I never claimed that anything exists beyond the reality of my mind. I only asked why I am not allowed to talk about things that can be conjugated / inflected etc. and to use the word "lemma" to refer to those things. You haven't answered that question. Best, Orhan On 18 Oct 2023 17:49, Ada Wan <adawan919(a)gmail.com> wrote: [To those who do not have shared interests on issues that pertain to Corpora-List matters, such as data/corpora and their handling which includes but is not limited to linguistic/NLP theories/methods (and the validity thereof): please disregard.] Dear Orhan Thanks for your interests in this discussion. I think it is high time that our community comes to a critical (re-)examination of (linguistic) morphology (and to address issues concerning reinterpretation and transition). First of all, allow me to put my traditional grammarian hat on to get to your question more directly. You brought up an example of a morphological paradigm. Now, as linguists or language professionals, we know that language is (re-)productive in nature. So, if you don't mind, we can do a thought experiment and go through this dialectically (pls note that I only check my emails about once a day on weekdays, however). 1. Let's think of a verb that does not yet exist (in any particular language(s) that you can think of or that you are used to). Would you mind conjugating it for me? How many patterns would you have? And what would the forms be like? 2. Where did you get the patterns/paradigm from? If you were able to come up with a "full paradigm" (whatever that should refer to (?) --- but let's suppose, you have 6 forms (as per some textbook paradigms from some "Indo-European languages" --- 1st/2nd/3rd person in sg/pl), you surely haven't seen any of these forms combined with the verb before, have you? So where is your evidence that these forms exist in reality beyond that of your mind? And if such "perfect/ideal paradigm" exists only in your mind (and minds of some of your friends as well), how do you justify that morphological paradigma (the form/"structure"/pattern) are a necessary or intrinsic part of language (may these be of any particular language (which "one"?) or or language in general)? Wouldn't morphology as well as the perpetual construction and reconstruction of morphological patterns be a self-fulfilling prophecy only? And how often do we impose our conceptual/perceptual habits/categories upon whatever "new" that we encounter? 3. If, however, you were not able to construct a "full paradigm" or any part thereof at all, or you claim you were not able to think of a hypothetical verb either, because to you morphology is solely based on what has been written and analyzed beforehand/historically, then what is there to claim about morphological analyses? Not only does such practice not generalize, but it would also just apply to calcified segments analyzed/interpreted in a certain way as part of philological pursuits in the past. One should bear in mind that philological methods can progress and update as well. There are no limits as to how one can *use* (or some might even claim *define* here) "language", including how various modalities can combine/fuse with each other. Meaning has no fixed boundaries. When it comes to language or meaning, there is no "completeness" to "speak of" or to serve as basis of any science/study. And there are no fixed demarcations between any "particular languages" either. Other perspectives on (the shortcomings of) morphology and "words" can be found on my rebuttal page here: https://openreview.net/forum?id=-llS6TiOew. Please also read the references cited therein. I look forward to your reply, comments/remarks, or questions. (Actually, the floor can also be opened to anyone who would like to join.) Thank you and best Ada

4 3

Shared hosting of LLMs for research?
by Amanda Stent 24 Oct '23

24 Oct '23

So, everyone wants to host their own (copy of a) large language model (LLM), but many academic institutions can't spin up multiple LLMs simultaneously, in perpetuity, nor do I believe the Scientific Funding Agencies in each country would want to pay for everyone to get a GPU cluster just to host 500+ copies of tomorrow's version of LLAMA-2(ish). Are you aware of any effort proposing or planning to host LLMs for use by researchers in some shared infrastructure? After all, hosting the LLM costs the same per hour whether 1, 3 or 20 people are calling it, and at most academic institutions usage would be a little bursty. Best, Amanda Stent -- (they/she) Director, Davis Institute for AI Professor, Computer Science Colby College Follow the Davis Institute for AI here <https://web.colby.edu/davisinstitute/> Want to meet? Calendly - Amanda Stent <https://calendly.com/amandastentcolby>

2 1

Say IT again: International Workshop on Interpreting Technologies
by Dola Mullage, Damith P. 24 Oct '23

24 Oct '23

We are pleased to announce "Say IT again: International Workshop on Interpreting Technologies" (SAY-IT AGAIN 2023), which will take place on the 2nd and 3rd of November 2023. Like our previous edition, SAY-IT AGAIN 2023 will also be hybrid, which means that both attendees and participants will have the chance to choose whether they want to attend the workshop ON SITE (at the University of Malaga, Spain) or fully ONLINE, Limited spots! This workshop seeks to act as a meeting point for researchers working in interpreting- related technologies (CAI tools, machine interpreting, speech to text/speech translation, remote interpreting, etc.); practicing tech-savvy interpreters; companies and freelancers providing services in interpreting as well as companies developing tools for interpreters. In addition to the short papers for presentation, SAY-IT AGAIN will feature invited talks by prominent experts as well as presentations and panels hosted by practitioners. You can see the full provisional programme here: https://lexytrad.es/SAYITAGAIN2023/wp-content/uploads/2023/10/PROGRAMME_SAY… Registration is available through the following link (until full capacity): https://lexytrad.es/SAYITAGAIN2023/registration/ For further information, you can access SAY-IT AGAIN 2023’s official website: https://lexytrad.es/SAYITAGAIN2023/ Thank you so much in advance for disseminating this event among your colleagues and students who might be interested in the latest advances in field of interpreting technologies.

1 0

2026

2025

2024

2023

2022

Corpora October 2023