August 2023 - Corpora

LREC-COLING 2024 website is alive!
by sara goggi 07 Aug '23

07 Aug '23

Dear Corpora members, this is to announce that the LREC COLING 2024 website is now available at: https://lrec-coling-2024.org/ On the website you will find the 2nd Call for Papers for the main conference, the Workshops CfP and the Tutorials CfP, the Author's kit plus other information about Torino. All the best, LREC-COLING 2024 Organizers

1 0

Fwd: looking for summarization datasets in italian and spanish langs
by Saeed Farzi 06 Aug '23

06 Aug '23

Hi guys, I am going to implement a summarization system in the medical domain in Italian and Spanish. So I am looking for free summarization datasets both in the public and medical domains in both languages. Any help would be appreciated. sincerely Ciao -- *Dr. Saeed Farzi,* Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran. Phone: +98-21-8462450-401 Fax: +98-21-88462066 P.O. Box: 16315-1355, Web: http://wp.kntu.ac.ir/saeedfarzi/ Lab: https://www.trlab.ir/ -- *Dr. Saeed Farzi,* Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran. Phone: +98-21-8462450-401 Fax: +98-21-88462066 P.O. Box: 16315-1355, Web: http://wp.kntu.ac.ir/saeedfarzi/ Lab: https://www.trlab.ir/

2 1

Third CFP (ArabicNLP 2023): The 1st Arabic Natural Language Processing Conference
by Amr Keleg 06 Aug '23

06 Aug '23

*** Apologies for Cross-Posting *** The First Arabic Natural Language Processing Conference (ArabicNLP 2023) co-located with EMNLP 2023 in Singapore. What's in a name? To mark our move from a workshop to a conference, we changed our acronym from WANLP to ArabicNLP. Conference URL: <https://wanlp2023.sigarab.org/> https://arabicnlp2023.sigarab.org/ Submission URL: https://openreview.net/group?id=SIGARAB.org/ArabicNLP/2023/Conference ArabicNLP 2023 invites the submission of original long, short, or demo papers in the area of Arabic Natural Language Processing. ArabicNLP 2023 builds on seven previous workshop editions, which have been extremely successful, drawing in a large active participation in various capacities. This conference is timely given the continued rise in research projects focusing on Arabic NLP. ArabicNLP 2023 will also feature shared tasks, allowing participants to work on specific NLP challenges related to Arabic language processing. The conference is organized by the Special Interest Group on Arabic NLP (SIGARAB), an Association for Computational Linguistics Special Interest Group on Arabic Natural Language Processing. Important Dates - May 7, 2023: submission of shared tasks proposals - May 14, 2023: notification of acceptance of shared tasks - September 5, 2023: conference papers due date - October 12, 2023: notification of acceptance - October 20, 2023: camera-ready papers due - December 7, 2023: conference day All deadlines are 11:59 pm UTC -12h <https://www.timeanddate.com/time/zone/timezone/utc-12> (“Anywhere on Earth”). We accept long (up to 8 pages), short (up to 4 pages), and demo paper (up to 4 pages) submissions. Long and short papers will be presented orally or as posters as determined by the program committee. Submissions are invited on topics that include, but are not limited to, the following: - Enabling core technologies: language models and large language models, morphological analysis, disambiguation, tokenization, POS tagging, named entity detection, chunking, parsing, semantic role labeling, sentiment analysis, Arabic dialect modeling, etc. - Applications: dialog modeling, machine translation, speech recognition, speech synthesis, optical character recognition, pedagogy, assistive technologies, social media, etc. - Resources: dictionaries, annotated data, corpora, etc. Submissions may include work in progress as well as finished work. Submissions must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, classical, or mixed. Papers on other languages sharing problems faced by Arabic NLP researchers, such as Semitic languages or languages using Arabic script, are welcome provided that they propose techniques or approaches that would be of interest to Arabic NLP, and they explain why this is the case. Additionally, papers on efforts using Arabic resources but targeting other languages are also welcome. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work. If you have any questions, please contact us at: <https://groups.google.com/u/1/>arabicnlp-pc-chairs(a)sigarab.org The ArabicNLP 2023 Publicity Chairs, Amr Keleg and Salam Khalifa

1 0

Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...
by Albretch Mueller 05 Aug '23

05 Aug '23

On 8/3/23, Toms Bergmanis <toms.bergmanis(a)tilde.lv> wrote: ... I, for one, have benefited from Ada's, as well as other member's suggestions and comments as I hope they have somehow benefited from mine. lbrtchx

4 7

1st CfP special issue in « Computational Linguistics »
by Abdellah Fourtassi 05 Aug '23

05 Aug '23

1st Call for Papers: Special Issue of the Computational Linguistics journal on Language Learning, Representation, and Processing in Humans and MachinesGuest Editors Marianna Apidianaki (University of Pennsylvania) Abdellah Fourtassi (Aix Marseille University) Sebastian Padó (University of Stuttgart) *Submission deadline: December, 10* Large language models (LLMs) acquire rich world knowledge from the data they are exposed to during training, in a way that appears to parallel how children learn from the language they hear around them. Indeed, since the introduction of these powerful models, there has been a general feeling among researchers in both NLP and cognitive science that a systematic understanding of how these models work and how they use the knowledge they encode, would shed light on the way humans acquire, represent, and process this same knowledge (and vice versa). Yet, despite the similarities, there are important differences between machines and humans that have prevented a direct translation of insights from the analysis of LLMs to a deeper understanding of human learning. Chief among these differences is that the size of data required to train LLMs far exceeds -- by several orders of magnitude -- the data children need to acquire sophisticated conceptual structures and meanings. Besides, the engineering-driven architectures of LLMs do not appear to have obvious equivalents in children's cognitive apparatus, at least as studied by standard methods in experimental psychology. Finally, children acquire world knowledge not only via exposure to language but also via sensory experience and social interaction. This edited volume aims to create a forum of exchange and debate between linguists, cognitive scientists and experts in deep learning, NLP and computational linguistics, on the broad topic of learning in humans and machines. Experts from these communities can contribute with empirical and theoretical papers that advance our understanding of this question. Submissions might address the acquisition of different types of linguistic and world knowledge. Additionally, we invite contributions that characterize and address challenges related to the mismatch between humans and LLMs in terms of the size and nature of input data, and the involved learning and processing mechanisms. Topics include, but are not limited to: - Grounded learning: comparison of unimodal (e.g., text) vs multimodal (e.g., images and video) learning. - Social learning: comparison of input-driven mechanisms vs. interaction-based learning. - Exploration of different knowledge types (e.g., procedural / declarative); knowledge integration and inference in LLMs. - Methods to characterize and quantify human-like language learning or processing in LLMs. - Interpretability/probing methods addressing the linguistic and world knowledge encoded in LLM representations. - Knowledge enrichment methods aimed at improving the quality and quantity of the knowledge encoded in LLMs. - Semantic representation and processing in humans and machines in terms of, e.g., abstractions made, structure of the lexicon, property inheritance and generalization, geometrical approaches to meaning representation, mental associations, and meaning retrieval. - Bilingualism in humans and machines; second language acquisition in children and adults; construction of multi-lingual spaces and cross-lingual correspondences. - Exploration of language models that incorporate cognitively plausible mechanisms and reasonably-sized training data. - Use of techniques from other disciplines (e.g., neuroscience or computer vision) for analyzing and evaluating LLMs. - Open-source tools for analysis, visualization, or explanation. Submission Instructions Papers should be formatted according to the Computational Linguistics style guidelines: https://cljournal.org/ We accept both long and short papers. Long papers are between 25 and 40 journal pages in length; short papers are between 15 and 25 pages in length. Papers for this special issue will be submitted through the CL electronic submission system, just like regular papers: https://cljournal.org/submissions.html Authors of special issue papers will need to select ‟Special Issue on LLRP‟ under the Journal Section heading in the CL submission system. Please note that papers submitted to a special issue undergo the same reviewing process as regular papers. Timeline Deadline for submissions : December, 10 2023 Notification after 1st round of reviewing : February, 10 2024 Revised versions of the papers : April, 30 2024 Final decisions : June, 10 2024 Final version of the papers : July, 1 2024Guest Editors Marianna Apidianaki marapi(a)seas.upenn.edu Abdellah Fourtassi abdellah.fourtassi(a)gmail.com Sebastian Padó pado(a)ims.uni-stuttgart.de *Computational Linguistics* is the longest-running flagship journal of the Association for Computational Linguistics. The journal has a high impact factor: 9.3 in 2022 and 7.778 in 2021. Average time to first decision of regular papers and full survey papers (excluding desk rejects) is 34 days for the period January to May 2023, and 47 days for the period January to December 2022. -- This email was sent from my smartphone. Forgive the brevity, the typos, and the lack of nuance.

1 0

[Last CFP] 3rd Workshop for NLP Open Source Software at EMNLP 2023
by liling tan 04 Aug '23

04 Aug '23

(apologies for cross-posting) *-----Workshop for NLP Open Source Software (NLP-OSS)* 06 Dec 2023, Co-located with EMNLP 2023 https://nlposs.github.io/ Deadline for Long and Short Paper submission: *09 August, 2023 (23:59, GMT-11)* ----- You have tried to use the latest, bestest, fastest LLM models and bore grievances but found the solution after hours of coffee and computer staring. Share that at NLP-OSS and suggest how open source could change for the better (e.g. best practices, documentation, API design etc.) You came across an awesome SOTA system on NLP task X that no LLM has beaten its F1 score. But now the code is stale and it takes a dinosaur to understand the code. Share your experience at NLP-OSS and propose how to "replicate" these forgotten systems. You see this shiny GPT from a blog post, tried it to reproduce similar results on a different task and it just doesn't work on your dataset. You did some magic to the code and now it works. Show us how you did it! Though they're small tweaks, well-motivated and empirically tested are valid submissions to NLP-OSS. You have tried 101 NLP tools and there's none that really do what you want. So you wrote your own shiny new package and made it open source. Tell us why your package is better than the existing tools. How did you design the code? Is it going to be a one-time thing? Or would you like to see thousands of people using it? You have heard enough of open-source LLM and pseudo-open-source GPT but not enough about how it can be used for your use-case or your commercial product at scale. So you contacted your legal department and they explained to you about how data, model and code licenses work. Sharing the knowledge with the NLP-OSS community. You have a position/opinion to share about free vs open vs closed source LLMs and have valid arguments, references or survey/data to support your position. We would like to hear more about it. At last, you've found the avenue to air these issues in an academic platform at the NLP-OSS workshop!!! Sharing your experiences, suggestions and analysis from/of NLP-OSS ---- P/S: 2nd Call for Paper *Workshop for NLP Open Source Software (NLP-OSS)* 06 Dec 2023, Co-located with EMNLP 2023 https://nlposs.github.io/ Deadline for Long and Short Paper submission: 09 August, 2023 (23:59, GMT-11) ------------------------------ The Third Workshop for NLP Open Source Software (NLP-OSS) will be co-located with EMNLP 2023 on 06 Dec 2023. Focusing more on the social and engineering aspect of NLP software and less on scientific novelty or state-of-art models, the Workshop for NLP-OSS is an academic forum to advance open source developments for NLP research, teaching and application. NLP-OSS also provides an academic workshop to announce new software/features, promote the collaborative culture and best practices that go beyond the conferences. We invite full papers (8 pages) or short papers (4 pages) on topics related to NLP-OSS broadly categorized into (i) software development, (ii) scientific contribution and (iii) NLP-OSS case studies. - *Software Development* - Designing and developing NLP-OSS - Licensing issues in NLP-OSS - Backwards compatibility and stale code in NLP-OSS - Growing, maintaining and motivating an NLP-OSS community - Best practices for NLP-OSS documentation and testing - Contribution to NLP-OSS without coding - Incentivizing OSS contributions in NLP - Commercialization and Intellectual Property of NLP-OSS - Defining and managing NLP-OSS project scope - Issues in API design for NLP - NLP-OSS software interoperability - Analysis of the NLP-OSS community - *Scientific Contribution* - Surveying OSS for specific NLP task(s) - Demonstration, introductions and/or tutorial of NLP-OSS - Small but useful NLP-OSS - NLP components in ML OSS - Citations and references for NLP-OSS - OSS and experiment replicability - Gaps between existing NLP-OSS - Task-generic vs task-specific software - *Case studies* - Case studies of how a specific bug is fixed or feature is added - Writing wrappers for other NLP-OSS - Writing open-source APIs for open data - Teaching NLP with OSS - NLP-OSS in the industry Submission should be formatted according to the EMNLP 2023 templates <https://2023.emnlp.org/call-for-papers> and submitted to OpenReview <https://openreview.net/group?id=EMNLP/2023/Workshop/NLP-OSS> ORGANIZERS Geeticka Chauhan, Massachusetts Institute of Technology Dmitrijs Milajevs, Grayscale AI Elijah Rippeth, University of Maryland Jeremy Gwinnup, Air Force Research Laboratory Liling Tan, Amazon

1 0

Re: Any literature about tensors-based corpora NLP research with actual examples (and homework ;-)) you would suggest? ...
by Ada Wan 04 Aug '23

04 Aug '23

Toms No, not my arrogance, but my expertise is outstanding. To my background: before my graduate-level theoretical linguistics curriculum (Chomskyan lineage) in the 1990s [1], I'd spent about 1-2 decades in multilingual, international environments, making keen observations and reflections on various language, cultural/social phenomena. After graduation, I have traveled to all 7 continents to continue with my linguistic and philosophical observations and learning (aka fieldwork). I have studied in 3 continents and had about 5 rounds of graduate training [2]. I have learned about 10+(?) languages/varieties (EN,ZH,FR,ES,RU,DE,LA,NL,IT,JA,ASL) and dabbled on a few other more (Sanskrit, Ancient Greek, AR...) and did fieldwork on/with a couple more (Zapotec, various varieties of PNG, Tok Pisin...) --- I mean, I don't remember much/any of these, it's been quite a while... now that I am/was (?) [3] about to retire. In the 2000s, I revisited "Linguistics proper" from another perspective, including but not limited to what you may know as "NLP" nowadays. I did not start publishing until the 2010s, so I assume that's when one might have become familiar with my work (assuming that they have read it). [1] doing original research, including but not limited to something similar to G2P work, and on what one could consider as writing pseudo code for computational systems (so, Computational Linguistics) [2] So to have finally come up with the experimental results I did and to have figured out what "language complexity" (in both the context of computing and not) as well as various other DL/NN phenomena were all about, I surely think it deserves a celebration! [3] Recent happenings seem to suggest that I should stay in the arena to keep an eye out on things. Indeed, as you might have liked to suggest, there are plenty of "NLP practitioners" out there nowadays who think they are qualified to work on "language" just because they can speak one. But if you need to pick a battle about that with me, I'm afraid you might have picked the wrong person. So I hope you could consider me as "not-a-noob". Being a woman in STEM/tech can be hard, but I didn't realize how little of a benefit of a doubt some choose to afford. I write this because not only of the tone of your inquiry, but also because of it is hard to not take offense with what you actually wrote, including this "yet that did not put you off from writing bogus papers on machine translation". Which part of my work do you regard as "bogus papers on MT"? *** Re ""The priority of my communications here is to clarify the part on the scientific front, to make sure that if one happens to have gotten oneself involved in this space, how one can come to more clarity on the status quo, esp. given my results." Is this about your "results" in that one paper...": No, it is not only about: i. my results in "Fairness in representation for multilingual NLP" ( https://openreview.net/forum?id=-llS6TiOew or https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view?pli=1), or ii. those in "Representation and Bias in Multilingual NLP" ( https://openreview.net/forum?id=dKwmCtp6YI), or iii. some of the remarks on language matters and current practices in the "language space" I've tweeted --- some of which further explain my solution to language complexity, some advance arguments from traditional academic debates, some on meta-theoretical and transdisciplinary development, some on possible future directions... etc., but also iv. some of the dependencies or potential impact of my results, if one were to take their work with responsibility and integrity. To better understand the *impact* of my results (as in, why they are important), it'd be helpful for one to have knowledge/experience of the language space (or the "language enterprise" as Noam sometimes refers/referred to it) --- to understand the development and the dependencies (and the lack thereof) between Linguistics, Computational Linguistics, Natural Language Processing, Computing or Computer Science and/or Computational Sciences, Statistics/Mathematics, Social Sciences, Information Theory, Philosophy... etc. [4] (both as academic disciplines as well as as sciences on their own). [4] because *language*! (But if I had to opine on which areas are to be impacted most directly with my findings, as in paradigm-shifting type of impact, I'd probably say/write the first 3 or 4, i.e. Lx, CL, NLP, CS/CS-related.) Re "For anyone wanting to continue this discussion, I strongly recommend reading Ada's work, so you have an informed opinion about what evidence she is referring to.": Thanks for your help in promoting my work. Yes, I think it'd be helpful for everyone to read my work, whether they'd like to partake in a conversation on it or not. Please feel free to send me any questions you may have --- it's been a few really intense years for me. I don't always know if/how my writing has been understood. I'd be grateful for your feedback. Best Ada On Thu, Aug 3, 2023 at 5:42 PM Toms Bergmanis <toms.bergmanis(a)tilde.lv> wrote: > Ada, > > > > "it is not the right time right now to be "campy" about (as in, to be > arguing/protesting for) "grammar", at the moment, esp. if you do not have a > background in Linguistics." > > Your arrogance is outstanding. I will ask again, as I have asked before - > what background do you have? Last time I checked, I could not find any > evidence of your background in NLP, yet that did not put you off from > writing bogus papers on machine translation. > > > > "The priority of my communications here is to clarify the part on the > scientific front, to make sure that if one happens to have gotten oneself > involved in this space, how one can come to more clarity on the status quo, > esp. given my results." > > Is this about your "results" in that one paper evaluating which data > representation is better in machine translation without actually > considering machine translation quality? It sounds like something that > everyone should read before engaging in a debate with you. > > > > For anyone wanting to continue this discussion, I strongly recommend > reading Ada's work, so you have an informed opinion about what evidence she > is referring to. > > Sincerely, > > Toms Bergmanis > ------------------------------ > > *From:* Ada Wan via Corpora <corpora(a)list.elra.info> > *Sent:* Wednesday, August 2, 2023 7:17:42 PM > *To:* Albretch Mueller <lbrtchx(a)gmail.com> > *Cc:* corpora <corpora(a)list.elra.info> > *Subject:* [Corpora-List] Re: Any literature about tensors-based corpora > NLP research with actual examples (and homework ;-)) you would suggest? ... > > > > Re RML or any "text technologies" leveraging "grammar" (misnomer or not): > > it is not the right time right now to be "campy" about (as in, to be > arguing/protesting for) "grammar", at the moment, esp. if you do not have a > background in Linguistics. > > There has been quite some abuse/misconduct with concepts/units/assumptions > such as "words", "sentences", and "grammar" in the language space (with or > without computational implementation). > > > > The priority of my communications here is to clarify the part on the > scientific front, to make sure that if one happens to have gotten oneself > involved in this space, how one can come to more clarity on the status quo, > esp. given my results. There is a lot that needs to be re-evaluated and > re-interpreted. Simply stating that something might have been useful in the > past is not going to be helpful with going forward. > > > > If one is working in technologies with language/text data (e.g., in a > user-based format/framework, and not working on "grammar" as a > "linguistic"/philological pursuit), it is recommended that the name(s) of > such technologies get updated --- if "grammar" [1] does not have to be > mentioned or be involved, don't. > > [1] or, including but not limited to any of the following: "word", > "sentence", "linguistic structure(s)", "meaning", "morphology", "syntax", > "parsing", various terms related to parts of speech (e.g. "nouns", > "verbs").... > > > > Re "BTW, regarding that "parsing" aspect, what is the term used to > describe the gradual process of "terminological inception"?": > > conceptualization? Coining of terms? > > According to me, "lexical priming" is different from "terminological > inception". > > > > Re "How could you clarified intersubjectivity?": > > https://en.wikipedia.org/wiki/Intersubjectivity :) > > Your question is way too broad, or requires an answer that is such, which > I cannot entertain at the moment. > > > > Thanks for sharing your perspectives. I must admit I have not had time to > digest all of your points. But this impression recurred in me as I was > reading them: > > sometimes, I sense that when one claims some concepts are not universal > (e.g. the ones mentioned in [1] above), others take it as that all concepts > are categorically invalid. That is not what I intended to communicate (with > all my papers, scientific work, and my comments here). It is an expert > opinion/finding that I shared, upon some careful evaluation. > > > > > > On Tue, Aug 1, 2023 at 10:26 PM Albretch Mueller via Corpora < > corpora(a)list.elra.info> wrote: > > On 7/31/23, Ada Wan <adawan919(a)gmail.com> wrote: > > That having been expressed, here are a couple of points re RML that one > should pay heed: > > i. to what extent and in what context is this a technology relevant? > > If you were able to device an algorithm which taking as input only NL > texts (composed of: 1) a start (semantic end); b) a sequence of > characters from a relatively large and representative text bank; c) an > end (a semantic start)) is able to exhaustively "deduce" the grammar > of such texts, in addition to being able to use it with any language, > you would then: > > 1) have defined a "space"/"coordinate system" for those texts, to > frame (pretty much) all possible "meaningful 'points'"/"phrases" in > terms of such grammar, which would also; > 2) be a 0-search structure describing the text bank/corpus (every > text segment would also become a pointer to every single actualization > of that very segment in all texts, no more "n-grams" necessary!), > which could; > 3) be used with minimal turking/supervision to: > 3.1) cleanse up all automatic translations from youtube; > 3.2) keep multilingual corpora; > 3.3) use it for automatic translations (demonstrably, in an almost > foolproof, perfect way, since you always have the words/phrases with > their context); > 3.4) "cosmic/tree reading": instead reading books/sequences of > characters, you would read that text as it relates to all other texts > from the same topic; > 3.5) parsing: you would keep a corpus of what you know so you wont > have to reread about certain topics and aspects you already know > (great Lord! how I hate reading a whole book to only find a few, at > times marginal, sentences worth reading! or that "youthful" thing of > thinking that they just discovered/created an idea because they are > just verbalizing it or made a movie about it!) BTW, regarding that > "parsing" aspect, what is the term used to describe the gradual > process of "terminological inception"? I have heard the term > "Adamization", but, even though that word doesn't really rub me the > wrong way, I could imagine it is "too sexist" to some people. I > wouldn't really care calling it Eveization or "pussyfication" or > whatever. I just don't want to use the term that the government uses: > "lexical priming" and "terminological inception" sounds too cumbersome > as a verb: "terminologically incept"? doesn't sound OK in English; > 3.6) of course, an easy application of that contextual parsing would > be removing all that js crap and ads before they reach your awareness; > ... > 3.n) not last and definitely not least I am thinking hard about how > to make sure police and politicians at least have a hard time while > using what I have described to "freedom love" people (I know, I know, > ... "3.n" doesn't "technically" pertain to quality of implementation > issues ..., but I, for one, disagree. Giving the "all tangible things" > (tm) panopticon in which we are all living these days, each of us in > one's own "virtual prison cell" to call it somehow, we should also > think about, be openly honest about such matters) > > I am working right now on such Leibnizian "characteristica > universalis" kind of thing. First cleansing approx. 1.2 million texts > mostly from archive.org, *.pub and the NYS Regents exams > (nysedregents.org + nysl.ptfs.com) which they have, at least > partially, translated to more than 10 languages. Is that relevant > enough to you? ;-) I am also being quite selfish about it because I > have always dreamed of being able to "read"/mind all texts which have > ever been written in the same way that teens think they have to have > sex with everybody in town to make sense of things. > > > ii. one can certainly dissect/decompose texts ... > > Computing power has become insanely cheap, but it has also enabled > too much "cleverhansing" out there. The Delphic phrase: "you can make > sense or money" these times translates as some sort of corollary to: > "using computers and then thinking about it makes you smart"; but, > does it really? > > It amazes me how easily you can "dissect"/"decompose texts", talk > about "tensors", "vectors", ... (I am not trying to police language > usage, it just amazes me); let alone all the insufferable bsing claims > by the "Artificial Intelligentsia". > > I would go with one character after the other and an open attempt to > use the minimal amount of principles to then see what I get. IMO, when > you start getting too smart about what you do, of course, you will > "see" how smart you are. The poet in me likes Borges' stanzas: "... el > nombre es arquetipo de la cosa, en las letras de 'rosa' está la rosa y > todo el Nilo en la palabra 'Nilo'" ("its name is a thing's archetype, > in the letters of 'rose' is the rose and the whole of the Nile (river) > in the word 'Nile'") > > > II. Re ""magical" in the sense that when we go about our intersubjective > business": some intersubjectivity can be further clarified. I don't see > much of your examples as being "magical". > > I actually do! How could you clarified intersubjectivity? I am trying > to do so (somewhat) Mathematically (to the extent you could). Could > you share any papers, "prior art" on such matters? > > > ii. "other people may read, mind, as well ...;": so? > > which is a good thing it is alright, fine and dandy in the hippie way, I > meant. > > > iii. "Alice bought some veggies from Bob, ...)": this I don't understand. > > iv. "We see more in money ("words", ...) than just a piece of paper" > > iii. and iv. overlap to some extent so I will try to explain them > both quickly (which is impossible since you can write philosophies > about each line, but there I'll go). To understand what Marx (may > have) meant by „gesellschaftlich notwendige Arbeit” ("socially > necessary labour time", wording which has made quite a few go berserk > ever since): > > https://en.wikipedia.org/wiki/Socially_necessary_labour_time > > https://en.wikipedia.org/wiki/Transformation_problem > > you have to understand the basic mathematical concepts of: > > a) combined rates, and > b) intratextual systems of linear equations > > Based on my teaching experience §b is easier to understand. Sorry I > couldn't find an "easier" explanation on youtube of that type of SLEs > than the one I used with my students preparing for the Regents: > > https://ergosumus.files.wordpress.com/2018/10/sle04-en.pdf > > the intratextuality of those problems matter to corpora research > because different strata of "like terms" ("verbs", "adjectives", ...) > is what creates grammar. "Crazy me" thinks you could to some extent > describe the "likeness of terms" underlying grammar! > ~ > I also have a guideline about combined rates which I successfully > used with my students: > > https://ergosumus.files.wordpress.com/2018/06/word_problems12-en00.pdf > ~ > What the eff do combined rates and SLEs have to do with Marx' > transformation problem? ;-) > > Well, notice that the -equitable aspect- used to solve combined rates > problems is the time (regardless of how differently fast one "works" > in comparison with others). There is also another type of combine rate > problems: you drive to some place with a friend who doesn't care about > driving fast, but you need to rest so she drives for a while ... that > problem is different from two people meeting at a place each driving > "on their own cars" (at their own average speed). > > Serge Heiden shared a paper about presidential debates which could be > also Mathematically studied as a CR kind of problem (even if > politicians as the crowd management clowns they all are don't have to > make sense, anyway), but as it happens with any dialogue there are > parts of the conversations in which both the cars and the time is > shared and other times when only (or more of) the time. I don't know > of a general Mathematical formulation to CRs kinds of problems, which > could be used for corpora research. On my "to do" list I have writing > papers studying Euclid's Elements and Plato's Dialogues in that way. > > Karl Marx's as part of his „Wertgesetz der Waren” (reChristened in > English as "labor theory of value") somewhat metaphorically stated > that the exchange value of a commodity is a function of "society's > labour-time". He also rendered his ideas as equations (in more of a > verbally descriptive, metaphorical way), but that phrase: "society's > labour-time", was and is still found from questionable to > unfalsifiably wild. I don't claim to have mind reading powers, but I > think in his letter to his friend Ludwig Kugelmann, the thoroughgoing > Hegelian Marx was, he clearly explained what he meant (page: 222 in > file, 208 in book): > > > https://archive.org/download/marxengelsselectedcorrespondence/Marx%20%26%20… > > Marx To Ludwig Kugelmann In Hanover London, July 11, 1868: > All that palaver about the necessity of proving the concept of value > comes from complete ignorance both of the subject dealt with and of > scientific method. Every child knows that a nation which ceased to > work, I will not say for a year, but even for a few weeks, would > perish. Every child knows, too, that the masses of products > corresponding to the different needs require different and > quantitatively determined masses of the total labour of society. That > this necessity of the distribution of social labour in definite > proportions cannot possibly be done away with by a particular form of > social production but can only change the mode of its appearance, is > self-evident. No natural laws can be done away with. What can change > in historically different circumstances is only the form in which > these laws assert themselves. And the form in which this proportional > distribution of labour asserts itself, in a state of society where the > interconnection of social labour is manifested in the private exchange > of the individual products of labour, is precisely the exchange value > of these products. > ~ > So, as I see it, in a Hegelian way, Marx was seeing the whole of > society as a corpus (in which we all live through our own > texts/narratives) talking about "socially necessary labour time" in > the way that "time" becomes the equitable aspect shared when > people/(-society as a whole-) work together as described by combined > rates kinds of problems. > > When "Alice buys some veggies from Bob, ..." she used money as > "equitable aspect" to get Bob's veggies (in the Marxian way they were > both part of a combine rates problem) and you tell me this is not > magical! > > > v. "some transactional electronic ("air"...) excitations": I don't get > this. > > you may pay with cash using coins or bills or using your debit card > which at the end of the day become transactional electronic > excitations on some hard drives. When you speak there is more to it > than vibrations/fluctuations of air. (I am referring to the medium > which Saussurean signifiers use) > > > vi. "your 'magic' and mine are different we are still able to > 'communicate'. How on earth do such things happen?": a disclaimer: I am not > using any magic in my attempts to communicate with you here. I try my best > to place myself in your shoes to guesstimate the points that you are trying > to get across. But many (as you can see above) didn't quite reach me. > > "I try my best to place myself in your shoes" ... ;-) Ha, ha, ha! > that is just a functional illusion. What do you know about "my shoes"? > I work as a gardener (which I love to do) so they are dirty and > smelly, ... I also love to eat garlic ... As I see things standing on > "my dirty and smelly shoes and voicing it from my garlicky mouth" > being honest and true to matters is good enough. > > lbrtchx > _______________________________________________ > Corpora mailing list -- corpora(a)list.elra.info > https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ > To unsubscribe send an email to corpora-leave(a)list.elra.info > > >

1 0

CFP - SIMBig 2023 in Mexico - Extended Deadline
by Juan Antonio Lossio-Ventura 04 Aug '23

04 Aug '23

> [Apologies for cross-posting] > ====================================================================== > FINAL DEADLINE OF PAPER SUBMISSION - **AUGUST 10** > ====================================================================== > > SIMBig 2023 - 10th International Conference on Information Management and Big Data > Where: Instituto Politécnico Nacional, Mexico DF, MEXICO > When: October 18 - 20, 2023 > Website: https://simbig.org/SIMBig2023/ > > ====================================================================== > > OVERVIEW > ---------------------------------- > > SIMBig 2023 seeks to present new methods of Artificial Intelligence (AI), Data Science, Machine Learning, Natural Language Processing, Semantic Web, and related fields, for analyzing, managing, and extracting insights and patterns from large volumes of data. > > > KEYNOTE SPEAKERS (to be confirmed) > ---------------------------------- > Mona Diab, Meta AI, USA Huan Liu, Arizona State University, USA > > and more to be announced soon... > > IMPORTANT DATES > ---------------------------------- > > July 24, 2023 August 10, 2023 --> Full papers and short papers due > August 28, 2023 --> Notification of acceptance > September 10, 2023 --> Camera-ready versions > October 18 - 20, 2023 --> Conference held in Mexico DF, Mexico > > PUBLICATION > ---------------------------------- > > All accepted papers of SIMBig 2023 (tracks including) will be published with Springer CCIS Series <https://www.springer.com/series/7899> (to be confirmed). > > Best papers of SIMBig 2023 (tracks including) will be selected to submit an extension to be published in the Springer SN Computer Science Journal. <https://www.springer.com/journal/42979> > TOPICS OF INTEREST > ---------------------------------- > > SIMBig 2023 has a broad scope. We invite contributions on theory and practice, including but not limited to the following technical areas: > > Artificial Intelligence > Big/Masive Data > Data Science > Machine Learning > Deep Learning > Natural Language Processing > Semantic Web > Data-driven Software Engineering > Data-driven software adaptation > Healthcare Informatics > Biomedical Informatics > Data Privacy and Security > Information Retrieval > Ontologies and Knowledge Representation > Social Networks and Social Web > Information Visualization > OLAP and Business intelligence > Crowdsourcing > > SPECIAL TRACKS > ---------------------------------- > > SIMBig 2023 proposes six special tracks in addition to the main conference: > > ANLP <https://simbig.org/SIMBig2023/en/anlp.html> - Applied Natural Language Processing > DISE <https://simbig.org/SIMBig2023/en/dise.html> - Data-Driven Software Engineering > EE-AI-HPC <https://simbig.org/SIMBig2023/en/eeaihpc.html> - Efficiency Enhancement for AI and High-Performance Computing > SNMAM <https://simbig.org/SIMBig2023/en/snmam.html> - Social Network and Media Analysis and Mining > > CONTACT > ---------------------------------- > > SIMBig 2023 General Chairs > > Juan Antonio Lossio-Ventura, National Institutes of Health, USA (juan.lossio(a)nih.gov <mailto:juan.lossio@nih.gov>) > Hugo Alatrista-Salas, Pontificia Universidad Católica del Perú, Peru (halatrista(a)pucp.pe <mailto:halatrista@pucp.pe>)

1 0

Postdoc in computational linguistics at Saarland University
by Alexander Koller 04 Aug '23

04 Aug '23

Research and teaching position in Computational Linguistics Department of Language Science and Technology Saarland University, Saarbrücken, Germany Start date: late 2023/early 2024 Contract duration: 3 years (can be extended) Payscale: E13 100% (postdoc) / E13 75% (PhD student) https://www.coli.uni-saarland.de/~koller/page.php?id=jobs We are looking to fill a research and teaching position in computational linguistics at the Department of Language Science and Technology at Saarland University. The position is part of the research group of Prof. Alexander Koller. It offers great flexibility in developing your own research and teaching agenda, and collaborations with other research groups are encouraged. The position is flexible with respect to topic, but it should connect thematically with current topics of interest to the research group. These include semantic parsing, reasoning with LLMs (e.g. planning, chain-of-thought), personalized dialogue and language generation, and the use of neurosymbolic models in NLP. You should have expertise in neural and/or linguistically principled methods in computational linguistics and be willing to take an active role in shaping the research and teaching environment of the department. The position includes a teaching load of up to four hours per week in the BSc Computational Linguistics (in German) and/or the MSc Language Science and Technology (in English). Both programs attract excellent and highly motivated students; it is not unusual for our students to publish papers at peer-reviewed conferences before graduation. The MSc students in particular are a very international crowd, with two thirds joining us from abroad. You will typically teach two seminars per semester on topics of your choice, which will allow you to motivate students to do BSc and MSc theses under your supervision. This is a position on the German TV-L E13 scale (100% position at the postdoc level; 75% position at the PhD student level). The starting salary of a 100% TV-L E13 position is a bit over 50,000 Euros per year and increases with experience. The initial appointment will be for three years; the position can be extended up to the limits of the German law for academic contracts (WissZeitVG). The starting date could be late 2023 or early 2024; we would be willing to adapt to the time requirements of an ideal candidate. Requirements We are looking for candidates who have finished, or are about to complete, an excellent PhD degree (at the postdoc level) or MSc degree (at the PhD student level) in computational linguistics, computer science, or a related discipline. You must be proficient in English (spoken and written); the ability to teach in German is a plus. The position is primarily intended for applicants at the postdoc level, who should have demonstrated their research expertise through high-quality publications. We will consider applicants at the PhD level in exceptional cases. About the department Saarland University is one of the leading centers for computational linguistics in Europe, and offers a dynamic and stimulating research environment. The Department of Language Science and Technology consists of about 100 research staff in nine research groups in the fields of computational linguistics, psycholinguistics, speech processing, and corpus linguistics. The department is a core member of the new Research Training Group "Neuroexplicit Models of Language, Vision, and Action", which is on track to grow into one of the largest centers for research on neurosymbolic models in NLP and other fields of AI in the world. It is also the centerpiece of the Collaborative Research Center 1102 "Information Density and Linguistic Encoding" and part of the Saarland Informatics Campus, which brings together computer science research at the university with world-class research institutions on campus, such as the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, and the German Research Center for Artificial Intelligence (DFKI). The Saarland Informatics Campus brings together 900 researchers and 2100 students from 81 countries; SIC faculty have won 36 ERC grants. Saarland University is located in Saarbrücken, a mid-sized city in the tri-border area of Germany, France, and Luxembourg. Saarbrücken combines a lively culture scene with a relaxed atmosphere, and is quite an affordable place to live in. Our department maintains an international and diverse work environment. The primary working language is English; learning German while you are here will make it easier to connect with the local culture, but is not necessary for your work. How to apply Please submit your application at http://apply.coli.uni-saarland.de/ak23. Preference will be given to applications received by 31 August 2023. Include a single PDF file with the following information: • a statement of research interests that motivates why you are applying for this position and outlines your research agenda; • a full CV including your list of publications; • scans of transcripts and academic degree certificates; • the names, affiliations, and e-mail addresses of two people who can provide letters of reference for you. Saarland University especially welcomes applications from women and people with disabilities. If you have further questions, please email Alexander Koller <koller(a)coli.uni-saarland.de>. Applications should _not_ be emailed to this address, but submitted through the online form.

1 0

Postdoc (full-time) in LLM Robustness and Safety at the University of Bonn, Germany
by Prof. Dr. Lucie Flek 03 Aug '23

03 Aug '23

We are expanding our core research team at the University of Bonn, looking for a Postdoctoral Researcher in Natural Language Processing and Machine Learning. (The position is offered at a full-time TV-L E13 level, corresponding to the gross salary of cca 57,000 Eur/year, covering health and social insurance and 30 vacation days. The duration of the contract is two years, with career growth opportunities afterwards.) This exciting opportunity is a chance to work on adversarial robustness, safety and explainability in machine learning, applied to modern LLMs. This position will involve close collaboration with the Lamarr Institute for Machine Learning and Artificial Intelligence (https://lamarr-institute.org/), the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) and the OpenGPT-X initiative (https://opengpt-x.de/). The candidate will have a central role in a project aiming to advance the state-of-the-art in the robustness and generalization capabilities of LLMs. Sounds like fun? Apply here: https://caisa-lab.github.io/resources/13-08-2023-postdoc-position.pdf You shall have a strong background in Computer Science with a specialization in Machine Learning or Natural Language Processing, and a corresponding publication record in global AI/ML/NLP venues. Demonstrable excellent python programming skills, e.g. through previous projects, and knowledge of current neural network models and implementation tools for neural networks (e.g., PyTorch) are expected. Our team consists of top researchers of varied backgrounds and cultures and we welcome applications from all appropriately qualified candidates worldwide! ____________________ Prof. Dr. Lucie Flek Data Science and Language Technologies Institut für Informatik / b-it Rheinische Friedrich-Wilhelms-Universität Bonn Friedrich-Hirzebruch-Allee 6 / 8, Raum 2.123 53115 Bonn, Germany Tel.: 0228-73-69200 flek(a)bit.uni-bonn.de

1 0

2025

2024

2023

2022

Corpora August 2023