Hi guys,
I am going to implement a summarization system in the medical domain in
Italian and Spanish. So I am looking for free summarization datasets both
in the public and medical domains in both languages.
Any help would be appreciated.
sincerely
Ciao
--
*Dr. Saeed Farzi,*
Faculty of Computer Engineering,
K. N. Toosi University of Technology, Tehran, Iran.
Phone: +98-21-8462450-401
Fax: +98-21-88462066
P.O. Box: 16315-1355,
Web: http://wp.kntu.ac.ir/saeedfarzi/
Lab: https://www.trlab.ir/
--
*Dr. Saeed Farzi,*
Faculty of Computer Engineering,
K. N. Toosi University of Technology, Tehran, Iran.
Phone: +98-21-8462450-401
Fax: +98-21-88462066
P.O. Box: 16315-1355,
Web: http://wp.kntu.ac.ir/saeedfarzi/
Lab: https://www.trlab.ir/
*** Apologies for Cross-Posting ***
The First Arabic Natural Language Processing Conference (ArabicNLP 2023)
co-located with EMNLP 2023 in Singapore.
What's in a name? To mark our move from a workshop to a conference, we
changed our acronym from WANLP to ArabicNLP.
Conference URL: <https://wanlp2023.sigarab.org/>
https://arabicnlp2023.sigarab.org/
Submission URL:
https://openreview.net/group?id=SIGARAB.org/ArabicNLP/2023/Conference
ArabicNLP 2023 invites the submission of original long, short, or demo
papers in the area of Arabic Natural Language Processing. ArabicNLP 2023
builds on seven previous workshop editions, which have been extremely
successful, drawing in a large active participation in various capacities.
This conference is timely given the continued rise in research projects
focusing on Arabic NLP. ArabicNLP 2023 will also feature shared tasks,
allowing participants to work on specific NLP challenges related to Arabic
language processing. The conference is organized by the Special Interest
Group on Arabic NLP (SIGARAB), an Association for Computational Linguistics
Special Interest Group on Arabic Natural Language Processing.
Important Dates
-
May 7, 2023: submission of shared tasks proposals
-
May 14, 2023: notification of acceptance of shared tasks
-
September 5, 2023: conference papers due date
-
October 12, 2023: notification of acceptance
-
October 20, 2023: camera-ready papers due
-
December 7, 2023: conference day
All deadlines are 11:59 pm UTC -12h
<https://www.timeanddate.com/time/zone/timezone/utc-12> (“Anywhere on
Earth”).
We accept long (up to 8 pages), short (up to 4 pages), and demo paper (up
to 4 pages) submissions. Long and short papers will be presented orally or
as posters as determined by the program committee.
Submissions are invited on topics that include, but are not limited to, the
following:
-
Enabling core technologies: language models and large language models,
morphological analysis, disambiguation, tokenization, POS tagging, named
entity detection, chunking, parsing, semantic role labeling, sentiment
analysis, Arabic dialect modeling, etc.
-
Applications: dialog modeling, machine translation, speech recognition,
speech synthesis, optical character recognition, pedagogy, assistive
technologies, social media, etc.
-
Resources: dictionaries, annotated data, corpora, etc.
Submissions may include work in progress as well as finished work.
Submissions must have a clear focus on specific issues pertaining to the
Arabic language whether it is standard Arabic, dialectal, classical, or
mixed. Papers on other languages sharing problems faced by Arabic NLP
researchers, such as Semitic languages or languages using Arabic script,
are welcome provided that they propose techniques or approaches that would
be of interest to Arabic NLP, and they explain why this is the case.
Additionally, papers on efforts using Arabic resources but targeting other
languages are also welcome. Descriptions of commercial systems are welcome,
but authors should be willing to discuss the details of their work.
If you have any questions, please contact us at:
<https://groups.google.com/u/1/>arabicnlp-pc-chairs(a)sigarab.org
The ArabicNLP 2023 Publicity Chairs,
Amr Keleg and Salam Khalifa
On 8/3/23, Toms Bergmanis <toms.bergmanis(a)tilde.lv> wrote:
...
I, for one, have benefited from Ada's, as well as other member's
suggestions and comments as I hope they have somehow benefited from
mine.
lbrtchx
1st Call for Papers: Special Issue of the Computational Linguistics journal
on Language Learning, Representation, and Processing in Humans and
MachinesGuest
Editors
Marianna Apidianaki (University of Pennsylvania)
Abdellah Fourtassi (Aix Marseille University)
Sebastian Padó (University of Stuttgart)
*Submission deadline: December, 10*
Large language models (LLMs) acquire rich world knowledge from the data
they are exposed to during training, in a way that appears to parallel how
children learn from the language they hear around them. Indeed, since the
introduction of these powerful models, there has been a general feeling
among researchers in both NLP and cognitive science that a systematic
understanding of how these models work and how they use the knowledge they
encode, would shed light on the way humans acquire, represent, and process
this same knowledge (and vice versa).
Yet, despite the similarities, there are important differences between
machines and humans that have prevented a direct translation of insights
from the analysis of LLMs to a deeper understanding of human learning.
Chief among these differences is that the size of data required to train
LLMs far exceeds -- by several orders of magnitude -- the data children
need to acquire sophisticated conceptual structures and meanings. Besides,
the engineering-driven architectures of LLMs do not appear to have obvious
equivalents in children's cognitive apparatus, at least as studied by
standard methods in experimental psychology. Finally, children acquire
world knowledge not only via exposure to language but also via sensory
experience and social interaction.
This edited volume aims to create a forum of exchange and debate between
linguists, cognitive scientists and experts in deep learning, NLP and
computational linguistics, on the broad topic of learning in humans and
machines. Experts from these communities can contribute with empirical and
theoretical papers that advance our understanding of this question.
Submissions might address the acquisition of different types of linguistic
and world knowledge. Additionally, we invite contributions that
characterize and address challenges related to the mismatch between humans
and LLMs in terms of the size and nature of input data, and the involved
learning and processing mechanisms.
Topics include, but are not limited to:
- Grounded learning: comparison of unimodal (e.g., text) vs multimodal
(e.g., images and video) learning.
- Social learning: comparison of input-driven mechanisms vs.
interaction-based learning.
- Exploration of different knowledge types (e.g., procedural /
declarative); knowledge integration and inference in LLMs.
- Methods to characterize and quantify human-like language learning or
processing in LLMs.
- Interpretability/probing methods addressing the linguistic and world
knowledge encoded in LLM representations.
- Knowledge enrichment methods aimed at improving the quality and
quantity of the knowledge encoded in LLMs.
- Semantic representation and processing in humans and machines in terms
of, e.g., abstractions made, structure of the lexicon, property inheritance
and generalization, geometrical approaches to meaning representation,
mental associations, and meaning retrieval.
- Bilingualism in humans and machines; second language acquisition in
children and adults; construction of multi-lingual spaces and cross-lingual
correspondences.
- Exploration of language models that incorporate cognitively plausible
mechanisms and reasonably-sized training data.
- Use of techniques from other disciplines (e.g., neuroscience or
computer vision) for analyzing and evaluating LLMs.
- Open-source tools for analysis, visualization, or explanation.
Submission Instructions
Papers should be formatted according to the Computational Linguistics style
guidelines: https://cljournal.org/
We accept both long and short papers. Long papers are between 25 and 40
journal pages in length; short papers are between 15 and 25 pages in length.
Papers for this special issue will be submitted through the CL electronic
submission system, just like regular papers:
https://cljournal.org/submissions.html
Authors of special issue papers will need to select ‟Special Issue on LLRP‟
under the Journal Section heading in the CL submission system. Please note
that papers submitted to a special issue undergo the same reviewing process
as regular papers.
Timeline
Deadline for submissions : December, 10 2023
Notification after 1st round of reviewing : February, 10 2024
Revised versions of the papers : April, 30 2024
Final decisions : June, 10 2024
Final version of the papers : July, 1 2024Guest Editors
Marianna Apidianaki
marapi(a)seas.upenn.edu
Abdellah Fourtassi
abdellah.fourtassi(a)gmail.com
Sebastian Padó
pado(a)ims.uni-stuttgart.de
*Computational Linguistics* is the longest-running flagship journal of the
Association for Computational Linguistics. The journal has a high impact
factor: 9.3 in 2022 and 7.778 in 2021. Average time to first decision of
regular papers and full survey papers (excluding desk rejects) is 34 days
for the period January to May 2023, and 47 days for the period January to
December 2022.
--
This email was sent from my smartphone. Forgive the brevity, the typos, and
the lack of nuance.
(apologies for cross-posting)
*-----Workshop for NLP Open Source Software (NLP-OSS)*
06 Dec 2023, Co-located with EMNLP 2023
https://nlposs.github.io/
Deadline for Long and Short Paper submission: *09 August, 2023 (23:59,
GMT-11)*
-----
You have tried to use the latest, bestest, fastest LLM models and bore
grievances but found the solution after hours of coffee and computer
staring. Share that at NLP-OSS and suggest how open source could change for
the better (e.g. best practices, documentation, API design etc.)
You came across an awesome SOTA system on NLP task X that no LLM has beaten
its F1 score. But now the code is stale and it takes a dinosaur to
understand the code. Share your experience at NLP-OSS and propose how to
"replicate" these forgotten systems.
You see this shiny GPT from a blog post, tried it to reproduce similar
results on a different task and it just doesn't work on your dataset. You
did some magic to the code and now it works. Show us how you did it! Though
they're small tweaks, well-motivated and empirically tested are valid
submissions to NLP-OSS.
You have tried 101 NLP tools and there's none that really do what you want.
So you wrote your own shiny new package and made it open source. Tell us
why your package is better than the existing tools. How did you design the
code? Is it going to be a one-time thing? Or would you like to see
thousands of people using it?
You have heard enough of open-source LLM and pseudo-open-source GPT but not
enough about how it can be used for your use-case or your commercial
product at scale. So you contacted your legal department and they explained
to you about how data, model and code licenses work. Sharing the knowledge
with the NLP-OSS community.
You have a position/opinion to share about free vs open vs closed source
LLMs and have valid arguments, references or survey/data to support your
position. We would like to hear more about it.
At last, you've found the avenue to air these issues in an academic
platform at the NLP-OSS workshop!!!
Sharing your experiences, suggestions and analysis from/of NLP-OSS
----
P/S: 2nd Call for Paper
*Workshop for NLP Open Source Software (NLP-OSS)*
06 Dec 2023, Co-located with EMNLP 2023
https://nlposs.github.io/
Deadline for Long and Short Paper submission: 09 August, 2023 (23:59,
GMT-11)
------------------------------
The Third Workshop for NLP Open Source Software (NLP-OSS) will be
co-located with EMNLP 2023 on 06 Dec 2023.
Focusing more on the social and engineering aspect of NLP software and less
on scientific novelty or state-of-art models, the Workshop for NLP-OSS is
an academic forum to advance open source developments for NLP research,
teaching and application.
NLP-OSS also provides an academic workshop to announce new
software/features, promote the collaborative culture and best practices
that go beyond the conferences.
We invite full papers (8 pages) or short papers (4 pages) on topics related
to NLP-OSS broadly categorized into (i) software development, (ii)
scientific contribution and (iii) NLP-OSS case studies.
-
*Software Development*
- Designing and developing NLP-OSS
- Licensing issues in NLP-OSS
- Backwards compatibility and stale code in NLP-OSS
- Growing, maintaining and motivating an NLP-OSS community
- Best practices for NLP-OSS documentation and testing
- Contribution to NLP-OSS without coding
- Incentivizing OSS contributions in NLP
- Commercialization and Intellectual Property of NLP-OSS
- Defining and managing NLP-OSS project scope
- Issues in API design for NLP
- NLP-OSS software interoperability
- Analysis of the NLP-OSS community
-
*Scientific Contribution*
- Surveying OSS for specific NLP task(s)
- Demonstration, introductions and/or tutorial of NLP-OSS
- Small but useful NLP-OSS
- NLP components in ML OSS
- Citations and references for NLP-OSS
- OSS and experiment replicability
- Gaps between existing NLP-OSS
- Task-generic vs task-specific software
-
*Case studies*
- Case studies of how a specific bug is fixed or feature is added
- Writing wrappers for other NLP-OSS
- Writing open-source APIs for open data
- Teaching NLP with OSS
- NLP-OSS in the industry
Submission should be formatted according to the EMNLP 2023 templates
<https://2023.emnlp.org/call-for-papers> and submitted to OpenReview
<https://openreview.net/group?id=EMNLP/2023/Workshop/NLP-OSS>
ORGANIZERS
Geeticka Chauhan, Massachusetts Institute of Technology
Dmitrijs Milajevs, Grayscale AI
Elijah Rippeth, University of Maryland
Jeremy Gwinnup, Air Force Research Laboratory
Liling Tan, Amazon
Toms
No, not my arrogance, but my expertise is outstanding.
To my background:
before my graduate-level theoretical linguistics curriculum (Chomskyan
lineage) in the 1990s [1], I'd spent about 1-2 decades in multilingual,
international environments, making keen observations and reflections on
various language, cultural/social phenomena. After graduation, I have
traveled to all 7 continents to continue with my linguistic and
philosophical observations and learning (aka fieldwork). I have studied in
3 continents and had about 5 rounds of graduate training [2]. I have
learned about 10+(?) languages/varieties
(EN,ZH,FR,ES,RU,DE,LA,NL,IT,JA,ASL) and dabbled on a few other more
(Sanskrit, Ancient Greek, AR...) and did fieldwork on/with a couple more
(Zapotec, various varieties of PNG, Tok Pisin...) --- I mean, I don't
remember much/any of these, it's been quite a while... now that I am/was
(?) [3] about to retire. In the 2000s, I revisited "Linguistics proper"
from another perspective, including but not limited to what you may know as
"NLP" nowadays. I did not start publishing until the 2010s, so I assume
that's when one might have become familiar with my work (assuming that they
have read it).
[1] doing original research, including but not limited to something similar
to G2P work, and on what one could consider as writing pseudo code for
computational systems (so, Computational Linguistics)
[2] So to have finally come up with the experimental results I did and to
have figured out what "language complexity" (in both the context of
computing and not) as well as various other DL/NN phenomena were all about,
I surely think it deserves a celebration!
[3] Recent happenings seem to suggest that I should stay in the arena to
keep an eye out on things. Indeed, as you might have liked to suggest,
there are plenty of "NLP practitioners" out there nowadays who think they
are qualified to work on "language" just because they can speak one. But if
you need to pick a battle about that with me, I'm afraid you might have
picked the wrong person.
So I hope you could consider me as "not-a-noob". Being a woman in STEM/tech
can be hard, but I didn't realize how little of a benefit of a doubt some
choose to afford. I write this because not only of the tone of your
inquiry, but also because of it is hard to not take offense with what you
actually wrote, including this "yet that did not put you off from writing
bogus papers on machine translation". Which part of my work do you regard
as "bogus papers on MT"?
***
Re ""The priority of my communications here is to clarify the part on the
scientific front, to make sure that if one happens to have gotten oneself
involved in this space, how one can come to more clarity on the status quo,
esp. given my results."
Is this about your "results" in that one paper...":
No, it is not only about:
i. my results in "Fairness in representation for multilingual NLP" (
https://openreview.net/forum?id=-llS6TiOew or
https://drive.google.com/file/d/1eKbhdZkPJ0HgU1RsGXGFBPGameWIVdt9/view?pli=1),
or
ii. those in "Representation and Bias in Multilingual NLP" (
https://openreview.net/forum?id=dKwmCtp6YI), or
iii. some of the remarks on language matters and current practices in the
"language space" I've tweeted --- some of which further explain my solution
to language complexity, some advance arguments from traditional academic
debates, some on meta-theoretical and transdisciplinary development, some
on possible future directions... etc.,
but also
iv. some of the dependencies or potential impact of my results, if one were
to take their work with responsibility and integrity.
To better understand the *impact* of my results (as in, why they are
important), it'd be helpful for one to have knowledge/experience of the
language space (or the "language enterprise" as Noam sometimes
refers/referred to it) --- to understand the development and the
dependencies (and the lack thereof) between Linguistics, Computational
Linguistics, Natural Language Processing, Computing or Computer Science
and/or Computational Sciences, Statistics/Mathematics, Social Sciences,
Information Theory, Philosophy... etc. [4] (both as academic disciplines as
well as as sciences on their own).
[4] because *language*! (But if I had to opine on which areas are to be
impacted most directly with my findings, as in paradigm-shifting type of
impact, I'd probably say/write the first 3 or 4, i.e. Lx, CL, NLP,
CS/CS-related.)
Re "For anyone wanting to continue this discussion, I strongly recommend
reading Ada's work, so you have an informed opinion about what evidence she
is referring to.":
Thanks for your help in promoting my work. Yes, I think it'd be helpful for
everyone to read my work, whether they'd like to partake in a conversation
on it or not. Please feel free to send me any questions you may have ---
it's been a few really intense years for me. I don't always know if/how my
writing has been understood. I'd be grateful for your feedback.
Best
Ada
On Thu, Aug 3, 2023 at 5:42 PM Toms Bergmanis <toms.bergmanis(a)tilde.lv>
wrote:
> Ada,
>
>
>
> "it is not the right time right now to be "campy" about (as in, to be
> arguing/protesting for) "grammar", at the moment, esp. if you do not have a
> background in Linguistics."
>
> Your arrogance is outstanding. I will ask again, as I have asked before -
> what background do you have? Last time I checked, I could not find any
> evidence of your background in NLP, yet that did not put you off from
> writing bogus papers on machine translation.
>
>
>
> "The priority of my communications here is to clarify the part on the
> scientific front, to make sure that if one happens to have gotten oneself
> involved in this space, how one can come to more clarity on the status quo,
> esp. given my results."
>
> Is this about your "results" in that one paper evaluating which data
> representation is better in machine translation without actually
> considering machine translation quality? It sounds like something that
> everyone should read before engaging in a debate with you.
>
>
>
> For anyone wanting to continue this discussion, I strongly recommend
> reading Ada's work, so you have an informed opinion about what evidence she
> is referring to.
>
> Sincerely,
>
> Toms Bergmanis
> ------------------------------
>
> *From:* Ada Wan via Corpora <corpora(a)list.elra.info>
> *Sent:* Wednesday, August 2, 2023 7:17:42 PM
> *To:* Albretch Mueller <lbrtchx(a)gmail.com>
> *Cc:* corpora <corpora(a)list.elra.info>
> *Subject:* [Corpora-List] Re: Any literature about tensors-based corpora
> NLP research with actual examples (and homework ;-)) you would suggest? ...
>
>
>
> Re RML or any "text technologies" leveraging "grammar" (misnomer or not):
>
> it is not the right time right now to be "campy" about (as in, to be
> arguing/protesting for) "grammar", at the moment, esp. if you do not have a
> background in Linguistics.
>
> There has been quite some abuse/misconduct with concepts/units/assumptions
> such as "words", "sentences", and "grammar" in the language space (with or
> without computational implementation).
>
>
>
> The priority of my communications here is to clarify the part on the
> scientific front, to make sure that if one happens to have gotten oneself
> involved in this space, how one can come to more clarity on the status quo,
> esp. given my results. There is a lot that needs to be re-evaluated and
> re-interpreted. Simply stating that something might have been useful in the
> past is not going to be helpful with going forward.
>
>
>
> If one is working in technologies with language/text data (e.g., in a
> user-based format/framework, and not working on "grammar" as a
> "linguistic"/philological pursuit), it is recommended that the name(s) of
> such technologies get updated --- if "grammar" [1] does not have to be
> mentioned or be involved, don't.
>
> [1] or, including but not limited to any of the following: "word",
> "sentence", "linguistic structure(s)", "meaning", "morphology", "syntax",
> "parsing", various terms related to parts of speech (e.g. "nouns",
> "verbs")....
>
>
>
> Re "BTW, regarding that "parsing" aspect, what is the term used to
> describe the gradual process of "terminological inception"?":
>
> conceptualization? Coining of terms?
>
> According to me, "lexical priming" is different from "terminological
> inception".
>
>
>
> Re "How could you clarified intersubjectivity?":
>
> https://en.wikipedia.org/wiki/Intersubjectivity :)
>
> Your question is way too broad, or requires an answer that is such, which
> I cannot entertain at the moment.
>
>
>
> Thanks for sharing your perspectives. I must admit I have not had time to
> digest all of your points. But this impression recurred in me as I was
> reading them:
>
> sometimes, I sense that when one claims some concepts are not universal
> (e.g. the ones mentioned in [1] above), others take it as that all concepts
> are categorically invalid. That is not what I intended to communicate (with
> all my papers, scientific work, and my comments here). It is an expert
> opinion/finding that I shared, upon some careful evaluation.
>
>
>
>
>
> On Tue, Aug 1, 2023 at 10:26 PM Albretch Mueller via Corpora <
> corpora(a)list.elra.info> wrote:
>
> On 7/31/23, Ada Wan <adawan919(a)gmail.com> wrote:
> > That having been expressed, here are a couple of points re RML that one
> should pay heed:
> > i. to what extent and in what context is this a technology relevant?
>
> If you were able to device an algorithm which taking as input only NL
> texts (composed of: 1) a start (semantic end); b) a sequence of
> characters from a relatively large and representative text bank; c) an
> end (a semantic start)) is able to exhaustively "deduce" the grammar
> of such texts, in addition to being able to use it with any language,
> you would then:
>
> 1) have defined a "space"/"coordinate system" for those texts, to
> frame (pretty much) all possible "meaningful 'points'"/"phrases" in
> terms of such grammar, which would also;
> 2) be a 0-search structure describing the text bank/corpus (every
> text segment would also become a pointer to every single actualization
> of that very segment in all texts, no more "n-grams" necessary!),
> which could;
> 3) be used with minimal turking/supervision to:
> 3.1) cleanse up all automatic translations from youtube;
> 3.2) keep multilingual corpora;
> 3.3) use it for automatic translations (demonstrably, in an almost
> foolproof, perfect way, since you always have the words/phrases with
> their context);
> 3.4) "cosmic/tree reading": instead reading books/sequences of
> characters, you would read that text as it relates to all other texts
> from the same topic;
> 3.5) parsing: you would keep a corpus of what you know so you wont
> have to reread about certain topics and aspects you already know
> (great Lord! how I hate reading a whole book to only find a few, at
> times marginal, sentences worth reading! or that "youthful" thing of
> thinking that they just discovered/created an idea because they are
> just verbalizing it or made a movie about it!) BTW, regarding that
> "parsing" aspect, what is the term used to describe the gradual
> process of "terminological inception"? I have heard the term
> "Adamization", but, even though that word doesn't really rub me the
> wrong way, I could imagine it is "too sexist" to some people. I
> wouldn't really care calling it Eveization or "pussyfication" or
> whatever. I just don't want to use the term that the government uses:
> "lexical priming" and "terminological inception" sounds too cumbersome
> as a verb: "terminologically incept"? doesn't sound OK in English;
> 3.6) of course, an easy application of that contextual parsing would
> be removing all that js crap and ads before they reach your awareness;
> ...
> 3.n) not last and definitely not least I am thinking hard about how
> to make sure police and politicians at least have a hard time while
> using what I have described to "freedom love" people (I know, I know,
> ... "3.n" doesn't "technically" pertain to quality of implementation
> issues ..., but I, for one, disagree. Giving the "all tangible things"
> (tm) panopticon in which we are all living these days, each of us in
> one's own "virtual prison cell" to call it somehow, we should also
> think about, be openly honest about such matters)
>
> I am working right now on such Leibnizian "characteristica
> universalis" kind of thing. First cleansing approx. 1.2 million texts
> mostly from archive.org, *.pub and the NYS Regents exams
> (nysedregents.org + nysl.ptfs.com) which they have, at least
> partially, translated to more than 10 languages. Is that relevant
> enough to you? ;-) I am also being quite selfish about it because I
> have always dreamed of being able to "read"/mind all texts which have
> ever been written in the same way that teens think they have to have
> sex with everybody in town to make sense of things.
>
> > ii. one can certainly dissect/decompose texts ...
>
> Computing power has become insanely cheap, but it has also enabled
> too much "cleverhansing" out there. The Delphic phrase: "you can make
> sense or money" these times translates as some sort of corollary to:
> "using computers and then thinking about it makes you smart"; but,
> does it really?
>
> It amazes me how easily you can "dissect"/"decompose texts", talk
> about "tensors", "vectors", ... (I am not trying to police language
> usage, it just amazes me); let alone all the insufferable bsing claims
> by the "Artificial Intelligentsia".
>
> I would go with one character after the other and an open attempt to
> use the minimal amount of principles to then see what I get. IMO, when
> you start getting too smart about what you do, of course, you will
> "see" how smart you are. The poet in me likes Borges' stanzas: "... el
> nombre es arquetipo de la cosa, en las letras de 'rosa' está la rosa y
> todo el Nilo en la palabra 'Nilo'" ("its name is a thing's archetype,
> in the letters of 'rose' is the rose and the whole of the Nile (river)
> in the word 'Nile'")
>
> > II. Re ""magical" in the sense that when we go about our intersubjective
> business": some intersubjectivity can be further clarified. I don't see
> much of your examples as being "magical".
>
> I actually do! How could you clarified intersubjectivity? I am trying
> to do so (somewhat) Mathematically (to the extent you could). Could
> you share any papers, "prior art" on such matters?
>
> > ii. "other people may read, mind, as well ...;": so?
>
> which is a good thing it is alright, fine and dandy in the hippie way, I
> meant.
>
> > iii. "Alice bought some veggies from Bob, ...)": this I don't understand.
> > iv. "We see more in money ("words", ...) than just a piece of paper"
>
> iii. and iv. overlap to some extent so I will try to explain them
> both quickly (which is impossible since you can write philosophies
> about each line, but there I'll go). To understand what Marx (may
> have) meant by „gesellschaftlich notwendige Arbeit” ("socially
> necessary labour time", wording which has made quite a few go berserk
> ever since):
>
> https://en.wikipedia.org/wiki/Socially_necessary_labour_time
>
> https://en.wikipedia.org/wiki/Transformation_problem
>
> you have to understand the basic mathematical concepts of:
>
> a) combined rates, and
> b) intratextual systems of linear equations
>
> Based on my teaching experience §b is easier to understand. Sorry I
> couldn't find an "easier" explanation on youtube of that type of SLEs
> than the one I used with my students preparing for the Regents:
>
> https://ergosumus.files.wordpress.com/2018/10/sle04-en.pdf
>
> the intratextuality of those problems matter to corpora research
> because different strata of "like terms" ("verbs", "adjectives", ...)
> is what creates grammar. "Crazy me" thinks you could to some extent
> describe the "likeness of terms" underlying grammar!
> ~
> I also have a guideline about combined rates which I successfully
> used with my students:
>
> https://ergosumus.files.wordpress.com/2018/06/word_problems12-en00.pdf
> ~
> What the eff do combined rates and SLEs have to do with Marx'
> transformation problem? ;-)
>
> Well, notice that the -equitable aspect- used to solve combined rates
> problems is the time (regardless of how differently fast one "works"
> in comparison with others). There is also another type of combine rate
> problems: you drive to some place with a friend who doesn't care about
> driving fast, but you need to rest so she drives for a while ... that
> problem is different from two people meeting at a place each driving
> "on their own cars" (at their own average speed).
>
> Serge Heiden shared a paper about presidential debates which could be
> also Mathematically studied as a CR kind of problem (even if
> politicians as the crowd management clowns they all are don't have to
> make sense, anyway), but as it happens with any dialogue there are
> parts of the conversations in which both the cars and the time is
> shared and other times when only (or more of) the time. I don't know
> of a general Mathematical formulation to CRs kinds of problems, which
> could be used for corpora research. On my "to do" list I have writing
> papers studying Euclid's Elements and Plato's Dialogues in that way.
>
> Karl Marx's as part of his „Wertgesetz der Waren” (reChristened in
> English as "labor theory of value") somewhat metaphorically stated
> that the exchange value of a commodity is a function of "society's
> labour-time". He also rendered his ideas as equations (in more of a
> verbally descriptive, metaphorical way), but that phrase: "society's
> labour-time", was and is still found from questionable to
> unfalsifiably wild. I don't claim to have mind reading powers, but I
> think in his letter to his friend Ludwig Kugelmann, the thoroughgoing
> Hegelian Marx was, he clearly explained what he meant (page: 222 in
> file, 208 in book):
>
>
> https://archive.org/download/marxengelsselectedcorrespondence/Marx%20%26%20…
>
> Marx To Ludwig Kugelmann In Hanover London, July 11, 1868:
> All that palaver about the necessity of proving the concept of value
> comes from complete ignorance both of the subject dealt with and of
> scientific method. Every child knows that a nation which ceased to
> work, I will not say for a year, but even for a few weeks, would
> perish. Every child knows, too, that the masses of products
> corresponding to the different needs require different and
> quantitatively determined masses of the total labour of society. That
> this necessity of the distribution of social labour in definite
> proportions cannot possibly be done away with by a particular form of
> social production but can only change the mode of its appearance, is
> self-evident. No natural laws can be done away with. What can change
> in historically different circumstances is only the form in which
> these laws assert themselves. And the form in which this proportional
> distribution of labour asserts itself, in a state of society where the
> interconnection of social labour is manifested in the private exchange
> of the individual products of labour, is precisely the exchange value
> of these products.
> ~
> So, as I see it, in a Hegelian way, Marx was seeing the whole of
> society as a corpus (in which we all live through our own
> texts/narratives) talking about "socially necessary labour time" in
> the way that "time" becomes the equitable aspect shared when
> people/(-society as a whole-) work together as described by combined
> rates kinds of problems.
>
> When "Alice buys some veggies from Bob, ..." she used money as
> "equitable aspect" to get Bob's veggies (in the Marxian way they were
> both part of a combine rates problem) and you tell me this is not
> magical!
>
> > v. "some transactional electronic ("air"...) excitations": I don't get
> this.
>
> you may pay with cash using coins or bills or using your debit card
> which at the end of the day become transactional electronic
> excitations on some hard drives. When you speak there is more to it
> than vibrations/fluctuations of air. (I am referring to the medium
> which Saussurean signifiers use)
>
> > vi. "your 'magic' and mine are different we are still able to
> 'communicate'. How on earth do such things happen?": a disclaimer: I am not
> using any magic in my attempts to communicate with you here. I try my best
> to place myself in your shoes to guesstimate the points that you are trying
> to get across. But many (as you can see above) didn't quite reach me.
>
> "I try my best to place myself in your shoes" ... ;-) Ha, ha, ha!
> that is just a functional illusion. What do you know about "my shoes"?
> I work as a gardener (which I love to do) so they are dirty and
> smelly, ... I also love to eat garlic ... As I see things standing on
> "my dirty and smelly shoes and voicing it from my garlicky mouth"
> being honest and true to matters is good enough.
>
> lbrtchx
> _______________________________________________
> Corpora mailing list -- corpora(a)list.elra.info
> https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
> To unsubscribe send an email to corpora-leave(a)list.elra.info
>
>
>
> [Apologies for cross-posting]
> ======================================================================
> FINAL DEADLINE OF PAPER SUBMISSION - **AUGUST 10**
> ======================================================================
>
> SIMBig 2023 - 10th International Conference on Information Management and Big Data
> Where: Instituto Politécnico Nacional, Mexico DF, MEXICO
> When: October 18 - 20, 2023
> Website: https://simbig.org/SIMBig2023/
>
> ======================================================================
>
> OVERVIEW
> ----------------------------------
>
> SIMBig 2023 seeks to present new methods of Artificial Intelligence (AI), Data Science, Machine Learning, Natural Language Processing, Semantic Web, and related fields, for analyzing, managing, and extracting insights and patterns from large volumes of data.
>
>
> KEYNOTE SPEAKERS (to be confirmed)
> ----------------------------------
>
 Mona Diab, Meta AI, USA
 Huan Liu, Arizona State University, USA
>
> and more to be announced soon...
>
> IMPORTANT DATES
> ----------------------------------
>
> July 24, 2023 August 10, 2023 --> Full papers and short papers due
> August 28, 2023 --> Notification of acceptance
> September 10, 2023 --> Camera-ready versions
> October 18 - 20, 2023 --> Conference held in Mexico DF, Mexico
>
> PUBLICATION
> ----------------------------------
>
> All accepted papers of SIMBig 2023 (tracks including) will be published with Springer CCIS Series <https://www.springer.com/series/7899> (to be confirmed).
>
> Best papers of SIMBig 2023 (tracks including) will be selected to submit an extension to be published in the Springer SN Computer Science Journal. <https://www.springer.com/journal/42979>
 
> TOPICS OF INTEREST
> ----------------------------------
>
> SIMBig 2023 has a broad scope. We invite contributions on theory and practice, including but not limited to the following technical areas:
>
> Artificial Intelligence
> Big/Masive Data
> Data Science
> Machine Learning
> Deep Learning
> Natural Language Processing
> Semantic Web
> Data-driven Software Engineering
> Data-driven software adaptation
> Healthcare Informatics
> Biomedical Informatics
> Data Privacy and Security
> Information Retrieval
> Ontologies and Knowledge Representation
> Social Networks and Social Web
> Information Visualization
> OLAP and Business intelligence
> Crowdsourcing
>
> SPECIAL TRACKS
> ----------------------------------
>
> SIMBig 2023 proposes six special tracks in addition to the main conference:
>
> ANLP <https://simbig.org/SIMBig2023/en/anlp.html> - Applied Natural Language Processing
> DISE <https://simbig.org/SIMBig2023/en/dise.html> - Data-Driven Software Engineering
> EE-AI-HPC <https://simbig.org/SIMBig2023/en/eeaihpc.html> - Efficiency Enhancement for AI and High-Performance Computing
> SNMAM <https://simbig.org/SIMBig2023/en/snmam.html> - Social Network and Media Analysis and Mining
>
> CONTACT
> ----------------------------------
>
> SIMBig 2023 General Chairs
>
> Juan Antonio Lossio-Ventura, National Institutes of Health, USA (juan.lossio(a)nih.gov <mailto:juan.lossio@nih.gov>)
> Hugo Alatrista-Salas, Pontificia Universidad Católica del Perú, Peru (halatrista(a)pucp.pe <mailto:halatrista@pucp.pe>)
Research and teaching position in Computational Linguistics
Department of Language Science and Technology
Saarland University, Saarbrücken, Germany
Start date: late 2023/early 2024
Contract duration: 3 years (can be extended)
Payscale: E13 100% (postdoc) / E13 75% (PhD student)
https://www.coli.uni-saarland.de/~koller/page.php?id=jobs
We are looking to fill a research and teaching position in computational linguistics at the Department of Language Science and Technology at Saarland University. The position is part of the research group of Prof. Alexander Koller. It offers great flexibility in developing your own research and teaching agenda, and collaborations with other research groups are encouraged.
The position is flexible with respect to topic, but it should connect thematically with current topics of interest to the research group. These include semantic parsing, reasoning with LLMs (e.g. planning, chain-of-thought), personalized dialogue and language generation, and the use of neurosymbolic models in NLP. You should have expertise in neural and/or linguistically principled methods in computational linguistics and be willing to take an active role in shaping the research and teaching environment of the department.
The position includes a teaching load of up to four hours per week in the BSc Computational Linguistics (in German) and/or the MSc Language Science and Technology (in English). Both programs attract excellent and highly motivated students; it is not unusual for our students to publish papers at peer-reviewed conferences before graduation. The MSc students in particular are a very international crowd, with two thirds joining us from abroad. You will typically teach two seminars per semester on topics of your choice, which will allow you to motivate students to do BSc and MSc theses under your supervision.
This is a position on the German TV-L E13 scale (100% position at the postdoc level; 75% position at the PhD student level). The starting salary of a 100% TV-L E13 position is a bit over 50,000 Euros per year and increases with experience. The initial appointment will be for three years; the position can be extended up to the limits of the German law for academic contracts (WissZeitVG). The starting date could be late 2023 or early 2024; we would be willing to adapt to the time requirements of an ideal candidate.
Requirements
We are looking for candidates who have finished, or are about to complete, an excellent PhD degree (at the postdoc level) or MSc degree (at the PhD student level) in computational linguistics, computer science, or a related discipline. You must be proficient in English (spoken and written); the ability to teach in German is a plus.
The position is primarily intended for applicants at the postdoc level, who should have demonstrated their research expertise through high-quality publications. We will consider applicants at the PhD level in exceptional cases.
About the department
Saarland University is one of the leading centers for computational linguistics in Europe, and offers a dynamic and stimulating research environment. The Department of Language Science and Technology consists of about 100 research staff in nine research groups in the fields of computational linguistics, psycholinguistics, speech processing, and corpus linguistics.
The department is a core member of the new Research Training Group "Neuroexplicit Models of Language, Vision, and Action", which is on track to grow into one of the largest centers for research on neurosymbolic models in NLP and other fields of AI in the world. It is also the centerpiece of the Collaborative Research Center 1102 "Information Density and Linguistic Encoding" and part of the Saarland Informatics Campus, which brings together computer science research at the university with world-class research institutions on campus, such as the Max Planck Institute for Informatics, the Max Planck Institute for Software Systems, and the German Research Center for Artificial Intelligence (DFKI). The Saarland Informatics Campus brings together 900 researchers and 2100 students from 81 countries; SIC faculty have won 36 ERC grants.
Saarland University is located in Saarbrücken, a mid-sized city in the tri-border area of Germany, France, and Luxembourg. Saarbrücken combines a lively culture scene with a relaxed atmosphere, and is quite an affordable place to live in. Our department maintains an international and diverse work environment. The primary working language is English; learning German while you are here will make it easier to connect with the local culture, but is not necessary for your work.
How to apply
Please submit your application at http://apply.coli.uni-saarland.de/ak23. Preference will be given to applications received by 31 August 2023.
Include a single PDF file with the following information:
• a statement of research interests that motivates why you are applying for this position and outlines your research agenda;
• a full CV including your list of publications;
• scans of transcripts and academic degree certificates;
• the names, affiliations, and e-mail addresses of two people who can provide letters of reference for you.
Saarland University especially welcomes applications from women and people with disabilities.
If you have further questions, please email Alexander Koller <koller(a)coli.uni-saarland.de>. Applications should _not_ be emailed to this address, but submitted through the online form.
We are expanding our core research team at the University of Bonn, looking
for a Postdoctoral Researcher in Natural Language Processing and Machine
Learning.
(The position is offered at a full-time TV-L E13 level, corresponding to
the gross salary of cca 57,000 Eur/year, covering health and social
insurance and 30 vacation days. The duration of the contract is two years,
with career growth opportunities afterwards.)
This exciting opportunity is a chance to work on adversarial robustness,
safety and explainability in machine learning, applied to modern LLMs. This
position will involve close collaboration with the Lamarr Institute for
Machine Learning and Artificial Intelligence (https://lamarr-institute.org/),
the Fraunhofer Institute for Intelligent Analysis and Information Systems
(IAIS) and the OpenGPT-X initiative (https://opengpt-x.de/). The candidate
will have a central role in a project aiming to advance the
state-of-the-art in the robustness and generalization capabilities of LLMs.
Sounds like fun? Apply here:
https://caisa-lab.github.io/resources/13-08-2023-postdoc-position.pdf
You shall have a strong background in Computer Science with a
specialization in Machine Learning or Natural Language Processing, and a
corresponding publication record in global AI/ML/NLP venues. Demonstrable
excellent python programming skills, e.g. through previous projects, and
knowledge of current neural network models and implementation tools for
neural networks (e.g., PyTorch) are expected.
Our team consists of top researchers of varied backgrounds and cultures and
we welcome applications from all appropriately qualified candidates
worldwide!
____________________
Prof. Dr. Lucie Flek
Data Science and Language Technologies
Institut für Informatik / b-it
Rheinische Friedrich-Wilhelms-Universität Bonn
Friedrich-Hirzebruch-Allee 6 / 8, Raum 2.123
53115 Bonn, Germany
Tel.: 0228-73-69200
flek(a)bit.uni-bonn.de
On 7/31/23, Ada Wan <adawan919(a)gmail.com> wrote:
> That having been expressed, here are a couple of points re RML that one should pay heed:
> i. to what extent and in what context is this a technology relevant?
If you were able to device an algorithm which taking as input only NL
texts (composed of: 1) a start (semantic end); b) a sequence of
characters from a relatively large and representative text bank; c) an
end (a semantic start)) is able to exhaustively "deduce" the grammar
of such texts, in addition to being able to use it with any language,
you would then:
1) have defined a "space"/"coordinate system" for those texts, to
frame (pretty much) all possible "meaningful 'points'"/"phrases" in
terms of such grammar, which would also;
2) be a 0-search structure describing the text bank/corpus (every
text segment would also become a pointer to every single actualization
of that very segment in all texts, no more "n-grams" necessary!),
which could;
3) be used with minimal turking/supervision to:
3.1) cleanse up all automatic translations from youtube;
3.2) keep multilingual corpora;
3.3) use it for automatic translations (demonstrably, in an almost
foolproof, perfect way, since you always have the words/phrases with
their context);
3.4) "cosmic/tree reading": instead reading books/sequences of
characters, you would read that text as it relates to all other texts
from the same topic;
3.5) parsing: you would keep a corpus of what you know so you wont
have to reread about certain topics and aspects you already know
(great Lord! how I hate reading a whole book to only find a few, at
times marginal, sentences worth reading! or that "youthful" thing of
thinking that they just discovered/created an idea because they are
just verbalizing it or made a movie about it!) BTW, regarding that
"parsing" aspect, what is the term used to describe the gradual
process of "terminological inception"? I have heard the term
"Adamization", but, even though that word doesn't really rub me the
wrong way, I could imagine it is "too sexist" to some people. I
wouldn't really care calling it Eveization or "pussyfication" or
whatever. I just don't want to use the term that the government uses:
"lexical priming" and "terminological inception" sounds too cumbersome
as a verb: "terminologically incept"? doesn't sound OK in English;
3.6) of course, an easy application of that contextual parsing would
be removing all that js crap and ads before they reach your awareness;
...
3.n) not last and definitely not least I am thinking hard about how
to make sure police and politicians at least have a hard time while
using what I have described to "freedom love" people (I know, I know,
... "3.n" doesn't "technically" pertain to quality of implementation
issues ..., but I, for one, disagree. Giving the "all tangible things"
(tm) panopticon in which we are all living these days, each of us in
one's own "virtual prison cell" to call it somehow, we should also
think about, be openly honest about such matters)
I am working right now on such Leibnizian "characteristica
universalis" kind of thing. First cleansing approx. 1.2 million texts
mostly from archive.org, *.pub and the NYS Regents exams
(nysedregents.org + nysl.ptfs.com) which they have, at least
partially, translated to more than 10 languages. Is that relevant
enough to you? ;-) I am also being quite selfish about it because I
have always dreamed of being able to "read"/mind all texts which have
ever been written in the same way that teens think they have to have
sex with everybody in town to make sense of things.
> ii. one can certainly dissect/decompose texts ...
Computing power has become insanely cheap, but it has also enabled
too much "cleverhansing" out there. The Delphic phrase: "you can make
sense or money" these times translates as some sort of corollary to:
"using computers and then thinking about it makes you smart"; but,
does it really?
It amazes me how easily you can "dissect"/"decompose texts", talk
about "tensors", "vectors", ... (I am not trying to police language
usage, it just amazes me); let alone all the insufferable bsing claims
by the "Artificial Intelligentsia".
I would go with one character after the other and an open attempt to
use the minimal amount of principles to then see what I get. IMO, when
you start getting too smart about what you do, of course, you will
"see" how smart you are. The poet in me likes Borges' stanzas: "... el
nombre es arquetipo de la cosa, en las letras de 'rosa' está la rosa y
todo el Nilo en la palabra 'Nilo'" ("its name is a thing's archetype,
in the letters of 'rose' is the rose and the whole of the Nile (river)
in the word 'Nile'")
> II. Re ""magical" in the sense that when we go about our intersubjective business": some intersubjectivity can be further clarified. I don't see much of your examples as being "magical".
I actually do! How could you clarified intersubjectivity? I am trying
to do so (somewhat) Mathematically (to the extent you could). Could
you share any papers, "prior art" on such matters?
> ii. "other people may read, mind, as well ...;": so?
which is a good thing it is alright, fine and dandy in the hippie way, I meant.
> iii. "Alice bought some veggies from Bob, ...)": this I don't understand.
> iv. "We see more in money ("words", ...) than just a piece of paper"
iii. and iv. overlap to some extent so I will try to explain them
both quickly (which is impossible since you can write philosophies
about each line, but there I'll go). To understand what Marx (may
have) meant by „gesellschaftlich notwendige Arbeit” ("socially
necessary labour time", wording which has made quite a few go berserk
ever since):
https://en.wikipedia.org/wiki/Socially_necessary_labour_timehttps://en.wikipedia.org/wiki/Transformation_problem
you have to understand the basic mathematical concepts of:
a) combined rates, and
b) intratextual systems of linear equations
Based on my teaching experience §b is easier to understand. Sorry I
couldn't find an "easier" explanation on youtube of that type of SLEs
than the one I used with my students preparing for the Regents:
https://ergosumus.files.wordpress.com/2018/10/sle04-en.pdf
the intratextuality of those problems matter to corpora research
because different strata of "like terms" ("verbs", "adjectives", ...)
is what creates grammar. "Crazy me" thinks you could to some extent
describe the "likeness of terms" underlying grammar!
~
I also have a guideline about combined rates which I successfully
used with my students:
https://ergosumus.files.wordpress.com/2018/06/word_problems12-en00.pdf
~
What the eff do combined rates and SLEs have to do with Marx'
transformation problem? ;-)
Well, notice that the -equitable aspect- used to solve combined rates
problems is the time (regardless of how differently fast one "works"
in comparison with others). There is also another type of combine rate
problems: you drive to some place with a friend who doesn't care about
driving fast, but you need to rest so she drives for a while ... that
problem is different from two people meeting at a place each driving
"on their own cars" (at their own average speed).
Serge Heiden shared a paper about presidential debates which could be
also Mathematically studied as a CR kind of problem (even if
politicians as the crowd management clowns they all are don't have to
make sense, anyway), but as it happens with any dialogue there are
parts of the conversations in which both the cars and the time is
shared and other times when only (or more of) the time. I don't know
of a general Mathematical formulation to CRs kinds of problems, which
could be used for corpora research. On my "to do" list I have writing
papers studying Euclid's Elements and Plato's Dialogues in that way.
Karl Marx's as part of his „Wertgesetz der Waren” (reChristened in
English as "labor theory of value") somewhat metaphorically stated
that the exchange value of a commodity is a function of "society's
labour-time". He also rendered his ideas as equations (in more of a
verbally descriptive, metaphorical way), but that phrase: "society's
labour-time", was and is still found from questionable to
unfalsifiably wild. I don't claim to have mind reading powers, but I
think in his letter to his friend Ludwig Kugelmann, the thoroughgoing
Hegelian Marx was, he clearly explained what he meant (page: 222 in
file, 208 in book):
https://archive.org/download/marxengelsselectedcorrespondence/Marx%20%26%20…
Marx To Ludwig Kugelmann In Hanover London, July 11, 1868:
All that palaver about the necessity of proving the concept of value
comes from complete ignorance both of the subject dealt with and of
scientific method. Every child knows that a nation which ceased to
work, I will not say for a year, but even for a few weeks, would
perish. Every child knows, too, that the masses of products
corresponding to the different needs require different and
quantitatively determined masses of the total labour of society. That
this necessity of the distribution of social labour in definite
proportions cannot possibly be done away with by a particular form of
social production but can only change the mode of its appearance, is
self-evident. No natural laws can be done away with. What can change
in historically different circumstances is only the form in which
these laws assert themselves. And the form in which this proportional
distribution of labour asserts itself, in a state of society where the
interconnection of social labour is manifested in the private exchange
of the individual products of labour, is precisely the exchange value
of these products.
~
So, as I see it, in a Hegelian way, Marx was seeing the whole of
society as a corpus (in which we all live through our own
texts/narratives) talking about "socially necessary labour time" in
the way that "time" becomes the equitable aspect shared when
people/(-society as a whole-) work together as described by combined
rates kinds of problems.
When "Alice buys some veggies from Bob, ..." she used money as
"equitable aspect" to get Bob's veggies (in the Marxian way they were
both part of a combine rates problem) and you tell me this is not
magical!
> v. "some transactional electronic ("air"...) excitations": I don't get this.
you may pay with cash using coins or bills or using your debit card
which at the end of the day become transactional electronic
excitations on some hard drives. When you speak there is more to it
than vibrations/fluctuations of air. (I am referring to the medium
which Saussurean signifiers use)
> vi. "your 'magic' and mine are different we are still able to 'communicate'. How on earth do such things happen?": a disclaimer: I am not using any magic in my attempts to communicate with you here. I try my best to place myself in your shoes to guesstimate the points that you are trying to get across. But many (as you can see above) didn't quite reach me.
"I try my best to place myself in your shoes" ... ;-) Ha, ha, ha!
that is just a functional illusion. What do you know about "my shoes"?
I work as a gardener (which I love to do) so they are dirty and
smelly, ... I also love to eat garlic ... As I see things standing on
"my dirty and smelly shoes and voicing it from my garlicky mouth"
being honest and true to matters is good enough.
lbrtchx