there are all kinds of lists on wikipedia of various kinds of authors: linguists, philosophers, mathematicians ... In most (almost all?) cases there is a brief page about the authors biography and their work.
As you could expect some folks (isbndb.com) would come up with the great idea of selling you the air you breathe. exaly.com, worldcat, freelibrary.org ... do a minimally better job, but their web interface I find too constraining and, "of course", you don't find a "download the whole damn thing" option.
What I am looking for is an openly and collectively maintained DB a la wikipedia from which interface you could download all search hits as well-formatted, parsable lines in a text file without having to "click next", copy and paste, and all that kind of nonsense.
I could imagine someone in the corpora research community has taken the time to compile a database which IMO should include:
a) work: a.1) original name a.2) original language a.3) topical bags index a.4) received category index (a children book, book review, degree theses, article in periodical, ...) a.5) publications: a.5.1) date a.5.2) metadata RDF including: language, "co-"authors (preface, those writing back-cover blurbs), editors, translators, ISBNs, publisher, copyright notice, ... b) name(s): b.1) first/given name(s) (at Birth) b.2) last name(s) (at Birth) b.3) pen name(s) b.3) also known as c) birth place d) date of birth e) languages f) date of death
Authorship - work pairs should be prioritized. In case of compilations of various auth-work pairs in a single book, the compilation in which an article appears should be specified in the metadata.
Please, let me know where could I find such a database (even if partially) which could be downloaded. In case you don't know such a general registry of published books/texts, which other entries would you think are important?
lbrtchx
What I am looking for is an openly and collectively maintained DB a la wikipedia from which interface you could download all search hits as well-formatted, parsable lines in a text file without having to "click next", copy and paste, and all that kind of nonsense.
I could imagine someone in the corpora research community has taken the time to compile a database which IMO should include: ...
That sounds like a description of Wikidata; that is the kind of thing it was created to achieve, at least.
You can construct your own queries in SPARQL, which I've found quite hard, though I've not done much with it the past couple of years, so it may have got better.
The query builder will make it a bit easier: https://query.wikidata.org/querybuilder/?uselang=en It will still be a bit of work, finding out the property numbers for all the fields you want included.
(If you go down this route, and it works, maybe you can post back the SPARQL query you end up with.)
Darren
On 2/4/23, Darren Cook via Corpora corpora@list.elra.info wrote:
The query builder will make it a bit easier: https://query.wikidata.org/querybuilder/?uselang=en It will still be a bit of work, finding out the property numbers for all the fields you want included.
(If you go down this route, and it works, maybe you can post back the SPARQL query you end up with.)
I played with it for a little while and couldn't make sense of it. The best I could do is gettting an exception which at least showed to me what they were using:
https://query.wikidata.org/querybuilder/?uselang=en
Wikidata Query Service
Edit visually Edit SPARQL Help Examples Download
Query timeout limit reached
SPARQL-QUERY: queryStr=SELECT DISTINCT ?item ?itemLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } { SELECT DISTINCT ?item WHERE { ?item p:P50 ?statement0. ?statement0 (ps:P50/(wdt:P279*)) _:anyValueP50. } LIMIT 100 } } java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292) at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlQuery(QueryServlet.java:678) at com.bigdata.rdf.sail.webapp.QueryServlet.doGet(QueryServlet.java:290) at com.bigdata.rdf.sail.webapp.RESTServlet.doGet(RESTServlet.java:240) at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doGet(MultiTenancyServlet.java:273) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:116) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.JWTIdentityFilter.doFilter(JWTIdentityFilter.java:66) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RequestConcurrencyFilter.doFilter(RequestConcurrencyFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:750)
Hi Albretch,
given that Wikidata is far from being a complete source of knowledge https://arxiv.org/abs/2212.13104 about authors and their works and has very strict timeout rules,
you can try this basic query and start building from it. (Please notice that the wikibase:label service often leads to timeouts and a mitigation strategy may be retrieving labels in a separate step) :
SELECT DISTINCT ?authorLabel ?workLabel ?workPage ?authorPage WHERE { {?work wdt:P31 wd:Q7725634} UNION {?work wdt:P31 wd:Q47461344} . ### all instances of the type 'literary work' or 'written work' ?work wdt:P50 ?author . ### the association between works and their authors
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } ### the service for retrieving labels from Wikidata
?workPage schema:about ?work; schema:isPartOf https://en.wikipedia.org/. ### retrieve the wikipedia page of the work ?authorPage schema:about ?author; schema:isPartOf https://en.wikipedia.org/. ### retrieve the wikipedia page of the author
} limit 10
Il giorno sab 4 feb 2023 alle ore 20:34 Albretch Mueller via Corpora < corpora@list.elra.info> ha scritto:
On 2/4/23, Darren Cook via Corpora corpora@list.elra.info wrote:
The query builder will make it a bit easier: https://query.wikidata.org/querybuilder/?uselang=en It will still be a bit of work, finding out the property numbers for all the fields you want included.
(If you go down this route, and it works, maybe you can post back the SPARQL query you end up with.)
I played with it for a little while and couldn't make sense of it. The best I could do is gettting an exception which at least showed to me what they were using:
https://query.wikidata.org/querybuilder/?uselang=en
Wikidata Query Service
Edit visually Edit SPARQL Help Examples Download
Query timeout limit reached
SPARQL-QUERY: queryStr=SELECT DISTINCT ?item ?itemLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } { SELECT DISTINCT ?item WHERE { ?item p:P50 ?statement0. ?statement0 (ps:P50/(wdt:P279*)) _:anyValueP50. } LIMIT 100 } } java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292) at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlQuery(QueryServlet.java:678) at com.bigdata.rdf.sail.webapp.QueryServlet.doGet(QueryServlet.java:290) at com.bigdata.rdf.sail.webapp.RESTServlet.doGet(RESTServlet.java:240) at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doGet(MultiTenancyServlet.java:273) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:116) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.JWTIdentityFilter.doFilter(JWTIdentityFilter.java:66) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RequestConcurrencyFilter.doFilter(RequestConcurrencyFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io .FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io .ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:750) _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
It would search through a school’s syllabus site ...
I am not trying to "search" for auth-work pairs. Going through schools’ syllabi is a great idea. I thought there should be such a registry somewhere.
wikibase I didn't find intuitive and/or it was taking too much of my time.
I am trying to sweet talk wikipedia folks into releasing a copy of all the references on their pages and archive.org and books.google.com to provide "we the people" with a list of all the books they have cataloged.
lbrtchx
Hi Marco:
thank you for sharing your great work:
https://arxiv.org/pdf/2212.13104.pdf
I have kept pestering "big wigs" trying to make them understand the need and profit (if not monetary, definitely as common good) of exactly what you describe in your paper.
// __ where could you find a master list of book-author pairs?
https://support.google.com/websearch/thread/207859826?hl=en ~ lbrtchx
On 2/5/23, Marco Antonio Stranisci marcoantonio.stranisci@unito.it wrote:
Hi Albretch,
given that Wikidata is far from being a complete source of knowledge https://arxiv.org/abs/2212.13104 about authors and their works and has very strict timeout rules,
you can try this basic query and start building from it. (Please notice that the wikibase:label service often leads to timeouts and a mitigation strategy may be retrieving labels in a separate step) :
SELECT DISTINCT ?authorLabel ?workLabel ?workPage ?authorPage WHERE { {?work wdt:P31 wd:Q7725634} UNION {?work wdt:P31 wd:Q47461344} . ### all instances of the type 'literary work' or 'written work' ?work wdt:P50 ?author . ### the association between works and their authors
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } ### the service for retrieving labels from Wikidata
?workPage schema:about ?work; schema:isPartOf https://en.wikipedia.org/. ### retrieve the wikipedia page of the work ?authorPage schema:about ?author; schema:isPartOf https://en.wikipedia.org/. ### retrieve the wikipedia page of the author
} limit 10
Il giorno sab 4 feb 2023 alle ore 20:34 Albretch Mueller via Corpora < corpora@list.elra.info> ha scritto:
On 2/4/23, Darren Cook via Corpora corpora@list.elra.info wrote:
The query builder will make it a bit easier: https://query.wikidata.org/querybuilder/?uselang=en It will still be a bit of work, finding out the property numbers for all the fields you want included.
(If you go down this route, and it works, maybe you can post back the SPARQL query you end up with.)
I played with it for a little while and couldn't make sense of it. The best I could do is gettting an exception which at least showed to me what they were using:
https://query.wikidata.org/querybuilder/?uselang=en
Wikidata Query Service
Edit visually Edit SPARQL Help Examples Download
Query timeout limit reached
SPARQL-QUERY: queryStr=SELECT DISTINCT ?item ?itemLabel WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } { SELECT DISTINCT ?item WHERE { ?item p:P50 ?statement0. ?statement0 (ps:P50/(wdt:P279*)) _:anyValueP50. } LIMIT 100 } } java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at com.bigdata.rdf.sail.webapp.BigdataServlet.submitApiTask(BigdataServlet.java:292) at com.bigdata.rdf.sail.webapp.QueryServlet.doSparqlQuery(QueryServlet.java:678) at com.bigdata.rdf.sail.webapp.QueryServlet.doGet(QueryServlet.java:290) at com.bigdata.rdf.sail.webapp.RESTServlet.doGet(RESTServlet.java:240) at com.bigdata.rdf.sail.webapp.MultiTenancyServlet.doGet(MultiTenancyServlet.java:273) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1655) at org.wikidata.query.rdf.blazegraph.throttling.ThrottlingFilter.doFilter(ThrottlingFilter.java:320) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.throttling.SystemOverloadFilter.doFilter(SystemOverloadFilter.java:82) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at ch.qos.logback.classic.helpers.MDCInsertingServletFilter.doFilter(MDCInsertingServletFilter.java:49) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.QueryEventSenderFilter.doFilter(QueryEventSenderFilter.java:116) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.ClientIPFilter.doFilter(ClientIPFilter.java:43) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.JWTIdentityFilter.doFilter(JWTIdentityFilter.java:66) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RealAgentFilter.doFilter(RealAgentFilter.java:33) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) at org.wikidata.query.rdf.blazegraph.filters.RequestConcurrencyFilter.doFilter(RequestConcurrencyFilter.java:50) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.eclipse.jetty.server.Server.handle(Server.java:503) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.eclipse.jetty.io .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.eclipse.jetty.io .FillInterest.fillable(FillInterest.java:103) at org.eclipse.jetty.io .ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:750) _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
-- You didn’t see the thing because you don’t know how to look. And you don’t know how to look because you don’t know the names
I also meant to say that I am not too convinced, enthusiastic about knowledge graphs or any kind of annotations tagged onto texts, what Franzosi calls "coding" which I questioned in a rant about his book:
// __ somewhere between a draft for the story line of a movie and listening to Žižek ...
https://www.amazon.com/gp/customer-reviews/R1N1USZJW30O80/ ~ I think corpora should be self-describing from the texts themselves. Nothing more, nothing less. All kinds of functionality into corpora should be indexed into the texts themselves, not "described" as some sort of "value-added" whatever. Do you know of any work emphasizing these aspects, ways of understanding corpora? By the way I am a Mathematician/Physicist so my "metrical" way of seeing things would be influenced wrongly or not by my background lbrtchx
On 3/26/23, Albretch Mueller lbrtchx@gmail.com wrote:
Thank you for citing your sources (LoC, goodreads, wikipedia, ...). I could imagine that as part of your work you had to get a more general list of author-work pairs from which you selected the ones you were interested in for you research. Where is that more general list? ;-) The only link I found when I went: site:github.com marcostranisci list, was about some python code.
Relating to your "Discovery tools for Digital Humanities", on §5, what do you mean when you say that: "it is possible to track a subject through time?" that is the only requirement which was not metadata. Do you mean the topics that the author engaged? the used terminology/words? the subject matter (say, Chemistry)?
I am working myself right now on a similar project, but instead of shedding some light on non-Western authors, I am interested in author-work pairs with "universal appeal" regardless of the genre, meaning that their authored texts have been:
* translated to other languages (even if it is just one book by the author), * published in another country (even if in the same language), * cited from authors belonging to other cultures/who speak other languages, or * republished more than a generation later.
I know those are not exactly totally safe criteria since some local report about "how human pee affected the fauna of the Ohio river" may not be considered to be of universal appeal, but a report about the pollution of the Hudson River by General Electric with all kinds of PCBs definitely would and that report about the Ohio River may relate to the Hudson River one ... so that kind of "universal appeal" corpus should honestly and functionally keep its edges, extensions.
You also considered the Spanish-American war "the first decolonization process". As someone who was born and raised in Cuba as part of a family of high profile political dissidents (so even though I am not that much into "historical" topics, basically narratives to keep the proles, they were talked about in my home since I was little), let me just tell you that even though I am not exactly a "patriot", ideologically/politically minded person, the idea sounded so weird to me that I couldn't be cynically quiet about it.
I have no way of knowing if you are being sarcastic in a cosmic way and/or your statement is an "honoring favor" to your donors. Let me tell you that, both, people for and against the U.S. trained dictator Fulgencio Batista and also the people who were for and against those that toppled him have wondered about and questioned the legal grounds for the U.S. occupation of Cuban territory. USG, as they talk, says that "they 'negotiated' that land based on the Platt Amendment" (they actually say that!) and they use that "alternate fact" to justify the use of torture in that place (jurisdictionally, not part of the U.S. territory), let me just point out that Cuba was basically a occupied country by the U.S. military, a country which as part of the Monroe Doctrine has always received some "special attention" from USG’s "freedom loving".
Why is USG still in gitmo? Once again, another piece of gringo twisted thinking. They have actually stated that the reason why they have remained there is "because" "they had paid the Cuban government (after they toppled Batista) the first month of rent" and "they had accepted it" ... So, they do admit legally they don't own the place which they use to justify "torture"; oh, no, wait!: "enhanced interrogation techniques". They are basically renting (not even squatting) it under their own "terms"! ... Is this what you call "decolonization"?
lbrtchx
On 3/26/23, Albretch Mueller lbrtchx@gmail.com wrote:
Hi Marco:
thank you for sharing your great work:
https://arxiv.org/pdf/2212.13104.pdf
I have kept pestering "big wigs" trying to make them understand the need and profit (if not monetary, definitely as common good) of exactly what you describe in your paper.
// __ where could you find a master list of book-author pairs?
https://support.google.com/websearch/thread/207859826?hl=en ~ lbrtchx
In terms of defining decolonization or its inverse colonization, I have often wondered if we are seeing modern colonization, e.g. of a military style Russia in Syria or Ukraine, and/or US in Iraq, of an economic style China in Vanuatu, and some African contexts. etc. that is, must colonization only apply to the 19th century? Following that then to what degree do the corpora and ML/AI tools we create facilitate these un-equal relationships.
Kind regards -Hugh
On Sun, Apr 2, 2023 at 8:13 AM Albretch Mueller via Corpora < corpora@list.elra.info> wrote:
On 3/26/23, Albretch Mueller lbrtchx@gmail.com wrote:
Thank you for citing your sources (LoC, goodreads, wikipedia, ...). I could imagine that as part of your work you had to get a more general list of author-work pairs from which you selected the ones you were interested in for you research. Where is that more general list? ;-) The only link I found when I went: site:github.com marcostranisci list, was about some python code.
Relating to your "Discovery tools for Digital Humanities", on §5, what do you mean when you say that: "it is possible to track a subject through time?" that is the only requirement which was not metadata. Do you mean the topics that the author engaged? the used terminology/words? the subject matter (say, Chemistry)?
I am working myself right now on a similar project, but instead of shedding some light on non-Western authors, I am interested in author-work pairs with "universal appeal" regardless of the genre, meaning that their authored texts have been:
- translated to other languages (even if it is just one book by the
author),
- published in another country (even if in the same language),
- cited from authors belonging to other cultures/who speak other
languages, or
- republished more than a generation later.
I know those are not exactly totally safe criteria since some local report about "how human pee affected the fauna of the Ohio river" may not be considered to be of universal appeal, but a report about the pollution of the Hudson River by General Electric with all kinds of PCBs definitely would and that report about the Ohio River may relate to the Hudson River one ... so that kind of "universal appeal" corpus should honestly and functionally keep its edges, extensions.
You also considered the Spanish-American war "the first decolonization process". As someone who was born and raised in Cuba as part of a family of high profile political dissidents (so even though I am not that much into "historical" topics, basically narratives to keep the proles, they were talked about in my home since I was little), let me just tell you that even though I am not exactly a "patriot", ideologically/politically minded person, the idea sounded so weird to me that I couldn't be cynically quiet about it.
I have no way of knowing if you are being sarcastic in a cosmic way and/or your statement is an "honoring favor" to your donors. Let me tell you that, both, people for and against the U.S. trained dictator Fulgencio Batista and also the people who were for and against those that toppled him have wondered about and questioned the legal grounds for the U.S. occupation of Cuban territory. USG, as they talk, says that "they 'negotiated' that land based on the Platt Amendment" (they actually say that!) and they use that "alternate fact" to justify the use of torture in that place (jurisdictionally, not part of the U.S. territory), let me just point out that Cuba was basically a occupied country by the U.S. military, a country which as part of the Monroe Doctrine has always received some "special attention" from USG’s "freedom loving".
Why is USG still in gitmo? Once again, another piece of gringo twisted thinking. They have actually stated that the reason why they have remained there is "because" "they had paid the Cuban government (after they toppled Batista) the first month of rent" and "they had accepted it" ... So, they do admit legally they don't own the place which they use to justify "torture"; oh, no, wait!: "enhanced interrogation techniques". They are basically renting (not even squatting) it under their own "terms"! ... Is this what you call "decolonization"?
lbrtchx
On 3/26/23, Albretch Mueller lbrtchx@gmail.com wrote:
Hi Marco:
thank you for sharing your great work:
https://arxiv.org/pdf/2212.13104.pdf
I have kept pestering "big wigs" trying to make them understand the need and profit (if not monetary, definitely as common good) of exactly what you describe in your paper.
// __ where could you find a master list of book-author pairs?
https://support.google.com/websearch/thread/207859826?hl=en ~ lbrtchx
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
... so that kind of "universal appeal" corpus
The one liner: "you are not a prophet in your own land" (apparently derived from the Bible: Luke 4:24, Jesus said: "Assuredly, I say to you, no prophet is accepted in his own country"), I take as meaning: "if all you know is 'your land', you would only be able to 'preach to your own choir'".
I think truth, true relevance always has more to it.
On 4/2/23, Hugh Paterson III sil.linguist@gmail.com wrote:
In terms of defining decolonization or its inverse colonization ...
"In terms of defining decolonization or its inverse colonization" did you just say? Based on your email address you seem to have an invested interest in portraying yourself as some sort of "SIL linguist". If you are an actual person, let me just tell you that the word "decolonization" I had never heard, I have no idea about what it could possibly mean and since colonization hasn't stopped to begin with at least you should have written your sentence as "In terms of defining colonization AND its inverse decolonization", right? Even an AI bot of the ones being used to keep the Internet would notice the difference.
... must colonization only apply to the 19th century?
Do you mean "must colonization only apply to the 15th, 16th, 17th, 18th, 19th, 20th and 21th centuries"?
https://en.wikipedia.org/wiki/History_of_colonialism
Let's not get into what you call "modern colonization"
lbrtchx
On 4/2/23, Hugh Paterson III sil.linguist@gmail.com wrote:
In terms of defining decolonization or its inverse colonization, I have often wondered if we are seeing modern colonization, e.g. of a military style Russia in Syria or Ukraine, and/or US in Iraq, of an economic style China in Vanuatu, and some African contexts. etc. that is, must colonization only apply to the 19th century? Following that then to what degree do the corpora and ML/AI tools we create facilitate these un-equal relationships.
Kind regards -Hugh