Greetings,
I am working on a project which is using lemmatization. I'm wondering how people have approached combining NIF and lemmatization. are there any "blessed" extensions or ontologies? I'm not seeing nif:lemma as an option within the nif ontology... though I am likely missing something.
Kind regards, - Hugh
Dear Hugh,
actually, there is nif:lemma in NIF 2.0 ( https://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.htm...). It's a datatype property where you just give the string value.
If you need an object property, you could model your lemmas as lexical entries and use OntoLex. For plain lemmatization, this would be overkill, but if you want to add metadata about your lemmas or link it with additional information, say, about derivation information or a proper dictionary, that would be the most conventional way of doing things in the context of NIF.
Example: Peter came home.
Lemma entry: :come_le a ontolex:Word; ontolex:canonicalForm [ ontolex:writtenRep "come"@en ].
As for the linking between lemmas and corpus, you could use either Web Annotation (and annotate the text with lemmas = lexical entries [here]) or OntoLex-FrAC.
Web Annotation (https://www.w3.org/TR/annotation-model, https://www.w3.org/ns/oa):
[ a oa:Annotation ] oa:hasBody :come_le; oa:hasTarget [ a oa:RangeSelector; oa:start "6"; oa:end:"9" ].
OntoLex-FrAC ( https://github.com/ontolex/frequency-attestation-corpus-information):
:come_le frac:attestation [ a frac:Attestation; frac:locus ... ]. (the object of locus could be a WebAnnotation selector or a NIF String URI.
If you'd like to avoid treating lemmas as lexical entries (because they typically don't have parts of speech, for example), you can also use plain Web Annotation:
[ a oa:Annotation, my_types:Lemma ] oa:hasBodyValue "come"; oa:hasTarget [ a oa:RangeSelector; oa:start "6"; oa:end:"9" ].
Here, my_types:Lemma is a placeholder for whatever class you introduce to define lemmas.
I would recommend the following preference 1. nif:lemma if you just have a lemma string and work with NIF anyway 2. OntoLex with frac:attestation if you can cast your lemmas as a lexical entry 3. WebAnnotation with oa:hasBodyValue if you have strong opinions on your lemmas being something else than lexical entries.
Note that you are free to use NIF Strings in place of WebAnnotation selectors (which safes you 3 triples), but people in Web Annotation would probably prefer their established ways.
Best, Christian
PS: I won't debate the use and future of lemmatization here. Lemmas won't disappear from linguistics, language teaching or philological practice, regardless of what happens in NLP.
Am Di., 10. Okt. 2023 um 00:15 Uhr schrieb Hugh Paterson III via Corpora < corpora@list.elra.info>:
Greetings,
I am working on a project which is using lemmatization. I'm wondering how people have approached combining NIF and lemmatization. are there any "blessed" extensions or ontologies? I'm not seeing nif:lemma as an option within the nif ontology... though I am likely missing something.
Kind regards,
- Hugh
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Dear Christian
Re your PS: one doesn't need to debate the use/future of lemmatization, though I'd welcome such as part of scholarship. For those experienced in matters in/of Linguistics, it should be clear that lemmatization was simply a cconstruct, a entry-level philological exercise (esp. for those from Computer Science with less of a background in Linguistics and language(s)). It has been sad that some have picked up the habit of using lemmatization as a heuristic (though for what, specifically?) and might have become, apparently, too addicted to it to let it go. It is imperative that one weans themselves from such habit. Methods for linguistic morphology, e.g. (morphological) parsing or stemming, are not a universal decomposition scheme, nor a universal method for language/linguistic analyses. Also important is to bear in mind is that neither linguistic morphology nor lemmas/lemmata doesn't/don't have that long of a history.
Thanks for being open-minded enough to read this far.
Best Ada
On Tue, Oct 17, 2023 at 12:28 PM Christian Chiarcos via Corpora < corpora@list.elra.info> wrote:
Dear Hugh,
actually, there is nif:lemma in NIF 2.0 ( https://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.htm...). It's a datatype property where you just give the string value.
If you need an object property, you could model your lemmas as lexical entries and use OntoLex. For plain lemmatization, this would be overkill, but if you want to add metadata about your lemmas or link it with additional information, say, about derivation information or a proper dictionary, that would be the most conventional way of doing things in the context of NIF.
Example: Peter came home.
Lemma entry: :come_le a ontolex:Word; ontolex:canonicalForm [ ontolex:writtenRep "come"@en ].
As for the linking between lemmas and corpus, you could use either Web Annotation (and annotate the text with lemmas = lexical entries [here]) or OntoLex-FrAC.
Web Annotation (https://www.w3.org/TR/annotation-model, https://www.w3.org/ns/oa):
[ a oa:Annotation ] oa:hasBody :come_le; oa:hasTarget [ a oa:RangeSelector; oa:start "6"; oa:end:"9" ].
OntoLex-FrAC ( https://github.com/ontolex/frequency-attestation-corpus-information):
:come_le frac:attestation [ a frac:Attestation; frac:locus ... ]. (the object of locus could be a WebAnnotation selector or a NIF String URI.
If you'd like to avoid treating lemmas as lexical entries (because they typically don't have parts of speech, for example), you can also use plain Web Annotation:
[ a oa:Annotation, my_types:Lemma ] oa:hasBodyValue "come"; oa:hasTarget [ a oa:RangeSelector; oa:start "6"; oa:end:"9" ].
Here, my_types:Lemma is a placeholder for whatever class you introduce to define lemmas.
I would recommend the following preference
- nif:lemma if you just have a lemma string and work with NIF anyway
- OntoLex with frac:attestation if you can cast your lemmas as a lexical
entry 3. WebAnnotation with oa:hasBodyValue if you have strong opinions on your lemmas being something else than lexical entries.
Note that you are free to use NIF Strings in place of WebAnnotation selectors (which safes you 3 triples), but people in Web Annotation would probably prefer their established ways.
Best, Christian
PS: I won't debate the use and future of lemmatization here. Lemmas won't disappear from linguistics, language teaching or philological practice, regardless of what happens in NLP.
Am Di., 10. Okt. 2023 um 00:15 Uhr schrieb Hugh Paterson III via Corpora < corpora@list.elra.info>:
Greetings,
I am working on a project which is using lemmatization. I'm wondering how people have approached combining NIF and lemmatization. are there any "blessed" extensions or ontologies? I'm not seeing nif:lemma as an option within the nif ontology... though I am likely missing something.
Kind regards,
- Hugh
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info