Dear Hugh,
actually, there is nif:lemma in NIF 2.0 ( https://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/nif-core.htm...). It's a datatype property where you just give the string value.
If you need an object property, you could model your lemmas as lexical entries and use OntoLex. For plain lemmatization, this would be overkill, but if you want to add metadata about your lemmas or link it with additional information, say, about derivation information or a proper dictionary, that would be the most conventional way of doing things in the context of NIF.
Example: Peter came home.
Lemma entry: :come_le a ontolex:Word; ontolex:canonicalForm [ ontolex:writtenRep "come"@en ].
As for the linking between lemmas and corpus, you could use either Web Annotation (and annotate the text with lemmas = lexical entries [here]) or OntoLex-FrAC.
Web Annotation (https://www.w3.org/TR/annotation-model, https://www.w3.org/ns/oa):
[ a oa:Annotation ] oa:hasBody :come_le; oa:hasTarget [ a oa:RangeSelector; oa:start "6"; oa:end:"9" ].
OntoLex-FrAC ( https://github.com/ontolex/frequency-attestation-corpus-information):
:come_le frac:attestation [ a frac:Attestation; frac:locus ... ]. (the object of locus could be a WebAnnotation selector or a NIF String URI.
If you'd like to avoid treating lemmas as lexical entries (because they typically don't have parts of speech, for example), you can also use plain Web Annotation:
[ a oa:Annotation, my_types:Lemma ] oa:hasBodyValue "come"; oa:hasTarget [ a oa:RangeSelector; oa:start "6"; oa:end:"9" ].
Here, my_types:Lemma is a placeholder for whatever class you introduce to define lemmas.
I would recommend the following preference 1. nif:lemma if you just have a lemma string and work with NIF anyway 2. OntoLex with frac:attestation if you can cast your lemmas as a lexical entry 3. WebAnnotation with oa:hasBodyValue if you have strong opinions on your lemmas being something else than lexical entries.
Note that you are free to use NIF Strings in place of WebAnnotation selectors (which safes you 3 triples), but people in Web Annotation would probably prefer their established ways.
Best, Christian
PS: I won't debate the use and future of lemmatization here. Lemmas won't disappear from linguistics, language teaching or philological practice, regardless of what happens in NLP.
Am Di., 10. Okt. 2023 um 00:15 Uhr schrieb Hugh Paterson III via Corpora < corpora@list.elra.info>:
Greetings,
I am working on a project which is using lemmatization. I'm wondering how people have approached combining NIF and lemmatization. are there any "blessed" extensions or ontologies? I'm not seeing nif:lemma as an option within the nif ontology... though I am likely missing something.
Kind regards,
- Hugh
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info