Greetings,
Does anyone know of any descriptions or approaches to using Ontolex/lemon with non-concatenative morphology? Is the assumption that Cv1Cv2C shaped words will have their own entries for each instance of changes for v1 and v2? If this is the case, then this radically increases the number of items in a dictionary when compared with languages with affix type morphology.
Any pointers appreciated,
Kind regards, Hugh
Dear Hugh,
this has been addressed in the context of the emerging OntoLex-Morph vocabulary (https://www.w3.org/community/ontolex/wiki/Morphology, https://github.com/ontolex/morph; most recent diagram under https://github.com/ontolex/morph/blob/master/doc/diagrams/Readme.md). Here, a morph:Morph object (a lexical entry of a lexical resource for morphemes, depending on the type of resource, this can be a morpheme or an allomorph of a morpheme), can be the object of a morph:involves property that connects it with a morph:Rule. This morph:Rule can have one or more morph:replacement properties. The morph:Replacement objects that this points to use regular expressions to formalize source and target strings of the rule associated with that particular morph(eme). These use Perl/Java/SPARQL-style regex syntax, which includes the support for capturing groups.
Note that this formalizes the form side of morphemes only, not the meaning side. However, a morph:Rule can also have a morph:grammaticalMeaning property to which such information can be added. Last week, Max Ionov and Mike Rosner have described the application (and an extension) of this mechanism for Maltese in a recent LDK paper: Beyond Concatenative Morphology: Applying OntoLex-Morph to Maltese *Maxim Ionov, Mike Rosner*. (Not online, yet.) We were also looking into other Semitic languages (and related phenomena such as Umlaut in German or vowel harmony in Turkic), but only on individual examples. If anyone is interested in discussing this further, please join the biweekly OntoLex-Morph calls ;)
The OntoLex-Morph vocabulary is relatively advanced, and we are in the process of freezing it in order to prepare its publication. Finalization of the report is expected for mid-next year.
Best, Christian
Am Mo., 18. Sept. 2023 um 15:31 Uhr schrieb Hugh Paterson III via Corpora < corpora@list.elra.info>:
Greetings,
Does anyone know of any descriptions or approaches to using Ontolex/lemon with non-concatenative morphology? Is the assumption that Cv1Cv2C shaped words will have their own entries for each instance of changes for v1 and v2? If this is the case, then this radically increases the number of items in a dictionary when compared with languages with affix type morphology.
Any pointers appreciated,
Kind regards, Hugh _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
What I forgot to state is the most elementary aspect: an ontolex:LexicalEntry can be associated with a (nonconcatenative) morph:Morph or a (nonconcatenative) morph:Rule/morph:Replacement in the following ways: - for word formation: the lexical entry (e.g., a lexinfo:Root) from which one or more derived forms can be encoded as the vartrans:source of a morph:WordFormationRelation and an associated morph:WordFormationRule - for inflection: the lexical entry can have an ontolex:morphologicalPattern relation pointing to a morph:Paradigm. Such paradigms are the morph:paradigm of morph:InflectionRules.
Both morph:InflectionRule and morph:WordFormationRule are morph:Rules and can thus be connected to a non-concatenative morpheme (or replacement) as described in the other email.
Our current real-world examples for noncontenative morphology are from word formation, only. I guess your usecase is more in the inflection area (because for word formation, it would be practical to give a lexical sense, and then you'd need a LexicalEntry anyway), but the noncontenative part of the specification (by means of regular expressions and capturing groups in morph:Replacement) is identical in both use scenarios.
Best, Christian
Am Mo., 18. Sept. 2023 um 16:45 Uhr schrieb Christian Chiarcos < christian.chiarcos@gmail.com>:
Dear Hugh,
this has been addressed in the context of the emerging OntoLex-Morph vocabulary (https://www.w3.org/community/ontolex/wiki/Morphology, https://github.com/ontolex/morph; most recent diagram under https://github.com/ontolex/morph/blob/master/doc/diagrams/Readme.md). Here, a morph:Morph object (a lexical entry of a lexical resource for morphemes, depending on the type of resource, this can be a morpheme or an allomorph of a morpheme), can be the object of a morph:involves property that connects it with a morph:Rule. This morph:Rule can have one or more morph:replacement properties. The morph:Replacement objects that this points to use regular expressions to formalize source and target strings of the rule associated with that particular morph(eme). These use Perl/Java/SPARQL-style regex syntax, which includes the support for capturing groups.
Note that this formalizes the form side of morphemes only, not the meaning side. However, a morph:Rule can also have a morph:grammaticalMeaning property to which such information can be added. Last week, Max Ionov and Mike Rosner have described the application (and an extension) of this mechanism for Maltese in a recent LDK paper: Beyond Concatenative Morphology: Applying OntoLex-Morph to Maltese *Maxim Ionov, Mike Rosner*. (Not online, yet.) We were also looking into other Semitic languages (and related phenomena such as Umlaut in German or vowel harmony in Turkic), but only on individual examples. If anyone is interested in discussing this further, please join the biweekly OntoLex-Morph calls ;)
The OntoLex-Morph vocabulary is relatively advanced, and we are in the process of freezing it in order to prepare its publication. Finalization of the report is expected for mid-next year.
Best, Christian
Am Mo., 18. Sept. 2023 um 15:31 Uhr schrieb Hugh Paterson III via Corpora < corpora@list.elra.info>:
Greetings,
Does anyone know of any descriptions or approaches to using Ontolex/lemon with non-concatenative morphology? Is the assumption that Cv1Cv2C shaped words will have their own entries for each instance of changes for v1 and v2? If this is the case, then this radically increases the number of items in a dictionary when compared with languages with affix type morphology.
Any pointers appreciated,
Kind regards, Hugh _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Dear Hugh
An alternative would be to use dictionaries (as in, "{ }" in python) to group characters belonging to the consonant and vowel groups (or at least one of them) and then examine accordingly. This should render more scientific insights on sequences than relying on bigger spans of hard-coded information (on "word"/"morph(eme)"-level), esp. when one calculates the transition probabilities and interpret accordingly. (Remember the whitespaces (if any) and use continuous texts/data (i.e. not just colonial data e.g. "word lists")!) What is the purpose, though, of your task that is supposed to be related to "morphology"?
Best Ada
On Mon, Sep 18, 2023 at 6:07 PM Christian Chiarcos via Corpora < corpora@list.elra.info> wrote:
What I forgot to state is the most elementary aspect: an ontolex:LexicalEntry can be associated with a (nonconcatenative) morph:Morph or a (nonconcatenative) morph:Rule/morph:Replacement in the following ways:
- for word formation: the lexical entry (e.g., a lexinfo:Root) from which
one or more derived forms can be encoded as the vartrans:source of a morph:WordFormationRelation and an associated morph:WordFormationRule
- for inflection: the lexical entry can have an
ontolex:morphologicalPattern relation pointing to a morph:Paradigm. Such paradigms are the morph:paradigm of morph:InflectionRules.
Both morph:InflectionRule and morph:WordFormationRule are morph:Rules and can thus be connected to a non-concatenative morpheme (or replacement) as described in the other email.
Our current real-world examples for noncontenative morphology are from word formation, only. I guess your usecase is more in the inflection area (because for word formation, it would be practical to give a lexical sense, and then you'd need a LexicalEntry anyway), but the noncontenative part of the specification (by means of regular expressions and capturing groups in morph:Replacement) is identical in both use scenarios.
Best, Christian
Am Mo., 18. Sept. 2023 um 16:45 Uhr schrieb Christian Chiarcos < christian.chiarcos@gmail.com>:
Dear Hugh,
this has been addressed in the context of the emerging OntoLex-Morph vocabulary (https://www.w3.org/community/ontolex/wiki/Morphology, https://github.com/ontolex/morph; most recent diagram under https://github.com/ontolex/morph/blob/master/doc/diagrams/Readme.md). Here, a morph:Morph object (a lexical entry of a lexical resource for morphemes, depending on the type of resource, this can be a morpheme or an allomorph of a morpheme), can be the object of a morph:involves property that connects it with a morph:Rule. This morph:Rule can have one or more morph:replacement properties. The morph:Replacement objects that this points to use regular expressions to formalize source and target strings of the rule associated with that particular morph(eme). These use Perl/Java/SPARQL-style regex syntax, which includes the support for capturing groups.
Note that this formalizes the form side of morphemes only, not the meaning side. However, a morph:Rule can also have a morph:grammaticalMeaning property to which such information can be added. Last week, Max Ionov and Mike Rosner have described the application (and an extension) of this mechanism for Maltese in a recent LDK paper: Beyond Concatenative Morphology: Applying OntoLex-Morph to Maltese *Maxim Ionov, Mike Rosner*. (Not online, yet.) We were also looking into other Semitic languages (and related phenomena such as Umlaut in German or vowel harmony in Turkic), but only on individual examples. If anyone is interested in discussing this further, please join the biweekly OntoLex-Morph calls ;)
The OntoLex-Morph vocabulary is relatively advanced, and we are in the process of freezing it in order to prepare its publication. Finalization of the report is expected for mid-next year.
Best, Christian
Am Mo., 18. Sept. 2023 um 15:31 Uhr schrieb Hugh Paterson III via Corpora corpora@list.elra.info:
Greetings,
Does anyone know of any descriptions or approaches to using Ontolex/lemon with non-concatenative morphology? Is the assumption that Cv1Cv2C shaped words will have their own entries for each instance of changes for v1 and v2? If this is the case, then this radically increases the number of items in a dictionary when compared with languages with affix type morphology.
Any pointers appreciated,
Kind regards, Hugh _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Dear Ada,
I am afraid your suggestion is something that the Ontolex-lemon model does not support (which is important since the question was specifically about that model for representing lexicographic information, not just general approaches for dealing with non-concatenative morphology). It is very similar though to the underlying current approach that we use in the model (that Christian described in his response).
While developing this model as a community effort we try really hard to avoid eurocentric views, but since the whole premise of OntoLex is to model data, we rely on existing resources, not inventing them from scratch. So the model can represent word lists as well as computational lexicons or many other types of lexicographic data existing out there.
Best regards,
Max
On 18/09/2023 19:29, Ada Wan via Corpora wrote:
Dear Hugh
An alternative would be to use dictionaries (as in, "{ }" in python) to group characters belonging to the consonant and vowel groups (or at least one of them) and then examine accordingly. This should render more scientific insights on sequences than relying on bigger spans of hard-coded information (on "word"/"morph(eme)"-level), esp. when one calculates the transition probabilities and interpret accordingly. (Remember the whitespaces (if any) and use continuous texts/data (i.e. not just colonial data e.g. "word lists")!) What is the purpose, though, of your task that is supposed to be related to "morphology"?
Best Ada
On Mon, Sep 18, 2023 at 6:07 PM Christian Chiarcos via Corpora corpora@list.elra.info wrote:
What I forgot to state is the most elementary aspect: an ontolex:LexicalEntry can be associated with a (nonconcatenative) morph:Morph or a (nonconcatenative) morph:Rule/morph:Replacement in the following ways: - for word formation: the lexical entry (e.g., a lexinfo:Root) from which one or more derived forms can be encoded as the vartrans:source of a morph:WordFormationRelation and an associated morph:WordFormationRule - for inflection: the lexical entry can have an ontolex:morphologicalPattern relation pointing to a morph:Paradigm. Such paradigms are the morph:paradigm of morph:InflectionRules. Both morph:InflectionRule and morph:WordFormationRule are morph:Rules and can thus be connected to a non-concatenative morpheme (or replacement) as described in the other email. Our current real-world examples for noncontenative morphology are from word formation, only. I guess your usecase is more in the inflection area (because for word formation, it would be practical to give a lexical sense, and then you'd need a LexicalEntry anyway), but the noncontenative part of the specification (by means of regular expressions and capturing groups in morph:Replacement) is identical in both use scenarios. Best, Christian Am Mo., 18. Sept. 2023 um 16:45 Uhr schrieb Christian Chiarcos <christian.chiarcos@gmail.com>: Dear Hugh, this has been addressed in the context of the emerging OntoLex-Morph vocabulary (https://www.w3.org/community/ontolex/wiki/Morphology, https://github.com/ontolex/morph; most recent diagram under https://github.com/ontolex/morph/blob/master/doc/diagrams/Readme.md). Here, a morph:Morph object (a lexical entry of a lexical resource for morphemes, depending on the type of resource, this can be a morpheme or an allomorph of a morpheme), can be the object of a morph:involves property that connects it with a morph:Rule. This morph:Rule can have one or more morph:replacement properties. The morph:Replacement objects that this points to use regular expressions to formalize source and target strings of the rule associated with that particular morph(eme). These use Perl/Java/SPARQL-style regex syntax, which includes the support for capturing groups. Note that this formalizes the form side of morphemes only, not the meaning side. However, a morph:Rule can also have a morph:grammaticalMeaning property to which such information can be added. Last week, Max Ionov and Mike Rosner have described the application (and an extension) of this mechanism for Maltese in a recent LDK paper: Beyond Concatenative Morphology: Applying OntoLex-Morph to Maltese /Maxim Ionov, Mike Rosner/. (Not online, yet.) We were also looking into other Semitic languages (and related phenomena such as Umlaut in German or vowel harmony in Turkic), but only on individual examples. If anyone is interested in discussing this further, please join the biweekly OntoLex-Morph calls ;) The OntoLex-Morph vocabulary is relatively advanced, and we are in the process of freezing it in order to prepare its publication. Finalization of the report is expected for mid-next year. Best, Christian Am Mo., 18. Sept. 2023 um 15:31 Uhr schrieb Hugh Paterson III via Corpora <corpora@list.elra.info>: Greetings, Does anyone know of any descriptions or approaches to using Ontolex/lemon with non-concatenative morphology? Is the assumption that Cv1Cv2C shaped words will have their own entries for each instance of changes for v1 and v2? If this is the case, then this radically increases the number of items in a dictionary when compared with languages with affix type morphology. Any pointers appreciated, Kind regards, Hugh _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list --corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email tocorpora-leave@list.elra.info
Dear Max
Thanks for your message.
My suggestion works with all texts (regardless of how "prototypical" or "clean"/"dirty" --- note that it is our acceptance, of how real and frequent "prototypical/canonical/textbook linguistic phenomena" are, that needs adjustment) and all "morphological" types*, and with just a few lines of code. I had read Hugh's question and Christian's replies.
*Please recall that despite how morphology has come to be an academic concentration, it is not a necessary method for the classification of language varieties. (There may be other decomposition methods that are more universal, but "linguistic morphology" or analysis thereof is not one.)
Thanks for your feedback nonetheless.
Best Ada
On Tue, Sep 19, 2023 at 1:02 PM Max Ionov via Corpora < corpora@list.elra.info> wrote:
Dear Ada,
I am afraid your suggestion is something that the Ontolex-lemon model does not support (which is important since the question was specifically about that model for representing lexicographic information, not just general approaches for dealing with non-concatenative morphology). It is very similar though to the underlying current approach that we use in the model (that Christian described in his response).
While developing this model as a community effort we try really hard to avoid eurocentric views, but since the whole premise of OntoLex is to model data, we rely on existing resources, not inventing them from scratch. So the model can represent word lists as well as computational lexicons or many other types of lexicographic data existing out there.
Best regards,
Max
On 18/09/2023 19:29, Ada Wan via Corpora wrote:
Dear Hugh
An alternative would be to use dictionaries (as in, "{ }" in python) to group characters belonging to the consonant and vowel groups (or at least one of them) and then examine accordingly. This should render more scientific insights on sequences than relying on bigger spans of hard-coded information (on "word"/"morph(eme)"-level), esp. when one calculates the transition probabilities and interpret accordingly. (Remember the whitespaces (if any) and use continuous texts/data (i.e. not just colonial data e.g. "word lists")!) What is the purpose, though, of your task that is supposed to be related to "morphology"?
Best Ada
On Mon, Sep 18, 2023 at 6:07 PM Christian Chiarcos via Corpora < corpora@list.elra.info> wrote:
What I forgot to state is the most elementary aspect: an ontolex:LexicalEntry can be associated with a (nonconcatenative) morph:Morph or a (nonconcatenative) morph:Rule/morph:Replacement in the following ways:
- for word formation: the lexical entry (e.g., a lexinfo:Root) from which
one or more derived forms can be encoded as the vartrans:source of a morph:WordFormationRelation and an associated morph:WordFormationRule
- for inflection: the lexical entry can have an
ontolex:morphologicalPattern relation pointing to a morph:Paradigm. Such paradigms are the morph:paradigm of morph:InflectionRules.
Both morph:InflectionRule and morph:WordFormationRule are morph:Rules and can thus be connected to a non-concatenative morpheme (or replacement) as described in the other email.
Our current real-world examples for noncontenative morphology are from word formation, only. I guess your usecase is more in the inflection area (because for word formation, it would be practical to give a lexical sense, and then you'd need a LexicalEntry anyway), but the noncontenative part of the specification (by means of regular expressions and capturing groups in morph:Replacement) is identical in both use scenarios.
Best, Christian
Am Mo., 18. Sept. 2023 um 16:45 Uhr schrieb Christian Chiarcos < christian.chiarcos@gmail.com>:
Dear Hugh,
this has been addressed in the context of the emerging OntoLex-Morph vocabulary (https://www.w3.org/community/ontolex/wiki/Morphology, https://github.com/ontolex/morph; most recent diagram under https://github.com/ontolex/morph/blob/master/doc/diagrams/Readme.md). Here, a morph:Morph object (a lexical entry of a lexical resource for morphemes, depending on the type of resource, this can be a morpheme or an allomorph of a morpheme), can be the object of a morph:involves property that connects it with a morph:Rule. This morph:Rule can have one or more morph:replacement properties. The morph:Replacement objects that this points to use regular expressions to formalize source and target strings of the rule associated with that particular morph(eme). These use Perl/Java/SPARQL-style regex syntax, which includes the support for capturing groups.
Note that this formalizes the form side of morphemes only, not the meaning side. However, a morph:Rule can also have a morph:grammaticalMeaning property to which such information can be added. Last week, Max Ionov and Mike Rosner have described the application (and an extension) of this mechanism for Maltese in a recent LDK paper: Beyond Concatenative Morphology: Applying OntoLex-Morph to Maltese *Maxim Ionov, Mike Rosner*. (Not online, yet.) We were also looking into other Semitic languages (and related phenomena such as Umlaut in German or vowel harmony in Turkic), but only on individual examples. If anyone is interested in discussing this further, please join the biweekly OntoLex-Morph calls ;)
The OntoLex-Morph vocabulary is relatively advanced, and we are in the process of freezing it in order to prepare its publication. Finalization of the report is expected for mid-next year.
Best, Christian
Am Mo., 18. Sept. 2023 um 15:31 Uhr schrieb Hugh Paterson III via Corpora corpora@list.elra.info:
Greetings,
Does anyone know of any descriptions or approaches to using Ontolex/lemon with non-concatenative morphology? Is the assumption that Cv1Cv2C shaped words will have their own entries for each instance of changes for v1 and v2? If this is the case, then this radically increases the number of items in a dictionary when compared with languages with affix type morphology.
Any pointers appreciated,
Kind regards, Hugh _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.infohttps://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info