I found a workaround:
in:
pymusas/spacy_api/taggers/rule_based.py
change this:
return RuleBasedTagger(name, pymusas_tags_token_attr, pymusas_mwe_indexes_attr, pos_attribute, lemma_attribute)
to this:
return RuleBasedTagger(name, pymusas_tags_token_attr, pos_attribute, lemma_attribute)
the pymusas output looks like this now:
Text Lemma POS USAS Tags
the the DET ['Z5']
characteristics characteristic NOUN ['O4.1', 'A4.2+', 'N2']
of of ADP ['Z5']
the the DET ['Z5']
network network NOUN ['Q4.3', 'Y2', 'S1.1.1']
are be VERB ['A3+', 'Z5']
on on ADP ['M6', 'A1.1.1']
the the DET ['Z5']
table table NOUN ['Q2.2']
. . PUNCT ['Z99']
SPACE ['Z99']
On Tue, Mar 7, 2023 at 12:03 AM Tony Berber-Sardinha tonycorpuslg@gmail.com wrote:
Dear all
I'm using the python implementation of the USAS tagger, pymusas.
I noitced that the output from pymusas is different from the web demo version.
For example, the phrase:
'the characteristics of the network'
is tagged like this by pymusas:
the the DET ['Z5']
characteristics characteristic NOUN ['Df/A5.1+++mfnc']
of of ADP ['Df/A5.1+++mfnc']
the the DET ['Df/A5.1+++mfnc']
network network NOUN ['Df/A5.1+++mfnc']
that is, the same tag is applied to the whole noun phrase.
but is tagged like this on the web:
0000003 010 AT the Z5 0000003 020 NN2 characteristics O4.1 A4.2+ N2 0000003 030 IO of Z5 0000003 040 AT the Z5 0000003 050 NN1 network S5+c Q4.3 Y2
in this case, each word in the noun phrase receives its own tag.
or:
'on the table'
pymusas:
on on ADP ['N6']
the the DET ['N6']
table table NOUN ['N6']
web:
0000003 010 II on N6[i1.3.1 Z5 0000003 020 AT the N6[i1.3.2 Z5 0000003 030 NN1 table N6[i1.3.3 H5 Q1.2 N2
I'm wondering if it's possible for pymusas to generate output similar to the web demo's output. Specifically, I'd like to obtain individual tags for each word, rather than just the tag for the entire multiword expression.
I've used the following python code:
import spacy
# We exclude the following components as we do not need them. nlp = spacy.load('en_core_web_sm', exclude=['parser', 'ner']) # Load the English PyMUSAS rule based tagger in a separate spaCy pipeline english_tagger_pipeline = spacy.load('en_dual_none_contextual') # Adds the English PyMUSAS rule based tagger to the main spaCy pipeline nlp.add_pipe('pymusas_rule_based_tagger', source=english_tagger_pipeline)
output_doc = nlp(text)
print(f'Text\tLemma\tPOS\tUSAS Tags') for token in output_doc:
print(f'{token.text}\t{token.lemma_}\t{token.pos_}\t{token._.pymusas_tags}')
thank you ahead!
Tony Berber Sardinha