[Corpora-List] Re: USAS Tagger: same output for Python version and web demo?

9 Mar 2023

      Dear Paul
thank you for your reply and for the details, I'll look at those links. And
thank you for making USAS available in Python!
tony
On Thu, Mar 9, 2023 at 5:34 AM Rayson, Paul p.rayson@lancaster.ac.uk
wrote:
...
Hi Tony,
The web demo (http://ucrel-api.lancaster.ac.uk/usas/tagger.html) uses my
original C version of the English semantic tagger incorporating POS tagging
by CLAWS, whereas PyMUSAS is the fully open source version built on the
spaCy pipeline. The semantic lexicons for PyMUSAS have POS tags mapped from
the C7 tagset to those used in spaCy, and there are some differences with
how PyMUSAS implements MWEs. For further info, please see where we are in
the Python implementation as documented and in our open issues, e.g. here:
https://ucrel.github.io/pymusas/
https://github.com/UCREL/pymusas/issues/26
https://github.com/UCREL/pymusas/issues/24
I’m starting to add PyMUSAS web demo pages as well at:
http://ucrel-api.lancaster.ac.uk/pymusas/tagger.html for example, Welsh
went live last week.
Paul.
--
Paul Rayson
Director of UCREL and Professor of Natural Language Processing
Group Lead (SCC Data Science)
School of Computing and Communications, InfoLab21, Lancaster University,
Lancaster, LA1 4WA, UK.
Web: http://www.research.lancs.ac.uk/portal/en/people/Paul-Rayson/
Tel: +44 1524 510357
Contact me on Teams
https://teams.microsoft.com/l/chat/0/0?users=p.rayson@lancaster.ac.uk
*From: *Tony Berber-Sardinha via Corpora corpora@list.elra.info
*Date: *Tuesday, 7 March 2023 at 03:57
*To: *CORPORA New List corpora@list.elra.info
*Subject: *[External] [Corpora-List] Re: USAS Tagger: same output for
Python version and web demo?
*This email originated outside the University. Check before clicking links
or attachments.*
I found a workaround:
in:
pymusas/spacy_api/taggers/rule_based.py
change this:
return RuleBasedTagger(name, pymusas_tags_token_attr,
                       pymusas_mwe_indexes_attr,
                       pos_attribute, lemma_attribute)

to this:
return RuleBasedTagger(name, pymusas_tags_token_attr,
                       pos_attribute, lemma_attribute)

the pymusas output looks like this now:
Text Lemma POS USAS Tags
the the DET ['Z5']
characteristics characteristic NOUN ['O4.1', 'A4.2+', 'N2']
of of ADP ['Z5']
the the DET ['Z5']
network network NOUN ['Q4.3', 'Y2', 'S1.1.1']
are be VERB ['A3+', 'Z5']
on on ADP ['M6', 'A1.1.1']
the the DET ['Z5']
table table NOUN ['Q2.2']
. . PUNCT ['Z99']
SPACE ['Z99']

On Tue, Mar 7, 2023 at 12:03 AM Tony Berber-Sardinha <
tonycorpuslg@gmail.com> wrote:
Dear all
I'm using the python implementation of the USAS tagger, pymusas.
I noitced that the output from pymusas is different from the web demo
version.
For example, the phrase:
'the characteristics of the network'
is tagged like this by pymusas:
the the DET ['Z5']
characteristics characteristic NOUN ['Df/A5.1+++mfnc']
of of ADP ['Df/A5.1+++mfnc']
the the DET ['Df/A5.1+++mfnc']
network network NOUN ['Df/A5.1+++mfnc']
that is, the same tag is applied to the whole noun phrase.
but is tagged like this on the web:
0000003 010  AT      the                      Z5
0000003 020  NN2     characteristics          O4.1 A4.2+ N2
0000003 030  IO      of                       Z5
0000003 040  AT      the                      Z5
0000003 050  NN1     network                  S5+c Q4.3 Y2
in this case, each word in the noun phrase receives its own tag.
or:
'on the table'
pymusas:
on on ADP ['N6']
the the DET ['N6']
table table NOUN ['N6']
web:
0000003 010  II      on                       N6[i1.3.1 Z5
0000003 020  AT      the                      N6[i1.3.2 Z5
0000003 030  NN1     table                    N6[i1.3.3 H5 Q1.2 N2
I'm wondering if it's possible for pymusas to generate output similar to
the web demo's output. Specifically, I'd like to obtain individual tags for
each word, rather than just the tag for the entire multiword expression.
I've used the following python code:
import spacy
# We exclude the following components as we do not need them.
nlp = spacy.load('en_core_web_sm', exclude=['parser', 'ner'])
# Load the English PyMUSAS rule based tagger in a separate spaCy pipeline
english_tagger_pipeline = spacy.load('en_dual_none_contextual')
# Adds the English PyMUSAS rule based tagger to the main spaCy pipeline
nlp.add_pipe('pymusas_rule_based_tagger', source=english_tagger_pipeline)
output_doc = nlp(text)
print(f'Text\tLemma\tPOS\tUSAS Tags')
for token in output_doc:
print(f'{token.text}\t{token.lemma_}\t{token.pos_}\t{token._.pymusas_tags}')
thank you ahead!
Tony Berber Sardinha

2026

2025

2024

2023

2022

[Corpora-List] Re: USAS Tagger: same output for Python version and web demo?