[Corpora-List] Re: Query: Guide or advice for crowd-sourcing linguistic annotations

14 Oct 2022

      Hi Robert,
I am a big proponent of comparative annotations (like paired comparison or
best--worst scaling) rather than the more commonly used rating scales.
Comparative questions such as 'which item is more positive?' usually works
much better than asking something like 'is this neutral or slightly
positive or moderately positive or ...?".
Here is some work and scripts that may be of interest:
http://saifmohammad.com/WebPages/BestWorst.html
Another favorite is to intersperse a small percentage of hidden "gold"
questions. Items that are pre-annotated by your team say. They are usually
simple items to annotate (and not boundary cases). If one gets a large
percentage of these questions wrong, then perhaps they are not the best
annotators. However, always double check to see if the expert gold
annotations are missing something. This is also discussed in the papers at
the link above.
Finally, this just-up-on-ArXiv paper might be of interest as well:
Best Practices in the Creation and Use of Emotion Lexicons
https://arxiv.org/abs/2210.07206
Cheers.
-Saif
On Wed, Oct 12, 2022 at 6:44 PM Robert Fuchs via Corpora <
corpora@list.elra.info> wrote:
...
Dear all
I'm looking for a guide or advice for crowd-sourcing linguistic
annotations via platforms  such as Mechanical Turk. I'm thinking of
rating tasks such as evaluating positive and negative sentiment in
sentences or annotating concordances from a corpus for a certain
property (e.g. deontic v epistemic meaning in modal verbs).
Specifically, I'm wondering

How can I ensure that the annotations are of sufficient quality? I

don't have a gold standard for all the data, after all this is why I
need the annotations. If I get all the data annotated by two or three
independent annotators, I can ensure adequate quality. But then I might
still get annotators who more or less submit random annotations (or
start doing so after a while), or at least it would take me very long to
find out who is doing so.

How do I find out what remuneration is adequate?
What is a good way to split up the data for annotation? Single

annotation units or, say, 50 or 100 at a time? How do I deliver them
effectively to the annotators?
Many thanks and best wishes
Robert
--
Prof. Dr. Robert Fuchs (JP) | Department of English Language and
Literature/Institut für Anglistik und Amerikanistik | University of
Hamburg | Überseering 35, 22297 Hamburg, Germany | Room 07076 |
https://uni-hamburg.academia.edu/RobertFuchs |
https://sites.google.com/view/rflinguistics/
Mailing list on varieties of English/World Englishes/ENL-ESL-EFL.
Subscribe here: https://groups.google.com/forum/#!forum/var-eng/join
Are you a non-native speaker of English? Please help us by taking this
short survey on when and how you use the English language:
https://lamapoll.de/englishusageofnonnativespeakers-1/
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-leave@list.elra.info
-- 
Saif M. Mohammad (he/him)
Senior Research Scientist
National Research Council Canada
http://www.saifmohammad.com

2025

2024

2023

2022

[Corpora-List] Re: Query: Guide or advice for crowd-sourcing linguistic annotations