Hi Robert,
�
I’m not sure if this goes beyond the scope of the original question, but if you have students and an opportunity to introduce annotation tasks into the curriculum as part of coursework, then I recommend considering ‘class-sourcing’, i.e. letting students annotate and also participate in developing the guidelines.
�
Students working in the field are already experts compared to crowd workers who are unfamiliar with the underlying theories, and in my experience student response to such projects has been very positive. They often report learning a lot from it and finding it more practical and rewarding than assignments which do not have a practical outcome. I wrote a paper about doing this here:
�
https://link.springer.com/article/10.1007/s10579-016-9343-x
�
Best,
Amir
�
From: Saif Mohammad via Corpora corpora@list.elra.info Sent: Thursday, October 13, 2022 9:09 PM To: Robert Fuchs robert.fuchs.dd@googlemail.com Cc: corpora@list.elra.info Subject: [Corpora-List] Re: Query: Guide or advice for crowd-sourcing linguistic annotations
�
Hi Robert,
�
I am a big proponent of comparative annotations (like paired comparison or best--worst scaling) rather than the more commonly used rating scales. Comparative questions such as 'which item is more positive?' usually works much better than asking something like 'is this neutral or slightly positive or moderately positive or ...?".
Here is some work and scripts that may be of interest:
http://saifmohammad.com/WebPages/BestWorst.html
�
Another favorite is to intersperse a small percentage of hidden "gold" questions. Items that are pre-annotated by your team say. They are usually simple items to annotate (and not boundary cases). If one gets a large percentage of these questions wrong, then perhaps they are not the best annotators. However, always double check to see if the expert gold annotations are missing something. This is also discussed in the papers at the link above.
�
Finally, this just-up-on-ArXiv paper might be of interest as well:
Best Practices in the Creation and Use of Emotion Lexicons
https://arxiv.org/abs/2210.07206
�
Cheers.
-Saif
�
�
On Wed, Oct 12, 2022 at 6:44 PM Robert Fuchs via Corpora <corpora@list.elra.info mailto:corpora@list.elra.info > wrote:
Dear all
I'm looking for a guide or advice for crowd-sourcing linguistic annotations via platforms � such as Mechanical Turk. I'm thinking of rating tasks such as evaluating positive and negative sentiment in sentences or annotating concordances from a corpus for a certain property (e.g. deontic v epistemic meaning in modal verbs).
Specifically, I'm wondering
- How can I ensure that the annotations are of sufficient quality? I don't have a gold standard for all the data, after all this is why I need the annotations. If I get all the data annotated by two or three independent annotators, I can ensure adequate quality. But then I might still get annotators who more or less submit random annotations (or start doing so after a while), or at least it would take me very long to find out who is doing so. - How do I find out what remuneration is adequate? - What is a good way to split up the data for annotation? Single annotation units or, say, 50 or 100 at a time? How do I deliver them effectively to the annotators?
Many thanks and best �wishes Robert