Dear all
I'm looking for a guide or advice for crowd-sourcing linguistic annotations via platforms such as Mechanical Turk. I'm thinking of rating tasks such as evaluating positive and negative sentiment in sentences or annotating concordances from a corpus for a certain property (e.g. deontic v epistemic meaning in modal verbs).
Specifically, I'm wondering
- How can I ensure that the annotations are of sufficient quality? I don't have a gold standard for all the data, after all this is why I need the annotations. If I get all the data annotated by two or three independent annotators, I can ensure adequate quality. But then I might still get annotators who more or less submit random annotations (or start doing so after a while), or at least it would take me very long to find out who is doing so. - How do I find out what remuneration is adequate? - What is a good way to split up the data for annotation? Single annotation units or, say, 50 or 100 at a time? How do I deliver them effectively to the annotators?
Many thanks and best wishes Robert