Hi Adam,
�
The GUM corpus (a.k.a. UD English GUM) has manually annotated dependencies, which I think express the distinction you are talking about. If an object is a dependent of only one of the coordinate nodes, it will be attached to that node and not arbitrarily to the initial one, since GUM is not converted from a constituent treebank.
�
Examples like [[saw and bought] a book]:
�
https://gucorpling.org/annis/#_q=dG9rIC0-ZGVwIHRva19mdW5jPSJvYmoiICYgdG9rX2Z... https://gucorpling.org/annis/#_q=dG9rIC0-ZGVwIHRva19mdW5jPSJvYmoiICYgdG9rX2Z1bmM9ImNvbmoiICYgIzEgLT5kZXAgIzMgJiAjMy4qIzI&_c=R1VN&cl=5&cr=5&s=0&l=10 &_c=R1VN&cl=5&cr=5&s=0&l=10
�
Examples like [came and [bought a book]]:
�
https://gucorpling.org/annis/#_q=dG9rX2Z1bmM9ImNvbmoiIC0-ZGVwIHRva19mdW5jPSJ... https://gucorpling.org/annis/#_q=dG9rX2Z1bmM9ImNvbmoiIC0-ZGVwIHRva19mdW5jPSJvYmoi&_c=R1VN&cl=5&cr=5&s=0&l=10 &_c=R1VN&cl=5&cr=5&s=0&l=10
�
The same is true of other UD datasets produced by Georgetown/collaborators, including UD English-GUMReddit, UD Hebrew-IAHLTWiki and UD Coptic-Scriptorium. If you want a large, high accuracy but automatically parsed dataset with the same properties, which has many examples of both constructions but with some errors (LAS=92.16, UAS=94.25), you may also want to take a look at AMALGUM:
�
https://gucorpling.org/gum/amalgum.html
�
Hope this helps,
Amir
------------
Dr. Amir Zeldes
Assoc. Prof. of Computational Linguistics
Department of Linguistics
Georgetown University
1437 37th St. NW
Washington, DC 20057
�
https://gucorpling.org/amir https://gucorpling.org/amir
�
�
�
From: Adam Przepiórkowski via Corpora corpora@list.elra.info Sent: Thursday, October 13, 2022 3:06 AM To: corpora@list.elra.info Subject: [Corpora-List] treebanks with good annotation of coordination? (any language, any syntactic schema)
�
Dear All,
�
I am looking for treebanks (of any kind; dependency, constituency, LFG, HPSG, …) with good – preferably manual – unambiguous annotation of coordinate structures, for any language.
�
A typical UD treebank does not have a good annotation of coordinations, because vanilla UD does not distinguish between dependents of single conjuncts, as in I [came and [bought a book]], and shared dependents of conjuncts, as in I [[saw and bought] a book]. � Enhanced UD can in principle make this distinction, but many EUD treebanks are automatically converted from vanilla UD treebanks, so this information is also often not available or not reliable. � On the other hand, many constituency treebanks (including PTB) do not have explicit information about governors of coordinations (in I bought John and Mary interesting books the governor of John and Mary is bought and not, say, books), and – perhaps surprisingly – it is often not easy to guess the governor. � So I am looking for treebanks that wear both kinds of information – about shared dependents and about governors – on their sleeves.
�
Thanks, best,
Adam P.