Dear All,
I am looking for treebanks (of any kind; dependency, constituency, LFG, HPSG, …) with good – preferably manual – unambiguous annotation of coordinate structures, for any language.
A typical UD treebank does not have a good annotation of coordinations, because vanilla UD does not distinguish between dependents of single conjuncts, as in *I [came and [bought a book]]*, and shared dependents of conjuncts, as in *I [[saw and bought] a book]*. Enhanced UD can in principle make this distinction, but many EUD treebanks are automatically converted from vanilla UD treebanks, so this information is also often not available or not reliable. On the other hand, many constituency treebanks (including PTB) do not have explicit information about governors of coordinations (in *I bought John and Mary interesting books* the governor of *John and Mary* is *bought* and not, say, *books*), and – perhaps surprisingly – it is often not easy to guess the governor. So I am looking for treebanks that wear both kinds of information – about shared dependents and about governors – on their sleeves.
Thanks, best, Adam P.
Dear Adam,
The Redwoods treebank for English should have the information you seek:
https://github.com/delph-in/docs/wiki/RedwoodsTop
Emily
On Thu, Oct 13, 2022 at 12:07 AM Adam Przepiórkowski via Corpora < corpora@list.elra.info> wrote:
Dear All,
I am looking for treebanks (of any kind; dependency, constituency, LFG, HPSG, …) with good – preferably manual – unambiguous annotation of coordinate structures, for any language.
A typical UD treebank does not have a good annotation of coordinations, because vanilla UD does not distinguish between dependents of single conjuncts, as in *I [came and [bought a book]]*, and shared dependents of conjuncts, as in *I [[saw and bought] a book]*. Enhanced UD can in principle make this distinction, but many EUD treebanks are automatically converted from vanilla UD treebanks, so this information is also often not available or not reliable. On the other hand, many constituency treebanks (including PTB) do not have explicit information about governors of coordinations (in *I bought John and Mary interesting books* the governor of *John and Mary* is *bought* and not, say, *books*), and – perhaps surprisingly – it is often not easy to guess the governor. So I am looking for treebanks that wear both kinds of information – about shared dependents and about governors – on their sleeves.
Thanks, best, Adam P. _______________________________________________ Corpora mailing list -- corpora@list.elra.info
https://urldefense.com/v3/__https://list.elra.info/mailman3/postorius/lists/...
To unsubscribe send an email to corpora-leave@list.elra.info
Hi Adam,
aware of this problem, we added some features in the SUD (Surface-Syntactic Universal Dependencies) for coordination: - A feature Shared=Yes (or No) on shared dependents: http://universal.grew.fr/?custom=63495b6f8a644 - A feature @emb on embedded coordinations (either Peter and Bill or Tom): http://universal.grew.fr/?custom=63495b3e5743f
The annotation scheme is explained at https://surfacesyntacticud.github.io/guidelines/u/particular_phenomena/coord.... SUD treebanks are automatically converted into UD, but, as you mentioned, only a part of the information can be recovered in the UD=>SUD conversion.
The Shared feature is presented in our last paper on SUD: Gerdes K., Guillaume B., Kahane S, Perrier G. (2021) Starting a new treebank? Go SUD! https://aclanthology.org/2021.depling-1.4.pdf, Proceedings of 6th international conference on Dependency Linguistics (DepLing), SyntaxFest, ACL, 11 p.
Best Sy
Le 13 oct. 2022 à 09:06, Adam Przepiórkowski via Corpora corpora@list.elra.info a écrit :
Dear All,
I am looking for treebanks (of any kind; dependency, constituency, LFG, HPSG, …) with good – preferably manual – unambiguous annotation of coordinate structures, for any language.
A typical UD treebank does not have a good annotation of coordinations, because vanilla UD does not distinguish between dependents of single conjuncts, as in I [came and [bought a book]], and shared dependents of conjuncts, as in I [[saw and bought] a book]. Enhanced UD can in principle make this distinction, but many EUD treebanks are automatically converted from vanilla UD treebanks, so this information is also often not available or not reliable. On the other hand, many constituency treebanks (including PTB) do not have explicit information about governors of coordinations (in I bought John and Mary interesting books the governor of John and Mary is bought and not, say, books), and – perhaps surprisingly – it is often not easy to guess the governor. So I am looking for treebanks that wear both kinds of information – about shared dependents and about governors – on their sleeves.
Thanks, best, Adam P. _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info
Dear Adam,
We have looked at this problem for EWT (English Enhanced UD). See our paper at EACL 2021 Coordinate Constructions in English Enhanced Universal Dependencies: Analysis and Computational Modeling - ACL Anthologyhttps://aclanthology.org/2021.eacl-main.67/. We made the annotated data available, it should be in the UD repo, but just in case and to make sure, feel free to contact Stefan!
Mit freundlichen Grüßen / Best regards
Dr. Annemarie Friedrich
Natural Language Processing and Semantic Reasoning (CR/PJ-AI-R26) Robert Bosch GmbH | Postfach 10 60 50 | 70049 Stuttgart | GERMANY | www.bosch.com Tel. +49 711 811-49626 | Mobil +49 172 3008243 | Annemarie.Friedrich@de.bosch.commailto:Annemarie.Friedrich@de.bosch.com
Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000; Aufsichtsratsvorsitzender: Prof. Dr. Stefan Asenkerschbaumer; Geschäftsführung: Dr. Stefan Hartung, Dr. Christian Fischer, Filiz Albrecht, Dr. Markus Forschner, Dr. Markus Heyn, Rolf Najork From: Sylvain Kahane sylvain@kahane.fr Sent: Freitag, 14. Oktober 2022 15:00 To: Corpora corpora@list.elra.info Subject: [Corpora-List] Re: treebanks with good annotation of coordination? (any language, any syntactic schema)
Hi Adam,
aware of this problem, we added some features in the SUD (Surface-Syntactic Universal Dependencies) for coordination: - A feature Shared=Yes (or No) on shared dependents: http://universal.grew.fr/?custom=63495b6f8a644 - A feature @emb on embedded coordinations (either Peter and Bill or Tom): http://universal.grew.fr/?custom=63495b3e5743f
The annotation scheme is explained at https://surfacesyntacticud.github.io/guidelines/u/particular_phenomena/coord.... SUD treebanks are automatically converted into UD, but, as you mentioned, only a part of the information can be recovered in the UD=>SUD conversion.
The Shared feature is presented in our last paper on SUD: Gerdes K., Guillaume B., Kahane S, Perrier G. (2021) Starting a new treebank? Go SUD!https://aclanthology.org/2021.depling-1.4.pdf, Proceedings of 6th international conference on Dependency Linguistics (DepLing), SyntaxFest, ACL, 11 p.
Best Sy
Le 13 oct. 2022 à 09:06, Adam Przepiórkowski via Corpora <corpora@list.elra.infomailto:corpora@list.elra.info> a écrit :
Dear All,
I am looking for treebanks (of any kind; dependency, constituency, LFG, HPSG, …) with good – preferably manual – unambiguous annotation of coordinate structures, for any language.
A typical UD treebank does not have a good annotation of coordinations, because vanilla UD does not distinguish between dependents of single conjuncts, as in I [came and [bought a book]], and shared dependents of conjuncts, as in I [[saw and bought] a book]. Enhanced UD can in principle make this distinction, but many EUD treebanks are automatically converted from vanilla UD treebanks, so this information is also often not available or not reliable. On the other hand, many constituency treebanks (including PTB) do not have explicit information about governors of coordinations (in I bought John and Mary interesting books the governor of John and Mary is bought and not, say, books), and – perhaps surprisingly – it is often not easy to guess the governor. So I am looking for treebanks that wear both kinds of information – about shared dependents and about governors – on their sleeves.
Thanks, best, Adam P. _______________________________________________ Corpora mailing list -- corpora@list.elra.infomailto:corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.infomailto:corpora-leave@list.elra.info
Hi Adam,
�
The GUM corpus (a.k.a. UD English GUM) has manually annotated dependencies, which I think express the distinction you are talking about. If an object is a dependent of only one of the coordinate nodes, it will be attached to that node and not arbitrarily to the initial one, since GUM is not converted from a constituent treebank.
�
Examples like [[saw and bought] a book]:
�
https://gucorpling.org/annis/#_q=dG9rIC0-ZGVwIHRva19mdW5jPSJvYmoiICYgdG9rX2Z... https://gucorpling.org/annis/#_q=dG9rIC0-ZGVwIHRva19mdW5jPSJvYmoiICYgdG9rX2Z1bmM9ImNvbmoiICYgIzEgLT5kZXAgIzMgJiAjMy4qIzI&_c=R1VN&cl=5&cr=5&s=0&l=10 &_c=R1VN&cl=5&cr=5&s=0&l=10
�
Examples like [came and [bought a book]]:
�
https://gucorpling.org/annis/#_q=dG9rX2Z1bmM9ImNvbmoiIC0-ZGVwIHRva19mdW5jPSJ... https://gucorpling.org/annis/#_q=dG9rX2Z1bmM9ImNvbmoiIC0-ZGVwIHRva19mdW5jPSJvYmoi&_c=R1VN&cl=5&cr=5&s=0&l=10 &_c=R1VN&cl=5&cr=5&s=0&l=10
�
The same is true of other UD datasets produced by Georgetown/collaborators, including UD English-GUMReddit, UD Hebrew-IAHLTWiki and UD Coptic-Scriptorium. If you want a large, high accuracy but automatically parsed dataset with the same properties, which has many examples of both constructions but with some errors (LAS=92.16, UAS=94.25), you may also want to take a look at AMALGUM:
�
https://gucorpling.org/gum/amalgum.html
�
Hope this helps,
Amir
------------
Dr. Amir Zeldes
Assoc. Prof. of Computational Linguistics
Department of Linguistics
Georgetown University
1437 37th St. NW
Washington, DC 20057
�
https://gucorpling.org/amir https://gucorpling.org/amir
�
�
�
From: Adam Przepiórkowski via Corpora corpora@list.elra.info Sent: Thursday, October 13, 2022 3:06 AM To: corpora@list.elra.info Subject: [Corpora-List] treebanks with good annotation of coordination? (any language, any syntactic schema)
�
Dear All,
�
I am looking for treebanks (of any kind; dependency, constituency, LFG, HPSG, …) with good – preferably manual – unambiguous annotation of coordinate structures, for any language.
�
A typical UD treebank does not have a good annotation of coordinations, because vanilla UD does not distinguish between dependents of single conjuncts, as in I [came and [bought a book]], and shared dependents of conjuncts, as in I [[saw and bought] a book]. � Enhanced UD can in principle make this distinction, but many EUD treebanks are automatically converted from vanilla UD treebanks, so this information is also often not available or not reliable. � On the other hand, many constituency treebanks (including PTB) do not have explicit information about governors of coordinations (in I bought John and Mary interesting books the governor of John and Mary is bought and not, say, books), and – perhaps surprisingly – it is often not easy to guess the governor. � So I am looking for treebanks that wear both kinds of information – about shared dependents and about governors – on their sleeves.
�
Thanks, best,
Adam P.
Hi Adam,
Just to clarify, I think your examples would actually receive different structures in (basic) UD:
*I [came and [bought a book]]* - "book" should attach to "bought", making it clear that it is not the object of "came"
*I [[saw and bought] a book] *- "book" should attach to "saw" as the first element of the coordination
The second case is formally ambiguous because, per the UD tree, "book" could properly be the object of "saw and "bought" or just "saw" (but the latter would be weird given English syntax).
Another illustration of the ambiguity is with modifiers:
*I [recently sang] and [danced]*
*I recently [sang and danced]*
would be the same in basic UD, and are both valid interpretations in English.
Cheers, Nathan
On Thu, Oct 13, 2022 at 3:07 AM Adam Przepiórkowski via Corpora < corpora@list.elra.info> wrote:
Dear All,
I am looking for treebanks (of any kind; dependency, constituency, LFG, HPSG, …) with good – preferably manual – unambiguous annotation of coordinate structures, for any language.
A typical UD treebank does not have a good annotation of coordinations, because vanilla UD does not distinguish between dependents of single conjuncts, as in *I [came and [bought a book]]*, and shared dependents of conjuncts, as in *I [[saw and bought] a book]*. Enhanced UD can in principle make this distinction, but many EUD treebanks are automatically converted from vanilla UD treebanks, so this information is also often not available or not reliable. On the other hand, many constituency treebanks (including PTB) do not have explicit information about governors of coordinations (in *I bought John and Mary interesting books* the governor of *John and Mary* is *bought* and not, say, *books*), and – perhaps surprisingly – it is often not easy to guess the governor. So I am looking for treebanks that wear both kinds of information – about shared dependents and about governors – on their sleeves.
Thanks, best, Adam P. _______________________________________________ Corpora mailing list -- corpora@list.elra.info https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/ To unsubscribe send an email to corpora-leave@list.elra.info