Hello,
Last minute reminder!
We're writing with an updated call for proposals for the Law and Corpus Linguistics Conference to be held on Friday, October 13, 2023, with pre-conference workshops on Thursday, October 12. The conference will be held at Brigham Young University's J. Reuben Clark Law School in Provo, Utah.
We're pleased to announce that the keynote address at this year's conference will be given by D. Gordon Smith, Dean of the J. Reuben Clark Law School at Brigham Young University. Gordon has made tremendous contributions to the field of law and corpus linguistics, and we're excited to have him as this year's keynote speaker.
Proposals are invited for individual papers and panels. We're open to submissions on a broad range of topics, including but not limited to:
* applications of corpus linguistics to the constitutional, statutory, contract, patent, trademark, probate, administrative, and criminal law in any state or nation;
* philosophical, normative, and pragmatic commentary on the use of corpus linguistics in the law;
* triangulation between corpus linguistics and other empirical methods in legal interpretation;
* corpus linguistic analysis of the law of countries other than the United States;
* the relationship between corpus linguistics and pragmatics (e.g. implicature, presupposition, sociolinguistic context);
* corpus-based analysis of legal discourse or topics;
* best practices in corpus design and corpus linguistic methods in legal settings.
We have a new proposal deadline of May 31, 2023. Proposals should include an abstract of no more than 750 words and complete contact information for presenters. Please send proposals to byulawcorpus(a)law.byu.edu<mailto:byulawcorpus@law.byu.edu>. More information about the conference can be found at https://corpusconference.byu.edu/2023-home/<https://l.facebook.com/l.php?u=https%3A%2F%2Fcorpusconference.byu.edu%2F202…>.
Best,
BYU LCL 2023 Conference Organizing Committee (Thomas Lee, Jesse Egbert, Brett Hashimoto, James Heilpern and James Phillips)
We're recruiting!
We have several positions open at CLASP, University of Gothenburg, Sweden.
Dateline: 1st of June 2023
Check the details in the link below
https://gu-clasp.github.io/recruitment/
===================================
=== The 1st UniDive webinar for newcomers to
Universal Dependencies, PARSEME and Grew-match
===
=== Monday, the 19th of June 2023
===
=== Event's page:
https://unidive.lisn.upsaclay.fr/doku.php?id=other-events:webinar-1
===================================
The UniDive <https://unidive.lisn.upsaclay.fr/>
COST Action (Universality, diversity and
idiosyncrasy in language technology) is happy to
announce its first webinar
<https://unidive.lisn.upsaclay.fr/doku.php?id=other-events:webinar-1>,
which will take place on *Monday, the 19th of June
2023* (online).
The objective of the webinar is to *train new
creators*, *maintainers* and *users* of Universal
Dependencies and PARSEME treebanks (annotated for
morphosyntax and multiord expressions), for the
sake of*language diversity* and
*universality*-driven language modelling.
Anyone interested in constructing and/or using
language corpora can join.
*Young* researchers and investigators, as well as
people working on *under-resourced* languages, are
particularly welcome.
To *register* to the event, please fill in this
form
<https://docs.google.com/forms/d/e/1FAIpQLSfw5yk1LH1y7X4tkhQWtto84aLShxIksYG…>
no later than *12 June 2023*.
The program is meant for *newcomers* to Universal
Dependencies <https://universaldependencies.org/>,
PARSEME
<https://gitlab.com/parseme/corpora/-/wikis>
and/or the Grew-match <https://match.grew.fr/>
treebank browser. It includes two parts:
* 9:00-12:00 CEST: /Introduction to Universal
Dependencies/ - by Marie-Catherine de Marneffe,
Joakim Nivre, and Daniel Zeman
* 14:00-17:00 CEST: /Introduction to PARSEME
and Grew-match/ - by Bruno Guillaume, Carlos
Ramisch and Agata Savary
More details about the schedule and the connection
link will be sent to the registered participants
after the deadline.
We also plan to record the tutorial and publish it
online after the event.
We are looking forward to meeting you online!
UniDive Core Group
The Institute of Artificial Intelligence invites applications for the position of a Doctoral or Postdoctoral Researcher (m/f/d) on the topic of Trustworthy Natural Language Processing (NLP) (Salary Scale 13 TV-L, 100 %) starting at the earliest possible date. The position is limited to a period of three years with the possibility of extension.
Tasks:
The goal of the offered position is to carry out innovative research that involves NLP, aiming for scientific publications at reputed international venues. The position is part of a collaborative project that explores how to employ AI to best support people in a sustainable and responsible use of resources. A focus is on the generation of factual and human-centered explanations, but we encourage the development of own research directions in this context. We are looking for highly motivated candidates with a passion for creativity and learning who seek to make a positive impact through open and independent research in a young team.
Your Profile
• Completed academic degree (Master or comparable) in computer science, computational linguistics, artificial intelligence, or related disciplines
• Solid understanding of machine learning with hands-on experience, ideally in the context of NLP
• Proficient programming skills in Python
• Good scientific writing skills (e.g., shown by a very good master’s thesis) are expected
• Strong communication skills in English, both in oral and in written form
Our Offer
• Creative and innovative work in a diverse and international team
• Possibility to obtain a Ph.D. degree or to shape your Postdoc profile
• State-of-the-art research facilities, including top-notch computing clusters
• Participation in international scientific events and research collaborations
• Salary at the level of 100% of salary scale 13 according to the Collective Agreement for the Public Service of the Länder (TV-L)
Contact
In case you have questions, please contact Maja Stahl (email: m.stahl(a)ai.uni-hannover.de). Further information about the NLP Group can be found at: https://www.ai.uni-hannover.de/en/institute/research-groups/nlp
Please submit your application with supporting documents (including CV, full set of transcripts, a brief statement of at most 1 page of why you apply to the NLP Group, and possibly further qualifications) by June 25, 2023 as a single PDF file to office(a)ai.uni-hannover.de (subject: “[ai-nlp] Application”)
*** Last Call for Papers ***
19th IEEE eScience Conference (eScience 2023)
October 9-13, 2023, St. Raphael Resort, Limassol, Cyprus
https://www.escience-conference.org/2023/
(*** Submission Deadline Extension: June 19, 2023, AoE, FIRM!)
eScience 2023 provides an interdisciplinary forum for researchers, developers, and users of
eScience applications and enabling IT technologies. Its objective is to promote and encourage
all aspects of eScience and its associated technologies, applications, algorithms, and tools,
with a strong focus on practical solutions and open challenges. The conference welcomes
conceptualization, implementation, and experience contributions enabling and driving
innovation in data- and compute-intensive research across all disciplines, from the physical
and biological sciences to the social sciences, arts, and humanities; encompassing artificial
intelligence and machine learning methods; and targeting a broad spectrum of architectures,
including HPC, Cloud, and IoT.
The overarching theme of the eScience 2023 conference is “open eScience”. This year, the
conference is promoting four additional key topics:
• Computational Science for sustainable development
• FAIR
• Research Infrastructures for eScience
• Continuum Computing: Convergence between Cloud Computing and the Internet of Things
(IoT)
The conference is soliciting two types of contributions:
• Full papers (10 pages) presenting previously unpublished research achievements or
eScience experiences and solutions
• Posters (2 pages) showcasing early-stage results and innovations
Submitted papers should use the IEEE 8.5×11 manuscript guidelines: double-column text
using single-spaced 10-point font on 8.5×11-inch pages. Templates are available from
http://www.ieee.org/conferences_events/conferences/publishing/templates.html .
Submissions should be made via the Easy Chair system using the submission link:
https://easychair.org/conferences/?conf=escience2023 .
All submissions will be single-blind peer reviewed. Selected full papers will receive a slot for
an oral presentation. Accepted posters will be presented during a poster reception. Accepted
full papers and poster papers will be published in the conference proceedings. Rejected full
papers can be re-submitted for a poster presentation. At least one author of each accepted
paper or poster must register as an author at the full registration rate. Each author registration
can be applied to only one accepted submission.
AWARDS
eScience 2023 will host the following awards, which will be announced at the conference.
• Best Paper Award
• Best Student Paper Award
• Best Poster Award
• Best Student Poster Award
• Outstanding Early Career Contribution – this award is associated with poster submissions
and short presentations of attendees in their early career phase (i.e., postdoctoral researchers
and junior scientists).
KEY DATES
• Paper Submissions Due: June 19, 2023 (AoE) (FIRM!)
• Notification of Paper Acceptance: July 10, 2023
• Poster Submissions due: July 7, 2023 (AoE)
• Poster Acceptance Notification: July 24, 2023
• All Camera-ready Submissions due: August 14, 2023
• Author Registration Deadline: August 14, 2023
ORGANISATION
General Chair
• George Angelos Papadopoulos, University of Cyprus, Cyprus
Technical Program Co-Chairs
• Rafael Ferreira da Silva, Oak Ridge National Laboratory, USA
• Rosa Filgueira, University of St Andrews, UK
Organisation Committee
https://www.escience-conference.org/2023/organizers
Steering Committee
https://www.escience-conference.org/about/#steering-committee
Email contact: Technical-Program(a)eScience-conference.org
ReproNLP 2023: First Call for ParticipationBackground
Across Natural Language Processing (NLP), a growing body of work is
exploring the issue of reproducibility in machine learning contexts. The
field is currently far from having a generally agreed tool box of methods
for defining and assessing reproducibility. Reproducibility of results of
human evaluation experiments is particularly under-addressed which is of
concern for areas of NLP where human evaluation is common including e.g.
MT, text generation and summarisation. More generally, human evaluations
provide the benchmarks against which automatic evaluation methods are
assessed across NLP.
We previously organised the First ReproGen Shared Task
<https://reprogen.github.io/2021/> on reproducibility of human evaluations
in NLG as part of Generation Challenges (GenChal) at INLG’21, and the Second
ReproGen Shared Task <https://reprogen.github.io/> at INLG’22, where we
extended the scope to encompass automatic evaluation methods as well as
human. This year we are expanding the scope of the shared task series to
encompass all NLP tasks, renaming it the ReproNLP Shared Task on
Reproducibility of Evaluations in NLP. This year we are focussing on human
evaluations, and the results session will be held at the 3rd Workshop on
Human Evaluation of NLP Systems (HumEval 2023) <https://humeval.github.io/>
.
As with the ReproGen shared tasks, our overall aim is (i) to shed light on
the extent to which past NLP evaluations have been reproducible, and (ii)
to draw conclusions regarding how NLP evaluations can be designed and
reported to increase reproducibility. If the task is run over several
years, we hope to be able to document an overall increase in levels of
reproducibility over time.
About ReproNLP
ReproNLP has three tracks, one an ‘unshared task’ in which teams attempt to
reproduce their own prior evaluation results (Track B below), the others
standard shared tasks in which teams repeat existing evaluation studies
with the aim of reproducing their results (Tracks A and C):
A. Legacy Reproducibility Track: For a shared set of selected evaluation
studies from ReproGen 2022 (see below), participants repeat one or more of
the studies, and attempt to reproduce their results. As in ReproGen,
participants will be given additional information and resources (beyond
what is in the published papers) about experimental details by the ReproNLP
organisers.
B. RYO Track: Reproduce Your Own previous evaluation results, and report
what happened. Unshared task.
C. ReproHum Track: For a shared set of selected evaluation studies
from the ReproHum
Project <https://reprohum.github.io/>, participants repeat one or more of
the studies, and attempt to reproduce their results, using information
provided by the ReproNLP organisers only, and following a common
reproduction approach.
Track A Papers
We have selected the papers listed below for inclusion in ReproNLP Track A
(these are the same papers as in Track A in ReproGen 2022). The authors
have agreed to evaluation studies from their papers as identified below
being used for Track A, and have provided the system outputs to be
evaluated and any reusable tools that were used in the original
evaluations. We also have available completed ReproGen Human Evaluation
Sheets which we will use as the standard for establishing similarity
between different human evaluation studies.
The papers and studies, with many thanks to the authors for supporting
ReproGen, are:
van der Lee et al. (2017): PASS: A Dutch data-to-text system for soccer,
targeted towards specific audiences:
https://www.aclweb.org/anthology/W17-3513.pdf [1 evaluation study; Dutch;
20 evaluators; 1 quality criterion; reproduction target: primary scores]
Dušek et al. (2018): Findings of the E2E NLG Challenge:
https://www.aclweb.org/anthology/W18-6539.pdf [1 human evaluation study;
English; MTurk; 2 quality criteria; reproduction target: primary scores]
Qader et al. (2018): Generation of Company descriptions using
concept-to-text and text-to-text deep models: dataset collection and
systems evaluation: https://www.aclweb.org/anthology/W18-6532.pdf [1 human
evaluation study; English; 19 evaluators; 4 quality criteria; reproduction
target: primary scores]
Santhanam & Shaikh (2019): Towards Best Experiment Design for Evaluating
Dialogue System Output: https://www.aclweb.org/anthology/W19-8610.pdf [3
evaluation studies differing in experimental design; English; 40
evaluators; 2 quality criteria; reproduction target: correlation scores
between 3 studies]
Nisioi et al. (2017): Exploring Neural Text Simplification Models:
https://aclanthology.org/P17-2014.pdf [one automatic evaluation study;
reproduction target: two automatic scores]; [one human evaluation study; 70
sentences; 9 system outputs; 4 quality criteria; reproduction target:
primary scores]
Track C Papers
The specific experiments listed and described below are currently the
subject of reproduction studies in the ReproHum project. The authors have
agreed to us using them in ReproNLP and have provided very detailed
information about the experiments. In some cases we have introduced
standardisations to the experimental design as noted in the detailed
instructions to participants which will be shared upon registration.
The papers and studies, with many thanks to the authors for supporting
ReproHum and ReproNLP, are:
Vamvas & Sennrich (2022): As Little as Possible, as Much as Necessary:
Detecting Over and Undertranslations with Contrastive Conditioning:
https://aclanthology.org/2022.acl-short.53.pdf [1 human evalution study (of
2 in paper); English to German; 2 evaluators; 1 quality criteria; 1 system;
approx. 800 outputs; reproduction target: primary scores]
Lin et al. (2022): Other Roles Matter! Enhancing Role-Oriented Dialogue
Summarization via Role Interactions:
https://aclanthology.org/2022.acl-long.182.pdf [1 human evaluation study;
Chinese; 3 evaluators; 3 quality criteria; 200 outputs per system; 4
systems; reproduction target: primary scores]
Lux & Vu (2022): Language-Agnostic Meta-Learning for Low-Resource
Text-to-Speech with Articulatory Features:
https://aclanthology.org/2022.acl-long.472.pdf [1 human evaluation;
German; Student evaluators (34 responses); 1 quality criterion; 12 outputs
per system; 2 systems; reproduction target: primary scores]
Chakrabarty et al. (2022): It's not Rocket Science: Interpreting
Figurative Language in Narratives:
https://aclanthology.org/2022.tacl-1.34.pdf [2 human evaluation studies
(of 4 in paper); English; MTurk; 1 quality criterion; 25 outputs per
system, 5/8 systems (varies between idiom and simile studies); reproduction
target: primary scores]
Puduppully & Lapata (2021): Data-to-text Generation with Macro Planning:
https://aclanthology.org/2021.tacl-1.31.pdf [first human evaluation
(relative); English; MTurk; 2 quality criteria; 20 outputs per system; 5
systems, reproduction target: primary scores] [second human evaluation
(absolute); English; MTurk; 3 quality criteria; 80 outputs per system; 5
systems; reproduction target: primary scores]
Track A, B and C Instructions
Step 1. Fill in the registration form at https://forms.gle/dnf73tH3jcyBEBCX6,
indicating which of the above papers, or which of your own papers, you wish
to carry out a reproduction study for.
Step 2. After registration, the ReproNLP participants information will be
made available to you, plus data, tools and other materials for each of the
studies you have selected in the registration form.
Step 3. Carry out the reproduction, and submit a report of up to 8 pages
plus references and supplementary material including a completed ReproGen
Human Evaluation Sheet (HEDS) for each reproduction study, by 4 August 2023.
Step 4. The organisers will carry out light touch review of the evaluation
reports according to the following criteria:
-
Evaluation sheet has been completed.
-
Exact repetition of study has been attempted and is described in the
report.
-
Report gives full details of the reproduction study, in accordance with
the reporting guidelines provided.
-
All tools and resources used in the study are publicly available.
Step 5. Present paper at the results meeting.
Reports will be included in the HumEval’23 proceedings, and results will be
presented at the workshop in September 2023. Full details and instructions
will be provided as part of the ReproNLP participants information.
Important Dates
Report submission deadline: 4 August 2023
Acceptance notification: 18 August 2023
Camera-ready reports: 25 August 2023
Workshop camera-ready proceedings ready: 31 August 2023
Presentation of results: 7 or 8 September 2023
All deadlines are 23:59 UTC-12.
Organisers
Anya Belz, ADAPT/DCU, Ireland
Craig Thomson, University of Aberdeen, UK
Ehud Reiter, University of Aberdeen, UK
Contact
anya.belz(a)adaptcentre.ie, c.thomson(a)abdn.ac.uk
https://repronlp.github.io
=====================================================================
Workshop on Multimodal, Multilingual Natural Language Generation
In conjunction with INLG/SIGDIAL 2023
=====================================================================
Prague, 12 September, 2023
https://synalp.gitlabpages.inria.fr/mmnlg2023/
======================================================================
We invite the submission of long and short papers for the first Workshop on Muiltimodal, Multilingual NLG (MM-NLG), which will be held in Prague, in conjunction with the joint meetings of the 16th International Conference on Natural Language Generation (INLG 2023) and the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial 2023).
Workshop goals and topics
This event aims to bring together researchers working on text generation from multimodal input data. The workshop also emphasises multilinguality as an ongoing, open challenge for text generation methods, especially for languages which are relatively under-resourced.
We therefore invite papers on all topics related to text generation from multimodal inputs, multilingual text generation, or a combination of the two. We welcome submissions which focus on multimodal and/or multilingual generation in both dialogue and non-interactive settings.
NLG and multimodal inputs
==========================
By Multimodal NLG, we intend to capture a broad variety of input data types and formats from which text can be generated using neural, statistical or rule-based methods. For example, while several contemporary NLG models generate based on textual prompts or prefixes, others rely on structured inputs which can take the form of `flat' semantic representations, RDF triples, etc. In a different vein, vision-to-text models generate captions, paragraphs or short narratives from visual inputs such as images or video. Finally, there is a long tradition in data-to-text NLG which seeks to generate text from numerical or other, less structured inputs. The sheer diversity is also reflected in the broad range of datasets available for training and evaluating NLG models.
This workshop will provide a forum to discuss NLG research based on any input modality, fostering a debate on the directions in which the field has developed, and especially the relationship between different NLG tasks, as characterised by the variety of possible inputs, among others.
NLG and multilingual outputs
============================
As the field has become increasingly dominated by large, pretrained language models, it has become increasingly evident that not all languages are on a level playing field. For example, when training data is opportunistically sourced from the web, data for certain languages is often very limited, and highly noisy. On the other hand, developing curated multilingual data for under-represented languages is very challenging, as some recent efforts (for example, the BLOOM model) have shown.
This workshop will provide an opportunity for researchers to discuss challenges and report on recent work targeting NLG in multiple languages, including, but not limited to, data-lean scenarios, where transfer learning, few-shot and zero-shot approaches would be expected to play an important role.
Workshop format
===============
This one-day workshop will consist of an oral and a poster session, together with a special session The oral session will feature talks by two invited speakers, as well as regular paper presentations.
The workshop will be hybrid. We encourage all participants to be present, but will provide online access for those who are unable, or prefer not to travel.
Special session: WebNLG Challenge on Under-Resourced Languages
=========================================================
In line with the goals of MM-NLG, the workshop will include a special session dedicated to the recently launched, ongoing WebNLG 2023 Challenge, which focuses on generation for under-resourced languages in few-shot and zero-shot settings.
More info here: https://synalp.gitlabpages.inria.fr/webnlg-challenge/challenge_2023/
Submission formats
===================
We solicit two kinds of papers:
- Long papers must not exceed eight (8) pages of content, plus unlimited pages of ethical considerations, supplementary material statements, and references.
- Short papers must not exceed four (4) pages, plus unlimited pages of ethical considerations, supplementary material statements, and references.
Submissions should follow ACL Author Guidelines and policies for submission, review and citation, and be anonymised for double blind reviewing. See https://www.aclweb.org/adminwiki/index.php?title=ACL_Author_Guidelines
Please use ACL 2023 style files; LaTeX style files and Microsoft Word templates are available at https://2023.aclweb.org/calls/style_and_formatting
Authors must honour the ethical code set out in the ACL Code of Ethics, available at https://www.aclweb.org/portal/content/acl-code-ethics
If your work raises any ethical issues, you should include an explicit discussion of those issues. This will also be taken into account in the review process. You may find this checklist of use: https://aclrollingreview.org/responsibleNLPresearch/
Authors are strongly encouraged to ensure that their work is reproducible; see, e.g., the reproducibility checklist at https://2021.aclweb.org/calls/reproducibility-checklist/
Papers involving any kind of experimental results (human judgments, system outputs, etc) should incorporate a data availability statement into their paper. Authors are asked to indicate whether the data is made publicly available. If the data is not made available, authors should provide a brief explanation why. (E.g. because the data contains proprietary information.) A statement guide is available on the INLG 2023 website.
Paper submission
=================
The workshop will only accept direct submissions. Submissions can be made to the MM-NLG START website: https://softconf.com/n/mmnlg2023/
Accepted papers will be published in the Workshop proceedings on the ACL Anthology.
Important dates
================
- Deadline for long and short papers: 16 July, 2023
- Notification of acceptance: 6 August, 2023
- Deadline for camera-ready papers: 14 August, 2023
- MM-NLG Workshop: 12 September, 2023
Organising committee
======================
Anya Belz, ADAPT, Dublin City University, Ireland
Claudia Borg, University of Malta, Malta
Liam Cripwell, CNRS/LORIA and Lorraine University, France
Aykut Erdem, Koc University, Turkey
Erkut Erdem, Hacettepe University, Turkey
Claire Gardent, CNRS/LORIA, France
Albert Gatt, Utrecht University, The Netherlands
John Judge, ADAPT, Dublin City University, Ireland
William Soto-Martinez, CNRS/LORIA and Lorraine University, France
Support and acknowledgements
============================
This workshop is a joint initiative which has received the support of the following projects:
- LT-Bridge funded by the EU Horizon 2020 Work Programme Spreading Excellence and Widening Participation (WIDESPREAD) 2018-2020 Grant No. 952194 https://lt-bridge.eu/
- The xNLG AI Chair on Multilingual, Multi-Source Text Generation funded by the French National Research Agency (Gardent; ANR-20-CHIA-0003), Meta and the Region Grand Est https://members.loria.fr/CGardent/xnlg.html
- Multi3Generation: Multimodal, Multi-task, Multi-Lingual Natural Language Generation COST Action CA18231 https://multi3generation.eu/
--
Senior Scientist at CNRS
LORIA, Nancy (France)
https://members.loria.fr/CGardent/
Dear colleague,
We are happy to announce the next webinar in the Language Technology
webinar series organized by the HiTZ research center (Basque Center for
Language Technology, http://hitz.eus). This will be the final webinar of
this academic year. You can check the videos of previous webinars and
the schedule for upcoming webinars here: http://www.hitz.eus/webinars
Next webinar:
* *Speaker*: Pascale Fung (The Hong Kong University of Science and
Technology)
* *Date*: Jun 1, 2023, 15:00 CET
Check past and upcoming webinars at the following url:
http://www.hitz.eus/webinars If you are interested in participating,
please complete this registration form:
http://www.hitz.eus/webinar_izenematea
If you cannot attend this seminar, but you want to be informed of the
following HiTZ webinars, please complete this registration form instead:
http://www.hitz.eus/webinar_info
Best wishes,
HiTZ Zentroa
Dear all,
Lancaster University (UK) in collaboration with British Council offers a £1,500 bursary on a competitive basis as a fees contribution to study a two-year MA programme in Corpus Linguistics (Distance) with the start in October 2023.
The programme is part-time and online, allowing flexible education for anyone seeking to gain qualification in Corpus linguistics from one of the world leaders in the field.
More about the MA in Corpus Linguistics, including fees: https://www.lancaster.ac.uk/study/postgraduate/postgraduate-courses/corpus-…
Bursary application: https://tinyurl.com/42a2pw88
Best wishes,
Vaclav
Professor Vaclav Brezina
Professor in Corpus Linguistics
Department of Linguistics and English Language
ESRC Centre for Corpus Approaches to Social Science
Faculty of Arts and Social Sciences, Lancaster University
Lancaster, LA1 4YD
Office: County South, room C05
T: +44 (0)1524 510828
[8ED5AC37]@vaclavbrezina
[B213DA5D]<http://www.lancaster.ac.uk/arts-and-social-sciences/about-us/people/vaclav-…>
We are happy to announce the release of version 2.12 of SUD (Surface Syntactic Universal Dependencies, see https://surfacesyntacticud.github.io/)
244 treebanks are available (https://grew.fr/download/sud-treebanks-v2.12.tgz): 8 are native SUD corpora and 236 are automatically converted from UD v2.12. See https://surfacesyntacticud.github.io/data/ for details.
All 2.12 corpora of UD and SUD are availble on Grew-match: https://universal.grew.fr <https://universal.grew.fr/>
A set of “Universal tables”, giving a global view of usage of features, dependency relations in UD and SUD treebanks, are available on https://tables.grew.fr <https://tables.grew.fr/>
See the UD announcement <https://list.elra.info/mailman3/hyperkitty/list/corpora@list.elra.info/thre…> for more information about corpora and contributors.
SUD is characterized by its distributional and functional head and the syntactic relation corresponding to positional paradigms.
SUD offers several advantages for various studies, particularly in the areas of phrase structure, word order, and typology. For example, UD may present challenges in identifying noun phrases (NPs) since adpositions depend on nouns or in discussing subject-auxiliary order since the subject is directly linked to the lexical verb.
It is important to note that the transformation from UD to SUD is accomplished using a universal Grew grammar that incorporates a set of heuristics. One such heuristic is that the most distant functional words dominate the nearest functional words to the lexical head. While this heuristic has proven effective in many cases, there are exceptions. As a result, specific grammars have been developed for languages such as German or Wolof. We encourage you to report any issues on the SUD GitHub repository <https://github.com/surfacesyntacticud/guidelines/issues>, and we will be in touch to collaborate on the development of specific grammars if needed.
If you plan to develop a new UD treebank, you can consider to start a native SUD treebank, especially if you are familiar with standard syntactic theories. If you already have a treebank in a different annotation scheme (including phrase-structure based annotation), it can be simpler to first convert it in SUD and then in UD. In any case, you can contact us.
<https://notes.inria.fr/l3Wgvt8LTm22PtyBBF6tpg#references-about-sud>References about SUD
Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Guy Perrier. Starting a new treebank? Go SUD! Theoretical and practical benefits of the Surface-Syntactic distributional approach <https://hal.inria.fr/hal-03509136v1> in DepLing 2021 <http://depling.org/depling2021/>.
Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Guy Perrier. Improving Surface-syntactic Universal Dependencies (SUD): surface-syntactic relations and deep syntactic features <https://hal.inria.fr/hal-02266003v1> in TLT 2019 <https://syntaxfest.github.io/syntaxfest19/tlt2019/tlt2019.html>.
Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Guy Perrier. SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD <https://hal.inria.fr/hal-01930614v1> in UDW 2018 <https://universaldependencies.org/udw18/>.