2nd Call for Papers - CODI 2021: 2nd Workshop on Computational Approaches to Discourse
https://sites.google.com/view/codi-2021
Held in conjunction with EMNLP 2021 November 10-11, 2021
Update: Note that we accept double submissions, see the dedicated section below.
Important information: The CODI workshop organization should be hybrid, following the EMNLP program chairs’ decision: “EMNLP 2021 is currently officially scheduled to be held in hybrid mode, online and in Punta Cana, in the Dominican Republic. ”. The 2-days workshop will take place on November 10 and 11.
**Aims and scope**
The last five years have seen a dramatic improvement in the ability of NLP systems to understand and produce words and sentences. This development has created a renewed interest in discourse problems as researchers move towards the processing of long-form text and conversations. There is a surge of activity in discourse parsing, coherence models, text summarization, corpora for discourse level reading comprehension, and discourse related/aided representation learning, to name a few. At this juncture, we envision that a workshop that brings together discourse experts and upcoming researchers will catalyze the speed and knowledge needed to solve such problems, as well as serve as a forum for the discussion of suitable datasets and reliable evaluation methods.
The previous workshops on discourse in machine translation (DiscoMT), linking lexical, sentential and discourse semantics (LSDSem), discourse structure in natural language generation (DSNNLG), discourse parsing and treebanking (DisRPT) and coreference (CORBON/ CRAC), have shown that there is considerable interest and success in bringing together the community working on specific problems. We believe that the discourse community will also benefit from a general forum where work ranging from corpus development/analysis to computational models, and evaluation is discussed, and desiderata can be drawn for future progress.
The Workshop on Computational Approaches to Discourse (CODI) brings together researchers interested in all aspects of discourse and its computational modeling. The first CODI workshop was held at EMNLP 2020 and showcased diverse discourse research (see the papers presented: https://codi-workshop.github.io/accepted-papers/).
This year, the workshop will also host two shared tasks:
- CODI-CRAC - Anaphora Resolution in Dialogues: https://competitions.codalab.org/competitions/30312
- CODI-DISRPT2021- Discourse Relation Parsing and Treebanking: https://sites.google.com/georgetown.edu/disrpt2021
Please visit the corresponding websites for more information.
**Topics of interest**
We welcome symbolic and probabilistic approaches, corpus development and analysis, as well as machine and deep learning approaches to discourse. We appreciate theoretical contributions as well as practical applications, including demos of systems and tools. The goal of the workshop is to provide a forum for the community of NLP researchers working on all aspects of discourse.
Topics of interest include, but are not limited to:- discourse structure
- discourse connectives
- discourse relations
- annotation tools and schemes for discourse phenomena
- corpora annotated with discourse phenomena
- discourse parsing
- cross-lingual discourse processing
- cross-domain discourse processing
- anaphora and coreference resolution
- event coreference
- argument mining
- coherence modeling
- discourse and semantics
- discourse in applications such as machine translation, summarization, etc.
- evaluation methodology for discourse processing
**Submissions**
We solicit four categories of papers: (1) regular workshop papers, (2) demos, (3) extended abstracts and (4) shared tasks papers. Only regular workshop papers, shared task papers and demos will be included in the proceedings as archival publications.
Regular papers must describe original unpublished research. Long papers may consist of up to 8 pages of content, plus unlimited pages for references. Short papers can have up to 4 pages, plus unlimited pages for references.
Demo submissions may describe systems, tools, visualizations, etc., and may consist of up to 4 pages, plus unlimited pages for references.
Accepted long, short, and demo papers will be presented orally.
Extended abstracts can describe work in progress or those already published elsewhere. These may be two pages long (without references). Extended abstracts are non-archival. They will be presented orally, and included in the workshop program and handbook, but will not appear in the workshop proceedings.
For shared task papers, please refer to the corresponding websites:- CODI-CRAC: https://competitions.codalab.org/competitions/30312#learn_the_details-overv…
- CODI-DISRPT2021: https://sites.google.com/georgetown.edu/disrpt2021/submission?authuser=0
Shared task papers will be presented during the dedicated sessions, either orally or as posters. They will be included in the workshop proceedings.
Final versions of all types of papers will be given one additional page of content.
**Double submission**
We allow for double submissions. Please indicate during submission to which other conference or workshop your work has been submitted.
We may also invite authors of papers accepted to the conferences (e.g. EMNLP, ACL) including Findings to present their work at the workshop. Please indicate whether your paper has been accepted to e.g. ACL during submission, or let us know by email that your paper has been accepted elsewhere (including Findings) upon notification. These papers will not be part of the proceedings of the workshop.
**Submission website**
All submissions must follow the EMNLP 2020 formatting instructions described here: https://2021.emnlp.org/call-for-papers/style-and-formatting
Please submit your papers using the link indicated on our website.
**Important dates**
* 18 May, 2021: 1st Call for Workshop Papers * 15 June, 2021: 2nd Call for Workshop Papers * 5 July, 2021: Anonymity period starts * 15 July, 2021: 3rd Second Call for Papers * 5 Aug, 2021: Workshop Papers Due (long, short, demo, extended abstracts) * 5 Sept, 2021: Notification of Acceptance * 15 Sept, 2021: Camera-ready papers due * Nov 10-11, 2021: Workshop Date
Please check the dedicated websites to get information about the deadlines for the two shared tasks.
All deadlines are 11.59 pm UTC -12h (“anywhere on Earth”).
Venue: COLING 2022
Location: Gyeongju, Republic of Korea
Date: October 16, 2022
Papers Due: July 17, 2022 (Sunday)
Website: https://sites.google.com/view/textgraphs2022
Workshop Description
For the past sixteen years, the workshops in the TextGraphs series have
published and promoted the synergy between the field of Graph Theory (GT)
and Natural Language Processing (NLP). The mix between the two started
small, with graph-theoretical frameworks providing efficient and elegant
solutions for NLP applications. Graph-based solutions initially focused on
single-document part-of-speech tagging, word sense disambiguation, and
semantic role labeling. They became progressively larger to include
ontology learning and information extraction from large text collections.
Nowadays, graph-based solutions also target Web-scale applications such as
information propagation in social networks, rumor proliferation,
e-reputation, multiple entity detection, language dynamics learning, and
future events prediction, to name a few.
We plan to encourage the description of novel NLP problems or applications
that have emerged in recent years, which can be enhanced with existing and
new graph-based methods. The sixteenth edition of the TextGraphs workshop
aims to extend the focus on graph-based representations for (1) integration
and joint training and use of transformer-based models for graphs and text
(such as Graph-BERT and BERT), and (2) domain-specific natural language
inference. Related to the former point, we would like to advance the
state-of-the-art natural language understanding facilitated with
large-scale language models like GPT-3 and linguistic relationships
represented by graph neural networks. Related to the latter point, we are
interested in addressing a challenging task contributing to mathematical
proof discovery. Furthermore, we also encourage research on applications of
graph-based methods in knowledge graphs to link them to related NLP
problems and applications.
TextGraphs-16 invites submissions on (but not limited to) the following
topics:
- Graph-based and graph-supported machine learning methods: Graph
embeddings and their combinations with text embeddings; Graph-based and
graph-supported deep learning (e.g., graph-based recurrent and recursive
networks); Probabilistic graphical models and structure learning methods
- Graph-based methods for Information Retrieval and Extraction: Graph-based
methods for word sense disambiguation; Graph-based strategies for semantic
relation identification; Encoding semantic distances in graphs; Graph-based
techniques for text summarization, simplification, and paraphrasing;
Graph-based techniques for document navigation and visualization
- New graph-based methods for NLP applications: Random walk methods in
graphs; Semi-supervised graph-based methods
- Graph-based methods for applications on social networks
- Graph-based methods for NLP and Semantic Web: Representation learning
methods for knowledge graphs; Using graphs-based methods to populate
ontologies using textual data
Important dates
- Papers Due: July 17, 2022 (Sunday)
- Notification of Acceptance: August 28, 2022 (Sunday)
- Camera-ready papers due: September 11, 2022 (Sunday)
- Conference date: October 16, 2022
Submission
- We invite submissions of up to eight (8) pages maximum, plus bibliography
for long papers and four (4) pages, plus bibliography, for short papers.
- The COLING 2022 templates must be used; these are provided in LaTeX and
also Microsoft Word format. Submissions will only be accepted in PDF
format. Download the Word and LaTeX templates here:
https://coling2022.org/Cpapers.
- Submit papers by the end of the deadline day (timezone is UTC-12) via our
Softconf Submission Site: https://www.softconf.com/coling2022/TextGraphs-16/
Shared Task
We invite participation in the 1st Shared Task on Natural Language Premise
Selection associated with the 16th Workshop on Graph-Based Natural Language
Processing (TextGraphs 2022).
The task proposed this year is the Natural Language Premise Selection
(NLPS) (Ferreira et al., 2020a), inspired by the field of automated theorem
proving. The task of NLPS takes as input a mathematical statement, written
in natural language, and outputs a set of relevant sentences (premises)
that could support an end-user finding a proof for that mathematical
statement. The premises are composed of supporting definitions and
propositions that can act as explanations for the proof process:
https://codalab.lisn.upsaclay.fr/competitions/5692
Contact
Please direct all questions and inquiries to our official e-mail address (
textgraphsOC(a)gmail.com) or contact any of the organizers via their
individual emails. Also you can join us on Facebook:
https://www.facebook.com/groups/900711756665369.
Organizers
- Dmitry Ustalov, Yandex
- Yanjun Gao, University of Wisconsin-Madison
- Abhik Jana, University of Hamburg
- Thein Huu Nguyen, University of Oregon
- Gerald Penn, University of Toronto
- Arti Ramesh, ETS AI Labs
- Alexander Panchenko, Skolkovo Institute of Science and Technology
- Mokanarangan Thayaparan, University of Manchester & Idiap Research
Institute
- Marco Valentino, University of Manchester & Idiap Research Institute
Call for Paper: 1st workshop on Transcript Understanding
Venue: COLING 2022
Location: Gyeongju, Republic of Korea
Submission deadline: July 25, 2022
Submission Site: https://www.softconf.com/coling2022/TU
Workshop Website: https://tuworkshop.github.io
Overview:
Videos have become an omnipresent source of knowledge: courses, presentations, conferences, documentaries, livestreams, meeting recordings, vlogs. This has created a strong demand for transcript understanding. However, the quality of audio and video content shared online and the nature of speech, video transcripts pose many challenges to the existing natural language processing technologies.
At the First workshop on Transcript Understanding (TU@COLING2022), we aim to bring together researchers from various domains to make the best of the knowledge that all these videos contain. Researchers from related domains are invited to paper on recent advanced technologies, resources, tools, and challenges for Transcript Understanding.
Topics:
The TU workshop holds a research track and a shared task track. The research track aims to explore recent advances and remaining challenges on video transcript understanding. As this topic is a multi-modal subject, researchers from artificial intelligence, computer vision, speech processing, natural language processing, data mining, statistics, and other fields are invited to submit papers on recent advances, resources, tools, challenges for video transcript understanding. To this end, the topics of the workshop include but are not limited to the following:
- Fundamental processing for video transcript, such as punctuation restoration, chunking, parsing, and named entity recognition.
- Subtitle segmentation
- Text summarization and keyword extraction for transcripts
- Event extraction, intent detection, and slot filling
- Sentiment analysis for speech text processing
- Noisy text processing
- Fact-checking, evidence extraction
- Question-Answering extraction from transcripts
- Automatic Speech Recognition, and related system such as speaker identification and filler word detection
- Multi-modal, multilingual video-speech-text processing
Important Dates
Papers Due: July 25, 2022 (Monday)
Notification of Acceptance: August 22, 2022 (Monday)
Camera-ready papers due: September 5, 2022 (Monday)
Workshop proceedings due: September 19, 2022 (Monday)
Workshop date: October 17, 2022
All deadlines are “anywhere on earth” (UTC-12)
Submissions:
Authors are invited to submit their unpublished work that represents novel research. The papers should be written in English using the *ACL style. Authors can also submit the supplementary materials, including technical appendices, source codes, datasets, and multimedia appendices. All submissions, including the main paper and its supplementary materials, should be fully anonymized. For more information on formatting and anonymity guidelines, please refer to COLING 2022 submission guidelines.
TU accepts both long papers (8 pages) and short papers (4 pages). The paper can include unlimited appendix and references. Upon the acceptance, the authors are provided with 1 more page to address the reviewer comments.
All papers will be double blind peer reviewed. Two reviewers with the same technical expertise will review each paper. Authors of the accepted papers will present their work in either the Oral or Poster session. All accepted papers will appear on the workshop proceedings that will be published on CEUR-WS. The authors will keep the copyright of their papers that are published on CEUR-WS. The workshop proceedings will be indexed by DBLP.
Both research paper and shared task paper must be submitted using SoftConf at https://www.softconf.com/coling2022/TU/.
We look forward to seeing you all at the virtual conference.
TU@COLING2022 Organizers:
Franck Dernoncourt (Adobe Research, USA)
Thien Huu Nguyen (University of Oregon, USA)
Viet Dac Lai (University of Oregon, USA)
Amir Pouran Ben Veyseh (University of Oregon, USA)
https://sites.google.com/view/figlang2022/shared-tasks?authuser=0
Euphemism Detection Shared Task
Euphemisms are mild or indirect expressions used in place of harsher or
more offensive ones. Euphemisms are often used to mask profanity or refer
to taboo topics such as death, disability, sex, religion or personal
relationships in a polite way. Euphemisms are often ambiguous: their
literal and non-literal interpretation is context-dependent:
Asked to choose *between jobs* and the environment, a majority -- at least
in our warped, first-past-the-post system -- will pick jobs.
[non-euphemistic]
vs.
This summer, the budding talent agent was *between jobs* and free to
babysit pretty much any time. [euphemistic]
The state of the art language models perform well on many major NLP
benchmarks; however, it is unclear how such models perform on euphemisms.
Thus, we propose a euphemism detection task: given an input sentence,
identify whether the sentence contains a euphemism.
For more information about the shared task and to participate visit
https://codalab.lisn.upsaclay.fr/competitions/5726
<https://www.google.com/url?q=https%3A%2F%2Fnam10.safelinks.protection.outlo…>
.
*Important dates:*
-
July 5, 2022: CodaLab competition is open; training data can be
downloaded
-
Aug 5, 2022: Test data can be downloaded and results submitted;
performance will be tracked on CodaLab dashboard
-
Aug 20, 2022: Last day for submitting predictions on test data
-
Sept 7, 2022: Papers describing the systems are due
-
Oct 9, 2022: Notification of acceptance
-
TBD, 2022: Camera-ready papers due
-
December 7 or 8, 2022: Workshop
--
**********************************************
Anna Feldman, Ph.D.
Professor of Linguistics and Computer Science
Graduate Program Coordinator & Chair of Linguistics
Montclair State University
http://www.purl.org/net/fa <http://www.purl.org/NET/fa>
Shared Task on Understanding Figurative Language at FigLang2022
We are happy to announce a new shared task on Understanding Figurative
Language as part of the Figurative Language Workshop (FigLang 2022) at
EMNLP 2022. In recent years, there have been several benchmarks dedicated
to figurative language understanding, which generally frame "understanding"
as a recognizing textual entailment task -- deciding whether one sentence
(premise) entails/contradicts another (hypothesis) (Chakrabarty et al 2021,
Stowe et al 2022). We introduce a new shared task for figurative language
understanding around this textual entailment paradigm, where the hypothesis
is a sentence containing the figurative language expression (e.g.,
metaphor, sarcasm, idiom, simile) and the premise is a literal sentence
containing the literal meaning. There are two important aspects of this
task: 1) the task requires not only to generate the label
(entail/contradict) but also to generate a plausible explanation for the
prediction; 2) the entail/contradict label and the exploration are related
to the meaning of the figurative language expression.
For more information about the shared task, including the link to the
datasets, evaluation metrics and scripts important dates please visit the
Shared task website (https://figlang2022sharedtask.github.io/).
Participants can use the following CodaLab (
https://codalab.lisn.upsaclay.fr/competitions/5908) link to participate in
the task as well as submit the predictions.
Important dates:
· July 10, 2022: CodaLab competition is open; training data can be
downloaded
· Aug 15, 2022: Test data (available only to registered participants)
can be downloaded and results submitted; performance will be tracked on
CodaLab dashboard
· Aug 20, 2022: Last day for submitting predictions on test data
· Sept 7, 2022: Papers describing the systems are due
· Oct 9, 2022: Notification of acceptance
· TBD, 2022: Camera-ready papers due
· December 8, 2022: Workshop at EMNLP 2022
Organizing Team
Tuhin Chakrabarty, Columbia University; tuhin.chakr(a)cs.columbia.edu
Arkadiy Saakyan, Columbia University; as5423(a)columbia.edu
Debanjan Ghosh, Educational Testing Service; dghosh(a)ets.org
Smaranda Muresan, Data Science Institute, Columbia
University;smara(a)columbia.edu
Apologies for cross-posting
It is our pleasure to announce the first call for submissions for the
next issue of the journal Dialogue and Discourse. Submissions are
invited on all topics in the formal, computational, or
psycholinguistic study of dialogue and discourse.
Submissions received by August 1, 2022 will be considered for the next
regular issue. Later submissions will be slated for the next available
issue.
http://www.dialogue-and-discourse.org/cfps-current.shtml
Dialogue and Discourse (D&D http://www.dialogue-and-discourse.org/) is
the first peer-reviewed free open access journal dedicated exclusively
to work that deals with language "beyond the sentence". The journal
adopts an interdisciplinary perspective, accepting work from
Linguistics, Computer Science, Psychology, Sociology, Philosophy, and
other associated fields with an interest in formally, technically,
empirically or experimentally rigorous approaches. Descriptive papers
should make a substantial theoretical contribution to be considered.
We are committed to ensuring the highest editorial standards and
rigorous peer-review of all submissions, while granting open access to
all interested readers. D&D has published regular issues every year
since 2010, and occasionally special issues on common topics.
As of June 2022, D&D has published 99 papers, and the journal's
h-index is 26. D&D is endorsed by ACL SIGdial, ACL SemDial, and AMLaP.
D&D is indexed by Scopus and the European Reference Index for the
Humanities and Social Sciences.
Submissions are made via the online submission system at
http://www.dialogue-and-discourse.org/submission.shtml. Authors are
required to indicate if a submission is an extended version of one or
more previously published conference papers (to which we would expect
substantial additions); simultaneous submission to another venue is
prohibited. Submissions will undergo rigorous peer-review. Once
accepted and finalized, papers will appear online immediately, as part
of the current issue. Selected papers will furthermore be offered the
opportunity to present a poster at the following SIGDIAL Conference.
Dialogue and Discourse Editors
Issue Editor:
Ryuichiro Higashinaka (Volume 13, Issue 2)
Junyi Jessy Li (Volume 13, Issue 1)
Editor In Chief:
Barbara Di Eugenio, University of Illinois at Chicago, United States
Associate Editors:
Vera Demberg, Saarland University, Germany
Kallirroi Georgila, University of Southern California, United States
Jonathan Ginzburg, Université Paris-Diderot (Paris 7), France
Pat Healey, Queen Mary University London, United Kingdom
Ryuichiro Higashinaka, Nagoya University, Japan
Junyi Jessy Li, University of Texas at Austin, United States
Massimo Poesio, Queen Mary University London, United Kingdom
Manfred Stede, University of Potsdam, Germany
David R. Traum, University of Southern California, United States
Amir Zeldes, Georgetown University, United States
Full editorial board at: http://www.dialogue-and-discourse.org/editors.shtml
We are proud to announce the release of a new version of BabelNet
<https://babelnet.org/> and its APIs, *both Java and the brand-new Python
version*, developed jointly by the Sapienza NLP Group
<http://nlp.uniroma1.it> of the *Sapienza University of Rome* under the
supervision of prof. Roberto Navigli <https://www.diag.uniroma1.it/navigli/>
and Babelscape <http://babelscape.com/>, *a successful deep-tech
multilingual NLP Company* providing innovative solutions for multilingual
NLP.
BabelNet -- winner of the *prominent paper award 2017* from the Artificial
Intelligence Journal and the META prize 2015, and covered in media such as The
Guardian
<https://www.theguardian.com/news/2018/feb/23/oxford-english-dictionary-can-…>
and Time magazine
<http://wwwusers.di.uniroma1.it/~navigli/img/Redefining_the_modern_dictionar…>
-- is today’s *most far-reaching multilingual resource* which, according to
need, can be used as an *encyclopedic dictionary*, or a *semantic network*
or a huge *knowledge base/ontology*. It has been used by more than *1000
universities and research institutions*, enabling multilinguality in
several fields of AI and NLP, such as semantic search, Word Sense
Disambiguation, Semantic Role Labeling and image tagging.
BabelNet was created by means of the seamless integration and interlinking
of the largest multilingual Web encyclopedia - i.e., Wikipedia - with the
most popular computational lexicon of English - i.e., WordNet, and other
lexical resources such as Wiktionary, OmegaWiki, Wikidata, dozens of
wordnets, Wikiquote, GeoNames, and ImageNet. BabelNet provides *multilingual
synsets*, i.e., concepts and named entities lexicalized in many languages,
and connected with large amounts of semantic relations.
*Version 5.1* comes with the following features:
- *500 languages* and *22 million synsets* covered;
- *53 resources *linked and integrated;
- *Wikipedia* and *Wikidata* updated thanks to *BabelNet live*;
- *Open English WordNet* has been updated to version 2021;
- Added *Q-codes* identifiers (e.g.
https://www.hetop.eu/hetop/3CGP/?la=en&rr=CGP_QC_QD8);
- Added *string tags *from *Wikipedia labels*;
- *French wordnets cleaned up* by removing most potentially incorrect
translations;
- *Italian wordnet definitions *cleaned up;
- *General data cleanup* (glosses, senses, Named Entity vs. concept
labels);
- *Lemma casing corrected in 24 languages* (English, Italian, Spanish,
German, French, Dutch, Polish, Portuguese, Russian, Bulgarian, Czech,
Danish, Greek, Estonian, Finnish, Croatian, Hungarian, Lithuanian, Latvian,
Maltese, Romanian, Slovak, Slovenian, Swedish).
More statistics are available at: babelnet.org/statistics.
Kind regards,
The BabelNet team
--
==============================================
Roberto Navigli* - Professor*
Department of Computer, Control and Management Engineering
Sapienza University of Rome
Via Ariosto, 25
00185 Roma Italy
Phone: +39 06 77274109
Home Page: https://www.diag.uniroma1.it/navigli/
Sapienza NLP Group: http://nlp.uniroma1.it
Co-founder of Babelscape <https://babelscape.com>
==============================================
Dear corpora subscribers:
At NLPgo (https://www.nlpgo.com) we are organizing an online workshop for linguists titled "Learning to use the Terminal", which may be of interest to you.
> What is the Terminal?
> It is the usual black or white background application available in any operating system which allows the users to run commands. It is also called the command interpreter.
> What it is useful for?
> It is useful for many different things, among them making different kinds of transformations on files and, therefore, it allows us to make some interesting corpus calculations, which would otherwise be very difficult to make.
> Workshop content
> 1. Preparation: installing the necessary tools and setup the working environment.
> 2. Basic concepts: file and folder structure, file types, character encodings (types, differences and compatibility problems).
> 3. Basic commands: show files available in a folder, change current folder, show file contents, copy and move files, column extraction and reordering, result sorting, etc.
> 4. Advanced commands and transformations: standard input/output/error, command chaining, finding file names, text finding and replacing, applying commands to several files at a time.
> 5. Regular expressions: advaced text search and replacement techniques
> 6. Corpus specific tasks: Working with data from spreadsheets, texts (orthographic words) and Part-Of-Speech tagged texts (grammatical elements).
More details about the workshop are available on our web page and, specifically, on the workshop specific one: https://www.nlpgo.com/teaching/terminal
As you can see, it will be held from 18th to 22th of July (in Spanish language, Spain timezone).
If you want to register in the workshop you can do it from the workshop web page I have included before. In addition, if you are interested in receiving information about this kind of workshop we organize and other useful information, you can subscribe to our newsletter here: hhttps://www.palabrasbinarias.com/subscribe.
If you have any question don't hesitate to write us through the contact form available at the workshop web page.
Thank you for you attention.
Best regards,
Mario Barcala
--
Mario Barcala
CEO at NLPgo
http://www.nlpgo.com
GPG key id: F1C15EB7
The University of Klagenfurt is pleased to announce the following open position at the Digital Age Research Center (D!ARC), employment to commence as soon as possible:
*PreDoc Scientist (in German: Universitätsassistent*in) (Doctoral Candidate) (all genders welcome)*
The Digital Age Research Center (D!ARC), founded in 2019 as an inter-faculty university centre, aims to shed light not only on the technological but also on the economic, legal, social, individual, behavioural and cultural aspects of the digital revolution. Over the next few years, it is set to develop a corresponding profile in research with a European claim to excellence as well as modules for the range of courses offered at the University of Klagenfurt.
Level of employment: 75 % (30 hours/week)
Minimum salary: € 32.116,-- per annum (gross); classification according to collective agreement: B1
Limited to: 4 years
Application deadline: July 27, 2022
Reference code: 191/22
Tasks and responsibilities:
• Contributing to research in a project in the area of computational linguistics and digital humanities
• Participation in research activities of D!ARC, which offers unique opportunities for interdisciplinary collaboration
• Teaching courses on computational linguistic topics
Prerequisites for the appointment:
• A Master’s degree completed at a domestic or foreign higher education institution in the field of computational linguistics, linguistics, computer science or alike, graded with success and corresponding knowledge in the field
• Good programming skills (preferably Python)
• Proven expertise in:
• Natural language processing
• Digital humanities
• Linguistics
• Machine learning (particularly deep learning)
• Fluent in English and German, both spoken and written
Additional desired qualifications:
• Experience with web crawling and processing large amounts of textual data
• Instructing and supervising linguistic annotation
• Publications at scientific conferences and in journals in the field relating to the position
• Profound knowledge of publicly available tools and resources for natural language processing
• Experience in teaching at a university
• Experience in working in interdisciplinary research projects
• Social and communication skills, ability to work independently
Our offer:
The employment contract is concluded with a starting salary of € 2.294,-- gross per month (14 times a year; previous experience deemed relevant to the job can be recognised in accordance with the https://jobs.aau.at/en/faq/). The University of Klagenfurt also offers:
• Personal and professional advanced training courses, management and career coaching
• Numerous attractive additional benefits, see also https://jobs.aau.at/en/the-university-as-employer/
• Diversity- and family-friendly university culture
• The opportunity to live and work in the attractive Alps-Adriatic region with a wide range of leisure activities in the spheres of culture, nature and sports
The application:
If you are interested in this position, please apply in German or English providing the usual documents:
• Letter of application
• Curriculum vitae
• Proof of all completed higher education programmes (certificates, supplements, if applicable)
• Other documentary evidence that may be relevant to this announcement (see prerequisites and desired qualifications)
The position is solely intended for the completion of a Doctorate. Applicants with a Doctorate or Ph.D. already completed in a related discipline are therefore ineligible for this position.
To apply, please select the position with the reference code 191/22 in the category “Scientific Staff” using the link “Apply for this position” in the job portal at http://jobs.aau.at/en/
Candidates must furnish proof that they meet the required qualifications by July 27, 2022 at the latest.
For further information on this specific vacancy, please contact Michael Wiegand (mailto:michael.wiegand@aau.at). General information about the university as an employer can be found at http://www.aau.at/jobs/en/information. At the University of Klagenfurt, recruitment and staff matters are accompanied not only by the authority responsible for the recruitment procedure but also by the Equal Opportunities Working Group (https://www.aau.at/en/university/organisation/representations-commissioners…) and, if necessary, by the Representative for Disabled Persons (https://www.aau.at/en/university/organisation/administration-and-management…).
!!!!!!!!!! The deadline for submission has been extended to July 20th
!!!!!!!!!! Upload Submissions Now
https://cmt3.research.microsoft.com/AMTA2022
The First Workshop on Corpus Generation and Corpus Augmentation for
Machine Translation (CoCo4MT)
https://sites.google.com/view/coco4mt
@ AMTA – 2022
This 15th biennial conference of the Association for Machine Translation
in the Americas
12-16 September 2022, Orlando, Florida, USA
INVITED TALKS
Jörg Tiedemann University of Helsinki
Julia Kreutzer Google Research
Maria Nadejde Amazon
SCOPE
It is a well-known fact that machine translation systems, especially
those that use deep learning, require massive amounts of data. Several
resources for languages are not available in their human-created format.
Some of the types of resources available are monolingual, multilingual,
translation memories, and lexicons. Those types of resources are
generally created for formal purposes such as parliamentary collections
when parallel and more informal situations when monolingual. The quality
and abundance of resources including corpora used for formal reasons is
generally higher than those used for informal purposes. Additionally,
corpora for low-resource languages, languages with less digital
resources available, tends to be less abundant and of lower quality.
CoCo4MT sets out to be the first workshop centered around research that
focuses on corpora creation, cleansing, and augmentation techniques
specifically for machine translation. We accept work that covers any
spoken language (including high-resource languages) but we are
specifically interested in those submissions that are on languages with
limited existing resources (low-resource languages) where resources are
not highly available.
The goal of this workshop is to begin to close the gap between corpora
available for low-resource translation systems and promote high-quality
data for online systems that can be used by native speakers of
low-resource languages is of particular interest. Therefore, It will be
beneficial if the techniques presented in research papers include their
impact on the quality of MT output and how they can be used in the real
world.
CoCo4MT aims to encourage research on new and undiscovered techniques.
We hope that submissions will provide high-quality corpora that is
available publicly for download and can be used to increase machine
translation performance thus encouraging new dataset creation for
multiple languages that will, in turn, provide a general workshop to
consult for corpora needs in the future. The workshop’s success will be
measured by the following key performance indicators:
- Promotes the ongoing increase in quality of machine translation
systems when measured by standard measurements,
- Provides a meeting place for collaboration from several research areas
to increase the availability of commonly used corpora and new corpora,
- Drives innovation to address the need for higher quality and abundance
of low-resource language data.
TOPICS
We are highly interested in original research papers on the topics
below; however, we welcome all novel ideas that cover research on
corpora techniques.
- Difficulties with using existing corpora (e.g., political
considerations or domain limitations) and their effects on final MT
systems,
- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,
- Exploration of datasets for pretraining or auxiliary tasks for
training MT systems.
SUBMISSION INFORMATION
There is one type of submission in the workshop: Research, review and
position paper. The length of each paper should be at least four (4) and
not exceed ten (10) pages, plus unlimited pages for references.
Submissions should be formatted according to the official AMTA 2022
style templates (PDF, LaTeX, Word). Accepted papers will be published
on-line in the AMTA 2022 proceedings which includes the ACL Anthology
and will be presented at the conference either orally or as a poster.
Submissions must be anonymized and should be done using the official
conference management system
(https://cmt3.research.microsoft.com/AMTA2022). Scientific papers that
have been or will be submitted to other venues must be declared as such,
and must be withdrawn from the other venues if accepted and published at
CoCo4MT. The review will be double-blind.
We would like to encourage authors to cite papers written in ANY
language that are related to the topics, as long as both original
bibliographic items and their corresponding English translations are
provided.
Registration will be handled by the main conference. (To be announced)
IMPORTANT DATES
June 1, 2022 – Call for papers released
June 15, 2022 – Second call for papers
June 29, 2022 – Third and final call for papers
July 20, 2022 – Paper submissions due (updated extension!)
July 27, 2022 – Notification of acceptance
August 7, 2022 – Camera-ready due
August 31, 2022 – Video recordings due
September 16, 2022 - CoCo4MT workshop
CONTACT
CoCo4MT Workshop Organizers
coco4mt2022(a)googlegroups.com
ORGANIZING COMMITTEE (listed alphabetically)
Constantine Lignos Brandeis University
John E. Ortega New York University and University of Santiago de
Compostela (CITIUS)
Katharina Kann University of Colorado Boulder
Maja Popopvić ADAPT Centre at Dublin City University
Marine Carpuat University of Maryland
Shabnam Tafreshi University of Maryland
William Chen Carnegie Mellon University
PROGRAM COMMITTEE (listed alphabetically tentative)
Abteen Ebrahimi University of Colorado Boulder
Adelani David Saarland University
Ananya Ganesh University of Colorado Boulder
Alberto Poncelas ADAPT Centre at Dublin City University
Amirhossein Tebbifakhr University of Trento
Anna Currey Amazon
Arturo Oncevay University of Edinburgh
Atul Kr. Ojha National University of Ireland Galway
Bharathi Raja Chakravarthi National University of Ireland Galway
Beatrice Savoldi University of Trento
Bogdan Babych Heidelberg University
Briakou Eleftheria University of Maryland
Dossou Bonaventure Mila Quebec AI Institute
Duygu Ataman New York University
Eleni Metheniti Université Toulosse - Paul Sabatier
Francis Tyers Indiana University
Jasper Kyle Catapang University of Birmingham
John E. Ortega New York University and USC - CITIUS
José Ramom Pichel Campos Universidade de Santiago de Compostela - CITIUS
Kalika Bali Microsoft
Koel Dutta Chowdhury Saarland University
Liangyou Li Huawei
Manuel Mager University of Stuttgart
Maria Art Antonette Clariño University of the Philippines Los Baños
Mathias Müller University of Zurich
Nathaniel Oco De La Salle University
Niu Xing Amazon
Pablo Gamallo Universidade de Santiago de Compostela - CITIUS
Rodolfo Joel Zevallos Salazar Universitat Pompeu Fabra
Rico Sennrich University of Zurich
Sangjee Dondrub Qinghai Normal University
Santanu Pal Saarland University
Sardana Ivanova University of Helsinki
Shantipriya Parida Silo AI
Surafel Melaku Lakew Amazon
Tommi A Pirinen University of Tromsø
Valentin Malykh Moscow Institute of Physics and Technology
--
*Shabnam Tafreshi, PhD*
*Assistant Research Scientist*
*Computational Linguistics, NLP*
*UMD: ARLIS @ College Park*
*"All the problems of the world could be settled easily, if people only
willing to think."*
*-Thomas J. Watson*