*********************************************************************
The Anthony C. Clarke Award for the 2022 EAMT Best Thesis
Submission deadline: March 3, 2023, 23:59 CEST
*********************************************************************
The European Association for Machine Translation (EAMT, http://www.eamt.org)
is an organization that serves the growing community of people interested
in MT and translation tools, including translators, users, developers, and
researchers of this increasingly viable technology.
The EAMT invites entries for its eleventh EAMT Best Thesis Award for a PhD
or equivalent thesis on a topic related to machine translation.
Previous year winners can be found at https://eamt.org/best-thesis-award/.
* Eligibility *
Researchers who
- have completed a PhD (or equivalent) thesis on a relevant topic in a
European, African or Middle Eastern institution within calendar year 2022,
- have not previously won another international award for that thesis, and,
- are members of the EAMT at the time of submission,
are invited to submit their theses to the EAMT for consideration.
* Panel *
The submissions will be judged by a panel of experts who will be
specifically appointed, based on the EAMT 2023 program committee, and which
will be ratified by the Executive Board of the EAMT.
* Selection criteria *
Each thesis will be judged according to how challenging the problem was, to
how relevant the results are for machine translation as a field, and to the
strength of their impact in terms of scientific publications.
* Scope *
The scope of the thesis does not need to be confined to a technical area,
and applications are also invited from students who carried out their
research into commercial and management aspects of machine translation.
Possible areas of research include:
- development of machine translation or advanced computer-assisted
translation: methods, software or resources
- machine translation for less-resourced languages
- the use of these systems in professional environments (freelance
translators, translation agencies, localisation, etc.)
- the increasing impact of machine translation on non-professional Internet
users and its impact in communications, social networking, etc.
- spoken language translation
- the integration of machine translation and translation memory systems
- the integration of machine translation software in larger IT applications
- the evaluation of machine translation systems in real tasks such as those
above
- the cross-fertilisation between machine translation and other language
technologies
* Prize *
The winner will be announced on the 31st of March 2023 and will receive a
prize of €500, together with an inscribed certificate. The recipient of the
award will be required to briefly present their research at EAMT 2023 to be
held from 12th June to 15th June 2022 in Tampere, Finland. In order to
facilitate this, the EAMT will waive the winner's registration costs, and
will make available a travel bursary of €200 to enable the recipient of the
award to attend the said conference. The prize includes complimentary
membership in the EAMT for 2024.
* Submission *
Candidates will submit using EasyChair:
https://easychair.org/conferences/?conf=eamt2023 (Submission type: Thesis
Award), a single PDF file containing:
- a 2-page summary of your thesis in English, containing:
---> your full contact details,
---> the name and contact details of your supervisor(s),
- a copy of your CV in English (at most one page, plus a complete list of
publications directly related to the thesis)
- an electronic copy of your thesis
- optionally, an appendix with any other relevant information on the thesis
By submitting their work, authors
- agree that, in case they are granted the award, any subsequently
published version of the thesis should carry the citation "The Anthony C.
Clarke Award for the 2022 EAMT Best Thesis" and
- acknowledge the right of the EAMT to publicize the granting of the award.
For this year's Best Thesis Award we are requiring candidates to be an
individual EAMT member at the time of submission. For EAMT memberships,
please visit: http://www.eamt.org/membership.php.
* Closing date *
Submission deadline: March 3, 2023, 23:59 CEST.
Award notification: March 31, 2023.
--
*Carolina Scarton*
Lecturer in Natural Language Processing
Department of Computer Science
University of Sheffield
http://staffwww.dcs.shef.ac.uk/people/C.Scarton/
Location: Cardiff, UK
Deadline for applications: 31st January 2023
Start date: as soon as possible
Duration: 30 months
Keywords: natural language processing, neurosymbolic AI, graph neural networks, commonsense reasoning
Details about the post
Applications are invited for a Research Associate post in the Cardiff University School of Computer Science & Informatics, to work on the EPSRC Open Fellowship project ReStoRe (Reasoning about Structured Story Representations), which is focused on story-level language understanding. The overall aim of this project is to develop methods for learning graph-structured representations of stories. For this post, the specific focus will be on developing common sense reasoning strategies, based on graph neural networks, to fill the gap between what is explicitly stated in a story and what a human reader would infer by “reading between the lines”. More details about the post and instructions on how to apply are available here:
https://www.jobs.ac.uk/job/CWM298/research-associate
Background about the ReStoRe project
When we read a story as a human, we build up a mental model of what is described. Such mental models are crucial for reading comprehension. They allow us to relate the story to our earlier experiences, to make inferences that require combining information from different sentences, and to interpret ambiguous sentences correctly. Crucially, mental models capture more information than what is literally mentioned in the story. They are representations of the situations that are described, rather than the text itself, and they are constructed by combining the story text with our commonsense understanding of how the world works.
The field of Natural Language Processing (NLP) has made rapid progress in the last few years, but the focus has largely been on sentence-level representations. Stories, such as news articles, social media posts or medical case reports, are essentially modelled as collections of sentences. As a result, current systems struggle with the ambiguity of language, since the correct interpretation of a word or sentence can often only be inferred by taking its broader story context into account. They are also severely limited in their ability to solve problems where information from different sentences needs to be combined. As a final example, current systems struggle to identify correspondences between related stories (e.g. different news articles about the same event), especially if they are written from a different perspective.
To address these fundamental challenges, we need a method to learn story-level representations that can act as an analogue to mental models. Intuitively, there are two steps involved in learning such story representations: first we need to model what is literally mentioned in the story, and then we need some form of commonsense reasoning to fill in the gaps. In practice, however, these two steps are closely interrelated: interpreting what is mentioned in the story requires a model of the story context, but constructing this model requires an interpretation of what is mentioned.
The solution that is proposed in this fellowship is based on representations called story graphs. These story graphs encode the events that occur, the entities involved, and the relationships that hold between these entities and events. A story can then be viewed as an incomplete specification of a story graph, similar to how a symbolic knowledge base corresponds to an incomplete specification of a possible world. The proposed framework will allow us to reason about textual information in a principled way. It will lead to significant improvements in NLP tasks where a commonsense understanding is required of the situations that are described, or where information from multiple sentences or documents needs to be combined. It will furthermore enable a step change in applications that directly rely on structured text representations, such as situational understanding, information retrieval systems for the legal, medical and news domains, and tools for inferring business insights from news stories and social media feeds.
UCLouvain is looking for:
a postdoctoral researcher in machine learning / natural language processing
- Full-time (100%) fixed-term contract of two years
- for the Centre de traitement automatique du langage (Cental) within the
Institut Langage & Communication (IL&C) in UCLouvain (Louvain-la-Neuve)
- Start date : as soon as possible
This postdoctoral position offer is part of a research project led by the
Cental (https://uclouvain.be/fr/instituts-recherche/ilc/cental) around
legal data processing.
Regarding the concrete application, the project aims at automatizing
the analysis
of documents related to clinic trials (meeting minutes, legal documents,
contracts, ...) to assess their compliance to RGPD. The proposed solution
should thus be flexible enough to, on one hand, ensure that the model(s)
can be adapted to the various document types and, on the other hand, limit
the need of specialists' expertise for training data annotation. In
consequence, the scientific core of this project is directly related fo the
question of few-shot learning, which we intend to address through active
learning and meta-learning.
The role of the hired postdoc will be to (1) develop the resources needed
for learning, (2) implement an architecture that incorporates active
learning and meta-learning, (3) evaluate the models and (4) implement the
components into a web service. The postdoc will also be required to
disseminate the results through scientific publications and/or reports.
Work environment:
CENTAL is part of the Institut Langage & Communication (
https://uclouvain.be/fr/instituts-recherche/ilc), in UCLouvain. This
university is located in Louvain-la-Neuve, Belgium (
https://uclouvain.be/fr/sites/louvain-la-neuve), a walkable city, that offers
a pleasant and dynamic living environment. The research project will be
supervised by Patrick Watrin.
Required skills:
- A completed PhD in Computer Science, Machine Learning, NLP or a similar
domain.
- Excellent programming skills:
- Python
- TensorFlow/Keras or PyTorch
- Linux (server administration)
- Knowledge of the main supervised learning algorithms and deep learning
algorithms is required
- A good knowledge of the main NLP tools and algorithms is a plus
- Strong research track record (publications, conferences, etc.)
- Autonomy, teamwork, ability to understand and analyze needs,
adaptability
- Excellent command of the French language (at least C1) and good command
of English (at least B2)
Conditions:
- Fixed-term contract of one year, renewable once
- Salary based on experience, ranging from 4250€ to 4850€ (monthly, gross)
The position requires residency in Belgium. Candidates from outside the EU
are responsible for obtaining the adequate visa and/or permits, with support
from the UCLouvain.
How to apply:
- Deadline : February 15
- The application file should be sent electronically to Patrick Watrin (
patrick.watrin(a)uclouvain.be) and contain:
- A detailed resume showing the adequate qualifications and skills,
as well and the scientific/academic experiences and publications;
- A cover letter in french, describing your interest for the role,
how your profile complies with the project's needs, etc.;
- A recommendation letter in french or in english.
The shortlisted candidates will be invited to participate in a remote videocall
(details will be communicated in a timely manner).
The Autogramm project (https://autogramm.github.io/en) invites applications for a 3-year PhD position starting between now and October 2023. The position is funded by ANR (Agence National de la recherche), France.
Applications and questions can be sent to Sylvain Kahane <sylvain(a)kahane.fr>
Applications should include:
- Cover letter outlining interest in the position
- Names of two referees
- Curriculum Vitae (CV) with publications (if applicable)
- Copy of MA degree
- University grade sheet of at least the two last years
Today, we have databases concerning several dozen languages, including corpora annotated according to the same principle, thanks in particular to corpora annotated in interlinear gloss (IGT, see for example the Pangloss collection, https://pangloss.cnrs.fr) or with the Universal Dependencies annotation scheme (UD, https://universaldependencies.org and its SUD variant, https://surfacesyntacticud.github.io/). These databases allow typological studies and have several advantages:
- the results obtained are based directly on primary data (corpora) and not secondary data (grammars written by linguists). (This is only partially true, since the results still depend on the choices made by a linguist in selecting the corpus and annotating it; nevertheless, these choices are visible and can be discussed.)
- the results are reproducible as long as the data are freely accessible;
- the nature of the data allows for quantitative results: we will not say that a language is OV or VO, but that it has such and such a percentage of OV constructions, and we will be able to observe directly on the data which factors determine the distribution between OV and VO (Levshina 2019, Gerdes et al. 2019, Futrell et al. 2015). (See also https://typometrics.elizia.net/#/.)
The goal of the thesis topic is to contribute to the development of quantitative typology by participating in the construction of a quantitative database on a large number of typologically diverse languages and by focusing on the exploitation of such a dataset (Levshina 2022). The originality of the project lies in the fact that we are working on quantitative data and not on categorical features like existing typological databases (see in particular the Word Atlas of Language Structure online, https://wals.info/, which gives access to data on more than 2500 languages).
The following questions can be studied:
- How to identify cross-linguistic regularities, such as quantitative entailment universals, from a set of corpora of world languages (see for example Gerdes et al. 2021)? How can we make inferences between quantitatively valued features?
- What quantitative information can be extracted from a corpus that is useful for a typological study? Which features require prior annotation of the data and what is the nature of the annotations needed (see for example the case of IGT for morphosyntactic features and treebanks for word order).
- How to identify the typological signature of a language from an annotated corpus and determine what makes it special within a group of languages (see Bickel & Nichols 2002 and AutoTyp project).
- How to take into account the imbalance of a database that is not representative of the distribution of languages in the world, but includes a higher proportion of languages from certain regions or families (Indo-European languages, Semitic languages, East Asian languages, etc.) to the detriment of other regions or families (Papua New Guinea, Oceania, Sub-Saharan Africa, Amerindian languages, aboriginal languages)? (see Guzmán Naranjo & Becker 2022).
- How to solve the question of the commensurability of the categories used in the description of the different languages? How can we check the consistency of the data? This question can be addressed by studying the consistency of treebanks of the same language or language family. How to detect the presence of aberrations in some treebanks (categorization choices not conforming to the universal scheme, e.g. assignment of the subject relation in ergative languages, use of the ADJ category in languages without real adjectives, etc.)?
- How to visualize multidimensional quantitative data? Linguistic data pose many challenges.
The work will be conducted in collaboration with the members of the ANR Autogramm project (https://autogramm.github.io/), researchers in field linguistics, typology, formal linguistics and automatic language processing. It could lead, with the help of engineers, to the constitution of a typometric database accompanied by query and data visualization tools.
Bickel & Nichols 2021
Futrell 2015
Gerdes et al. 2019
Gerdes et al. 2021
Guzmán Naranjo & Becker 2022
Levshina 2019
Levshina 2022
Hello All,
Happy New Year 2023 ! Sorry for cross-posting .
Please feel free to spread a word about the PhD position on "Computational
Journalism" in my group.
Computational Social Science group (https://css.cs.ut.ee/) is looking for
motivated researchers who are interested in working on the topics
of computational journalism, especially on understanding echo
chambers, biasness in news media, fairness in news media
applications (recommendation). We expect the candidate to know one or more
aspects of the following techniques and programming languages (if not all):
(i) Preferred programming languages: Python or R.
(ii) Exploratory data analysis: feature extraction, visualization, etc.
(ii) Machine learning and deep learning with some hands-on experience.
(iv) Social media analysis: This includes collecting data
from Twitter/Reddit and analyze it for more insight. An ideal
candidate should be mindful of what's going on social media as well.
(v) Social network analysis and Natural Language Processing.
Program Benefits
================
The funding covers the student fees and a monthly stipend of 2000
Euros (gross salary) for 4 years and Tuition fee is waived.
Health insurance is provided
Academic and industrial professional development including travel support.
Interaction with world-renowned external board members and speakers.
Travel grant for attending conferences and workshops.
Location of PhD study: Institute of Computer Science, University of Tartu,
Estonia.
Institute of Computer Science is located in the University of Tartu Delta
Centre (https://delta.ut.ee/en/) and it is a unique multidisciplinary
centre for digital technology, analytics and economic thought, bringing
together more than 2500 students, university teachers, scientists and R&D
staff from companies. In short you will get an opportunity to work in a
diverse environment and collaborate with colleagues. Delta Centre opened in
January 2020 and is one of the most modern centres of digital technology,
analytical and economic thought in the Nordic region.
University of Tartu is the leading higher education and research center in
Estonia, with more than 16000 students and 1800 academic staff. It is also
the highest ranked university in the Baltic States according to both the
Times Higher Education and the QS World University rankings. University of
Tartu's Institute of Computer Science, ranks 176-200 (according to Times
Higher Education), and hosts 750 Bachelors and Masters students and 60
doctoral students. The institute has a strong international orientation:
over 40% of graduate students and a quarter of academic and research staff
members are international. Graduate teaching in the institute is in English.
Estonia is famous for its e-approach and home to many startups like Skype,
Transferwise and Bolt to name a few. Tartu, university town, is the second
largest city of Estonia and is relatively less expensive (compared to its
neighbors like Sweden and Finland) and is surrounded by nature within the
walkable distance from the city.
The applicant should have:
- Applicant should have a master's degree in computer science, mathematics
or other relevant discipline,
- Excellent programming skills.
- A good command of spoken and written English,
- Background in statistics/Data Mining/Machine Learning, social
media analysis would be ideal. Knowledge of social network analysis would
be an additional advantage.
Applications with a CV (max. 2 page), with experience in research
(publications) and knowledge of programming languages/tools, can be sent to
rajesh.sharma(a)ut.ee with the subject "PhD application".
If you have any queries, please do not hesitate to contact me.
Kind Regards
Rajesh Sharma,
Associate Professor
Head, Computational Social Science Group
Institute of Computer Science
University of Tartu, Estonia.
Group webpage https://css.cs.ut.ee/
Dear colleagues,
Happy new year ! We are extending the deadline for this call to the 15th of February. At the request of some authors, we also adapted the most recent JLM LaTeX template so that it be compatible with overleaf, it can be found here: https://fr.overleaf.com/latex/templates/template-for-journal-of-language-mo…
Please find below the updated call:
-------------
We invite researchers in the broad area of computational morphology to submit their recent, unpublished work to a special issue of the Journal of Language Modelling <https://jlm.ipipan.waw.pl/index.php/JLM><https://jlm.ipipan.waw.pl/index.php/JLM>.
Motivation:
Computational techniques have a long history of use in the study of morphology, where they have been used both for practical tasks such as the analysis and production of complex word forms and for theoretical ones such as structural and informational analysis of morphological systems. As both systems and datasets improve, these techniques are increasingly developed and evaluated on a typologically diverse array of languages, including many which are endangered or lack large-scale resources. Detailed comparisons across languages can help to reveal typological biases or assumptions within existing computational techniques [1, 2]. Alternatively, computational methods and analyses can also shed light on questions within linguistic typology [3, 4, 5, 6].
The goal of this special issue is to bring researchers from multiple communities together in exploring issues of linguistic typology across a wide range of different languages and phenomena. We encourage the submission of work on endangered or less-studied languages.
The Journal of Language Modelling is a free (for readers and authors alike) open-access peer-reviewed journal. All articles are peer-reviewed by at least 3 reviewers, usually including at least one member of the Editorial Board.
Topics of interest:
- Typological clustering or classification of languages
- Investigation of particular linguistic features which improve or detract from the performance of computational morphology tools
- Comparison of morphological structures (e.g., inflection classes, implicative networks) across typologically different languages
- Investigation of diachronic typological change using computational methods
- Creation, curation or analysis of typological databases via computational methods
Submissions:
The submissions should be journal papers, not proceedings papers, totalling 25-50 pages, excluding references.
Authors are advised to use the online manuscript submission for the journal. Make sure to select the special issue when asked to provide the article type. More information, including formatting instructions for authors can be found on the journal's webpage at: https://jlm.ipipan.waw.pl/index.php/JLM/about/submissions. An adaptation of the LaTeX template for overleaf can be found at: https://fr.overleaf.com/latex/templates/template-for-journal-of-language-mo….
Important dates:
Call for papers issued: 15/7/2022
Submissions due: 15/1/2023 --- extended to 15/02/2023
Author notification: Spring 2023
Guest editors:
Sacha Beniamine (University of Surrey)
Micha Elsner (The Ohio State University)
Katharina Kann (University of Colorado, Boulder)
References
[1] Ryan Cotterell, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner, and Mans Hulden. 2016a. The SIGMORPHON 2016 shared Task— Morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 10–22, Berlin, Germany. Association for Computational Linguistics.
[2] Huiming Jin, Liwei Cai, Yihui Peng, Chen Xia, Arya McCarthy, and Katharina Kann. 2020. Unsupervised morphological paradigm completion. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6696– 6707, Online. Association for Computational Linguistics.
[3] Neil Rathi, Michael Hahn, and Richard Futrell. 2021. An Information-Theoretic Characterization of Morphological Fusion. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10115–10120, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
[4] Parker, J., Reynolds, R., & Sims, A. (2022). Network Structure and Inflection Class Predictability: Modeling the Emergence of Marginal Detraction. In A. Sims, A. Ussishkin, J. Parker, & S. Wray (Eds.), Morphological Diversity and Linguistic Cognition (pp. 247-281). Cambridge: Cambridge University Press. DOI: 10.1017/9781108807951.010
[5] Guzmán Naranjo, Matías and Becker, Laura. Statistical bias control in typology. Linguistic Typology, to appear, 2021. DOI: 10.1515/lingty-2021-0002
[6] Sacha Beniamine. 2021. One lexeme, many classes: Inflection class systems as lattices. In Berthold Crysmann & Manfred Sailer (eds.), One-to-many relations in morphology, syntax, and semantics, 23--51. Berlin: Language Science Press. DOI: 10.5281/zenodo.4729789
Sinn und Bedeutung 28 will take place at Ruhr University Bochum (RUB) from September 5-8, 2023. The conference is jointly organized by the RUB Department of Linguistics, the Linguistic Data Science Lab, the Department of German Language and Literature, and the Departments of Philosophy I and II. The conference will feature a three-day main session (Sept. 6-8) and two parallel one-day special sessions on The Semantics and Pragmatics of Co-Speech / Co-Sign Communication and on Big Data in Semantics and Pragmatics (Sept. 5).
Conference Website: https://www.ruhr-uni-bochum.de/sub28/
Invited Speakers (main session):
— Dorothy Ahn (Rutgers University)
— Hazel Pearson (Queen Mary University of London)
— Graham Priest (City University of New York, University of Melbourne, RUB)
Invited Speakers (special sessions):
Semantics and Pragmatics of Co-Speech / Co-Sign Communication
— Cornelia Ebert (Goethe University Frankfurt)
Big Data in Semantics and Pragmatics
— Racquel Fernandez (University of Amsterdam)
Call for Papers
We invite abstract submissions for talks or posters on topics pertaining to natural language semantics, pragmatics, the syntax-semantics interface, super semantics, philosophy of language, and psycho-/neurolinguistic investigations related to meaning. We specifically welcome submissions on the semantics of under-represented languages and phenomena.
Abstracts should contain original research that, at the time of submission, has neither been published nor accepted for publication. One person can submit at most one abstract as sole author and one abstract as co-author (or two co-authored abstracts) for the main session and special session combined.
Submissions must be anonymous and must not reveal the identity of the authors in any form.
Abstracts should fit two pages (letter size or A4 paper, 2.54cm or 1 inch margins on all sides, 12 point font, Times New Roman), with an additional third page used *exclusively* for the following elements: references (obligatory), large figures or tables, as many lines of text as there are lines of glosses and translations in non-English glossed examples.
Abstracts must be submitted in PDF format via EasyChair by Wednesday, March 15, 2023 (23:59 Central European Standard Time): https://easychair.org/conferences/?conf=sub28. Easychair will open for submissions on January 15, 2023.
Note: Since Bochum gets very busy during the summer, we strongly recommend booking your accommodation as early as possible (with a cancellation option).
Important Dates:
— Submission deadline: March 15, 2023
— Notification of acceptance: May 30, 2023
— Special sessions: September 5, 2023
— Main session: September 6-8, 2023
Organizers:
— Kristina Liefke (RUB Philosophy II)
— Ralf Klabunde (RUB Linguistics, Linguistic Data Science Lab)
— Agata Renans (RUB Linguistics)
— Daniel Gutzmann (RUB German Language & Literature)
— Tatjana Scheffler (RUB German Language & Literature)
— Dolf Rami (RUB Philosophy I)
— Heinrich Wansing (RUB Philosophy I)
— Markus Werning (RUB Philosophy II)
Email: sub28(a)ruhr-uni-bochum.de <mailto:sub28@ruhr-uni-bochum.de>
---
Jun.-Prof. Dr. Tatjana Scheffler (she/her)
GB 5/157
Ruhr-Universität Bochum
Fakultät für Philologie, Germanistik
Universitätsstraße 150
44780 Bochum
Germany
Mail: tatjana.scheffler(a)rub.de
Web: http://staff.germanistik.rub.de/digitale-forensische-linguistik/
Tel.: +49 234 32-21471
Apologies for cross-posting.
----------------------------------------
*The International Conference on Spoken Language Translation*
ACL – 20th IWSLT 2023
July 13-14, 2023 – Toronto, Canada
http://iwslt.org
The International Conference on Spoken Language Translation (IWSLT) is the
premier annual conference for all aspects of Spoken Language Translation.
Every year, the conference organizes and sponsors open evaluation campaigns
around key challenges in simultaneous and consecutive translation, under
real-time/low latency or offline conditions and under low-resource or
multilingual constraints. System descriptions and results from
participants’ systems and scientific papers related to key algorithmic
advances and best practices are presented.
IWSLT is the venue of the SIGSLTs, the Special Interest Group on Spoken
Language Translation of ACL, ISCA and ELRA. With a track record of 19
years, IWSLT benchmarks and proceedings serve as references for all
researchers and practitioners working on speech translation and related
fields.
In 2023, IWSLT will be co-located with ACL and will be run as a hybrid
meeting.
Important Dates
January 14, 2023: Release of shared task training and dev data
April 24, 2023: Scientific paper submission deadline
April 01-15, 2023: Evaluation period
May 22, 2023: Notification of acceptance
June 06, 2023: Camera-ready paper due
July 12, 2023: Pre-recorded video due
July 13-14, 2023: Conference
Evaluation
IWSLT 2023 features shared tasks <https://iwslt.org/2023/#shared-tasks>
that address the following focus areas:
– Speech translation of talks
– Speech-to-speech translation of multi-source data
– Speech dubbing of multi-source data
– Dialectal and Low-resource speech translation
– Formality control for SLT
Training and development data for each shared task will be prepared and
released by the respective organizers (for further information on this
initiative, please refer to the website). Participants will receive
instructions about how to submit their runs. The results of all tasks will
be collected and discussed in an overview paper that will be presented at
the conference. In addition, participants have the opportunity to present
their work through a system paper that will be published in the ACL
Proceedings.
Conference
IWSLT also invites submissions of scientific papers to be published in the
ACL Proceedings and presented either in oral or poster format. The
conference selects high-quality, original contributions on theoretical and
practical issues of spoken language translation research, technologies and
applications.
Contact
Please send an email to iwslt-evaluation-campaign(a)googlegroups.com if you
have any questions related to the shared tasks.
Thanks,
Marcello, Alex, Jan, Sebastian, Elizabeth, Atul
(IWSLT organisers)
Dear Corpora members,
Please find below a CFP for the next "Journées de la Linguistique de
Corpus".
* The 11th International Conference on Corpus Linguistics *
3-6 July 2023, Grenoble, France
* Call for Papers *
https://jlc2023.sciencesconf.org/
The International Conference on Corpus Linguistics (JLC), founded by
Geoffrey Williams in 2001 at the University of South Brittany, Lorient,
France, regularly draws together an interdisciplinary community whose
research focus is corpus linguistics. After seven gatherings in Lorient
and an interlude in Orleans in 2015 (8th International Conference on
Corpus Linguistics), the conference alighted in Grenoble in early July
2017 and in November 2019, organized by the LIDILEM Laboratory with
contributions from LIG, ILCEA4, Litt&Arts and the MSH-Alpes. Université
Grenoble Alpes is honored to host this international conference again
from July 3rd to July 6th 2023. The JLC’23 are organized in
collaboration with other labs from French universities (Lyon,
Montpellier, Toulouse): DDL, ICAR, Praxiling, CLLE.
The objective of JLC'23 is to (re)unite a community that adopts various
approaches, be they methodological or disciplinary, to promote corpus
linguistics, and to contribute to the evolution of practices in the
field by building bridges between different approaches to digital
corpora. The participants are invited to share and compare their
knowledge of tools, experiences, and findings.
In the tradition of previous conferences, the JLC in Grenoble will offer
three days of presentations, guest speakers and discussion sessions
among the participants. Training sessions on tools and methods will be
organized over a half day.
This edition of the JLC will put a particular focus on corpora and
didactics. A part of the conference will be specifically dedicated to
this theme. We expect papers that show and question the use of corpora
in teaching, be they feedback from real uses, presentation of
methodological approaches for various audiences, or more theoretical
points of view...
These days will not be limited to this theme and will be open to all
kinds of contributions on written, oral or multimodal corpora, which may
concern, in a non-exhaustive way :
1. Linguistic approaches to corpora
2. Methods and tools
3. Variations, genres, and discourse
4. Applications and uses of corpora for teaching and learning,
translation, terminology...
Guest speakers include: Florence Mourlhon-Dallies + another speaker to
be confirmed
Submissions for a presentation or a demonstration in French or English
should not exceed three pages (excluding figures and bibliographic
references) and must be anonymous. They will get double peer-reviewing
by members of the scientific board. JLC2023 will adopt the SciencesConf
system to manage communication proposals. In addition to classic
presentations, you may also propose a demonstration (identical
submission guidelines).
Publication: following the colloquium, authors are welcome to submit an
article. This collection of articles will be reviewed and published online.
Timetable:
1. First CFP: November 2022
2. Submission deadline: *Friday February 3rd 2023*
3. Notification of acceptance: Mid-April 2023
4. Final submission version: Friday May 19th 2023
5. Registration begins: May 2023
Best regards
--
Marie-Paule Jacques /Mobilisée pour la défense du service public de
l'enseignement supérieur et de la recherche/ Maitre de conférences HDR
Sciences du langage - Senior Lecturer in Linguistics INSPE et LIDILEM
(Laboratoire de linguistique et didactique des langues étrangères et
maternelles) Université Grenoble Alpes
BIONLP 2023 and Shared Tasks @ ACL 2023
https://aclweb.org/aclwiki/BioNLP_Workshop#SHARED_TASKS_2023
*Tentative* Important Dates(All submission deadlines are 11:59 p.m.
UTC-12:00 “anywhere on Earth”)May 1, 2023: Workshop Paper Due DateJune 15,
2023: Camera-ready papers dueBioNLP 2023 Workshop at ACL, July 13 OR 14,
2023, Toronto, Canada
Please watch for the updates!
SUBMISSION INSTRUCTIONS-----------------------------------------Two types
of submissions are invited: full papers and short papers.
Full papers should not exceed eight (8) pages of text, plus unlimited
references. These are intended to be reports of original research. BioNLP
aims to be the forum for interesting, innovative, and promising work
involving biomedicine and language technology, whether or not yielding high
performance at the moment. This by no means precludes our interest in and
preference for mature results, strong performance, and thorough
evaluation. Both types of research and combinations thereof are
encouraged.
Short papers may consist of up to four (4) pages of content, plus unlimited
references. Appropriate short paper topics include preliminary results,
application notes, descriptions of work in progress, etc.
Electronic SubmissionSubmissions must be electronic and in PDF format,
using the Softconf START conference management system Submissions need to
be anonymous.
*The submission site will be announced shortly.*
Dual submission policy: papers may NOT be submitted to the BioNLP 2017
workshop if they are or will be concurrently submitted to another meeting
or publication.
WORKSHOP OVERVIEW AND
SCOPE---------------------------------------------------The BioNLP workshop
associated with the ACL SIGBIOMED special interest group has established
itself as the primary venue for presenting foundational research in
language processing for the biological and medical domains. The workshop is
running every year since 2002 and continues getting stronger. BioNLP
welcomes and encourages work on languages other than English, and inclusion
and diversity. BioNLP truly encompasses the breadth of the domain and
brings together researchers in bio- and clinical NLP from all over the
world. The workshop will continue presenting work on a broad and
interesting range of topics in NLP. The interest to biomedical language has
broadened significantly due to the COVID-19 pandemic and continues to grow:
as access to information becomes easier and more people generate and access
health-related text, it becomes clearer that only language technologies can
enable and support adequate use of the biomedical text.
BioNLP 2023 will be particularly interested in language processing that
supports DEIA (Diversity, Equity, Inclusion and Accessibility). The work on
detection and mitigation of bias and misinformation continues to be of
interest. Research in languages other than English, particularly,
under-represented languages, and health disparities are always of interest
to BioNLP.
Other active areas of research include, but are not limited to:
Tangible results of biomedical language processing applications;Entity
identification and normalization (linking) for a broad range of semantic
categories;Extraction of complex relations and events;Discourse
analysis;Anaphora/coreference resolution;Text mining / Literature based
discovery;Summarization;Τext simplification;Question Answering;Resources
and strategies for system testing and evaluation;Infrastructures and
pre-trained language models for biomedical NLP (Processing and annotation
platforms);Development of synthetic data & data augmentation;Translating
NLP research into practice;Getting reproducible results.
SHARED TASKS 2023-------------------------------------Shared Tasks on
Summarization of Clinical Notes and Scientific Articles
The first task focuses on Clinical Text.
Task 1A. Problem List SummarizationAutomatically summarizing patients’ main
problems from the daily care notes in the electronic health record can help
mitigate information and cognitive overload for clinicians and provide
augmented intelligence via computerized diagnostic decision support at the
bedside. The task of Problem List Summarization aims to generate a list of
diagnoses and problems in a patient’s daily care plan using input from the
provider’s progress notes during hospitalization.This task aims to promote
NLP model development for downstream applications in diagnostic decision
support systems that could improve efficiency and reduce diagnostic errors
in hospitals. This task will contain 768 hospital daily progress notes and
2783 diagnoses in the training set, and a new set of 300 daily progress
notes will be annotated by physicians as the test set. The annotation
methods and annotation quality have previously been reported here. The goal
of this shared task is to attract future research efforts in building NLP
models for real-world decision support applications, where a system
generating relevant and accurate diagnoses will assist the healthcare
providers’ decision-making process and improve the quality of care for
patients.
Task 1B. Radiology report summarizationRadiology report summarization is a
growing area of research. Given the Findings and/or Background sections of
a radiology report, the goal is to generate a summary (called an Impression
section) that highlights the key observations and conclusions of the
radiology study.
The research area of radiology report summarization currently faces an
important limitation: most research is carried out on chest X-rays. To
palliate these limitations, we propose two datasets: A shared summarization
task that includes six different modalities and anatomies, totalling 79,779
samples, based on the MIMIC-III database.A shared summarization task on
chest x-ray radiology reports with images and a brand new out-of-domain
test-set from Stanford.
SEE MORE at: https://vilmedic.app/misc/bionlp23/sharedtask
Task 2. Lay Summarization of Biomedical Research ArticlesBiomedical
publications contain the latest research on prominent health-related
topics, ranging from common illnesses to global pandemics. This can often
result in their content being of interest to a wide variety of audiences
including researchers, medical professionals, journalists, and even members
of the public. However, the highly technical and specialist language used
within such articles typically makes it difficult for non-expert audiences
to understand their contents.
Abstractive summarization models can be used to generate a concise summary
of an article, capturing its salient point using words and sentences that
aren’t used in the original text. As such, these models have the potential
to help broaden access to highly technical documents when trained to
generate summaries that are more readable, containing more background
information and less technical terminology (i.e., a “lay summary”).
This shared task surrounds the abstractive summarization of biomedical
research articles, with an emphasis on controllability and catering to
non-expert audiences. Through this task, we aim to help foster increased
research interest in controllable summarization that helps broaden access
to technical texts and progress toward more usable abstractive
summarization models in the biomedical domain.
For more information, see:
Main site: https://biolaysumm.org/CodaLab page - subtask 1:
https://codalab.lisn.upsaclay.fr/competitions/9541CodaLab page - subtask 2:
https://codalab.lisn.upsaclay.fr/competitions/9544
*Workshop Organizers* Dina Demner-Fushman, US National Library of
Medicine Kevin Bretonnel Cohen, University of Colorado School of Medicine
Sophia Ananiadou, National Centre for Text Mining and University of
Manchester, UK Jun-ichi Tsujii, National Institute of Advanced Industrial
Science and Technology, Japan