Fully Funded PostDoc Position in NLP / Scientific Document Analysis
The Tübingen AI Center at the University of Tübingen is looking for a motivated postdoctoral researcher interested in natural language processing for scientific document analysis. The researcher will be supervised by Prof. Andreas Geiger (University of Tübingen) and Iryna Gurevych (TU Darmstadt) and will have the opportunity to supervise Master and PhD students.
Description: The body of scientific literature is growing at an ever-increasing rate. As a result, it is increasingly difficult for researchers to keep up-to-date. This hinders scientific progress at large and leads to a suboptimal usage of resources including research funds, compute, energy and intellectual capacity. In this project, we plan to develop novel NLP methods and algorithms and to collect new datasets to advance research in scientific documents processing. Research topics include:
* Efficient hierarchical and multi-modal document representations
* Structured intra- and inter-document models
* Distillation and adaptation of LLMs for scientific document analysis and generation
* Self-supervised learning with multi-scale pre-text tasks
* Explainable and grounded scientific document models
* Deployment of algorithms and collection of datasets (www.scholar-inbox.com)
Requirements: We are looking for candidates that hold a PhD degree and who have published at top conferences in the field (ACL, EMNLP, NAACL, TACL).
About Us: The University of Tübingen is one of Germany's excellence universities with an excellence cluster on machine learning, an ELLIS Unit and the Tübingen AI Center. Embedded in the interdisciplinary research environment of CyberValley, the Autonomous Vision Group conducts curiosity-driven fundamental research, providing researchers with access to unique research facilities and great research teams. Currently, 2 PhD students are working on this project. Our culture is international, inclusive and collaborative. We are looking forward to your application!
To apply, please send your application materials including your CV, research statement, transcripts and names of referees to: a.geiger(a)uni-tuebingen.de
Dear Colleagues,
Tomorrow is the last day for Early Bird registration for DHd2024 in Passau.
Late Bird registration will be open until 18.02.2024
Please register through conftool.
More information:
https://dhd2024.dig-hum.de/registrierung/
For the DHd2024 Team in Passau
Thomas Haider
Open-Rank Faculty Positions in Natural Language Processing @INSAIT in Sofia
Join a fantastic growing team of world-class researchers: the Institute for Computer Science, Artificial Intelligence and Technology (INSAIT) in Sofia, Bulgaria is looking to establish a strong profile in Natural Language Processing (NLP). We solicit applications for multiple fulltime open-rank (both senior and junior) faculty positions in Natural Language Processing. The faculty will closely collaborate with the UKP Lab, Technical University of Darmstadt, Germany (Prof. Iryna Gurevych).
INSAIT has been founded in 2022. Its mission is to become a high-profile computer science and AI research institution. It has recently become an ELLIS unit (https://ellis.eu/units/sofia). INSAIT is structured similarly to top U.S. and European research institutions (tenure-track and tenured faculty positions, PhD duration, etc.). This is a unique opportunity with outstanding working conditions with regard to facilities, packages, resources, and salaries.
More information about the opening and the application process can be found here:
https://cra.org/job/institute-for-computer-science-artificial-intelligence-…https://ellis.eu/jobs/open-tenure-track-and-tenured-ai-faculty-positions-at…
For questions, please send a message to Iryna Gurevych (iryna dot gurevych at tu-darmstadt dot de)
Dear corpora-list,
We would like to draw your attention to the Thematic Track "AI in
Digital Humanities, Computational Social Sciences and Economics
Research (AI-HuSo)", which will take place at FedCSIS 2024 in Belgrade
from September 8-11, 2024.
The event aims to bring together research from various disciplines in
the humanities, social sciences and economics, focusing on the use of
computational methods, machine learning and AI.
Further information on the topics and the deadlines can be found here:
https://2024.fedcsis.org/thematic/ai-huso
Contact: ai-huso(a)fedcsis.org
=======================================================================
AI in Digital Humanities, Computational Social Sciences and Economics
Research (AI‑HuSo)
=======================================================================
Belgrade, Serbia, 8–11 September, 2024
This thematic track is dedicated to the computational study of Social
Sciences, Economics and Humanities, including all subjects like, for
example, education, labour market, history, religious studies,
theology, cultural heritage, and informative predictions for
decision-making and behavioural-science perspectives. While digital
methods and AI have been emerging topics in these fields for several
decades, this thematic track is not only limited to discoveries in
these domains, but also dedicated to the reflections of these methods
and results within the field of computer science. Thus, we are in
particular interested in interdisciplinary exchange and dissemination
with a clear focus on computational and AI methods.
Since there is a clear methodological overlap between these three
domains and often similar algorithms and AI approaches are considered,
we see this thematic track as place for interdisciplinary learning,
discussing a joint toolbox as a support for scholars from these field
with human and context-aware agents.
The aim of this thematic track is thus to bridge the gap between
scientific domains, foster interdisciplinary exchange and discuss how
research questions from other domains challenge current computer
science. In particular, we are interested in communications between
researchers from different fields of computer science, social
sciences, economics, humanities, and practitioners from different
fields.
Topics
The list of topics includes, but is not limited to:
- AI and computational approaches for the interdisciplinary work of
the social sciences, economics, and humanities: report on theoretical,
methodological, experimental, and applied research.
- AI and computational approaches for linking data from different
digital resources, including online social networks, web and data
mining, Knowledge Graphs, Ontologies.
- AI and computational methods for text mining and textual
analysis, for example texts within social sciences, digital literary
studies, computational stylistics and stylometry.
- Text encoding, computational linguistics, annotation guidelines,
OCR for humanities, economics, and social sciences.
- Network analysis, including social and historical network analysis.
While we encourage submissions from a broad background, every year we
also encourage submissions to two special topics. In 2024 these will be:
- Ethical and philosophical considerations of AI in society and research.
- Sociological challenges for AI in society, e.g., labour market,
education or media.
In general, the applications of interest are included in the list
below, but are not limited to:
- Labour market research and qualification, including
behavioral-science perspectives.
- Education: Digital methods and systems, e-learning, adult education, etc.
- Contributions to the application of technology to culture,
history, and societal issues: For example, computational text
analysis, analytical and visualization, databases, etc.
- In particular, we welcome submissions which focus on a critical
reflection of digital methods in the humanities, economics and social
sciences within computer science.
- Linking of digital resources, a discussion of data sets, their
quality and reliability, combining quantitative and qualitative data,
anonymization and data protection.
We are happy to announce a new special issue of the Lexique journal on
“Démonette: a French Derivational Database”, edited by Nabil Hathout and
Fiammetta Namer. The issue focuses on the results of the ANR Démonext
"Derivation in Extension" project (2018-2022) and on uses of some of
these results.
https://www.peren-revues.fr/lexique/942
The issue features the following articles:
* Nabil Hathout et Fiammetta Namer
Démonette : une base de données dérivationnelle du français
* Fiammetta Namer, Nabil Hathout, Dany Amiot, Lucie Barque, Olivier
Bonami, Gilles Boyé, Basilio Calderone, Julie Cattini, Georgette Dal,
Alexander Delaporte, Guillaume Duboisdindien, Achille Falaise, Natalia
Grabar, Pauline Haas, Frédérique Henry, Mathilde Huguin, Nyoman
Juniarta, Loïc Liégeois, Stéphanie Lignon, Lucie Macchi, Grigoriy
Manucharian, Caroline Masson, Fabio Montermini, Nadejda Okinina, Franck
Sajous, Daniele Sanacore, Mai Thi Tran, Juliette Thuilier, Yannick
Toussaint et Delphine Tribout
Démonette-2, a derivational database for French with broad lexical
coverage and fine-grained morphological descriptions
* Mathilde Huguin, Lucie Barque, Pauline Haas et Delphine Tribout
Typage sémantique des noms dans la ressource morphologique Démonette
* Basilio Calderone, Nabil Hathout et Olivier Bonami
Phonolette: a grapheme-to-phoneme converter for French
* Nabil Hathout, Fiammetta Namer, Olivier Bonami, Georgette Dal et
Stéphanie Lignon
Generation of exercises for derivational morphology using the Démonette
database
* Stéphanie Caët, Caroline Masson, Loïc Liégeois, Lucie Macchi,
Christine Da Silva-Genest et Nadejda Okinina
Explorer des corpus oraux à l’aide de la base de données Démonette-2 :
usage de mots construits dans des interactions adulte(s)-enfant(s)
* Frédérique Brin-Henry et Fiammetta Namer
Mesurer la similarité morphologique entre mot produit et mot attendu
chez les adultes avec aphasie : étude pilote
* Guillaume Duboisdindien, Julie Cattini et Georgette Dal
Améliorer les compétences lexicales dans le cadre d’un Trouble
Développemental du Langage avec la base Démonette-2
* Guillaume Duboisdindien et Georgette Dal
Programme de recherche participative DEMONEXT : partenariat et
co-construction des savoirs entre chercheurs et orthophonistes
* Bernard Fradin
Repères critiques sur « Les familles dérivationnelles : comment ça
marche ? »
* Michel Roché
Les familles dérivationnelles : comment ça marche ?
Best regards,
Fiammetta Namer et Nabil Hathout
--
CLLE, CNRS & Université de Toulouse Jean Jaurès
Maison de la Recherche. F-31058 Toulouse cedex 9
Tél. (+33) 561-504-013. Nabil.Hathout(a)univ-tlse2.fr
http://nabil.hathout.free.fr/
The body of scientific literature is growing at an ever-increasing rate. Especially in the field of artificial intelligence (AI), the number of publications is growing every month with currently more than 300 new papers every day. As a result, it is increasingly difficult for researchers to keep an overview over the current state-of-the-art. While a number of tools simplify literature retrieval today, none of them is able to deliver accurate personalized recommendations with high recall on a daily basis.
This motivated us to develop scholar-inbox.com, a personal paper recommendation system which enables researchers to stay up-to-date with the most relevant progress by delivering personal suggestions directly to your inbox - free of charge. Scholar Inbox learns which papers you like and dislike and recommends papers to you that are similar to papers you like based on neural language processing techniques. Scholar Inbox focuses entirely on the written contents and hence also suggests relevant papers that don't receive social media hype.
https://www.scholar-inbox.com/https://sites.google.com/view/avg-blog/scholar-inbox/
We hope that you will find this tool as useful as we find it ourselves!
Andreas Geiger & Scholar Inbox Team
University of Tübingen
Apologies for cross-posting.
---------------------------------------------------------------------------
*1st Workshop on NLP for Indigenous Languages of Lusophone Countries
(ILLC-NLP 2024) -- 2nd CFP*
*January 17, 2024: Papers submission due (extended)*
February 01, 2024: Notification of Acceptance
March 12, 2024: Workshop
Workshop website: https://sites.google.com/view/illc-nlp-2024/home
<https://sites.google.com/view/illc-nlp-2024/home>
Co-located with PROPOR 2024 <https://propor2024.citius.gal/> in Santiago
de Compostela
——————————————————————————————————
*Overview and goals:*
The workshop aims to explore, discuss, and enhance the development of
resources, methods, and applications of NLP for indigenous languages,
especially those spoken or that have influenced languages spoken in
countries where Portuguese is currently the official language. We hope to
contribute to the preservation and promotion of these languages.
This is one of several initiatives to expand knowledge and research
in NLP for underrepresented languages. We encourage the participation of
everyone who shares an interest in preserving and enriching the linguistic
and cultural heritage of indigenous languages in a broad sense. This way,
we welcome the submission of works including languages from all
Portuguese-speaking nations, like those of African origin in Angola,
Mozambique, and the Atlantic islands, as well as minority languages in
Portugal.
*Submissions*:
IILC-NLP seeks submissions under the following categories:
- Full papers: 8 pages+unlimited reference
- Short papers (work in progress, innovative ideas/proposals, research
ideas): 4 pages+unlimited reference
- Submissions should be written in English. At submission time,
papers must be in PDF format only. For the final versions, authors of
accepted papers will be given one extra content page to consider the
reviews. Authors of accepted papers will be requested to send the source
files to produce the proceedings. All submitted papers must conform to the
official ACL style guidelines (Latex
<https://github.com/acl-org/acl-style-files/tree/master/latex> or Word
<https://github.com/acl-org/acl-style-files/tree/master/word>).
Both long and short papers will be published in the ACL Anthology.
Submission site: https://easychair.org/conferences/?conf=illcnlp2024
Reviewing format: At least two reviewers will evaluate each submission.
The reviewing format will be single-blind.
Please help us spread the word about this event by sharing this call with
your contacts and institutions. Your participation and support are crucial
for the success of this workshop.
Sincerely,
Aline Paes, Aline Villavicencio, Claudio Pinhanez, Edward Gow-Smith, Paulo
Rodrigo Cavalin (Workshop organisers)
-------------------------------------------------------------------------------------------------
*Profa. Dra. Aline Paes (she/her)*
*Associate professor - Computer Science (Artificial Intelligence)*
Institute of Computing / Universidade Federal Fluminense (IC/UFF)
Member of CE-PLN <https://sites.google.com/view/ce-pln/inicio> and BPLN
<https://brasileiraspln.com/>
CNPq PQ-2 and FAPERJ JCNE
__________________________________________________________
url: www.ic.uff.br/~alinepaes
Av Gal Milton Tavares de Souza, S/N, Computing Building, Office 504
São Domingos, Niterói, RJ, Brazil. ZIP 24210-346
-------------------------------------------------------------------------------------------------
****Please do not feel any pressure to respond out of your own regular
working hours. Remember that this is supposed to be an asynchronous tool***
We invite proposals for tasks to be run as part of SemEval-2025
<https://semeval.github.io/SemEval2025/>. SemEval (the International
Workshop on Semantic Evaluation) is an ongoing series of evaluations of
computational semantics systems, organized under the umbrella of SIGLEX
<https://siglex.org/>, the Special Interest Group on the Lexicon of the
Association for Computational Linguistics.
SemEval tasks explore the nature of meaning in natural languages: how to
characterize meaning and how to compute it. This is achieved in practical
terms, using shared datasets and standardized evaluation metrics to
quantify the strengths and weaknesses and possible solutions. SemEval tasks
encompass a broad range of semantic topics from the lexical level to the
discourse level, including word sense identification, semantic parsing,
coreference resolution, and sentiment analysis, among others.
For SemEval-2025 <https://semeval.github.io/SemEval2025/cft>, we welcome
tasks that can test an automatic system for the semantic analysis of text
(e.g., intrinsic semantic evaluation, or an application-oriented
evaluation). We especially encourage tasks for languages other than
English, cross-lingual tasks, and tasks that develop novel applications of
computational semantics. See the websites of previous editions of SemEval
to get an idea about the range of tasks explored, e.g. SemEval-2020
<http://alt.qcri.org/semeval2020/> and SemEval-2021-/2023/2024
<https://semeval.github.io/>.
We strongly encourage proposals based on pilot studies that have already
generated initial data, evaluation measures and baselines. In this way, we
can avoid unforeseen challenges down the road which that may delay the task.
In case you are not sure whether a task is suitable for SemEval, please
feel free to get in touch with the SemEval organizers at
semevalorganizers(a)gmail.com to discuss your idea.
=== Task Selection ===
Task proposals will be reviewed by experts, and reviews will serve as the
basis for acceptance decisions. Everything else being equal, more
innovative new tasks will be given preference over task reruns. Task
proposals will be evaluated on:
- Novelty: Is the task on a compelling new problem that has not been
explored much in the community? Is the task a rerun, but covering
substantially new ground (new subtasks, new types of data, new languages,
etc.)?
- Interest: Is the proposed task likely to attract a sufficient number
of participants?
- Data: Are the plans for collecting data convincing? Will the resulting
data be of high quality? Will annotations have meaningfully high
inter-annotator agreements? Have all appropriate licenses for use and
re-use of the data after the evaluation been secured? Have all
international privacy concerns been addressed? Will the data annotation be
ready on time?
- Evaluation: Is the methodology for evaluation sound? Is the necessary
infrastructure available or can it be built in time for the shared task?
Will research inspired by this task be able to evaluate in the same manner
and on the same data after the initial task?
- Impact: What is the expected impact of the data in this task on future
research beyond the SemEval Workshop?
-
Ethical: The data must be compliant with privacy policies. e.g.
a) avoid personally identifiable information (PII). Tasks aimed at
identifying specific people will not be accepted,
b) avoid medical decision making (compliance with HIPAA, do not try to
replace medical professionals, especially if it has anything to do with
mental health)
c) these are representative and not exhaustive
=== New Tasks vs. Task Reruns ===
We welcome both new tasks and task reruns. For a new task, the proposal
should address whether the task would be able to attract participants.
Preference will be given to novel tasks that have not received much
attention yet.
For reruns of previous shared tasks (whether or not the previous task was
part of SemEval), the proposal should address the need for another
iteration of the task. Valid reasons include: a new form of evaluation
(e.g. a new evaluation metric, a new application-oriented scenario), new
genres or domains (e.g. social media, domain-specific corpora), or a
significant expansion in scale. We further discourage carrying over a
previous task and just adding new subtasks, as this can lead to the
accumulation of too many subtasks. Evaluating on a different dataset with
the same task formulation, or evaluating on the same dataset with a
different evaluation metric, typically should not be considered a separate
subtask.
=== Task Organization ===
We welcome people who have never organized a SemEval task before, as well
as those who have. Apart from providing a dataset, task organizers are
expected to:
- Verify the data annotations have sufficient inter-annotator agreement
- Verify licenses for the data allow its use in the competition and
afterwards. In particular, text that is publicly available online is not
necessarily in the public domain; unless a license has been provided, the
author retains all rights associated with their work, including copying,
sharing and publishing. For more information, see:
https://creativecommons.org/faq/#what-is-copyright-and-why-does-it-matter
- Resolve any potential security, privacy, or ethical concerns about the
data
- Commit to make the data available after the task
- Provide task participants with format checkers and standard scorers.
- Provide task participants with baseline systems to use as a starting
point (in order to lower the obstacles to participation). A baseline system
typically contains code that reads the data, creates a baseline response
(e.g. random guessing, majority class prediction), and outputs the
evaluation results. Whenever possible, baseline systems should be written
in widely used programming languages and/or should be implemented as a
component for standard NLP pipelines.
- Create a mailing list and website for the task and post all relevant
information there.
- Create a CodaLab or other similar competition for the task and upload
the evaluation script.
- Manage submissions on CodaLab or a similar competition site.
- Write a task description paper to be included in SemEval proceedings,
and present it at the workshop.
- Manage participants’ submissions of system description papers, manage
participants’ peer review of each others’ papers, and possibly shepherd
papers that need additional help in improving the writing.
- Review other task description papers.
- Define Roles for each Organizer:
- Lead Organizer - main point of contact, expected to ensure
deliverables are met on time and participate in contributing to
task duties
(see below).
- Co-Organizers - provide significant contributions to ensuring the
task runs smoothly. Some examples include, maintaining communication with
task participants, preparing data, creating and running
evaluation scripts,
and leading paper reviewing and acceptance.
- Advisory Organizers - more of a supervisor role, may not contribute
to detailed tasks but will provide guidance and support.
=== Important dates ===
- Task proposals due March 31, 2024 (Anywhere on Earth)
- Task selection notification May 18, 2024
=== Preliminary timetable ===
- Sample data ready July 15, 2024
- Training data ready September 1, 2024
- Evaluation data ready December 1, 2024 (internal deadline; not for public
release)
- Evaluation starts January 10, 2025
- Evaluation end by January 31, 2025 (latest date; task organizers may
choose an earlier date)
- Paper submission due February 2025
- Notification to authors on March 2025
- Camera-ready due April 2025
- SemEval workshop Summer 2025 (co-located with a major NLP conference)
Tasks that fail to keep up with crucial deadlines (such as the dates for
having the task and CodaLab website up and dates for uploading sample,
training, and evaluation data) or that diverge significantly from the
proposal may be cancelled at the discretion of SemEval organizers. While
consideration will be given to extenuating circumstances, our goal is to
provide sufficient time for the participants to develop strong and
well-thought-out systems. Cancelled tasks will be encouraged to submit
proposals for the subsequent year’s SemEval. To reduce the risk of tasks
failing to meet the deadlines, we are unlikely to accept multiple tasks
with overlap in the task organizers.
=== Submission Details ===
The task proposal should be a self-contained document of no longer than 3
pages (plus additional pages for references). All submissions must be in
PDF format, following the ACL template
<https://github.com/acl-org/acl-style-files>.
Each proposal should contain the following:
- Overview
- Summary of the task
- Why this task is needed and which communities would be interested
in participating
- Expected impact of the task
- Data & Resources
- How the training/testing data will be produced. Please discuss whether
existing corpora will be re-used.
- Details of copyright, so that the data can be used by the research
community both during the SemEval evaluation and afterwards
- How much data will be produced
- How data quality will be ensured and evaluated
- An example of what the data would look like
- Resources required to produce the data and prepare the task for
participants (annotation cost, annotation time, computation time, etc.)
- Assessment of any concerns with respect to ethics, privacy, or
security (e.g. personally identifiable information of private
individuals;
potential for systems to cause harm)
- Pilot Task (strongly recommended)
- Details of the pilot task
- What lessons were learned and how these will impact the task design
- Evaluation
- The evaluation methodology to be used, including clear evaluation
criteria
- For Task Reruns
- Justification for why a new iteration of the task is needed (see
criteria above)
- What will differ from the previous iteration
- Expected impact of the rerun compared with the previous iteration
- Task organizers
- Names, affiliations, email addresses
- (optional) brief description of relevant experience or expertise
- (if applicable) years and task numbers, of any SemEval tasks you
have run in the past
- Role of each organizer
Proposals will be reviewed by an independent group of area experts who may
not have familiarity with recent SemEval tasks, and therefore all proposals
should be written in a self-explanatory manner and contain sufficient
examples.
*The submission webpage is:* SemEval2025 Task Proposal Submission
<https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/SemEval> (
https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/SemEval)
For further information on this initiative, please refer to
https://semeval.github.io/SemEval2025/cft
=== Chairs ===
Atul Kr. Ojha, Insight SFI Centre for Data Analytics, DSI, University of
Galway
A. Seza Doğruöz, Ghent University
Giovanni Da San Martino, University of Padua
Harish Tayyar Madabushi, The University of Bath
Sara Rosenthal, IBM Research AI
Aiala Rosá, Universidad de la República - Uruguay
Contact: semevalorganizers(a)gmail.com