Hi everyone
City, University of London are looking for postdoc NLP research fellow to work on the VISION - Violence, Health and Society project, to develop methods for extraction of information on violence from public sector records. Details below. Please circulate to anyone you think might be interested.
https://www.city.ac.uk/about/jobs/apply/details.html?nPostingId=1579&nPosti…
best wishes
--
Angus
> [Apologies for cross-posting]
> ======================================================================
> EXTENDED DEADLINE TO **JULY 31**
> ======================================================================
>
> SIMBig 2023 - 10th International Conference on Information Management and Big Data
> Where: Instituto Politécnico Nacional, Mexico DF, MEXICO
> When: October 18 - 20, 2023
> Website: https://simbig.org/SIMBig2023/
>
> ============================…
[View More]==========================================
>
> OVERVIEW
> ----------------------------------
>
> SIMBig 2023 seeks to present new methods of Artificial Intelligence (AI), Data Science, Machine Learning, Natural Language Processing, Semantic Web, and related fields, for analyzing, managing, and extracting insights and patterns from large volumes of data.
>
>
> KEYNOTE SPEAKERS (to be confirmed)
> ----------------------------------
>
 Mona Diab, Meta AI, USA
 Huan Liu, Arizona State University, USA
>
> and more to be announced soon...
>
> IMPORTANT DATES
> ----------------------------------
>
> July 24, 2023 July 31, 2023 --> Full papers and short papers due
> August 28, 2023 --> Notification of acceptance
> September 10, 2023 --> Camera-ready versions
> October 18 - 20, 2023 --> Conference held in Mexico DF, Mexico
>
> PUBLICATION
> ----------------------------------
>
> All accepted papers of SIMBig 2023 (tracks including) will be published with Springer CCIS Series <https://www.springer.com/series/7899> (to be confirmed).
>
> Best papers of SIMBig 2023 (tracks including) will be selected to submit an extension to be published in the Springer SN Computer Science Journal. <https://www.springer.com/journal/42979>
 
> TOPICS OF INTEREST
> ----------------------------------
>
> SIMBig 2023 has a broad scope. We invite contributions on theory and practice, including but not limited to the following technical areas:
>
> Artificial Intelligence
> Big/Masive Data
> Data Science
> Machine Learning
> Deep Learning
> Natural Language Processing
> Semantic Web
> Data-driven Software Engineering
> Data-driven software adaptation
> Healthcare Informatics
> Biomedical Informatics
> Data Privacy and Security
> Information Retrieval
> Ontologies and Knowledge Representation
> Social Networks and Social Web
> Information Visualization
> OLAP and Business intelligence
> Crowdsourcing
>
> SPECIAL TRACKS
> ----------------------------------
>
> SIMBig 2023 proposes six special tracks in addition to the main conference:
>
> ANLP <https://simbig.org/SIMBig2023/en/anlp.html> - Applied Natural Language Processing
> DISE <https://simbig.org/SIMBig2023/en/dise.html> - Data-Driven Software Engineering
> EE-AI-HPC <https://simbig.org/SIMBig2023/en/eeaihpc.html> - Efficiency Enhancement for AI and High-Performance Computing
> SNMAM <https://simbig.org/SIMBig2023/en/snmam.html> - Social Network and Media Analysis and Mining
>
> CONTACT
> ----------------------------------
>
> SIMBig 2023 General Chairs
>
> Juan Antonio Lossio-Ventura, National Institutes of Health, USA (juan.lossio(a)nih.gov <mailto:juan.lossio@nih.gov>)
> Hugo Alatrista-Salas, Pontificia Universidad Católica del Perú, Peru (halatrista(a)pucp.pe <mailto:halatrista@pucp.pe>)
[View Less]
15th meeting of Forum for Information Retrieval Evaluation HASOC-2023
We are excited to announce the 5th edition of HASOC, consisting of four interesting shared tasks. We invite you to participate.
Task 1 focuses on identifying hate speech, offensive language, and profanity in different languages using natural language processing techniques.
* Task 1A is identifying hate and offensive content in Sinhala, a low-resource Indo-Aryan language spoken mainly in Sri Lanka. The task involves …
[View More]classifying tweets into Hate and Offensive (HOF) or Non-Hate and Offensive (NOT). The training set for this task is based on the Sinhala Offensive Language Detection dataset, which contains 10,000 tweets.
* Task 1B focuses on identifying hate and offensive content in Gujarati, another low-resource Indo-Aryan language spoken by approximately 50 million people in India. Similarly, participants need to classify tweets into HOF or NOT categories. The training set for this task consists of around 200 tweets.
For more details, please visit task 1 page<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>.
Task 2, Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL), addresses the challenge of identifying hate speech and offensive content in code-mixed conversations on social media. Code-mixed text includes multiple languages within a single conversation. The task is divided into two subtasks.
* In Task 2a, participants need to perform binary classification on conversational tweets with tree-structured data. They must determine whether a tweet, comment, or reply contains hate speech, offensive language, or profanity (HOF) or is non-hate and offensive (NOT). The classification should consider both the individual content and support for hate expressed in the parent tweet.
* Task 2b involves the classification of conversational tweets with tree-structured data into specific forms of hate. Participants must identify if the tweet, comment, or reply contains standalone hate (SHOF), contextual hate (CHOF) that supports hate expressed in the parent, or if it is non-hate (NONE).
For more details, please visit Task 2 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Task 3 aims to detect hateful spans within a sentence already considered hateful. A hate span is a set of continuous tokens that, in tandem, communicate the explicit hatefulness in a sentence.
* For instance, in the statement, "Women ... Can't live with them... Can't shoot them," the portion highlighted in bold will be considered a hateful span. This shared task aims to extract all such spans from a hateful text.
* The input texts are all in English. The detection of hateful spans is achieved by mapping this into a sequence labeling problem. For every token of the sequences, we have manually annotated the start and end of a hateful span. This is achieved by the BIO notation tagging, where B' represents the beginning of the hate span,' I' forms the continuation of a hate span, and' O' represents the non-hate tag. The task is then to learn the correct sequence of the BIO tags for a given sentence. For example, in the above sentence, the tag sequence for the preprocessed sentence will be of the form "women can't live with them can't shoot them" → "O O O O O B I I"; "I" notation cannot exist on its own and will always be preceded by either an "I" or "B". Consequently, a “B” notation can be immediately followed by an “O” in case the span is just a single word.
For more details, please visit Task 3 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese languages. It is a binary classification task. Each dataset (for the three languages) consists of a list of sentences with their corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is primarily collected from Twitter, Facebook, and Youtube comments.
The Macro F1 score will be the yardstick of the task. Team rank will be determined based on the Macro F1 score of the first part.
For more details, please visit Task 4 webpage.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
Registration for all four tasks is open on our registration page.<https://url6.mailanyone.net/scanner?m=1qOOEH-000CON-4S&d=4%7Cmail%2F14%2F16…>
We believe that your expertise and contribution will be invaluable in advancing the state-of-the-art hate speech classification. We encourage you to participate in this exciting shared task and contribute to the research community.
Regards,
HASOC organizing team
[View Less]
Apologies for multiple posting
***********************************
*------------------------------------------------------------------------------------------------Machine
Translation for Indian Languages (MTIL)
2023------------------------------------------------------------------------------------------------*
We invite all IR and NLP researchers and enthusiasts to participate in the
MTIL track (https://mtilfire.github.io/mtil/2023/) held in conjunction with
the Forum for Information …
[View More]Retrieval Evaluation (FIRE) 2023 (
http://fire.irsi.res.in/).
Indian languages have many linguistic complexities. Though some Indian
languages share syntactic similarities, some possess intricate
morphological structures. At the same time, some Indian languages are
low-resource. Therefore the machine translation models should address these
unique challenges in translating between Indian languages.
The MTIL track consists of two tasks:
1. *General Translation Task (Task 1):* Task participants should build a
machine translation model to translate sentences of the following language
pairs:
1. Hindi-Gujarati
2. Hindi-Kannada
3. Kannada-Hindi
4. Hindi-Odia
5. Odia-Hindi
6. Hindi-Punjabi
7. Punjabi-Hindi
8. Hindi-Sindhi
9. Urdu-Kashmiri
10. Telugu-Hindi
11. Hindi-Telugu
12. Urdu-Hindi
13. Hindi-Urdu
2. *Domain Specific Translation Task (Task 2)*: Task participants will
build machine translation models for Governance and Healthcare domains.
1. Healthcare:
a. Hindi-Gujarati
b. Kannada-Hindi
c. Hindi-Odia
d. Odia-Hindi
e. Hindi-Punjabi
f. Kannada-Hindi
2. Governance:
a. Hindi-Gujarati
b. Kannada-Hindi
c. Hindi-Odia
d. Odia-Hindi
e. Hindi-Punjabi
f. Kannada-Hindi
*Dataset:*
The primary source of parallel language pairs is Bharat Parallel Corpus
Collection (BPCC), released by AI4Bharat (https://ai4bharat.iitm.ac.in/bpcc
).
Participants are encouraged to add datasets of their choice, including
parallel corpora and monolingual datasets, to train their models.
More information on registration and participation in the track can be
found here: https://mtilfire.github.io/mtil/2023/
This track is being done in association with BHASHINI (
https://bhashini.gov.in/)
*Organisers*
- Prasenjit Majumder, DAIICT Gandhinagar,India and TCG CREST,
Kolkata,India
- Arafat Ahsan, IIIT-Hyderabad,India
- Asif Ekbal, IIT-Patna,India
- Saran Pandian, DAIICT Gandhinagar,India
- Ramakrishna Appicharla, IIT-Patna ,India
- Surupendu Gangopadhyay, DAIICT Gandhinagar,India
- Ganesh Epili, DAIICT Gandhinagar,India
- Dreamy Pujara, DAIICT Gandhinagar,India
- Misha Patel, DAIICT Gandhinagar,India
- Aayushi Patel, DAIICT Gandhinagar,India
- Bhargav Dave, DAIICT Gandhinagar,India
- Mukesh Jha, DAIICT Gandhinagar,India
[View Less]
We are seeking a highly motivated and talented individual to join our research team as a Postdoctoral Researcher in the field of Natural Language Processing. The position offers an exciting opportunity to investigate the computational and algorithmic aspects underlying modern Artificial Intelligence systems, with a specific focus on the algorithmic and application aspects of NLP-based technologies. Successful candidates will work closely with Prof. Debora Nozza, Prof. Dirk Hovy, and the MilaNLP …
[View More]lab.
Your profile:
- a Ph.D. in Computer Science, Computational Linguistics/NLP, Machine Learning, Data Science, or related fields.
- Excellent programming skills in Python.
- Fluency in spoken and written English. Knowledge of Italian is NOT a requirement.
- Knowledge of current neural network models and implementation tools for neural networks (e.g., PyTorch).
- Experience with publications in top-tier venues in the field of NLP/Computational Linguistics.
Position Details:
- Starting date: Oct 1 2023, or any time thereafter
- Duration: 2 years
- Deadline: 1st September 2023
- Competitive Salary: Applicants from outside Italy may qualify for a researcher taxation scheme
How to apply:
Go to the Bocconi postdoc job market page https://jobmarket.unibocconi.eu/?type=a&urlBack=/wps/wcm/connect/Bocconi/Si… and search for “Natural language processing”, where you can also find the official job description. Candidates should attach publications and a cover letter to their application.
Online interviews will take place during September 2023. Please contact debora.nozza(at)unibocconi.it if you have any questions.
[View Less]
Dear colleagues,
Our research group TurkuNLP at the University of Turku, Finland, has an opening for *a post doc position in corpus linguistics or NLP.*.
The position is part of the research project "Massively Multilingual Modeling of Registers in Web-Scale Data," (MMMReg) which is funded by the Academy of Finland. The project aims to explore language use in the digital world at a massively multilingual scale using neural networks. The specific focus of the project is on web registers, such …
[View More]as news, blogs, and how-to pages. The primary goals of the project are to analyze the linguistic characteristics of web registers across languages and to develop machine learning methods for modeling registers in large web datasets at a massively multilingual scale.
The position is for one year, starting on September 1, 2023.
The closing date for the applications is August 7, 2023 (UTC+3)
For more information on the position, please visit https://www.utu.fi/en/university/come-work-with-us/open-vacancies
Do not hesitate to get in touch if you have any questions!
Best regards,
Veronika Laippala
TurkuNLP, University of Turku, Finland
[View Less]
Third Workshop DL4LD 2023
Deep learning for linguistic linked data: Addressing Deep Learning,
Relation Extraction, and Linguistic Data with a Case Study on The Bigger
Analogy Test Set (BATS) https://vecto.space/projects/BATS/
Venue: Vienna, Austria, University of Vienna & online
Website: http://dl4ld2023.mruni.eu/
Date: 13 September 2023
The Cost Action CA18209 NexusLinguarum ( https://nexuslinguarum.eu )
invites you to attend the Third Workshop Deep Learning for Linguistic
Linked Data: …
[View More]Addressing Deep Learning, Relation Extraction, and
Linguistic Data with a Case Study on The Bigger Analogy Test Set (BATS)
– DL4LD 2023, organized as part of LDK 2023 ( http://2023.ldk-conf.org ).
We are glad to announce that the program features one keynote, Assistant
Professor Michael Cochez, and seven oral presentations.
DL4LD 2023 will be a hybrid event (in-person and online) open to anyone
interested in the topic. Online participation is still possible but
requires prior registration. The registration form (
https://ldk-registration.univie.ac.at ) for online participation will be
open until 3rd September 2023.
We are very much looking forward to seeing you in Vienna or online.
Program available here: http://dl4ld2023.mruni.eu/?page_id=323
[View Less]
The European Chapter of the ACL (EACL), the North American Chapter of the
ACL - Human Language Technologies (NAACL-HLT), the Association for
Computational Linguistics (ACL), and the Conference on Empirical Methods in
Natural Language Processing (EMNLP) invite proposals for tutorials in
conjunction with EACL 2024, NAACL-HLT 2024, ACL 2024, and EMNLP 2024. We
seek proposals in all areas of computational linguistics, broadly conceived
to include related disciplines. We invite proposals for two …
[View More]types of
tutorials:
Cutting-edge: tutorials that cover advances in newly emerging areas not
previously covered in any EACL/NAACL-HLT/ACL/EMNLP related tutorial (see
the list of tutorials in the past 4 years).
Introductory to computational linguistics/NLP: tutorials that provide
introductions to related fields that are potentially relevant for the
computational linguistics community (e.g., linguistics, bioinformatics,
machine learning techniques, large language models for Non-English
languages).
In both cases, the aim of a tutorial is primarily to help understand a
scientific problem, its tractability, and its theoretical and practical
implications. Presentations of particular technological solutions or
systems are welcome, provided that they serve as illustrations of broader
scientific considerations.
Tutorials will be held at one of the following conference venues:
* EACL 2024 in Malta on March 17-22, 2024
* NAACL-HLT 2024 in Mexico City on June 16-21, 2024
* ACL 2024 in Bangkok on August 12-17th, 2024
* EMNLP 2024 in location TBD, late in 2024
Important Dates
EACL/NAACL-HLT/ACL/EMNLP 2024 shared dates:
Proposal submission deadline: September 1, 2023
Notification of acceptance: October 1, 2023
All deadlines are 11:59PM UTC-12:00 (“anywhere on Earth”).
Fee Waiving
Up to 3 instructors per tutorial can have their registration fees waived
for the main conference and any subset of co-located tutorials and
workshops.
Diversity And Inclusion
To foster a really inclusive culture in our field, we particularly
encourage submissions from members of underrepresented groups in
computational linguistics, i.e., researchers from any demographic or
geographic minority, researchers with disabilities, among others. The
overall diversity of the tutorial organizers and potential audience will be
taken into account to ensure that the conference program is varied and
balanced.
Tutorial proposals should describe and will be evaluated according to how
the tutorial contributes to topics promoting diversity (e.g., working on
minority languages or groups), participation diversity (e.g., coordinating
with social affinity groups, providing subsidies, making a promotional plan
for the tutorial), and representation diversity among tutorial presenters.
For more information or advice, organizers may consult resources such as
the BIG directory, Black in AI, Disability in AI, Indigenous AI, LatinX in
AI, Masakhane, 500 Queer Scientists, and Women-in-ML’s directory.
Submission Details
Proposals should use the ACL paper submission format. Authors can download
the LaTeX or Word template or use the Overleaf template. Proposals should
not exceed 4 pages of content (plus unlimited pages for references), should
be submitted as PDF documents, and should contain the following:
* A title and authors, affiliations, and contact information.
* A brief description of the tutorial content and its relevance to the
computational linguistics community.
* Type of the tutorial: cutting-edge vs introductory.
* Briefly describe the target audience and any expected prerequisite
background the audience should be aware of. Specification of any
prerequisites for the attendees. Here are some examples:
** Math: e.g., “Understand derivatives and integrals as found in
introductory calculus”
** Linguistics: e.g., “Be able to parse and generate text with dependency
grammars”
** Machine Learning: e.g., “Understand ‘classical’ supervised methods such
as SVM and perceptron”
** Other areas: e.g., “Familiarity with word2vec”
** Programming or other tools: e.g., “Knowledge of Pytorch and Unix command
line tools”
* An outline of the tutorial structure content and how it will be covered
in a three-hour slot. In exceptional cases six-hour tutorial slots are
available. These time limits do not include coffee breaks, e.g., a
three-hour tutorial in fact occupies a 3.5-hour slot, and a six-hour
tutorial occupies a 7-hour slot.
* Explain how the tutorial includes other people’s work. We recommend that
the tutorial covers work by the presenters as well as by other researchers.
The submission should explain how this breadth is ensured. Tutorials should
not be “self-invited talks”.
* Diversity considerations, e.g., use of multilingual data, indications of
how the described methods scale up to various languages or domains,
participation of both senior and junior instructors, demographic and
geographical diversity of the instructors, plans for how to diversify
audience participation, etc.
* Reading list. Work that you expect the audience to read before the
tutorial can be indicated by an asterisk. Recommended papers should provide
breadth of authorship and include work by other authors, and work from
other disciplines is welcome if relevant.
* For each tutorial presenter, a one-paragraph statement of their research
interests and areas of expertise for the tutorial topic, as well as
experience in instructing an international audience.
* An estimate of the audience size for the tutorial. If the same or a
similar tutorial has been given before, include information on where any
previous version of the tutorial was given and how many attendees the
tutorial attracted.
* A note specifying which venue(s) (EACL/NAACL-HLT/ACL/EMNLP) would be
acceptable and/or preferable. Include a description of any constraints that
might make the tutorial compatible with only one of these events,
logistically, thematically, or otherwise.
* A description of special requirements for technical equipment.
* We intend to make tutorial presentation materials publically available
(e.g., tutorial slides, captioned video recording, as well as software,
data, or other resources as applicable) in the ACL Anthology. If any of
your tutorial materials cannot be shared, please explain.
* An ethics statement that discusses ethical considerations related to the
topics of the tutorial.
* OPTIONAL: We welcome proposals on special conference themes. If your
tutorial proposal aligns with the special theme of a conference, then
please explain.
* OPTIONAL: We invite tutorial instructors to include pedagogical material
that the audience can bring into classrooms or similar spaces of
discussion, to bring attention to the tutorial topic (e.g., a hands-on
exercise, discussion questions, a demo, or an assignment). If you would
like to provide this, then please explain.
Tutorial proposals should be submitted online using the softconf system at
the following link:
https://softconf.com/n/acl-tutorials2024.
Proposals will be reviewed jointly by the Tutorial Co-Chairs of the
conferences and by a group of external experts.
Evaluation Criteria
Each tutorial proposal will be evaluated according to its clarity and
preparedness, novelty or timely character of the topic, instructors’
experience, likely audience interest, open access of the tutorial
instructional material, and diversity and inclusion.
Instructor Responsibilities
Accepted tutorial presenters will be notified by October 1st, 2023. They
must then provide abstracts of their tutorials for inclusion in the
conference registration material by the specific conference deadlines. The
description should be in two formats: (a) an ASCII version that can be
included in email announcements and published on the conference website,
and (b) a PDF version for inclusion in the electronic proceedings (detailed
instructions will be provided). Tutorial speakers must provide tutorial
materials (e.g., slides, relevant list of papers) at least one month prior
to the date of the tutorial conditioned on the final venue. The final
submitted tutorial materials must minimally include copies of the course
slides and a bibliography for the material covered in the tutorial. After
the conference, the presenters will be invited to update their slides in
the ACL Anthology (if needed).
Tutorial Chairs
EACL
Sharid Loáiciga <sharid.loaiciga(a)gu.se>
Mohsen Mesgar <mohsen.mesgar(a)bosch.com>
NAACL-HLT
Rui Zhang <rmz5227(a)psu.edu>
Nathan Schneider <nathan.schneider(a)georgetown.edu>
Snigdha Chaturvedi <snigdha(a)cs.unc.edu>
ACL
Luis Chiruzzo <luis.chiruzzo(a)gmail.com>
Hung-yi Lee <tlkagkb93901106(a)gmail.com>
Leonardo Ribeiro <leonardofribeiro(a)gmail.com>
EMNLP
TBA
[View Less]
Dear all,
With apologies for cross posting: The New Directions in Analyzing Text as
Data (TADA) meeting is a leading forum for interdisciplinary research on
the study of politics, society, and culture through computational analysis
of documents. TADA 2023 invites applications for research presentations on
new work related to text-as-data methods and applications. Our programs
from past meetings (e.g. 2022) <https://tada2022.org/>demonstrate this
community’s history of bringing together …
[View More]researchers, practitioners, and
scholars from many fields.
*Key* *Dates*
Submission deadline: August 4th
Notification of acceptance: August 28th
Registration opens: September 5th
Papers circulated to discussants: October 26th
Conference: November 9th and 10th, Amherst, MA
*Abstract Submissions*
Proposals to present work are due August 4, and consist of a brief,
300-word abstract in text format rather than a full paper. TADA 2023 is a
non-archival conference; there are no formal proceedings, and papers
presented at the conference will not be distributed publicly by the
conference. Presenters are expected to provide a paper to their discussant
two weeks before the conference. We welcome any work, so long as it hasn’t
been previously presented at a TADA conference. We also welcome individuals
to volunteer to serve as discussants.
*Link for
submissions: https://docs.google.com/forms/d/e/1FAIpQLSfpsWgM44dfn3HRrQVq3uGXstBvRN6rbuO…
<https://docs.google.com/forms/d/e/1FAIpQLSfpsWgM44dfn3HRrQVq3uGXstBvRN6rbuO…>*
In addition to oral presentations and posters, TADA 2023 will have a
doctoral consortium. We have limited funding to cover travel and lodging
expenses for PhD students, who will be matched with experienced mentors
from complementary fields to offer critiques to specific work and to
provide guidance in how to do effective interdisciplinary work.
Send questions for organizers at: info(a)tada2023.org.
Thank you,
Heather Froehlich
--
Dr Heather Froehlich
w // http://hfroehli.ch
t // @heatherfro
[View Less]
Hi all, I'm doing analysis on a corpus on tweets from institutions. Regarding analysis of n-grams, it is quite unusual in that there are many repeated exact tweets, or very similar tweets, leading to long super strings of often 9 or 10 or more words together. Naturally this makes accurate counting and classifying difficult due to the overlapping substrings. Does anyone know of any approaches or software which can count and classify n-grams in such circumstances? I am aware of approaches …
[View More]outlined by Buerki (2017) and O'Donnell (2011), but these do not seem practical due to the excessive length of the n-grams in the corpus. Does anyone know of any accessible methods or packages?
Any input much appreciated.
[View Less]