BeyondGender: A Multifaceted Bilingual Dataset for Practical Sexism Detection

  • Xuan Luo
  • , Li Yang
  • , Han Zhang
  • , Geng Tu
  • , Qianlong Wang
  • , Keyang Ding
  • , Chuang Fan
  • , Jing Li
  • , Ruifeng Xu

Research output: Journal article publicationConference articleAcademic researchpeer-review

Abstract

Sexism affects both women and men, yet research often overlooks misandry and suffers from overly broad annotations that limit AI applications. To address this, we introduce BeyondGender, a dataset meticulously annotated according to the latest definitions of misogyny and misandry. It features innovative multifaceted labels encompassing aspects of sexism, gender, phrasing, misogyny, and misandry. The dataset includes 6.0K English and 1.7K Chinese sexism instances, alongside 13.4K non-sexism examples. Our evaluations of masked language models and large language models reveal that they detect misogyny in English and misandry in Chinese more effectively, with F1-scores of 0.87 and 0.62, respectively. However, they frequently misclassify hostile and mild comments, underscoring the complexity of sexism detection. Parallel corpus experiments suggest promising data augmentation strategies to enhance AI systems for nuanced sexism detection, and our dataset can be leveraged to improve value alignment in large language models.

Original languageEnglish
Pages (from-to)24750-24758
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number23
DOIs
Publication statusPublished - 11 Apr 2025
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 5 - Gender Equality
    SDG 5 Gender Equality

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'BeyondGender: A Multifaceted Bilingual Dataset for Practical Sexism Detection'. Together they form a unique fingerprint.

Cite this