Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA

Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Bin Islam

2025-05-15

Understanding and Mitigating Toxicity in Image-Text Pretraining
Datasets: A Case Study on LLaVA

Summary

This paper talks about a study that looks at harmful or toxic content in the LLaVA dataset, which is used to train AI models that understand both images and text, and explains how to reduce that toxicity to make the data safer.

What's the problem?

The problem is that large datasets used to train AI often contain toxic or inappropriate content, like hate speech or offensive images, which can cause the AI to learn and repeat harmful behaviors or ideas.

What's the solution?

The researchers carefully examined the LLaVA dataset to find out what kinds of toxic content were present and how common they were. They then developed and tested ways to filter out or reduce this harmful material, creating a cleaner and safer version of the dataset that anyone can use.

Why it matters?

This matters because safer training data leads to AI systems that are less likely to say or show harmful things, making them more trustworthy and appropriate for use in schools, businesses, and public spaces.

Abstract

A detailed analysis of toxicity in the LLaVA image-text pretraining dataset identifies common types of harmful content and proposes strategies to mitigate it, resulting in a refined open-source dataset.

View Paper