Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content

Sai Kartheek Reddy Kasu, Shankar Biradar, Sunil Saumya

2025-03-21

Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging
Fabricated Claims with Humorous Content

Summary

This paper introduces a new collection of funny comments that are based on fake news and made-up information. This collection spans multiple languages to allow for better research.

What's the problem?

In today's world, fake news is a big problem, and it's important to understand how humor can be used to spread or disguise it.

What's the solution?

The researchers created a set of funny comments based on fake news and labeled each comment with a level of satire, and a category of humor to help researchers study the connection between humor and deception.

Why it matters?

This collection can help researchers study how humor affects people's understanding of fake news, and how to detect deceptive humor in different languages.

Abstract

This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.

View Paper