D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning

Sai Kartheek Reddy Kasu, Mohammad Zia Ur Rehman, Shahid Shafi Dar, Rishi Bharat Junghare, Dhanvin Sanjay Namboodiri, Nagendra Kumar

2025-09-09

D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning

Summary

This paper focuses on the difficulty computers have understanding dark humor, specifically in memes found online. It introduces a new collection of memes labeled for dark humor, what the humor is *about* (like gender or mental health), and how strong the dark humor is, and then proposes a new method for computers to detect this type of humor.

What's the problem?

Dark humor is tricky for computers because it relies on understanding things that aren't directly stated – things like social context, sensitive topics, and shared cultural knowledge. Existing tools aren't very good at recognizing it in memes, which combine images and text, because they struggle to grasp the subtle cues needed to 'get' the joke. There simply weren't enough labeled examples of dark humor memes available for researchers to build and test better systems.

What's the solution?

The researchers created a dataset of over 4,000 memes specifically labeled for dark humor. Then, they built a system that works in a few steps. First, a powerful AI model tries to *explain* the meme, almost like it's trying to figure out the joke itself, and it refines this explanation by considering what the original creator might have been thinking. Next, the system analyzes the text in the meme, the image, and the AI’s explanation, combining all this information to determine if the meme contains dark humor, what it's targeting, and how intense it is. They use a special network that pays attention to how these different pieces of information relate to each other.

Why it matters?

This work is important because it helps us build better tools for understanding online content. This is useful for things like content moderation – identifying potentially harmful or offensive memes – and for improving AI’s ability to understand human communication in general. By releasing the dataset and code, the researchers are helping other scientists build on their work and create even more sophisticated systems for understanding humor and other complex forms of expression online.

Abstract

Dark humor in online memes poses unique challenges due to its reliance on implicit, sensitive, and culturally contextual cues. To address the lack of resources and methods for detecting dark humor in multimodal content, we introduce a novel dataset of 4,379 Reddit memes annotated for dark humor, target category (gender, mental health, violence, race, disability, and other), and a three-level intensity rating (mild, moderate, severe). Building on this resource, we propose a reasoning-augmented framework that first generates structured explanations for each meme using a Large Vision-Language Model (VLM). Through a Role-Reversal Self-Loop, VLM adopts the author's perspective to iteratively refine its explanations, ensuring completeness and alignment. We then extract textual features from both the OCR transcript and the self-refined reasoning via a text encoder, while visual features are obtained using a vision transformer. A Tri-stream Cross-Reasoning Network (TCRNet) fuses these three streams, text, image, and reasoning, via pairwise attention mechanisms, producing a unified representation for classification. Experimental results demonstrate that our approach outperforms strong baselines across three tasks: dark humor detection, target identification, and intensity prediction. The dataset, annotations, and code are released to facilitate further research in multimodal humor understanding and content moderation. Code and Dataset are available at: https://github.com/Sai-Kartheek-Reddy/D-Humor-Dark-Humor-Understanding-via-Multimodal-Open-ended-Reasoning

View Paper