< Explain other AI papers

QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety

Taegyeong Lee, Jeonghwa Yoo, Hyoungseo Cho, Soo Yong Kim, Yunho Maeng

2025-06-17

QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety

Summary

This paper talks about QGuard, a safety method designed to protect large language models (LLMs) from harmful prompts, including those that use both text and images. QGuard works by asking carefully designed questions to detect and block dangerous or malicious inputs without needing to retrain the model itself.

What's the problem?

The problem is that as large language models become more powerful, bad actors can create harmful prompts that trick the models into generating unsafe or inappropriate content. These attacks can involve not only text but also images or a combination of both, making it hard to guard against them effectively. Existing safety methods often require retraining the model or can't handle multi-modal (text plus image) harmful prompts well.

What's the solution?

The solution QGuard offers is to use question prompting to check inputs in a zero-shot setting, meaning it can protect the model without any extra training. By asking different guard questions, it identifies harmful prompts as they come in and blocks them. This approach also works for multi-modal prompts and stays strong against new kinds of attacks by changing or expanding the guard questions, keeping the model safer without costly fine-tuning.

Why it matters?

This matters because it helps keep AI systems safe and trustworthy in real-world use. By effectively stopping harmful or malicious inputs quickly and without needing to retrain the AI, QGuard makes it easier to deploy large language models that handle both text and images safely. This improves the security and reliability of AI applications, protecting users and reducing risks from harmful content.

Abstract

QGuard, a safety guard method using question prompting, effectively defends LLMs against harmful and multi-modal malicious prompts without fine-tuning.