MinorBench: A hand-built benchmark for content-based risks for children

Shaun Khoo, Gabriel Chua, Rachel Shong

2025-03-14

MinorBench: A hand-built benchmark for content-based risks for children

Summary

This paper introduces MinorBench, a tool to evaluate how well AI language models avoid unsafe or inappropriate content when interacting with children.

What's the problem?

Large Language Models (LLMs) are increasingly used by children, but current safety measures don't adequately address the specific risks children face, such as exposure to harmful or inappropriate content.

What's the solution?

The researchers created MinorBench, a set of tests designed to assess how well LLMs can refuse unsafe or inappropriate questions from children. They tested several popular LLMs and found significant differences in their ability to protect children.

Why it matters?

This work matters because it highlights the need for AI systems to be specifically designed to protect young users and provides a tool to evaluate and improve their safety.

Abstract

Large Language Models (LLMs) are rapidly entering children's lives - through parent-driven adoption, schools, and peer networks - yet current AI ethics and safety research do not adequately address content-related risks specific to minors. In this paper, we highlight these gaps with a real-world case study of an LLM-based chatbot deployed in a middle school setting, revealing how students used and sometimes misused the system. Building on these findings, we propose a new taxonomy of content-based risks for minors and introduce MinorBench, an open-source benchmark designed to evaluate LLMs on their ability to refuse unsafe or inappropriate queries from children. We evaluate six prominent LLMs under different system prompts, demonstrating substantial variability in their child-safety compliance. Our results inform practical steps for more robust, child-focused safety mechanisms and underscore the urgency of tailoring AI systems to safeguard young users.

View Paper