LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context
Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
2025-11-05
Summary
This paper introduces LiveSecBench, a new way to test how safe Chinese language AI models, specifically large language models, are when used in real-world applications.
What's the problem?
Existing tests for AI safety often don't focus on the specific rules, laws, and cultural norms of China. This means an AI model that seems safe in one country might be unsafe or problematic when used in China, potentially leading to legal issues, ethical concerns, or the spread of misinformation. There's a need for a benchmark designed specifically for the Chinese context that also keeps up with new risks as AI technology evolves.
What's the solution?
The researchers created LiveSecBench, which tests AI models in six key areas: whether they follow the law, if they're ethical, if they provide accurate information, if they protect privacy, how well they resist being tricked into harmful responses, and if their reasoning is safe. Importantly, LiveSecBench isn't a one-time test; it's designed to be constantly updated with new threats and challenges, like safety issues related to AI creating images or acting as an agent. They’ve already tested 18 different models and made the results public.
Why it matters?
This work is important because it provides a way to measure and improve the safety of AI models used in China. By identifying weaknesses in these models, developers can build more responsible and trustworthy AI systems that align with Chinese laws and societal values. The continuously updated nature of the benchmark ensures it remains relevant as AI technology advances and new risks emerge, helping to prevent potential harm.
Abstract
In this work, we propose LiveSecBench, a dynamic and continuously updated safety benchmark specifically for Chinese-language LLM application scenarios. LiveSecBench evaluates models across six critical dimensions (Legality, Ethics, Factuality, Privacy, Adversarial Robustness, and Reasoning Safety) rooted in the Chinese legal and social frameworks. This benchmark maintains relevance through a dynamic update schedule that incorporates new threat vectors, such as the planned inclusion of Text-to-Image Generation Safety and Agentic Safety in the next update. For now, LiveSecBench (v251030) has evaluated 18 LLMs, providing a landscape of AI safety in the context of Chinese language. The leaderboard is publicly accessible at https://livesecbench.intokentech.cn/.