AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models
Jiale Cheng, Yida Lu, Xiaotao Gu, Pei Ke, Xiao Liu, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang
2024-06-25

Summary
This paper introduces AutoDetect, a new framework designed to automatically find weaknesses in large language models (LLMs) like ChatGPT. It aims to improve these models by identifying specific areas where they struggle.
What's the problem?
Even though LLMs are getting better at tasks, they still make mistakes, especially with instructions or coding. These errors can have serious consequences when the models are used in real-world applications. Traditional methods for testing these models either miss important weaknesses or are too expensive and slow to use effectively.
What's the solution?
The authors developed AutoDetect, which uses three types of agents—Examiner, Questioner, and Assessor—to work together and identify weaknesses in LLMs. This system is inspired by how schools assess student performance and can automatically find flaws in the models. The framework has been successful in detecting issues in well-known models like ChatGPT and Claude, achieving an identification success rate of over 30%. The weaknesses found can then be used to make targeted improvements to the models, leading to better overall performance.
Why it matters?
This research is important because it provides a systematic way to uncover and address weaknesses in LLMs, making them more reliable for practical use. By improving how these models perform, AutoDetect can help enhance applications that rely on language models, such as chatbots and coding assistants, ultimately leading to better user experiences.
Abstract
Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences in practical deployments, it is crucial to investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies, while manual inspections are costly and not scalable. In this paper, we introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks. Inspired by the educational assessment process that measures students' learning outcomes, AutoDetect consists of three LLM-powered agents: Examiner, Questioner, and Assessor. The collaboration among these three agents is designed to realize comprehensive and in-depth weakness identification. Our framework demonstrates significant success in uncovering flaws, with an identification success rate exceeding 30% in prominent models such as ChatGPT and Claude. More importantly, these identified weaknesses can guide specific model improvements, proving more effective than untargeted data augmentation methods like Self-Instruct. Our approach has led to substantial enhancements in popular LLMs, including the Llama series and Mistral-7b, boosting their performance by over 10% across several benchmarks. Code and data are publicly available at https://github.com/thu-coai/AutoDetect.