On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

Yue Huang, Chujie Gao, Siyuan Wu, Haoran Wang, Xiangqi Wang, Yujun Zhou, Yanbo Wang, Jiayi Ye, Jiawen Shi, Qihui Zhang, Yuan Li, Han Bao, Zhaoyi Liu, Tianrui Guan, Dongping Chen, Ruoxi Chen, Kehan Guo, Andy Zou, Bryan Hooi Kuen-Yew, Caiming Xiong, Elias Stengel-Eskin, Hongyang Zhang

2025-02-20

On the Trustworthiness of Generative Foundation Models: Guideline,
Assessment, and Perspective

Summary

This paper talks about making sure AI models that can generate text, images, and other content (called Generative Foundation Models or GenFMs) are trustworthy and safe to use. It's like creating a report card system for these AI models to make sure they're behaving well and not causing problems.

What's the problem?

As these AI models become more powerful and widely used, there are worries about whether we can trust them. They might produce false information, be unfair to certain groups of people, or be used in harmful ways. It's hard to check if they're trustworthy because they can do so many different things.

What's the solution?

The researchers did three main things to solve this problem. First, they looked at laws and guidelines about AI from around the world and created a set of principles for making trustworthy AI. Then, they made a special testing system called TrustGen that can check different types of AI models in various ways, adapting as the AI improves. Finally, they talked about the challenges of making AI trustworthy and suggested ways to improve in the future.

Why it matters?

This matters because as AI becomes a bigger part of our lives, we need to make sure it's safe and reliable. By creating ways to test and improve AI trustworthiness, we can use these powerful tools more confidently in important areas like healthcare, education, and business. It's like making sure a new technology is safe before we start using it everywhere, so we can enjoy its benefits without worrying about potential harm.

Abstract

Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.

View Paper