Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
Will Cai, Tianneng Shi, Xuandong Zhao, Dawn Song
2025-04-08
Summary
This paper talks about catching sneaky tricks where AI companies might secretly swap expensive, high-quality AI models with cheaper ones, like buying a fancy phone but getting a knockoff version instead.
What's the problem?
People pay for top AI services but can't check if companies actually use the powerful models they advertise, since the tech works like a black box where you only see inputs and outputs.
What's the solution?
Researchers tested different detective methods like checking answer patterns, running special tests, and looking at hidden AI thinking data, while also exploring secure hardware that could prevent cheating.
Why it matters?
This helps ensure fairness in AI services we use daily, like chatbots and writing tools, so people get what they pay for and can trust AI benchmarks and comparisons.
Abstract
The proliferation of Large Language Models (LLMs) accessed via black-box APIs introduces a significant trust challenge: users pay for services based on advertised model capabilities (e.g., size, performance), but providers may covertly substitute the specified model with a cheaper, lower-quality alternative to reduce operational costs. This lack of transparency undermines fairness, erodes trust, and complicates reliable benchmarking. Detecting such substitutions is difficult due to the black-box nature, typically limiting interaction to input-output queries. This paper formalizes the problem of model substitution detection in LLM APIs. We systematically evaluate existing verification techniques, including output-based statistical tests, benchmark evaluations, and log probability analysis, under various realistic attack scenarios like model quantization, randomized substitution, and benchmark evasion. Our findings reveal the limitations of methods relying solely on text outputs, especially against subtle or adaptive attacks. While log probability analysis offers stronger guarantees when available, its accessibility is often limited. We conclude by discussing the potential of hardware-based solutions like Trusted Execution Environments (TEEs) as a pathway towards provable model integrity, highlighting the trade-offs between security, performance, and provider adoption. Code is available at https://github.com/sunblaze-ucb/llm-api-audit