Locket: Robust Feature-Locking Technique for Language Models

Lipeng He, Vasisht Duddu, N. Asokan

2025-10-15

Locket: Robust Feature-Locking Technique for Language Models

Summary

This paper introduces Locket, a new system designed to let companies like OpenAI charge for specific abilities of their chatbots, like advanced math or coding, instead of just offering different subscription levels.

What's the problem?

Currently, chatbot companies make money through subscriptions, but they think it would be better to let people pay only for the features they want. To do this, they need a way to 'lock' certain features for paying customers and 'unlock' them when someone pays. Existing methods for doing this, like using passwords, aren't very secure or don't work well when you have many features and users. They're either easily bypassed or they make the chatbot worse overall.

What's the solution?

The researchers created Locket, which works by subtly changing the chatbot's internal workings to block unwanted features. It 'attaches' small modules, called adapters, to the main chatbot program. These adapters are designed to specifically refuse requests for locked features without negatively impacting the chatbot's performance on features the user *has* paid for. Locket is designed to be secure, so people can't easily trick it into unlocking features, and it can handle many features and users at the same time.

Why it matters?

Locket could change how chatbot companies make money. Instead of a one-size-fits-all subscription, users could pay for exactly what they need, potentially making chatbots more affordable and accessible while also increasing revenue for the companies providing them. It provides a more flexible and potentially profitable business model for advanced AI services.

Abstract

Chatbot providers (e.g., OpenAI) rely on tiered subscription schemes to generate revenue, offering basic models for free users, and advanced models for paying subscribers. However, a finer-grained pay-to-unlock scheme for premium features (e.g., math, coding) is thought to be more economically viable for the providers. Such a scheme requires a feature-locking technique (FLoTE) which is (i) effective in refusing locked features, (ii) utility-preserving for unlocked features, (iii) robust against evasion or unauthorized credential sharing, and (iv) scalable to multiple features and users. However, existing FLoTEs (e.g., password-locked models) are not robust or scalable. We present Locket, the first robust and scalable FLoTE to enable pay-to-unlock schemes. Locket uses a novel merging approach to attach adapters to an LLM for refusing unauthorized features. Our comprehensive evaluation shows that Locket is effective (100% refusal on locked features), utility-preserving (leq 7% utility degradation in unlocked features), robust (leq 5% attack success rate), and scales to multiple features and clients.

View Paper