A new method called Prompt Inversion from Logprob Sequences (PILS) recovers hidden prompts in language models by analyzing the low-dimensional subspace of the model's next-token probabilities, achieving higher recovery rates and better generalization than previous methods.

This paper talks about a new method called Prompt Inversion from Logprob Sequences (PILS), which recovers hidden prompts in language models by analyzing the model's next-word predictions in a compact way.

Better Language Model Inversion by Compactly Representing Next-Token Distributions

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract