Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Hengli Li, Chenxi Li, Tong Wu, Xuekai Zhu, Yuxuan Wang, Zhaoxin Yu, Eric Hanchen Jiang, Song-Chun Zhu, Zixia Jia, Ying Nian Wu, Zilong Zheng

2025-05-20

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient
in Latent Space

Summary

This paper talks about LatentSeek, a new approach that helps large language models get better at solving problems by letting them adjust their thinking process for each specific question while they're actually working on it.

What's the problem?

The problem is that even though language models are trained on lots of data, they sometimes struggle with new or tricky questions because they can't easily adapt their reasoning to each unique situation in real time.

What's the solution?

To solve this, the researchers designed a method that allows the model to tweak its inner workings, or 'latent space,' for each question it faces, using a technique called policy gradient. This helps the model reason more effectively and get better results on different types of tests.

Why it matters?

This matters because it means AI can become more flexible and accurate when answering questions or solving problems, making it more useful for real-world situations where every question might be a little different.

Abstract

LatentSeek enhances LLM reasoning using test-time instance-level adaptation in latent space, improving performance across various benchmarks.

View Paper