DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents
Jiahao Zhao, Shaoxuan Xu, Zhongxiang Sun, Fengqi Zhu, Jingyang Ou, Yuling Shi, Chongxuan Li, Xiao Zhang, Jun Xu
2026-02-11
Summary
This paper focuses on making search agents, which use large language models to find information, faster and more effective. It explores how a newer type of language model called a diffusion large language model (dLLM) can be used to improve these agents.
What's the problem?
Currently, search agents built using a common method called 'ReAct' are slow because they have to complete each step – thinking, using a tool like a search engine, and waiting for the tool’s response – one after another. This creates a delay. Also, existing dLLMs aren't very good at the complex reasoning and tool use that search agents require, limiting their potential.
What's the solution?
The researchers developed a framework called DLLM-Searcher. They improved the dLLM’s ability to reason and use tools through a two-step training process involving supervised fine-tuning and a technique to reduce variations during training. They also created a new approach called 'Parallel-Reasoning and Acting' (P-ReAct) which allows the model to continue thinking and planning while waiting for tools to respond, essentially working on multiple parts of the problem at the same time.
Why it matters?
This work is important because it addresses a key bottleneck in search agent performance – speed. By leveraging the strengths of dLLMs and introducing P-ReAct, the researchers achieved comparable performance to existing agents but with a significant speed increase of around 15%. This makes these agents more practical for real-world applications where quick responses are crucial.
Abstract
Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Meanwhile, despite the rapid advancement of Search Agents, their practical deployment is constrained by a fundamental limitation, termed as 1) Latency Challenge: the serial execution of multi-round reasoning, tool calling, and tool response waiting under the ReAct agent paradigm induces severe end-to-end latency. Intuitively, dLLMs can leverage their distinctive strengths to optimize the operational efficiency of agents under the ReAct agent paradigm. Practically, existing dLLM backbones face the 2) Agent Ability Challenge. That is, existing dLLMs exhibit remarkably weak reasoning and tool-calling capabilities, preventing these advantages from being effectively realized in practice. In this paper, we propose DLLM-Searcher, an optimization framework for dLLM-based Search Agents. To solve the Agent Ability Challenge, we design a two-stage post-training pipeline encompassing Agentic Supervised Fine-Tuning (Agentic SFT) and Agentic Variance-Reduced Preference Optimization Agentic VRPO, which enhances the backbone dLLM's information seeking and reasoning capabilities. To mitigate the Latency Challenge, we leverage the flexible generation mechanism of dLLMs and propose a novel agent paradigm termed Parallel-Reasoning and Acting P-ReAct. P-ReAct guides the model to prioritize decoding tool_call instructions, thereby allowing the model to keep thinking while waiting for the tool's return. Experimental results demonstrate that DLLM-Searcher achieves performance comparable to mainstream LLM-based search agents and P-ReAct delivers approximately 15% inference acceleration. Our code is available at https://anonymous.4open.science/r/DLLM-Searcher-553C