Stanford's DeLM Slashes Multi-Agent Costs by 50% Without a Controller

AI engineers are currently hitting a ceiling known as the orchestrator bottleneck. As teams scale their multi-agent workflows to handle more complex reasoning, they typically introduce a central manager model to delegate tasks and synthesize results. While this provides a sense of control, it introduces a punishing token tax. Every single interaction must route through the central hub, causing prompt lengths to balloon and latency to spike. The industry has largely accepted this overhead as the price of coherence, believing that without a central brain, a swarm of agents would inevitably descend into chaos.

The Architecture of Decentralization

Stanford University has challenged this assumption with the introduction of the Decentralized Language Model, or DeLM. Rather than relying on a top-down hierarchy, DeLM implements a system where agents coordinate through a shared environment. The framework is built around two primary mechanisms: a shared context and a task queue. The task queue holds a set of pending sub-tasks that agents can claim independently, while the shared context acts as a curated repository for verified findings, partial results, and documented failure cases.

The operational pipeline follows a rigorous sequence to ensure accuracy without central oversight. It begins with initialization, followed by parallel execution where agents independently process tasks from the queue while reading the shared context. To prevent the shared context from becoming cluttered with noise, DeLM utilizes a process of compression and verification. Agents condense their findings into Gists, which are high-density summaries of essential information. Only Gists that pass an evidence-based verification step are committed to the shared context for other agents to use. The process concludes when a final agent determines if further work is required before returning the ultimate answer.

This decentralized approach was tested across a diverse array of high-performance model families, including GPT-5.4, Claude Sonnet, Gemini Flash, and DeepSeek-V4-Pro. In the LongBench-v2 Multi-Doc QA benchmark, which evaluates the ability to handle long-context problems in real-world scenarios, DeLM achieved the highest accuracy across all four model families. By allowing agents to navigate complex multi-document queries independently, the system proved more precise than traditional single-model product families.

Breaking the Orchestrator Myth

The prevailing wisdom in agentic design is that a high-intelligence central controller is mandatory for complex problem solving. DeLM proves that this is a misconception. By shifting the focus from control to context management, the framework achieves superior results with significantly fewer resources. In evaluations using SWE-bench Verified, which measures the ability to solve real-world software engineering issues, DeLM outperformed the strongest existing baseline by 10.5 percent. More importantly, it achieved this performance gain while reducing the cost per task by approximately 50 percent.

This efficiency stems from the elimination of the routing overhead. In a centralized system, the orchestrator must re-process the history of every sub-agent to maintain a global view, leading to exponential token growth. DeLM replaces this with an unfolding structure, where agents share Gists and only expand into detailed content when absolutely necessary. This architecture is particularly potent for test-time scaling, where models are given more computational time to reason through a problem. In scenarios like simultaneous debugging, where multiple errors must be analyzed in parallel, DeLM maximizes reasoning performance without the linear cost increase associated with a manager model.

Furthermore, the system excels in multi-document QA tasks that require maintaining a global view while investigating fragmented clusters of evidence. Because agents can independently verify and commit Gists to the shared context, the system maintains a coherent global state without requiring a single model to hold the entire conversation history in its active window. The result is a system that scales its reasoning capabilities without scaling its bill.

The efficiency of a complex AI workflow is not determined by the intelligence of the model in charge, but by the precision of the shared context it leaves behind. The shift toward decentralized coordination suggests that the next leap in agentic productivity will come from optimizing how information is shared, not how it is commanded.

Stanford's DeLM Slashes Multi-Agent Costs by 50% Without a Controller

The Architecture of Decentralization

Breaking the Orchestrator Myth

Related Articles