Every developer using large language models today is familiar with the last-mile problem. You prompt an AI to generate a complex function, and it returns a block of code that is ninety percent correct. The remaining ten percent, however, consists of subtle logic flaws or hallucinated library methods that require a human to hunt down and fix manually. This cycle of AI generation followed by human correction has become the standard operating procedure for modern software engineering. But a fundamental shift is occurring in how these tools operate, moving the burden of correction from the human developer to the AI itself.
The Architecture of Non-Deterministic Loops
Boris Cherny, a key figure behind the development of Claude Code, observes that the industry is moving through three distinct evolutionary stages of code production. Two years ago, the primary driver was the human programmer writing source code. This transitioned into the current era where agents are tasked with writing the code based on human prompts. Now, we are entering a third phase: the agent loop, where one agent directs another agent to iterate on and refine the code until it reaches a specific standard of quality.
This transition is not merely a change in workflow but a departure from classical computer science. In traditional programming, a recursive loop is deterministic; it operates on a strict set of rules and must have a clearly defined stop condition to avoid a system crash. Agent loops, by contrast, are non-deterministic. There is no hard-coded exit condition. Instead, the sub-agent uses its own internal reasoning to determine when the output has reached a sufficient level of completion. The AI evaluates its own work, decides if it meets the goal, and chooses whether to terminate the loop or trigger another iteration.
In practical application, Cherny utilizes these loops to handle high-level architectural tasks that are typically too complex for a single prompt. He deploys agents that run continuously, with one agent dedicated to identifying ways to improve the overall code architecture and another searching for redundant abstractions that can be consolidated. These agents do not just suggest changes in a chat window; they operate like human contributors, submitting Pull Requests to the codebase. Because the environment is constantly evolving, the agents remain in a state of perpetual operation, refining the system in real-time.
Test-Time Compute and the Token Burn
This approach is a practical implementation of a concept known as test-time compute. While most LLM interactions focus on pre-training or fine-tuning to improve a model, test-time compute increases the amount of computation performed during the inference phase to solve a problem. Noam Brown, a researcher at OpenAI, recently noted that providing sufficient compute at the inference stage allows modern models to solve nearly any problem, provided the task can be framed as a hill-climbing exercise.
In the context of coding, hill-climbing means the model does not need to find the perfect solution in a single leap. Instead, it makes a small improvement, evaluates the result, and moves incrementally toward a higher peak of correctness. The agent loop is the engine for this process, allowing the model to bounce between drafting and reviewing until it hits a threshold of accuracy.
However, this leap in capability comes with a significant economic trade-off. A standard Q&A chatbot is a predictable cost center; one prompt leads to one response. An agent loop is an open-ended expenditure. Because the agents iterate until the problem is solved, token consumption accelerates exponentially. For the end user, this represents a massive increase in cost. For a company like Anthropic, which sells tokens, this structural shift creates a powerful incentive to encourage agentic workflows, as the volume of tokens consumed per task skyrockets.
There is also a technical risk known as context drift. When a model runs in a loop for too long, it can lose sight of the original objective, becoming trapped in a cycle of irrelevant refinements. To combat this, developers employ a technique called the Ralph Loop. Named after the character Ralph Wiggum, this method forces the model to periodically stop and summarize every action it has taken so far. The model then asks itself if the original goal has been achieved. If the answer is no, the model is bounced back into the execution phase with a refreshed understanding of its trajectory.
This bouncing mechanism ensures that the agent does not wander aimlessly through the codebase. By combining the raw power of test-time compute with a structured summary loop, the system can maintain accuracy even across hundreds of iterations. The result is a system that no longer requires a human to act as the primary debugger, replacing manual oversight with a self-correcting AI ecosystem.
The industry is now moving toward a critical evaluation of the return on investment for these loops. The primary question is no longer whether an agent loop can produce better code than a single prompt, but whether the marginal gain in software quality justifies the exponential increase in token costs.




