The Opus and Haiku Hierarchical Design Reducing LLM Costs by 50%

Modern DevOps pipelines are drowning in data. In a typical high-scale environment, thousands of Continuous Integration (CI) logs stream in every day, creating a terabyte-scale mountain of telemetry that must be parsed to find the needle of a critical failure. For a long time, the industry standard for automating this was brute force: feed the entire log into the most powerful Large Language Model (LLM) available and hope for a correct diagnosis. But as the volume of data scales, this approach becomes a financial liability, turning a productivity tool into a massive cost center.

The Triager Pattern and Model Tiering

To solve the cost-performance trade-off, a new architectural approach leverages a hierarchical agent design using Anthropic's model family. The system implements a Triager pattern that splits the workload between Claude Opus, the high-reasoning frontier model, and Claude Haiku, the lightweight, high-speed model. The logic is rooted in a simple observation of CI failures: approximately 80% of all failures are either known issues already being tracked or simple duplicate errors.

Previously, teams relied on Claude Sonnet (version 4.0) as a middle-ground solution to categorize these errors. However, the cost-to-performance ratio remained inefficient. The new architecture replaces this with a strict tiering system where Haiku acts as the first line of defense. Haiku reads the incoming logs and searches for existing error signatures to determine if the problem is a duplicate. If the issue is recognized, the process terminates immediately. Only when Haiku identifies a genuinely new or complex failure is the task escalated to Opus 4.6. By ensuring that 80% of failures never reach the most expensive model, the system maintains a lower operational cost than the previous Sonnet 4.0 environment, despite utilizing the more powerful Opus model for the final analysis.

Decoupling Reasoning from Data Retrieval

The true shift in this architecture is not just the choice of models, but how those models interact with data. Traditionally, developers would include the entire log file within the prompt. This creates two problems: massive token consumption and a cognitive phenomenon known as anchoring. When an LLM is presented with a massive block of raw text, it often fixates on the first few error messages it sees, leading to biased judgments and missed root causes.

To eliminate this, the agents no longer read raw logs directly. Instead, they interact with ClickHouse, an open-source columnar database designed for real-time analytical processing. The agents use a SQL interface to query only the specific data points they need. By utilizing materialized views, the agents can first analyze high-level metrics, such as failure rates and execution times, before deciding which specific log lines to retrieve. If the SQL results are too broad, the agent iteratively refines its query or uses the GitHub CLI to pull additional context from the repository.

This separation of concerns extends to the relationship between Opus and Haiku. In this design, Opus never touches the raw logs. Instead, Opus functions as the orchestrator, formulating hypotheses and spawning short-lived Haiku sub-agents to execute specific investigative tasks. For instance, if Opus suspects a build failure is related to a specific dependency conflict, it commands a Haiku agent to extract only the relevant error messages. Once the Haiku agent returns the result, it is immediately destroyed.

This prevents the context window of Opus from being polluted by thousands of lines of irrelevant log noise, preserving its reasoning capabilities for the actual diagnosis. By limiting the depth of sub-agent generation to a single level, the system prevents recursive cost spirals. The data reveals the efficiency of this split: Haiku processes 65% of all input tokens but accounts for only 36% of the total cost. This orchestration has successfully slashed overall operational expenses by 50%.

Intelligence in an AI system is no longer a function of model size, but a result of how effectively a complex problem is decomposed and assigned to the right tool.

The Opus and Haiku Hierarchical Design Reducing LLM Costs by 50%

The Triager Pattern and Model Tiering

Decoupling Reasoning from Data Retrieval

Related Articles