Most developers building AI agents spend their first week not on logic, but on plumbing. They wrestle with framework selection, struggle to connect model clients, write tedious tool adapters, and figure out how to pass UI states without breaking the session. This preparatory phase is a known bottleneck in the industry, where the actual intelligence of the agent is often sidelined by the sheer volume of boilerplate code required to make it functional. The friction is so high that many teams abandon complex agentic workflows before they even reach the prototyping stage.
The Architecture of the Configurable User-centric Agent Harness
IBM addresses this systemic inefficiency with the release of CUGA, the Configurable User-centric Agent harness. Rather than forcing developers to build the orchestration layer from scratch, CUGA absorbs the repetitive tasks of planning, execution loops, tool calling, and state management into the harness itself. This shift allows developers to focus exclusively on defining the tools the agent should use and the specific tasks it must perform. The entry barrier is intentionally low, requiring only a simple package installation to get started.
pip install cugaCUGA is implemented as a single-file structure based on FastAPI, which streamlines the transition from development to deployment. To accelerate adoption, IBM provides cuga-apps, a collection of 24 example applications ranging from movie recommendation engines to an IBM Cloud architecture advisor. These examples are available via a live gallery, allowing developers to see the agents in action before diving into the code. Because each app encapsulates the agent configuration, tools, and prompts within a single file, any developer familiar with FastAPI can modify the logic and deploy a customized agent almost instantly.
This architectural decision moves the burden of orchestration away from the developer and into the system. When a team decides to switch their underlying model, they no longer need to rewrite the control logic. CUGA handles the sequence of planning before action, mixing tool calls with generated code to minimize execution errors. The effectiveness of this approach is reflected in recent industry benchmarks. Between July 2025 and February 2026, CUGA secured the top position on the AppWorld benchmark, and it similarly ranked first on the WebArena benchmark from February to September 2025. These results suggest that systematic state management at the harness level is more effective for complex task completion than the traditional approach of aggressive prompt tuning.
CodeAct and the Shift Toward Self-Correcting Loops
The core technical differentiator in CUGA is its departure from simple inference-based tool calling. Instead of relying solely on the model's ability to predict the correct API call, CUGA employs a method called CodeAct. In this paradigm, the model does not just trigger a tool; it writes the actual code required to process the tool's output and determines the next step based on the execution result. This creates a tight loop where reasoning and execution are fused, allowing the model to verify its own output in real-time and pivot its strategy if the result is unexpected.
To solve the common problem of data loss during long-horizon tasks, IBM introduced Variable-tracking. This feature ensures that critical data is maintained across sequences of 20 steps or more, preventing the agent from falling into a loop of re-deriving the same information. This is paired with a Reflection stage. If the harness detects that an execution result is incorrect or fails to meet the goal, it triggers a re-planning phase. By referencing a list of variables recorded by the harness rather than forcing the model to remember every state in its context window, CUGA significantly reduces the cognitive load on the LLM.
Stability in these loops is maintained through a strict data contract for tool returns. CUGA requires all inline tools to return a standardized JSON envelope that explicitly distinguishes between success and failure.
{"ok": true, "data": {...}}
{"ok": false, "code": "...", "error": "..."}
When a tool returns a raw stack trace, the agent's plan typically collapses. However, by providing a standardized error message, the CUGA planner can recognize the failure, find a workaround, or correct the input parameters. This structural requirement transforms a simple error into a signal for recovery, drastically increasing the agent's resilience.
Achieving Frontier Performance with Open Weights
Because the harness handles the heavy lifting of state management and error recovery, CUGA does not require expensive frontier APIs to function at a high level. The system is optimized to run hosting apps using gpt-oss-120b, an open-weights model. This allows organizations to maintain higher levels of data privacy and reduce operational costs without sacrificing performance. Developers can toggle between three inference modes—Fast, Balanced, and Accurate—via a configuration file to optimize the trade-off between latency and precision, all without touching the core workflow code.
Execution safety is handled through a flexible sandbox environment. Developers can choose between Local, Docker/Podman, or E2B cloud environments. While Local is sufficient for initial development, Docker and E2B cloud provide the isolation necessary for production, ensuring that agent-generated code cannot compromise the host system. The integration with E2B cloud is particularly notable, as it provides an immediate code execution environment, freeing developers to spend their time on tool definition and prompt optimization rather than infrastructure management.
This distribution of labor—where the harness manages planning, reflection, and variable tracking—effectively offloads the computational burden from the model. While most agent frameworks rely on the inherent self-correction capabilities of models like GPT-4, CUGA implements these as system-level functions. This is why a relatively smaller open model can complete complex workflows that would typically require a massive proprietary model.
Expanding the Ecosystem via MCP and Multi-Agent Coordination
CUGA simplifies the integration of external capabilities by treating OpenAPI, the Model Context Protocol (MCP), and LangChain functions as uniform bindings. This means developers no longer need to write custom adapters for every new API; as long as the tool follows a standard specification, it can be plugged into the harness immediately.
Currently, IBM provides seven public MCP servers based on IBM Code Engine, offering a total of 36 tools. These include essential utilities for web search, Wikipedia access, arXiv paper retrieval, geocoding, weather data, and financial market quotes. These general-purpose tools can be integrated with a single line of code.
load_tools(["web"])This hybrid approach allows teams to pull stateless, general functions from shared MCP servers while defining specialized, proprietary business logic as internal Python functions. The system also extends into unstructured data processing through Docling-based RAG (Retrieval-Augmented Generation), enabling the agent to accurately parse and utilize content from PDFs, audio files, and videos.
For highly complex enterprise tasks, CUGA implements an A2A (Agent-to-Agent) architecture. Instead of one monolithic agent attempting to handle every variable, a primary agent can delegate specific sub-tasks to specialized agents. This modularity increases execution accuracy by breaking down a massive objective into smaller, manageable units of work, each handled by an agent optimized for that specific domain.
Implementation Standards for Production Environments
For practitioners moving toward deployment, CUGA provides a streamlined path via the create_llm factory function. This function automatically connects the agent to the appropriate model based on environment variables, ensuring that the code remains portable across different environments.
llm = create_llm()CUGA supports a wide array of providers, including OpenAI, Anthropic, watsonx, LiteLLM, and Ollama. The support for Ollama and LiteLLM is particularly critical for teams operating in on-premises environments or those seeking to minimize API costs. By simply changing an environment variable, a developer can switch from a commercial API to a local open model to benchmark performance differences without altering a single line of application logic.
One of the most practical features for developers is the reliance on Python docstrings for tool selection. Rather than manually constructing complex JSON schemas to describe a tool's purpose, the agent uses the natural language description written at the top of the Python function to determine if the tool is appropriate for the current task and how to map the arguments. This means that adding a new capability or refining an existing one is as simple as updating a comment in the code.
The ultimate value of CUGA lies in the speed of verification. The goal is no longer to find the largest possible model, but to determine how quickly a complex task can be validated using the most efficient model available. By providing a robust harness that handles the systemic risks of agentic workflows—such as state drift and execution failure—IBM has shifted the focus from model parameters to architectural reliability.
The era of spending a week on API plumbing is over. By integrating CodeAct execution with systemic reflection and variable tracking, CUGA allows developers to move from a concept to a functioning, self-correcting agent in a fraction of the time. The challenge for the modern AI engineer is no longer about scaling the model, but about leveraging a harness that ensures the model's output is actually actionable.



