The modern developer experience with AI agents is characterized by a widening gap between execution speed and observability. When a tool like Claude Code operates, it moves with a velocity that far outpaces a human's ability to monitor it in real time. The agent executes commands, calls tools, and iterates through complex file structures in seconds, leaving behind a trail of .jsonl files. These logs are exhaustive, capturing every turn, every tool invocation, and every token spent. Yet, for most engineers, these files are effectively write-only data. The sheer volume of a single session can easily exceed 4,000 lines of raw JSON, making the process of diagnosing a sudden production environment access or a spike in context window usage a grueling exercise in manual pattern matching.

The Architecture of Agent Observability

Her, a tool whose name is derived from the Marathi word for detective, transforms these dense JSON archives into human-readable intelligence. The workflow begins when a user uploads a session file to the Her interface. Rather than simply reformatting the text, Her reconstructs the narrative of the session, stripping away the structural noise of JSON to isolate the agent's intent and the actual outcomes of its actions. This process is specifically tuned to identify high-risk behaviors that often hide in plain sight within massive logs. The tool automatically flags the execution of deployment tools, modifications to production environment settings, and the accidental exposure of sensitive secrets. Each flag is not a vague warning but a precise pointer linked to the exact turn in the log, allowing developers to perform immediate cross-verification without scrolling through thousands of lines of code.

Beyond security, Her provides a granular audit of resource consumption. It tracks exactly how many tokens are consumed by specific tools, sub-agents, skills, and MCP (Model Context Protocol) servers. In complex agentic workflows, it is common for a single inefficient sub-agent to consume half of the total execution time and budget. Her identifies these bottlenecks with mathematical precision, turning the log from a debugging artifact into a quantitative basis for efficiency measurement. This transforms the tool into an audit log for enterprise environments where managing the operational cost of AI agents is as critical as the code they produce. By pinpointing exactly where tokens are wasted, developers can optimize their infrastructure costs and refine their agent prompts based on empirical data.

Solving the Hallucination Problem in Auditing

One of the most significant risks in using AI to analyze AI is the introduction of probabilistic errors. If an analysis tool provides a token count or a risk assessment that changes every time the page is refreshed, it becomes useless for auditing. Her solves this by implementing a strict decoupling of data extraction and linguistic generation. The system utilizes a dual-engine structure where a rule-based deterministic engine handles all numerical calculations and log analysis. This engine extracts the hard facts from the .jsonl file using fixed logic that does not rely on probability.

Once the deterministic engine has established the facts, the Nemotron-Mini-4B-Instruct model takes over. The role of the LLM is limited strictly to translating these pre-verified facts into natural English sentences. Because the core analysis happens outside the model's probabilistic generation process, the integrity of the data remains intact regardless of which model is used for the final report. This architecture ensures that the numbers never hallucinate, providing a level of reliability essential for security audits.

The underlying infrastructure is hosted on Hugging Face Space using a ZeroGPU environment. The frontend is a React application served via a Gradio server. When a file is uploaded, the Gradio server triggers the deterministic engine first, followed by the Nemotron model, which then pushes the final natural language reconstruction to the React frontend. The choice of a lightweight 4B parameter model ensures low inference latency and reduced operational costs without sacrificing the quality of the narrative report.

Security is baked into the deployment through a closed-loop design that eliminates external API calls. All analysis occurs within the allocated GPU infrastructure, meaning no data is sent to third-party AI providers. Session files are uploaded to private, volatile namespaces assigned to each user, which are automatically purged once the session ends. This design allows enterprises to analyze logs containing internal configurations or secrets without the risk of data leakage to external model providers.

From Static Logs to Interactive Intelligence

For too long, the cost of generating logs has been nearly zero, while the cost of reading them has been prohibitively high. Developers have been forced to manually track turns to understand why an agent chose a specific path or why a certain tool was invoked. Her shifts this paradigm from static reading to interactive analysis. Through the Ask Her copilot, users can query the session record using natural language. If a developer asks why a specific tool was used at a certain juncture, the copilot provides an answer based strictly on the session evidence and provides a direct link to the exact turn where the action occurred. This replaces the tedious process of keyword searching with a conversational interface.

This analytical capability extends beyond single sessions into a project-wide view. By uploading multiple session files, users can analyze the broader context of a project. This allows for the detection of recurring failure patterns or inefficient tool-calling paths that span across multiple interactions. Fragmented data from individual sessions is thus integrated into a cohesive project-level dataset, enabling a higher level of systemic optimization.

To further reduce the friction of analysis, Her incorporates a built-in database of CLI tools from Homebrew, npm, and PyPI. When an agent executes a command using a tool the developer may not be familiar with, Her automatically matches the tool name against its internal database to provide a concise description of its function. This eliminates the need to leave the analysis environment to search external documentation, ensuring that the debugging flow remains uninterrupted.

Ultimately, the transition to transparent agent operations is a prerequisite for moving AI agents from experimental scripts to production-grade software. By visualizing the opaque reasoning process and quantifying the cost of every action, Her lowers the economic and psychological barriers to agent adoption. The ability to audit an agent's behavior with deterministic certainty allows organizations to delegate higher levels of authority to AI while maintaining a rigorous safety net. The result is a significant reduction in debugging time and a clear path toward token cost optimization, ensuring that the efficiency gains of AI agents are not erased by the costs of managing them.