Why D&B Rebuilt Its 642 Million Record Database for AI Agents

A developer building a supply chain automation tool recently hit a wall that no amount of prompt engineering could fix. The goal was simple: have an AI agent identify real-time risk factors and complex corporate ownership structures across a global network. However, the agent repeatedly failed to call the correct data or struggled with ambiguous entity matching. The problem was not the model's reasoning capability, but the architecture of the data it was querying. Most commercial databases were built for humans—analysts who possess the patience to wait for a slow SQL query to return and the intuition to manually correct a near-match. AI agents possess neither. They operate in a world of sub-second latency requirements and binary precision, where a fragmented data architecture is not just an inconvenience, but a fatal bottleneck.

The Scale of 642 Million Records and 100 Billion Quality Checks

Dun & Bradstreet (D&B) manages one of the most extensive corporate datasets in existence, with a history spanning over 180 years. Today, that repository contains 642 million company records. The growth has been aggressive; five years ago, the count sat at approximately 300 million, meaning the database has nearly doubled in size in a short window. This is not merely a horizontal expansion of entries, but a vertical deepening of data. Each individual record contains up to 11,000 fields, creating a level of data density that allows for extreme precision in corporate analysis.

Maintaining this scale requires an industrial-grade validation engine. D&B performs roughly 100 billion data quality checks every month as records move through the system. This rigorous verification is why 200,000 customers worldwide rely on the platform for credit scoring and risk management. However, the underlying structure—the Commercial Graph used to visualize relationships and risk profiles—was designed for the human eye. A credit analyst can tolerate a few seconds of lag and can use cognitive heuristics to resolve an entity that is almost, but not quite, the right company. Humans bridge the gap of imperfect data with intuition.

AI agents cannot bridge that gap. For an agent, the existing system was a collection of fragmented silos, each built for different markets or use cases and stitched together with custom integrations. While a human analyst could navigate this fragmentation using SQL (Structured Query Language) or a predefined interface, the agent found these structures to be impenetrable barriers. The most critical failure point was latency. In an agentic workflow, responses must be delivered in sub-second intervals to maintain the chain of thought and execution. The legacy fragmented architecture created bottlenecks that made such speeds physically impossible when querying 642 million records.

To solve this, D&B undertook a fundamental redesign, migrating its fragmented databases to a cloud-native infrastructure and completely rewriting the underlying schemas. They implemented a Data Fabric layer—an architecture that virtualizes distributed data for unified management—to standardize records across different markets while adhering to local compliance laws. The result is a single, unified Knowledge Graph that tracks 642 million companies and billions of interconnections. This was not a simple migration of storage, but a transformation of the data's essence into a format that machines can read and interpret instantaneously.

From Static Links to Dynamic Networks via MCP and A2A

Legacy corporate graphs relied on static connections, such as a simple link between a CEO and a company. For a human analyst, this snapshot was sufficient. AI agents, however, require dynamic relationship networks that can track the history of personnel movements and evolving corporate hierarchies in real time. The transition to a cloud-based Data Fabric allows the system to normalize market-specific records and track these shifts as they happen, providing the foundation for a living Knowledge Graph.

To achieve the required sub-second latency, D&B abandoned direct SQL access for its agents. Querying 642 million records with 11,000 fields per record via traditional SQL in a fragmented environment is too slow for real-time agentic loops. Instead, they introduced a structured access layer based on the Model Context Protocol (MCP). MCP does more than just deliver data; it packages information with the necessary context and provides the agent with a specific set of tools and skills to route itself to the optimal record. Behind every query, an Entity Resolution engine operates to ensure the agent identifies the exact, verified entity rather than a similar-sounding company, effectively eliminating a primary source of hallucinations.

The challenge extends beyond a single agent's search capabilities to the problem of identity persistence in multi-agent workflows. In a typical enterprise chain, a credit-check agent, a KYC (Know Your Customer) agent, and a third-party risk agent must work in sequence. If the credit agent references Company A, but the risk agent accidentally shifts to a similarly named Company B, the entire workflow collapses. To prevent this divergence, D&B implemented a business verification agent based on Google's A2A (Agent-to-Agent) protocol. This agent acts as a digital handshake, serving as a persistent reference point regardless of the orchestration tool being used. It ensures that every agent in the chain is referencing the exact same entity, allowing the workflow to complete without human intervention.

This reliability is further reinforced by a new identity model called KYA (Know Your Agent). Traditional authentication is designed for humans, but agents require a different framework. Under the KYA model, an agent must be mapped to a verified IP address and registered with a unique access key to be recognized as an authenticated identity. This process defines not just who the agent is, but which organization it belongs to and exactly which data permissions it holds.

Finally, D&B addressed the issue of trust through the implementation of Data Lineage. In high-stakes environments like supply chain finance or credit scoring, a probabilistic answer from an LLM is unacceptable; an error can lead to direct financial loss. Enterprise agents must provide the exact location of the dataset used to reach a conclusion. D&B's infrastructure now ensures that every answer generated by an agent includes a traceable path back to the original source. This lineage system is not a superficial guardrail added at the end, but a core part of the infrastructure that allows users to verify the origin of the data with a single click.

This level of agentic infrastructure is only possible when the foundational data work is complete. Many Chief Data Officers (CDOs) and Chief Information Officers (CIOs) currently find that their AI initiatives are stalled by non-standardized, fragmented data. When data is not normalized, agents produce ambiguous entity matches, which inevitably lead to hallucinations. Without a unified data fabric, layers like KYA or data lineage cannot function. The core of agent-ready data infrastructure is not about expanding storage, but about securing a refined, machine-readable structure that can be verified in real time.

Why D&B Rebuilt Its 642 Million Record Database for AI Agents

The Scale of 642 Million Records and 100 Billion Quality Checks

From Static Links to Dynamic Networks via MCP and A2A

Related Articles