The current enterprise AI gold rush is largely defined by a fixation on the surface. Most organizations spend their first year of adoption obsessing over which frontier model to license or how to craft the perfect system prompt to minimize hallucinations. They treat AI agents like standalone appliances, plugging them into existing workflows and hoping for a productivity miracle. However, for a global pharmaceutical giant operating at the scale of Merck, this one-off approach is a recipe for technical debt. The industry is discovering that the bottleneck for AI is no longer the intelligence of the model, but the friction of the data pipeline.

The High Cost of Regulatory Friction and Legacy Code

In the pharmaceutical industry, the gap between a creative marketing draft and a published campaign is often measured in months. This delay is not due to a lack of creativity, but the crushing weight of regulatory compliance. Every vaccine advertisement or drug brochure must be meticulously cross-referenced against a fragmented web of regional and national laws. A campaign targeting Georgia in the United States faces entirely different legal constraints than one targeting Canada. Historically, this required human reviewers to manually audit every line, and a single error discovered late in the process would trigger a recursive loop, sending the entire project back to the drafting stage.

Merck attacked this bottleneck by deploying AI agents capable of generating regulatory-compliant drafts with 99% accuracy. By shifting the human role from the primary creator to a final supervisor, the company accelerated the deployment speed of marketing materials by up to 80%, collapsing review cycles from months down to mere days. This shift represents a fundamental change in labor dynamics, where the AI handles the deterministic rigor of compliance while the human provides the strategic sign-off.

This efficiency extends deeper into the core of the business: drug discovery. The process of analyzing molecular structures and disease states to determine if a specific condition is druggable typically spans several years. By integrating AI agents directly into the research workflow, Merck reduced specific research cycle durations by 33%. In practical terms, this is equivalent to shortening the overall drug development timeline by one year. For a patient awaiting a life-saving treatment, this year of reclaimed time is the most critical metric of success. It allows researchers to move away from the drudgery of data aggregation and toward high-value hypothesis testing and strategic validation.

Beyond the lab and the marketing office, Merck applied this agentic approach to the most tedious part of IT: legacy modernization. Mapping the architecture of an old application and documenting its data interactions is typically a high-cost endeavor involving hundreds of thousands of dollars and months of manual auditing. Merck replaced this with a prompt-based agent system. These agents autonomously analyze API endpoints and network paths, verify authentication and authorization protocols, and map the hidden connections between complex systems. The agents do not just document; they execute. They write `Terraform` code for infrastructure as code and refactor legacy `JavaScript` code into `Python`, turning a multi-month modernization project into an automated background process.

The Multi-Cloud Plumbing and the A2A Framework

While the results are impressive, the secret to Merck's success is not the agents themselves, but what Sean Finnerty, Vice President of Digital Platforms, calls a plumbing-first strategy. Most companies build AI agents as isolated silos. Finnerty argues that adding agents one by one creates a fragmented ecosystem of technical debt that eventually halts innovation. Instead, Merck focused on the pipes—the underlying infrastructure that allows data to flow seamlessly between models and sources regardless of where they reside.

This is a massive logistical challenge. Merck operates an environment consisting of 2,500 AWS accounts, numerous Azure subscriptions, and a Google Cloud Platform (GCP) integration. Within this sprawl lie petabytes of structured and unstructured data, ranging from Oracle and SQL databases to scattered Excel sheets and audio recordings of phone calls. If an agent has to jump between an AWS console and an Azure portal to find a single piece of context, the latency and permission hurdles make the AI useless. Merck solved this by unifying data access paths, ensuring that the infrastructure is transparent to the agent.

To manage this, Merck utilizes Databricks and Amazon Redshift, integrating them through the Model Context Protocol (MCP) and an Agent-to-Agent (A2A) communication framework. MCP provides a standardized way for models to access external data, meaning that if a data source changes, the model does not need to be retrained or reconfigured. A2A allows specialized agents to collaborate, handing off tasks to one another based on their specific roles. This architecture ensures that developers are not locked into a single cloud provider; they can shift workloads between AWS and GCP based on which environment is optimal for the specific task. Only after this plumbing was installed could Merck safely register thousands of agents and maintain strict security controls over their toolsets.

To combat the persistent problem of hallucinations, Merck implemented a system of mutual surveillance. Rather than relying on a single prompt to ensure accuracy, they designed a cross-verification loop. For example, a result generated by Anthropic's Claude is sent to Microsoft Copilot for a secondary review. The system assigns a Confidence Score to the output; if the score falls below a certain threshold, the response is filtered out. By forcing the AI to question and review its own work through three iterative loops, Merck significantly reduced the amount of garbage data in the final output, ensuring that the agents do not suggest functions that do not exist in the actual code.

This architectural philosophy is being mirrored in other sectors, such as at Mastercard. In their experiments with transaction dispute automation, Mastercard uses a similar hybrid approach. Deterministic data, such as whether a card was reported lost, is handled by a rigid, rule-based agent. Probabilistic data, such as the nuance of a customer's complaint, is analyzed by a generative agent. By separating the logic based on the nature of the data and tying them together through a unified pipeline, they have automated complex dispute processes that previously required heavy human intervention.

When operating thousands of AI agents simultaneously, the primary bottleneck is rarely the inference speed of the model. Instead, the system fails at the point of data transport. The intelligence of the agent is capped by the quality of the plumbing that feeds it.

Scaling AI in the enterprise is no longer a challenge of choosing the smartest model, but of building the most robust delivery system. The capacity of the infrastructure determines the ultimate ceiling of the business's AI scalability.