The initial promise of the generative AI era was a drastic reduction in development costs. For many engineering teams, the early wins felt like magic: a well-tuned prompt or a successful fine-tuning run could replace weeks of manual coding. However, as these projects move from experimental notebooks to production environments, a harsh reality has set in. The cost of building a model is a one-time hurdle, but the cost of running that model at scale—the production inference phase—is a relentless, resource-heavy marathon. The industry is currently hitting a wall where the ability to innovate is no longer limited by algorithmic brilliance, but by the sheer availability of power and silicon.
The Shift Toward the AI Factory
AI services are rapidly migrating from the laboratory phase of model development to a production phase characterized by the continuous generation of massive token volumes. This shift has birthed the concept of the AI Factory, a specialized data center designed not just for storage or general computing, but for the industrial-scale execution of AI workloads. To operate these services commercially, providers require multi-tenant accelerated computing resources. This architecture allows multiple users to share high-performance compute resources in isolated environments, ensuring that response times remain low even as concurrent user counts spike. The economic pressure is now centered on two factors: the speed of deployment and the optimization of operational costs while maintaining high utilization rates.
Historically, building this level of infrastructure was a capital-intensive nightmare. The entry barrier was a vertical wall of upfront investment that excluded all but the wealthiest hyperscalers. NVIDIA is now dismantling this barrier by introducing a new computing access model tailored for startups, model builders, enterprises, research institutions, and regional AI players. By restructuring the economic incentives of infrastructure procurement, NVIDIA is allowing entities with limited liquid capital to access massive compute clusters. This move effectively shifts the financial risk away from the model builder or agent platform provider, allowing them to scale their services without the looming threat of bankruptcy caused by premature infrastructure over-investment.
This evolution mirrors the actual lifecycle of an AI model, which flows from initial training to post-training, fine-tuning, and finally, mass inference. The core of NVIDIA's new approach is providing commercial flexibility during the transition from a pilot project to a commercial product. As user bases grow, compute demand increases exponentially. In this environment, the ability to secure reliable accelerated computing resources has become the primary variable determining global AI competitiveness.
Financial Engineering as a Catalyst for Compute
When a single high-performance GPU costs thousands of dollars, scaling to a cluster of tens of thousands requires billions in initial capital. To solve this, NVIDIA has implemented a combination of revenue-sharing and credit-support models. Revenue-sharing allows the costs of infrastructure to be offset by a percentage of the actual revenue generated by the AI services running on that hardware. Credit-support provides the financial cushioning necessary to lower the initial procurement burden. Under this framework, AI cloud providers no longer shoulder the entire financial risk alone; instead, they align their economic interests with NVIDIA.
The operational mechanics function as a partnership-based service sales structure. An AI cloud provider procures NVIDIA infrastructure and sells it as a cloud service to AI-native companies or general enterprise clients. NVIDIA, in turn, moves beyond the traditional hardware vendor role. While they still earn standard product revenue from the sale, they now capture a portion of the cloud revenue generated by the supported compute capacity. This transforms the relationship from a simple transactional sale into a structural integration where NVIDIA's profits are directly tied to the utilization rates and service revenue of the hardware.
This strategy creates a recurring, usage-linked earnings stream for NVIDIA. Unlike a one-time hardware sale, this model ensures a steady flow of income that scales with the actual consumption of AI resources. As more customers perform computations and consume tokens, NVIDIA's revenue grows in tandem. This creates a virtuous cycle: the lower barrier to entry accelerates hardware adoption, which increases the total volume of AI workloads, which in turn drives further demand for more hardware.
For the AI cloud provider, this reduces the risk associated with infrastructure expansion. For NVIDIA, it secures market share and establishes a sustainable, long-term revenue model. By linking profit to actual usage metrics rather than just shipping boxes, NVIDIA has turned infrastructure procurement from a purchase problem into a business model design problem, effectively using financial engineering to accelerate the physical deployment of technology.
Bypassing the Physicality of Data Center Construction
To understand the impact of this shift, one must consider the traditional timeline of building a large-scale AI data center. The legacy process is a linear sequence of high-risk dependencies: selecting a site, negotiating power procurement, constructing the physical shell, and finally entering the bring-up phase where hardware is installed and configured. Power procurement is the most volatile variable, often involving years of negotiations with local grids, the installation of new substations, and grueling environmental impact assessments. For a startup, this timeline is an eternity, and the capital locked in during this period represents a massive financial risk.
NVIDIA's new model bypasses these physical bottlenecks by providing immediate access to full-stack accelerated computing. Instead of waiting for a building to be constructed, companies can tap into optimized compute resources where the hardware, networking, and software stacks are already integrated. This includes the pre-optimization of CUDA libraries and the underlying software environment, meaning the time between the decision to scale and the first actual computation is reduced from months or years to nearly zero.
This removes the temporal gap and capital risk that previously stifled AI-native firms. Many of these companies previously struggled with creditworthiness or lacked the collateral required for the massive loans needed to build GPU clusters. By replacing the upfront purchase requirement with revenue-sharing and credit support, NVIDIA has decoupled the ability to compute from the ability to build real estate. AI companies can now ignore the logistics of power grids and land zoning and focus entirely on the essential stages of development: training and inference.
Scaling to the 360MW AI Factory
Real-world applications of this model are already manifesting in massive deployments. Sharon AI, for instance, has opted for this path to bypass physical constraints, deploying up to 40,000 NVIDIA Grace Blackwell GB300 GPUs. The GB300, which integrates the CPU and GPU on a single board to maximize data transfer efficiency, provides a level of compute power that would be nearly impossible for a single entity to secure in a short window through traditional means. This enables the creation of Sovereign AI—independent computing infrastructure that allows a nation or region to operate without total reliance on external providers.
An even more ambitious project is taking shape in Batam, Indonesia, where Permas is constructing the DSX AI Factory campus. This is not a traditional data center but a factory-style facility dedicated to AI operations. The campus boasts a power capacity of 360MW, with plans to scale up to 170,000 NVIDIA GPUs. To put this in perspective, 360MW is enough to power dozens of standard data centers simultaneously. A cluster of 170,000 GPUs represents one of the densest concentrations of compute power on earth, providing the necessary foundation to handle real-time inference requests for the world's largest models.
This massive influx of resources meets the urgent needs of AI-native firms like Baseten, Fireworks AI, and Together AI. These companies operate in a high-pressure pipeline: they move from initial training to post-training, then to fine-tuning, and finally into the stage of agentic inference. Agentic inference is significantly more compute-intensive than simple chat responses, as it requires the AI to reason, plan, and execute complex multi-step tasks autonomously. When a pilot service transitions to full production, the resulting traffic spike can be catastrophic if the infrastructure cannot scale instantly. The commercial flexibility provided by NVIDIA's new model allows these firms to expand their hardware footprint in real-time as their user base grows.
By eliminating the traditional path of site selection, power negotiation, and construction, these entities are jumping straight into full-stack accelerated computing. The cases of Sharon AI and Permas demonstrate that the capital barrier to entry has been fundamentally lowered. AI companies are no longer forced to be real estate developers or power brokers; they can return to being software architects and model researchers, focusing on performance and service implementation while the infrastructure scales invisibly in the background.




