Varya Slashes Video AI Costs to $0.005 Per Second

The current state of generative video is defined by a frustrating trade-off between cinematic quality and prohibitive operational costs. For most creators and developers, the experience of using high-end video AI involves long queues, expensive credit bundles, and a waiting game that often lasts minutes for a mere five seconds of footage. This economic barrier has kept high-fidelity video generation as a luxury tool for well-funded studios rather than a utility for the broader developer community. The tension lies in the massive compute requirements of diffusion models, where every single frame demands an immense amount of GPU power, making real-time or low-cost iteration nearly impossible.

The Infrastructure of the India AI Mission

Avataar AI is challenging this status quo with the release of Varya, a video generation model designed to dismantle the cost barriers of the medium. Unlike the prevailing industry trend of keeping high-performance weights behind proprietary APIs, Varya is being released as an open-weight model. This distribution is handled through the AI Kosh portal, the central repository for AI models and datasets operated by the Indian government. By providing the model weights and the training data used to build them, Avataar AI allows developers to host the model on their own infrastructure and modify the architecture to suit specific needs, shifting control from the provider to the end user.

This open-source strategy is a direct result of the India AI Mission, a $1.2 billion initiative aimed at bolstering the nation's sovereign AI capabilities. Avataar AI is one of 12 startups selected for this program, which provides subsidized GPU computing resources to alleviate the hardware shortages that typically stifle AI startups. The agreement is a strategic exchange: the Indian government absorbs the massive infrastructure costs, and in return, the resulting models must be made public to ensure the technology serves as a public good.

Beyond the open-weight release, Avataar AI is positioning Varya to capture both the enterprise and ecosystem markets. The company plans to offer specialized versions of the model to corporate clients while pursuing partnerships with established generative tools. Potential integrations with platforms like Higgsfield and Adobe Firefly suggest a strategy where Varya acts as the efficient engine powering a wider array of creative interfaces. To ensure the model is not just fast but relevant, Avataar AI trained Varya on curated datasets specifically reflecting Indian cultural contexts, including local festivals, traditional clothing, architecture, and cuisine, ensuring a level of cultural accuracy that global general-purpose models often miss.

The Distillation Leap from 50 Steps to 4

The most significant breakthrough in Varya is not the data it was trained on, but how it processes information. The primary bottleneck in video AI is the number of inference steps required to turn random noise into a coherent image. Most high-quality models, including the open-source Wan 2.2 from Alibaba which serves as the foundation for Varya, require roughly 50 steps of computation to generate a clip. This linear relationship between steps and compute time is what drives the high price tags of commercial video services.

Avataar AI utilized a technique known as distillation to compress the knowledge of the larger Wan 2.2 model into a leaner, more efficient version. By distilling the model, they successfully reduced the required inference steps from 50 down to just 4. This is not a marginal improvement but a fundamental shift in efficiency. When generating a 5-second 720p clip on an NVIDIA H200 GPU, the difference is stark: while the original Wan 2.2 takes 1,230 seconds to complete the task, Varya finishes the same job in 45 seconds. This represents a 10x increase in generation speed.

This technical optimization translates directly into a disruptive pricing model. While industry leaders like Google's Veo, Kling AI, Luma AI, and Runway typically charge $0.10 or more per second of generated video, Varya's planned hosting service is priced at $0.005 per second (approximately ₹0.48). By reducing the compute overhead by an order of magnitude, Avataar AI has made the service roughly 20 times cheaper than its primary competitors. The shift proves that the next frontier of AI competition is not necessarily about increasing model size, but about the aggressive optimization of inference.

This approach demonstrates a pivot in AI development strategy. Rather than attempting to build a massive foundation model from scratch—a process that requires billions of dollars and astronomical energy consumption—Avataar AI focused on the distillation of existing open-source power and the curation of niche, high-quality data. By solving for the cost of the inference step, they have moved video AI from a high-cost experimental tool to a commercially viable utility.

Efficiency is now the primary metric for the democratization of generative video.

Varya Slashes Video AI Costs to $0.005 Per Second

The Infrastructure of the India AI Mission

The Distillation Leap from 50 Steps to 4

Related Articles