For the past two months, a mysterious entity known as Owl Alpha has been quietly dominating the OpenRouter leaderboards. Developers across the globe were feeding their most complex prompts into this anonymous engine, watching it handle massive codebases and intricate logic with a precision that rivaled the industry's most guarded proprietary models. It was a rare moment of blind testing at scale, where the community judged a model solely on its output rather than its pedigree. This week, the mask finally came off, revealing that the engine behind the performance is LongCat-2.0, a massive open-source project from the Chinese delivery giant Meituan.

The Architecture of a 1.6 Trillion Parameter Giant

Meituan has officially released LongCat-2.0 on GitHub and Hugging Face, positioning it as a powerhouse for enterprise-grade coding and agentic workflows. At its core, LongCat-2.0 is a Mixture-of-Experts (MoE) model boasting a staggering 1.6 trillion total parameters. Despite this scale, the model is designed for efficiency. It utilizes a sparsity optimization strategy where only a fraction of the network is active for any given token. On average, the model activates 48 billion parameters per token, with a dynamic range that fluctuates between 33 billion and 56 billion depending on the complexity of the query. To further eliminate the computational waste typically found in ultra-large models, Meituan implemented a Zero-Compute Experts framework, which routes simple execution elements through lightweight sub-networks to avoid idle operation overhead.

One of the most provocative aspects of LongCat-2.0 is its origin story. While the AI world remains locked in a fierce struggle for Nvidia H100s, Meituan completed the full training of this model on a cluster of over 50,000 Chinese-made ASICs (Application-Specific Integrated Circuits). This serves as a critical proof of concept that trillion-parameter models can be realized without total reliance on the standard GPU supply chain. The model is released under the MIT license, making it immediately viable for commercial integration, and it ships with a native context window of 1 million tokens.

Meituan is also introducing a disruptive API pricing model designed to encourage the use of long-context windows. The company has implemented a cache-hit system where requests that trigger a context cache hit are processed entirely for free. For non-cached requests, the standard pricing is set at $0.75 per million input tokens and $2.95 per million output tokens. To accelerate adoption, a limited-time promotion has further reduced these costs to $0.30 per million input tokens and $1.20 per million output tokens. The scale of the model's previous stealth phase as Owl Alpha is evident in its traffic data, having processed an average of 10.1 trillion tokens per month, with daily throughput reaching 559 billion tokens.

From Chatbot to Autonomous Agent

The true distinction of LongCat-2.0 lies in how it handles the memory and reasoning bottlenecks that plague most long-context models. To maintain stability across its 1-million-token window, Meituan developed LongCat Sparse Attention (LSA). This technology solves memory fragmentation and computational costs through three distinct indexing mechanisms. First, Streaming-aware Indexing (SI) combines hardware-aligned sequential data reads with dynamic random selection to maximize High Bandwidth Memory (HBM) utilization. Second, Cross-Layer Indexing (CLI) leverages the fact that attention saliency remains stable across adjacent hidden layers, allowing a single indexing pass to guide inference across multiple layers. Finally, Hierarchical Indexing (HI) employs a two-stage scoring process, using block-level recall to filter candidates before performing fine-grained token selection.

To enhance its grasp of local token relationships and reduce memory I/O bottlenecks, the model integrates an N-gram embedding module. By adding 135 billion parameters to a 5-gram token combination framework, Meituan expanded the embedding space by approximately 100 times. This allows the model to capture nuanced local patterns in code and text, significantly increasing the speed of large-batch inference.

However, the most significant leap is found in the post-training optimization. Rather than using a single reward function, Meituan employed Multi-Teacher Optimization via Mixture of Specialized Experts (MOPD). This approach splits the optimization process into three specialized expert clusters. Agent Experts focus on structural execution, such as tool calling, API parameter parsing, and self-correction loops. Reasoning Experts handle multi-step logic, Chain-of-Thought processing, and STEM problem-solving. Interaction Experts ensure instruction following, factual grounding, and the maintenance of safety guardrails.

This specialized training manifests in the benchmarks. In the SWE-bench Pro evaluation, which tests a model's ability to resolve real-world GitHub issues, LongCat-2.0 scored 59.5, surpassing GPT-5.5's score of 58.6. The model also demonstrated high proficiency in other technical environments, scoring 70.8 on Terminal-Bench 2.1, 77.3 on SWE-bench Multilingual, and 73.2 on FORTE, a simulator for corporate workflows.

For the developer, this means LongCat-2.0 should not be viewed as a conversational assistant, but as a specialized tool for agentic tasks. The MOPD-trained agent layers provide a level of precision in API manipulation and self-correction that general-purpose models often lack. When combined with the free context caching, the model becomes an economically viable engine for analyzing entire repositories or building autonomous agents that must reference massive amounts of documentation without incurring prohibitive costs.

By decoupling high-end model performance from specific hardware vendors and proprietary silos, LongCat-2.0 shifts the landscape of open-source AI toward autonomous, agent-driven engineering.