Cohere Command A+ Shifts to Apache 2.0 to Scale Sovereign AI

Enterprise AI developers have long operated under a frustrating paradox. On one hand, the industry promises the democratization of intelligence through open weights; on the other, the fine print of licenses like CC-BY-NC often turns those weights into a gilded cage, forbidding commercial use or imposing restrictive terms that make legal departments shudder. This tension is compounded by the token tax, where non-English languages are fragmented into more pieces, driving up latency and operational costs for global firms. The current atmosphere in the developer community is one of cautious anticipation, waiting for a frontier-class model that is truly open and computationally efficient enough to leave the cloud provider's ecosystem entirely.

The Convergence of 218B Parameters and Apache 2.0

Cohere has responded to this tension by releasing Command A+, a model that fundamentally alters the accessibility of high-parameter AI. The most immediate shock to the system is the license. By adopting the Apache 2.0 license, Cohere has stripped away the commercial restrictions that plagued the Command R series. This move allows any entity, from a solo developer to a Fortune 500 corporation, to modify, distribute, and commercially deploy the model without the legal friction of non-commercial clauses. It is a strategic pivot toward Sovereign AI, enabling organizations to maintain absolute control over their data and model weights within their own secure infrastructure.

Under the hood, Command A+ employs a decoder-only Sparse Mixture-of-Experts (MoE) Transformer architecture. While the model boasts a massive total parameter count of 218 billion (218B), it only activates 25 billion (25B) parameters during any single inference pass. This design allows the model to retain the vast knowledge base and nuanced reasoning of a giant model while maintaining the inference speed and cost profile of a much smaller one. By routing queries only to the most relevant expert networks, Cohere has effectively decoupled model capacity from computational cost.

This efficiency extends to the model's linguistic processing. Cohere has implemented an optimized tokenizer supporting 48 languages natively, specifically targeting the inefficiencies of non-European scripts. The results are concrete: token efficiency for Korean has improved by 16%, Japanese by 18%, and Arabic by 20%. For an enterprise running millions of API calls, these percentages translate directly into lower operational expenditure (OPEX) and faster response times. Furthermore, the model supports a native multimodal architecture for text and images, paired with a 128K context window, allowing it to ingest massive corporate documents or complex charts in a single session without losing coherence.

Erasing the Quantization Tax with W4A4 and MoE

Until now, the primary barrier to running 200B+ parameter models on-premises was the quantization tax. Traditionally, compressing a model to 4-bit or 8-bit precision to save memory resulted in a noticeable degradation of reasoning capabilities, forcing developers to choose between hardware affordability and intelligence. Cohere has bypassed this trade-off using a W4A4 (Weight 4-bit, Activation 4-bit) quantization format combined with Quantization-Aware Distillation. Instead of a blanket compression, Cohere utilizes a hybrid approach: the critical attention paths are maintained at high precision, while the MoE expert networks are compressed to 4-bit. This ensures that the structural integrity of the reasoning process remains intact while the memory footprint collapses.

The hardware implications are transformative. Command A+ can be deployed on a single NVIDIA Blackwell B200 GPU or two NVIDIA H100 GPUs, a stark contrast to the massive clusters typically required for models of this scale. In low-concurrency environments, the model generates 375 tokens per second (TOPS) with a Time-to-First-Token (TTFT) of 113 milliseconds. When compared to its predecessor, Command A Reasoning, the output speed has increased by up to 63% and latency has decreased by 17%. This shift moves the conversation from whether a company can afford the hardware to how quickly they can deploy the agent.

This technical optimization has triggered a massive leap in benchmark performance, particularly in reasoning and agentic tasks. On the $\tau^2$-Bench Telecom, which measures complex reasoning in a domain-specific context, the score surged from 37% to 85%. The model's ability to operate within a terminal environment, measured by Terminal-Bench Hard, climbed from a negligible 3% to 25%. Most impressively, on the AIME 25 mathematics benchmark, Command A+ jumped from 57% to 90%, placing it in direct competition with closed-source frontier models and DeepSeek V4 Pro. These are not just incremental gains; they represent a qualitative shift in the model's ability to handle multi-step logic and tool use.

To solve the persistent problem of hallucinations in enterprise settings, Cohere introduced native Grounding Spans. Rather than simply summarizing a source, the model uses specific tags to mark the exact range of the original text that supports its answer. In high-stakes industries like law, finance, or medicine, this traceability transforms the AI from a black-box generator into a verifiable research tool. By integrating this at the model level, Cohere has reduced the complexity of designing reliable AI agents, removing the need for developers to spend hundreds of hours engineering prompts to force the model to cite its sources.

Command A+ effectively signals the end of the era where enterprises had to trade sovereignty for performance.

Cohere Command A+ Shifts to Apache 2.0 to Scale Sovereign AI

The Convergence of 218B Parameters and Apache 2.0

Erasing the Quantization Tax with W4A4 and MoE

Related Articles