For years, the directive from defense security officers has been a blunt instrument: halt the adoption of any AI model that cannot guarantee absolute compliance with security regulations. In the high-stakes environment of national security, where a single data leak can compromise an entire operation, the performance of a Large Language Model is secondary to the integrity of the perimeter. Government agencies have long been trapped between the desire for the transformative productivity of generative AI and the reality of closed-box proprietary systems that offer little transparency and too many vulnerabilities. This tension created a stagnant gap where the public sector watched the private sector accelerate, unable to bridge the divide between cutting-edge intelligence and rigid security mandates.

The Architecture of Sovereign Intelligence

AWS is addressing this deadlock by integrating OpenAI GPT OSS and NVIDIA Nemotron open-weight models into Amazon Bedrock within the AWS GovCloud (US) region. This deployment is not a mere addition of tools but a strategic alignment of high-performance AI with the most stringent regulatory frameworks in the world. The available model lineup is tiered to handle diverse operational needs, featuring OpenAI's `gpt-oss-120b` and `gpt-oss-20b`, alongside NVIDIA Nemotron Nano 9B v2, Nano 12B v2, Nano 30B, and Super 120B.

The OpenAI GPT OSS series functions as a text-to-text engine optimized for complex reasoning, agentic workflows, and developer-centric tasks. The `gpt-oss-120b` variant, with its 120 billion parameters, is designed for general-purpose production environments where high-level cognitive reasoning is non-negotiable. Conversely, the `gpt-oss-20b` model focuses on reducing latency, making it the ideal choice for specialized tasks or local environment optimizations where speed is the primary metric. Both models in the GPT OSS family provide a 128K token context window, allowing agencies to ingest and analyze massive volumes of documentation in a single pass, with a maximum output capacity of 16K tokens.

Because these are open-weight models, the underlying learned parameters are accessible for evaluation. This allows organizations to independently audit the model architecture and run proprietary benchmarks to verify behavior before a single prompt is ever sent in a live environment. This capability is housed within the AWS GovCloud (US) infrastructure, which is physically located in the United States and managed exclusively by U.S. citizens. The environment meets FedRAMP High standards and complies with DoD SRG Impact Levels 2, 4, and 5. Furthermore, it satisfies the International Traffic in Arms Regulations (ITAR) and the Criminal Justice Information Services (CJIS) frameworks, ensuring that the defense and intelligence communities can operate without compromising legal or security mandates.

The Zero Trust Pivot and API Flexibility

While the presence of the models is significant, the true shift lies in the implementation of Zero Operator Access. In a traditional cloud setup, there is always a theoretical risk of administrative access. AWS has eliminated this by ensuring that zero operators—not from AWS, not from the model provider, and not even from the customer's own administrative team—can physically access inference prompts or the resulting outputs. This is achieved through Model Deployment Account isolation, a technique that completely separates the account executing the model from the account managing the infrastructure. When combined with the physical isolation of GovCloud, the risk of data exfiltration is effectively neutralized at the architectural level.

To make this secure environment usable for developers, AWS has implemented a dual-endpoint system. The first is `bedrock-mantle`, an OpenAI-compatible HTTPS API endpoint. This allows developers to use existing OpenAI Python and TypeScript SDKs to call Chat Completions and Responses APIs. The strategic advantage here is seamless migration; teams can transition their existing AI application code to these secure open-weight models with almost zero modification. The second path is the `bedrock-runtime` endpoint, which utilizes the AWS SDK to call Converse and InvokeModel APIs. This path is specifically optimized for interactive interfaces and allows for the direct integration of Amazon Bedrock Guardrails, enabling real-time filtering of model responses to ensure they adhere to strict policy and security guidelines.

This infrastructure removes the operational burden of GPU provisioning, accelerator optimization, and complex deployment cycles. By treating the models as managed services via API, government agencies can shift their focus from the physics of hardware to the logic of application. The distinction between NVIDIA Nemotron and OpenAI GPT OSS further refines this utility. Nemotron is positioned as the efficiency leader, offering Small Language Models (SLMs) and LLMs that maximize resource utilization. It is designed for specialized agent AI systems where precision and low computational cost are paramount, especially in resource-constrained environments. OpenAI GPT OSS, meanwhile, handles the heavy lifting of logical deduction and complex developer workflows, allowing operators to balance cost and performance by selecting the model size that matches the complexity of the task.

This open-weight approach transforms the security model from one of trust to one of verification. By reviewing model cards and executing independent benchmarks, security teams can apply Zero Trust principles to AI. Instead of trusting a provider's claim that a model is safe, the agency verifies the model's behavior against its own representative workloads. In practice, this means a defense agency might use Nemotron for automated security control evaluations where transparency is key, while deploying `gpt-oss-120b` for multi-document synthesis and contract analysis where deep logical connectivity across vast datasets is required.

Optimizing Inference through Strategic Tiering

To prevent the common pitfall of over-provisioning expensive resources, Amazon Bedrock introduces three distinct service tiers tailored to workload characteristics. The Standard tier operates on an on-demand basis, where users pay only for the tokens they consume. This is the most efficient choice for experimental workloads or tasks with unpredictable usage patterns, as it eliminates fixed infrastructure costs. For mission-critical, user-facing applications where latency directly impacts operational effectiveness, the Priority tier ensures that requests are routed with higher precedence to minimize response times.

For non-time-sensitive tasks, such as large-scale batch summarization or exhaustive model evaluation, the Flex tier provides a low-cost alternative. By prioritizing cost reduction over immediate speed, the Flex tier allows agencies to process massive datasets without exhausting their budgets. This tiered approach allows operators to strategically distribute their workloads based on urgency and budget, optimizing the balance between inference performance and fiscal responsibility.

Crucially, all these processes remain strictly within the AWS GovCloud (US) boundary. To maintain absolute data residency, AWS has disabled global cross-region inference for these models. While cross-region inference typically increases throughput by distributing requests across global commercial regions, it introduces the risk of data leaving the U.S. physical border. By restricting all requests and responses to a single region or specific geo cross-region options managed by U.S. citizens, AWS ensures that the data residency requirements of the most sensitive government missions are met without exception. This creates a blueprint for sovereign AI, where the power of the world's most advanced models is harnessed without sacrificing the physical and legal boundaries of national security.

This shift toward open-weight, isolated AI environments signals the end of the era where government agencies had to choose between security and innovation. By combining Zero Operator Access with the transparency of open weights, the infrastructure now supports a regime of continuous verification, ensuring that the AI tools used to protect national interests are as secure as the data they process.