The 300K ops/sec Rust Engine Built by AI Agents in 4 Weeks

The current era of AI-assisted coding has largely been defined by the snippet. Most developers use LLMs to generate a boilerplate function, debug a stubborn regex, or write a unit test for a simple edge case. However, a persistent ceiling remains: the transition from a functional prototype to a production-ready system. In the world of distributed systems, where a single race condition or a subtle logic error in a consensus algorithm can lead to catastrophic data corruption, the industry has remained skeptical about whether AI agents can handle the architectural heavy lifting. This week, a project emerged that shatters that ceiling, demonstrating that the combination of high-reasoning models and a rigorous verification framework can build a complex, high-performance system in a fraction of the traditional development time.

The 130,000-Line Sprint to Distributed Consensus

The project in question is a Rust-based multi-Paxos distributed consensus engine, designed to replicate the functionality of the Replicated State Library (RSL) used in Azure services. The scale of the achievement is found in the raw metrics. While the entire project spanned three months, the core implementation of 100,000 lines of Rust code was completed in just four weeks. By the end of the build, the codebase grew to over 130,000 lines. In a traditional engineering environment, a system of this complexity—requiring deep expertise in memory safety, network concurrency, and the mathematical nuances of the Paxos algorithm—would typically require a team of senior engineers working for several months or even years.

This velocity was not the result of a single tool, but a strategic orchestration of AI agents. The developer utilized Claude Code and Codex CLI as the primary drivers for terminal-based execution and code generation. These were supplemented by GitHub Copilot, Augment Code, Kiro, and Trae to handle auxiliary tasks. To maintain this pace, the developer treated AI rate limits as a primary engineering bottleneck. This involved subscribing to the Anthropic Max plan at $100 per month to unlock higher throughput and maintaining two separate ChatGPT Plus accounts to rotate usage and bypass daily message caps. The development environment was intentionally lean; VS Code was relegated to a secondary role for diffing and fine-tuning, while the bulk of the interaction happened asynchronously within the terminal to maximize the AI's autonomy.

However, speed without stability is a liability in distributed systems. To prevent the project from collapsing under its own weight, the developer implemented a massive verification layer. Over 65% of the total codebase consists of test code, totaling more than 1,300 individual test cases. This testing suite was not limited to simple unit tests. It encompassed a multi-tiered defense strategy: minimal integration tests involving only a proposer and an acceptor, full-scale integration tests with multiple active replicas, and aggressive failure-injection tests designed to simulate network partitions and node crashes. This rigorous approach ensured that the AI-generated code maintained the absolute integrity required for a consensus engine.

From Code Generation to Architectural Optimization

The real shift in this project occurs when the focus moves from writing lines of code to ensuring correctness and squeezing out performance. The most dangerous aspect of using AI for complex algorithms like Paxos is the risk of "silent failures"—code that compiles and runs but produces subtly incorrect results under specific conditions. To solve this, the developer implemented Code Contracts, a methodology where pre-conditions, post-conditions, and invariants are explicitly defined for every critical function. Using the GPT-5 High model, these contracts were drafted with extreme precision, acting as a formal agreement of how the code must behave.

These contracts were then converted into runtime asserts during the testing phase, creating a real-time alarm system that triggered the moment an invariant was violated. To push this further, the developer employed property-based testing. Instead of writing static test cases, the AI was tasked with generating tens of thousands of random inputs to stress-test the contracts, successfully uncovering edge cases that a human architect would likely have missed. This approach effectively modernized the concepts found in .NET Contracts by leveraging the reasoning capabilities of modern LLMs.

This shift toward a Spec-Driven Development (SDD) strategy replaced traditional, heavy documentation. Rather than maintaining static design documents that quickly become obsolete, the developer used the Spec Kit tool. By utilizing the `/specify` command, the AI generated lightweight specifications containing core user stories and acceptance criteria. This was followed by a self-criticism loop using the `/clarify` command, where the AI was forced to critique its own design and propose improvements before a single line of code was written. The work was then broken down into single User Story units—the "sweet spot" for AI control—ensuring the model never lost context due to excessive task size. This allowed for the rapid implementation of the complex Azure RSL features while maintaining a high degree of reliability.

The most dramatic evidence of the AI's capability, however, was the performance optimization phase. The initial implementation achieved a throughput of 23K ops/sec. Over the following three weeks, the developer entered a profiling loop: identifying bottlenecks through execution analysis, having the AI propose optimization strategies, and verifying the results. This iterative process resulted in a 13x performance leap, peaking at 300K ops/sec. The AI didn't just suggest better syntax; it implemented high-level systems optimizations. This included the adoption of zero-copy techniques to eliminate unnecessary memory movement and the removal of synchronization locks to eliminate thread contention. The strict memory safety guarantees of Rust ensured that these aggressive optimizations did not introduce memory corruption or segmentation faults.

This project signals a fundamental change in the role of the software architect. The developer is no longer the primary writer of code, but the definer of contracts and the orchestrator of verification loops. The system has already integrated pipelining and Non-Volatile Memory (NVM) support, utilizing persistence log techniques validated in the PoWER Never Corrupts (OSDI 2025) paper. While RDMA support remains a pending goal, the overall architecture—based on the lean Jay Lorch Design—proves that AI can now navigate the entire lifecycle of a high-performance system, from initial specification to extreme optimization.

The 300K ops/sec Rust Engine Built by AI Agents in 4 Weeks

The 130,000-Line Sprint to Distributed Consensus

From Code Generation to Architectural Optimization

Related Articles