This week, developers are treating “AI coding agents” less like code generators and more like teammates, and GStack is the clearest example yet.
Section 1
GStack is an open-source AI coding workflow Garry Tan, the YC (Y Combinator) CEO and an engineer by training, published as a skill-pack system for Claude Code. Tan frames GStack as something you don’t just “use to write code,” but something that behaves like a virtual software team made up of distinct roles: CEO, designer, engineer, and QA. In his release, he says the project is built around a set of 23 skill packs that collectively cover the full software development lifecycle.
Tan also shared early traction metrics. He says that within three weeks of launch, GStack recorded more GitHub stars than Ruby on Rails, and that it currently has 70,000+ stars.
The core design choice is “thin wrapper, thick skills.” Instead of introducing a complicated runtime or a heavyweight orchestration layer, GStack runs through a Markdown-based, structured prompt system. Tan’s description emphasizes that the skills execute on top of Claude Code’s existing slash-command framework, meaning the workflow is designed to feel native to the tool rather than forcing developers into a separate environment.
The 23 skills are organized as a sprint lifecycle that Tan describes as: “thinking → planning → building → review → testing → deployment → retrospection.” Each skill’s output is intended to become the next skill’s input, so the workflow behaves like a chain of steps rather than a single prompt that produces a patch. Tan’s pitch is that this structure turns a chat-based coding assistant into a repeatable process.
He also highlights specific skills that mirror how product and engineering teams operate. For example, the Office Hours skill is described as simulating the way YC partners pressure-test ideas. In the idea stage, it uses forced questions such as “What is the strongest evidence that someone actually wants this?” to refine product direction. Tan says it then evaluates the business model and feasibility as part of the same flow.
For review, GStack includes an adversarial review capability. Tan describes it as automatically validating design documents and attempting to catch issues like missing failure handling, insufficient privacy considerations, and unresolved handoffs for two-step authentication. The goal is not only to check for correctness, but to run multi-stage scrutiny and attempt fixes when problems are detected.
Where this gets concrete is in how GStack positions itself against the typical “AI writes code” pattern. Tan’s framing is that older tools often focus on one slice of the work—writing or reviewing—while GStack emphasizes connecting the entire pipeline from idea validation through deployment.
Section 2
So what is actually different about GStack, beyond the marketing language of “teamwork” and “skills”? The twist is that GStack changes the unit of work from “a code response” to “a process that can be executed, verified, and iterated,” and it does that by attaching QA and cross-model review to the same slash-command workflow.
First, GStack pushes the workflow past static correctness. Tan says developers feel the shift when the agent doesn’t stop at producing code, but instead verifies behavior in a real browser. He describes a Playwright-based browser QA skill that uses the /qa command to open an actual Chromium browser and perform actions like clicking, typing, and capturing screenshots. In his account, this is paired with regression test generation: the system creates regression tests, commits them, and uses the results as part of the loop.
Tan also claims this approach helps avoid friction he attributes to older Chrome MCP-style tooling, describing it as slow and prone to context issues when used through a CLI wrapper. In other words, the “difference” is not only that GStack adds testing, but that it integrates testing in a way that is meant to be responsive enough to run as part of an agent workflow.
Second, GStack changes how review happens by making it cross-model rather than single-model. Tan says GStack supports eight AI coding agents at once, not just Claude Code. The list he provides includes Claude Code, OpenAI Codex CLI, Cursor, and OpenCode, with the overall point being that the workflow is designed to avoid vendor lock-in.
He describes a specific mechanism for cross-model review using the /codex command. The idea is that Claude and OpenAI Codex CLI can each provide independent reviews, and then the workflow is structured so that problems one model misses can be caught by the other. This is a causation claim: if the workflow is a pipeline with multiple reviewers, then the failure modes of any single model are less likely to slip through.
Third, GStack changes throughput by running parallel work and handling many PRs per day. Tan says he runs 10–15 Claude Code sessions concurrently and that the workflow can process up to 50 PRs per day. He describes the underlying structure as using a work tree model, where each task proceeds on an independent branch. That matters because it reframes the agent from “one developer at a time” into something closer to a team with parallel lanes.
Finally, GStack changes adoption friction with a team installation mode. Tan says the project provides a ./setup --team option. He describes what that means operationally: when sessions start, the workflow auto-updates, and it doesn’t add separate files into the project repository. His argument is that this reduces the overhead of rolling the system out to a team.
The result is that GStack’s “skill packs” are not just a set of clever prompts. They are a structured, role-based workflow that tries to enforce causality across steps: planning outputs feed building, building outputs feed review, review outputs feed testing, and testing outputs feed deployment and retrospection. That is why the project reads like a process framework rather than a coding chatbot.
Section 3
GStack is positioned as fully open source under the MIT license, and Tan says it can be used without separate costs or subscriptions. He also notes that community contributions are active, which matters because skill-pack workflows tend to evolve as teams discover edge cases.
Still, Tan includes limitations, and they’re important because they explain where the “process” approach can break down.
The first limitation is that the workflow is opinionated. Tan says the system bakes in a particular way of working, which means it may not match every team’s culture or engineering practices. In other words, the framework can be a fit problem, not just a tooling problem.
The second limitation is about claims of scale and verifiability. Tan mentions a statement that the workflow can write “60 days of 600,000+ lines of code,” but he also acknowledges that verifying that claim is difficult. More broadly, he separates the quality of AI-generated code from maintainability: even if the agent produces working code, that doesn’t automatically guarantee long-term readability, testability, or maintainability.
The third limitation is the relationship between model intelligence and workflow structure. Tan’s framing is that GStack complements model limitations with process structure rather than solving fundamental model weaknesses. He describes the system as “wrapping” direction when a model is smart but can’t find the right path, but he argues it doesn’t replace the need for good engineering judgment.
This is where the story resolves into a clear thesis about the bottleneck in AI coding. Tan pushes the idea that the biggest bottleneck may not be “model intelligence” but “process absence.” If teams don’t have a repeatable workflow for planning, review, testing, and deployment, then even a capable model can produce inconsistent outcomes.
GStack’s approach therefore doesn’t primarily ask teams to switch models; it asks them to attach the roles of a software team to the workflow first, using skill packs that run inside Claude Code’s slash-command ecosystem.
Where this leads is straightforward: the next wave of AI coding tools will likely compete less on raw code generation and more on how reliably they enforce an end-to-end engineering process.




