One Developer Built Spatial AI in 7 Hours Using Only Gemini

The scene at Seoul's Sebitseom Island on February 28 was familiar to anyone who has watched a hackathon's final hours. Developers hunched over laptops, demo videos rendering in the background, the low hum of last-minute debugging. But one table drew the judges' attention differently. When a judge asked, "Did you mix in any external APIs?" the solo developer answered flatly: "I started from a blank slate and used only the Gemini API." The Google engineers at the tech desk exchanged glances. His name is Minsoo Jang, and he had just taken first place out of 1,515 applicants, 219 participants, and 111 submitted projects.

7 Hours, 8 Photos, One API

Jang's service is called GeminiSpace. The workflow is deceptively simple: take eight photos covering a 360-degree view of a room with a smartphone, feed them to Google's multimodal AI model Gemini, and the system returns a 2D floor plan and a 3D voxel map of the space. Once the map is generated, users can ask natural-language questions like "How do I get from the refrigerator to the door?" and the AI reasons through the spatial layout to answer.

Jang built the entire thing in seven hours. For text reasoning, he used Gemini 3 Flash. For image generation, he tapped the Nano Banana model. His development environment ran on a Google AI Pro subscription, and he switched between three platforms depending on the stage of development: Gemini Web, Google AI Studio, and Google Antigravity. Ten minutes before the submission deadline, he uploaded a demo video that included an intro generated by Google's Veo model and AI-generated text-to-speech. The entire pipeline — from concept to demo — fit inside a single afternoon.

What Used to Require Heavy Engineering

For robots to navigate indoors or perform specific tasks, they need a map. The traditional approach involves deterministic software engineering — SLAM algorithms, sensor fusion, point cloud processing — all time-consuming and requiring specialized expertise. Jang works as a robotics engineer by day. On his way to the hackathon venue, he formed a hypothesis: "What if Gemini could generate a map from just a few smartphone photos?"

The hypothesis held. With a smartphone camera and a single Gemini API call, anyone can now build spatial AI. Jang described Gemini as a "Visual-Language-Action (VLA) model," and said its multimodality is the entire product. "The multimodal capability of Gemini is everything for this product," he said. The shift is not incremental. It collapses what used to be a multi-week engineering sprint into a single afternoon of prompt engineering and API orchestration.

The Real Shift Is Domain Knowledge

The deeper change Jang experienced is not about the model's raw capability — it is about what becomes valuable when the model is already capable. "We have entered an era where genuine domain knowledge is the most important asset," Jang said. His point is that the barrier to building functional AI products has dropped so low that the differentiating factor is no longer who can write the most optimized code, but who understands the problem deeply enough to know what to ask the model.

Jang does not describe himself as a traditional software engineer. He calls himself someone who chats with AI every day and pulls the ideas in his head into reality. The winning prize included a mentoring session with the founder of the Google AI Futures Fund, a program that invests in early-stage AI startups. For Jang, the win is a signal that the solo developer armed with domain expertise and a willingness to iterate with AI tools can now compete with teams of engineers.

The takeaway is not that Jang is exceptional — it is that the conditions for exceptional work have changed. When a single developer can build spatial AI in seven hours using only one API, the bottleneck is no longer engineering resources. It is the quality of the question you ask the model.

One Developer Built Spatial AI in 7 Hours Using Only Gemini

7 Hours, 8 Photos, One API

What Used to Require Heavy Engineering

The Real Shift Is Domain Knowledge

Related Articles