The dream of a fully autonomous multi-agent society often hits a wall of cold, hard mathematics. For developers attempting to simulate complex interactions—where dozens of agents trade, negotiate, and react in real-time—the reliance on frontier models like GPT-4 or Claude 3.5 creates an unsustainable bottleneck. Every tick of the simulation clock triggers a cascade of API calls, leading to exponential costs and latency that turns a real-time world into a slow-motion slideshow. The industry has long sought a middle ground: a model small enough to run in a single batch on a GPU, yet capable enough to maintain the illusion of agency.
The Architecture of Thousand Token Wood
This challenge led to the creation of Thousand Token Wood, a minimal economic ecosystem powered by five instances of the Qwen2.5-3B model. The simulation establishes a closed loop where five forest-dwelling agents trade five types of goods using pebbles as the primary medium of exchange. To achieve the necessary throughput for real-time interaction, the infrastructure leverages vLLM, a high-performance LLM inference engine, deployed on the Modal cloud GPU platform. The entire state of the forest economy is monitored through a Gradio web interface, allowing observers to track asset accumulation and trade patterns as they unfold.
By selecting a 3B parameter model, the system can process the turns of all five agents in a single batch GPU call. This architectural choice eliminates the sequential waiting period inherent in larger models, enabling a fluid simulation of gossip, resource hoarding, and panic reactions. On a technical level, Qwen2.5-3B demonstrated a remarkable strength in structural adherence, achieving 100% reliability in generating valid JSON outputs across all calls. This formatting consistency is critical for any autonomous system where a single misplaced bracket can crash the entire simulation loop.
The Intelligence Gap and the Shift to Systemic Design
However, the project revealed a stark divergence between formatting capability and economic reasoning. While Qwen2.5-3B could perfectly wrap its thoughts in JSON, its internal logic often faltered. In early tests, agents frequently committed logical errors, such as attempting to purchase goods they were already producing in abundance. The model functioned as a reliable data formatter but an unreliable economist, unable to independently derive the principles of scarcity or exchange value.
Rather than attempting to solve this by scaling up to a larger model—which would reintroduce the latency and cost issues—the solution was to shift the burden of intelligence from the model to the system. This approach replaces high-level reasoning with structural constraints and precision prompting. Instead of asking the model to figure out what it needs, the system now calculates the agent's current deficits and injects a precise list of missing goods directly into the prompt. By providing the answer in the input, the model is only required to use its formatting strength to execute the action, effectively bypassing its reasoning limitations.
To ensure the economy did not stagnate, a scarcity design was implemented. In initial runs, agents reached a state of self-sufficiency where production exceeded consumption, causing the market to fall silent after a single settlement. By intentionally setting consumption rates higher than production rates, the system forces agents into a state of perpetual deficit. This artificial hunger creates a mandatory motive for trade; when one agent controls a critical resource like firewood, others must compete for it to survive, creating a natural accumulation of wealth for the supplier.
To stabilize the simulation, a JSON parse-and-repair layer was introduced. If a model produces a malformed response, the system executes a no-op (no operation) rather than allowing the simulation to crash. Furthermore, the management of agent wellbeing was transitioned from a cumulative metric to a mean-reversion model. Previously, chronic resource shortages led to a dead spiral where all agents converged to zero wellbeing. The new design allows wellbeing to recover as soon as food and warmth are provided, shifting the pressure of survival from a health bar to the volatility of goods and prices.
Simulating Market Shocks and Price Volatility
One of the most compelling results of this systemic approach is the emergence of non-scripted market crashes. Through a feature called Wood Legend, the system injects historical economic shocks—reimagined as forest folklore—into the agents' environment. Events like Tulip Mania and the South Sea Bubble are translated into the Great Acorn Mania and the Hollow Log Trading Company. These are not mere flavor text; they act as external shock variables that influence agent decision-making.
This was most evident during a scenario based on the 1929 Bank Run, centered on an agent named Oona the Owl. When a rumor spread that Oona's vault was empty, the other agents reacted with systemic panic. Oona, attempting to secure pebbles quickly, began dumping honey into the market in massive quantities. This sudden supply glut caused the market reference price for honey to plummet from 10 to 3. This price collapse happened without any explicit command from the developer to sell; it was an emergent behavior resulting from the interaction between the injected rumor and the agents' drive for liquidity.
This volatility is powered by a drift-based pricing mechanism. In early iterations, agents simply mirrored the reference price provided by the system, leading to a frozen market. The current system calculates the volume of unfilled buy and sell orders at the end of each round to automatically adjust the reference price. Unmet demand drives prices up, while excess inventory drives them down. This ensures that the market forms genuine trends based on scarcity and reaches equilibrium only when trade is balanced.
For practitioners deploying Small Language Models (SLMs) in agentic workflows, the lesson is clear: do not rely on the model's inherent intelligence to manage complex logic. The most efficient path to a stable autonomous system is to treat the SLM as a high-speed formatting engine and move the reasoning into the system architecture. By combining precision input data, structural constraints, and a robust reward system, it is possible to simulate complex economic interactions and market volatility at a fraction of the cost of frontier models.




