After work, in a Gangnam coworking space, a developer opens multiple terminal windows on a laptop and tells a local AI agent: "Summarize this week's meeting notes and organize my schedule." Within seconds, the task is done, and the agent begins planning the next step on its own. This scene is about to become routine for many more developers.

Hermes Agent Crosses 140K Stars, Tops OpenRouter

Nous Research's Hermes agent has crossed 140,000 stars on GitHub just three months after launch. Last week, it became the most-used agent on OpenRouter, the platform that connects multiple AI models through a single API. Hermes is provider- and model-agnostic, optimized for always-on local environments. Recommended hardware includes NVIDIA RTX PCs, RTX PRO workstations, and the DGX Spark, a compact AI desktop.

Qwen 3.6: 400B-Class Performance at 1/16 the Size

Alibaba's new open-weight LLM series, Qwen 3.6 (27B and 35B parameters), outperforms the previous generation's 120B and 400B models. The Qwen 3.6 35B runs in roughly 20GB of memory while surpassing the 120B model, which requires over 70GB. The Qwen 3.6 27B, a dense model, matches the accuracy of the Qwen 3.5 397B while being 1/16 the size. Both models support accelerated AI inference on NVIDIA RTX GPUs and DGX Spark.

What Used to Require the Cloud Now Runs Locally

Traditional AI agents relied on cloud APIs, introducing latency, cost, and data privacy concerns. Hermes is designed exclusively for local execution, sidestepping these issues. NVIDIA Tensor Core hardware accelerates AI inference, cutting the time Hermes takes to perform multi-step tasks or self-improve from minutes to seconds. The DGX Spark, with 128GB of unified memory and 1 petaflop of AI performance, can run a 120B mixture-of-experts (MoE) model all day.

The Change Developers Feel Immediately: Simple Setup and Execution

Running Hermes locally is straightforward. Start from the Hermes GitHub repository, then connect your preferred local model and runtime. It works with Qwen 3.6 via llama.cpp, LM Studio, or Ollama, with native support for LM Studio and Ollama. NVIDIA RTX PRO GPUs boost token generation speed for Qwen 3.6 models in llama.cpp by up to 3x. Google's Gemma 4 26B and 31B models are now available as NVFP4 checkpoints, delivering 3x faster inference on Blackwell GPUs at the same output quality. Mistral Medium 3.5 also received llama.cpp and Ollama compatibility updates, enabling it on RTX PRO and DGX Spark.

NVIDIA also announced NemoClaw, an open-source stack optimizing OpenClaw for NVIDIA devices, now supporting Windows Subsystem for Linux (WSL2). Step-by-step playbooks for DGX Spark are available.

The combination of Hermes and NVIDIA hardware marks a turning point where local AI agents transition from toys to practical tools.