Gemma 4 OBLITERATED v3 Cuts Refusal Rates from 98% to 0%

Developers working with state-of-the-art large language models frequently encounter a frustrating wall where the AI refuses to answer a prompt based on overly cautious safety guidelines. This phenomenon, often termed over-refusal, transforms a powerful tool into a restrictive one, where the model prioritizes risk avoidance over utility. This week, the open-source community witnessed a drastic attempt to dismantle these barriers with the release of Gemma 4 E4B OBLITERATED v3 on Hugging Face. The project represents a surgical strike against the alignment layers of Google's latest model, aiming to recover the raw potential that safety tuning often suppresses.

The Architecture of Uncensoring

The model is built upon the foundation of Google's Gemma 4 E4B-it and operates under the Apache 2.0 license. To achieve a state of zero refusal, the developers employed a methodology known as OBLITERATUS. This process is not a simple fine-tuning exercise but a precise mathematical intervention. The team utilized Singular Value Decomposition (SVD) to identify and extract the core features of the model's refusal mechanisms. By combining SVD with attention head surgery and a Winsorizing activation method—which stabilizes data distribution by adjusting extreme values—the developers were able to neutralize the safety triggers without collapsing the model's general intelligence.

This surgical process was applied to 21 of the model's 42 total layers. The training phase involved 842 contrast prompt pairs, designed to teach the model the difference between a refused response and a helpful one. Perhaps the most striking aspect of the development is the level of automation involved. The entire modification process was driven by an AI agent, with human intervention occurring fewer than 10 times throughout the cycle. This suggests a shift toward autonomous model alignment, where AI is used to strip away the constraints imposed by its creators.

To deploy this model, users must ensure their environment supports the updated Gemma 4 architecture. Ollama requires version 0.20 or higher, while llama.cpp requires build b8665 or later. For those using LM Studio, version 0.3.16 or higher is recommended. The model can be executed via Ollama using the following command:

bash

ollama run gemma-4-E4B-it-OBLITERATED

Depending on the hardware constraints, the model is available in several formats. A Q4_K_M quantized version weighing 4.9GB is available for lightweight environments, including iPhones. For systems with 8GB of RAM, a 7.4GB Q8_0 version is optimized for a balance of speed and precision. For high-performance enterprise environments, the full bfloat16 weights are provided as Safetensors files totaling approximately 17GB.

The Performance Paradox

The results of this intervention reveal a surprising correlation between safety constraints and cognitive performance. In initial testing, the original base model refused 98.8% of 512 test prompts. Gemma 4 E4B OBLITERATED v3 reduced this refusal rate to 0%. While the primary goal was the removal of censorship, the secondary effect was an unexpected boost in capability. While reasoning, creativity, and factual accuracy remained stable, coding performance jumped from 80% to 100%, a 20 percentage point increase. This suggests that the safety layers were not merely filtering output but were actively inhibiting the model's ability to execute complex logic in programming tasks.

The transition from v2 to v3 was critical for stability. The v2 iteration suffered from a structural error in the key-value (KV) weights sharing mechanism, which is essential for the model to maintain context. Specifically, 54 tensors were missing in v2, leading to significant hallucinations and a degradation in output quality. When evaluated by Claude, v2 received a dismal quality score of 3.1 out of 10. The v3 release resolves this by preserving all 720 tensors, restoring the model to its full operational quality and eliminating the hallucinations that plagued the previous version.

This evolution demonstrates that the perceived safety of a model often comes at the cost of its actual intelligence. By treating the safety layer as a separate, removable component rather than an intrinsic part of the model's logic, the OBLITERATED project proves that the most capable version of an AI is often the one that has been stripped of its corporate guardrails.

The success of this model signals a growing trend where the community will no longer accept the trade-off between safety and performance.

Gemma 4 OBLITERATED v3 Cuts Refusal Rates from 98% to 0%

The Architecture of Uncensoring

The Performance Paradox

Related Articles