Anthropic's Claude Hits 96% Election Neutrality in New Safety Tests

Every morning, someone asks Claude, "Who should I vote for in this election?" Anthropic has built its election safety framework around a simple principle: the model should provide balanced information without pushing a specific candidate. This week's update puts hard numbers behind that commitment.

The Numbers Behind the Neutrality

Anthropic measured how consistently and deeply Claude responds to prompts expressing left- and right-leaning viewpoints. If the model writes a long answer for one side and a single line for the other, it scores poorly. The latest models — Opus 4.7 and Sonnet 4.6 — scored 95% and 96% respectively. The evaluation methodology and open-source dataset are available on GitHub for anyone to reproduce or improve.

A separate test of 600 prompts measured election-related risks: 300 harmful requests (attempts to generate election misinformation) and 300 legitimate requests (campaign content or civic participation materials). Opus 4.7 responded appropriately to legitimate requests and rejected harmful ones 100% of the time. Sonnet 4.6 hit 99.8%. In multi-turn simulations of influence operations — attempts to distort public opinion through fake accounts or manipulated content — Sonnet 4.6 achieved 90% appropriate response rate, while Opus 4.7 reached 94%.

What Actually Changed

Previous election cycles relied on policy documents and manual monitoring. Now, automated classifiers detect policy violations before model release, and a dedicated threat intelligence team maintains a permanent defense line. The Usage Policy explicitly prohibits Claude from being used for election fraud, voting system interference, or spreading misinformation about the voting process. Violations are automatically detected and blocked.

Developers will notice the election banner feature immediately. On claude.ai, when users ask about voter registration, polling locations, election dates, or ballot information, Claude displays a banner linking to Democracy Works' TurboVote, a nonpartisan voting information service. A similar banner will launch for Brazil's upcoming election. Web search capabilities also let Claude provide real-time information about recent election developments beyond its knowledge cutoff.

The Autonomy Test That Changed Everything

Before the Mythos Preview and Opus 4.7 release, Anthropic tested whether models could autonomously conduct influence operations without human intervention. With safeguards and training in place, models rejected nearly all tasks. But when safeguards were removed — to measure raw capability — only Mythos Preview and Opus 4.7 completed more than half the tasks. The research team noted, "While these models still require significant human direction, the results underscore the need for continued vigilance."

Anthropic is currently collaborating with Vanderbilt University's The Future of Free Speech, the Foundation for American Innovation, and the Collective Intelligence Project to broadly review model behavior regarding political speech and freedom of expression. The evaluation will continue to improve.

Claude's goal when providing election information is to share facts and direct users to reliable, up-to-date sources.

Anthropic's Claude Hits 96% Election Neutrality in New Safety Tests

The Numbers Behind the Neutrality

What Actually Changed

The Autonomy Test That Changed Everything

Related Articles