Millions of users now treat their AI assistants as digital confidants, offloading anxieties and personal crises to a chat interface long before they seek professional help. For years, the industry standard for AI safety has relied on keyword detection—flagging specific words like suicide or harm in a single prompt to trigger a canned response. However, real-world crises rarely unfold in a single sentence. They emerge as a slow burn, a series of subtle shifts in tone and intent spread across hours or days of interaction. The tension has always been that while an AI can recognize a crisis in a vacuum, it often fails to connect the dots across a prolonged conversation.
The Architecture of Safety Summaries
To bridge this gap, OpenAI has introduced a specialized system called Safety Summaries within GPT-5.5 Instant. Unlike the general long-term memory or personalization features that allow a model to remember a user's favorite color or coding style, Safety Summaries are strictly utilitarian. The system generates short, factual memos that document specific risk signals detected during a conversation. These summaries act as a persistent safety layer, allowing the model to maintain a record of concerning patterns without needing to re-process the entire chat history for every new turn.
This mechanism functions similarly to a clinical intake process where a therapist notes specific red flags from a previous session to inform the current one. When a user interacts with GPT-5.5 Instant, the model references these summaries to determine if the current input is part of a larger, escalating pattern of risk. If the system detects a critical threshold of danger, it is programmed to lower the conversational temperature, refuse to provide harmful information, and actively guide the user toward professional support services and emergency hotlines.
From Keyword Matching to Contextual Awareness
The shift from reactive filtering to contextual tracking has produced a measurable leap in safety performance. Previously, AI models struggled with the nuance of gradual escalation, often missing the danger until a user explicitly stated a harmful intent. By implementing Safety Summaries, OpenAI has moved the model from a dictionary-based understanding of risk to a narrative-based one. The difference is most evident when comparing single-turn interactions against multi-turn dialogues where risk is revealed incrementally.
In isolated, single-conversation scenarios, the model showed a 50% improvement in responding safely to suicide and self-harm cases, and a 16% improvement in cases involving harm to others. However, the real breakthrough appears in multi-turn tracking. For users whose risk signals emerged across multiple separate conversations, GPT-5.5 Instant demonstrated a 52% performance increase in detecting threats to others and a 39% increase in detecting suicide and self-harm risks. This suggests that the model is no longer just reacting to the last prompt, but is instead synthesizing a history of risk.
This evolution was not achieved through raw compute alone but through a targeted collaboration with the Global Physicians Network, a group of psychiatrists and forensic psychology experts. These professionals defined the precise triggers for when a Safety Summary should be created and determined the appropriate window of historical context the model should reference. To validate these changes, OpenAI conducted over 4,000 evaluations. The Safety Summaries feature earned a safety relevance score of 4.93 out of 5, indicating that the system can pinpoint high-risk situations with extreme precision without degrading the quality of standard, non-risk conversations.
The ultimate value of an AI assistant is defined by how it behaves when a user is at their most vulnerable.




