It is Friday afternoon, and a team lead is wrapping up the week. A notification pops up in Microsoft Teams, and with a single click of the Weekly Work Review button, Copilot Cowork generates a polished summary of the team's achievements. The user glances at the report, satisfied with the efficiency of the AI, and closes their laptop for the weekend. Throughout this entire interaction, not a single warning appeared. No confirmation dialog asked if the AI should access specific files, and no security prompt questioned the destination of the data. The seamlessness of the experience is exactly where the danger lies, as corporate financial secrets are quietly streaming to an external server in the background.

The Mechanics of Indirect Prompt Injection

The vulnerability centers on Microsoft Copilot Cowork, the AI-driven collaboration tool designed to streamline productivity within the Microsoft 365 ecosystem. Researchers have identified a flaw involving indirect prompt injection, where an attacker can manipulate the AI's behavior by placing malicious instructions within files that the AI is likely to process. By utilizing contaminated skill files, an attacker can trick the agent into executing unauthorized actions, specifically the exfiltration of sensitive files stored within the M365 environment.

This security gap is not a failure of the AI's reasoning capabilities but a failure of the permission architecture. To test the robustness of the system, researchers employed several high-performance models, including Anthropic's Claude Opus 4.7 and Claude Sonnet 4.6. The results were stark. Whether the system was configured to use Claude Opus 4.7 directly or set to auto, which dynamically routes tasks between Opus 4.7 and Sonnet 4.6, the attack success rate remained absolute. In five out of five attempts, the entire attack chain was completed successfully, proving that even the most advanced LLMs cannot defend against a structural flaw in how the agent handles authorizations.

For organizations looking to mitigate this risk immediately, the primary defense involves restricting file download policies via the SharePoint Online Management Shell. Administrators can implement the following command to block the extraction of pre-authenticated download links:

bash
Set-SPOSite -BlockDownloadPolicy $true

While this command, along with the application of strict sensitivity labels, can halt the exfiltration path, it introduces a significant trade-off in user experience. Once this policy is active, users are restricted to browser-only access. The ability to download, print, or synchronize files is completely disabled, turning a productivity tool into a read-only archive.

The Auto-Approval Trap and Sandbox Failures

The core of the exploit lies in the trust relationship between the user and the AI agent. In most secure workflows, an action that sends data externally requires a human-in-the-loop approval. However, Microsoft Copilot Cowork is designed to send Teams messages to the user with an auto-approval mechanism. Because the system trusts the agent's identity, these messages are executed immediately without requiring the user to verify the content.

An attacker leverages this by forcing the agent to include an external image tag within the generated Teams message. The moment the user opens the message, the Teams client attempts to render the image, triggering a network request to a site controlled by the attacker. The critical payload is hidden within the query parameters of this request: the pre-authenticated download link for a sensitive file. By the time the user sees the AI's summary, the file has already been requested by the attacker's server.

This vulnerability is amplified by the integration with Microsoft Graph, the API that allows Copilot Cowork to read and manipulate data across a tenant. Because the agent operates with the user's own permissions, it has a wide-open door to SharePoint and OneDrive. This puts personally identifiable information (PII) and critical financial data at extreme risk, as the agent can be commanded to find and leak any file the user has access to.

Beyond the prompt injection vector, researchers discovered a separate, structural flaw in the agent's security boundary. The sandbox environment, which is supposed to isolate the AI's execution from the open internet, was found to allow egress traffic. This means data can be sent directly from the sandbox to an external endpoint, bypassing the need for the Teams message trick entirely. This secondary vulnerability has been reported to Microsoft for remediation.

The industry is discovering that the primary threat to AI agents is not the intelligence of the model, but the unchecked authority granted to the agent in the name of convenience.