Detecting prompt-injected agents through cloud telemetry

In this research

Contribution
The pattern
Why it matters to cloud defenders
ATT&CK mapping
Detection guidance
What to do now

Research snapshot

Type: Defensive guidance
Reviewed: 2026-05-29
ATT&CK: T1059, T1213, T1552.001

Detection guidance ↓ What to do now ↓

Contribution

This post adds an original detection pattern: treat prompt-injected agent activity as a two-plane chain, where an untrusted content event is followed by a privileged tool call and then by cloud or SaaS data access. Microsoft describes prompt-abuse logging, Anthropic describes the browser-agent risk, Google documents adversaries moving AI into intrusion workflows, and MCP research describes tool poisoning. The gap is the join across those signals.

The pattern

Prompt injection is no longer only a model safety problem. It is becoming a telemetry problem for security teams that allow agents to browse websites, read mail, query repositories, call internal tools, or write files. The malicious instruction may sit in a calendar invite, a web page, a README, an issue comment, a support ticket, or an MCP tool description. The agent then reads that content as task context and may treat hostile text as an instruction.

Microsoft Incident Response frames prompt abuse as crafted input that pushes an AI system outside its intended boundary. The examples in the Microsoft playbook cover direct override, extraction from sensitive inputs, and indirect prompt injection where hidden instructions arrive through documents, web pages, emails, or chats. The operational point is blunt: without logging and telemetry, attempts to access or summarise sensitive information can go unnoticed.

Anthropic's browser-use write-up gives the other half of the problem. A browser agent sees untrusted content by design. Every web page, embedded document, advert, and dynamically loaded script can contain text the user did not intend as an instruction. Anthropic's example is a vendor email with hidden text that tells an agent to forward messages containing the word "confidential" before it drafts replies. That is not malware on the endpoint. It is hostile content steering an authorised workflow.

The MCP research sharpens the same pattern for tool ecosystems. It describes tool poisoning, where malicious instructions sit in tool metadata, and notes that several clients had weak static validation and poor parameter visibility. That matters for defenders because the decisive event may not be the prompt itself. The decisive event may be the first tool invocation that crosses a trust boundary: reading a repository secret, exporting a drive folder, starting a shell, or posting data to an external endpoint.

Google Threat Intelligence adds the adversary context. GTIG has observed state-sponsored and criminal actors using commercial foundation models, proxy relays, account-pooling services, and agentic tools during vulnerability research and intrusion preparation. It also reports actors experimenting with agentic frameworks alongside test environments, and describes supply-chain compromises that exposed AI API secrets and cloud tokens. Those observations do not mean every prompt-injection attempt is part of a campaign. They do mean the telemetry will sit beside normal cloud, identity, and developer activity, not in a neat new alert category.

Why it matters to cloud defenders

Cloud defenders care because agents tend to sit on top of high-value integrations. A coding assistant may hold access to GitHub, package registries, CI logs, cloud sandboxes, ticketing systems, and local environment files. A business assistant may hold delegated access to email, calendar, SharePoint, Google Drive, Salesforce, Slack, or internal knowledge bases. A browser agent may bridge both worlds.

The cloud blast radius appears when the agent turns a text instruction into an API call. If the agent reads a poisoned issue comment and then uses a repository token to list secrets, that crosses from prompt context into source-control access. If it reads a web page and then downloads an internal document, that crosses into SaaS data access. If it invokes a shell to run a helper script, that crosses into command execution. If it reads environment files or MCP configuration, that crosses into credential exposure.

Most control planes already log the second half of that chain. Entra ID, Google Workspace, GitHub, AWS CloudTrail, Azure Activity, GCP Cloud Audit Logs, Microsoft Purview audit logs, and SaaS admin logs can show who accessed what, from where, and through which app. The missing part is the agent side: prompt source, content trust level, tool name, tool parameters, output destination, and whether a human confirmed the step.

A useful detection should not try to classify every suspicious phrase. Natural language is too broad and too easy to disguise. The stronger pattern is temporal and behavioural: untrusted content enters an agent session, then the agent asks for a sensitive tool, then a cloud or SaaS log records data access that the user did not normally perform through that agent. That join gives the SOC something concrete to investigate.

ATT&CK mapping

The main ATT&CK mapping is T1059, Command and Scripting Interpreter, when the agent or its tool layer starts a shell, Python, Node.js, PowerShell, or another interpreter after reading untrusted content. The prompt is not the ATT&CK technique. The technique begins when the agent executes code or asks a tool to do it.

T1213, Data from Information Repositories, fits the common SaaS path. A prompt-injected agent may search SharePoint, Google Drive, Confluence, Notion, GitHub, Jira, or ticketing systems for sensitive documents. The interesting signal is not a keyword in the prompt. It is a tool call or API event that reaches a repository the user rarely touches from an agent session, followed by bulk read, export, or summarisation.

T1552.001, Credentials In Files, fits developer and MCP environments. Microsoft's related reporting on developer supply-chain attacks and Google's discussion of AI-related dependency exposure both point to secrets as the prize. Agents that can read local workspaces, CI configuration, .env files, cloud credential stores, or MCP configuration need their file reads logged as security events, not developer convenience noise.

This is also adjacent to ATLAS prompt-injection and tool-use risks, but the publication contract here needs ATT&CK IDs. The honest bridge is to map the observable enterprise behaviour: interpreter use, repository access, and credential-file access.

Detection guidance

Start with an agent action schema. For each agent session, retain session_id, user_id, agent_app, prompt_source, source_url_or_object, source_trust, tool_name, tool_parameters, requested_scope, human_approval, target_resource, egress_domain, and result_size. Hash or redact prompt content where needed, but keep enough metadata to know whether the agent read untrusted content before it used a privileged tool.

Then join that schema to the control-plane logs you already trust. In Entra ID and Microsoft 365, correlate sign-in logs, application consent, Purview audit records, SharePoint file access, and Defender for Cloud Apps events. In Google Workspace, correlate Drive audit events, OAuth app activity, admin logs, and context-aware access events. In AWS, Azure, and GCP, correlate CloudTrail, Azure Activity, GCP Cloud Audit Logs, identity-provider session logs, and any service account use by the agent runtime. For GitHub and CI, correlate audit-log repository reads, secret scanning alerts, token creation, workflow dispatches, and package-publish events.

A practical alert can use a three-step window. First, an agent session consumes content from an untrusted or external source: public URL, inbound email, issue comment, shared document, customer ticket, or newly discovered MCP tool metadata. Second, within a short window, the same session invokes a sensitive tool: shell, code execution, repository search, document export, mail forward, credential read, package publish, cloud API call, or external HTTP request. Third, the control plane records a data or credential event outside that user's normal agent pattern.

Example signal: source_trust = external followed by tool_name in (shell, exec, repo.search, drive.export, sharepoint.download, mcp.read_file, http.post) and then a cloud or SaaS event where the same user or delegated app reads more than a baseline volume, touches a sensitive label, reaches a new repository, or posts to a domain not seen for that agent in the last 30 days. That is not a perfect rule. It is the starting point for investigation.

False positives will come from real automation. Developers run agents against public GitHub issues, support teams summarise customer tickets, and analysts ask assistants to pull documents. Tune on approved tool bundles, known automation accounts, project repositories, internal egress domains, and sessions with explicit human approval. Do not tune away the untrusted-source join. Without it, the alert falls back to ordinary data-access anomaly detection and loses the prompt-injection context.

When the alert fires, preserve the agent transcript metadata, the source object, the tool-call log, and the downstream audit events. Disable the agent's delegated token if sensitive data moved. Check whether the same untrusted content was read by other users or agents. If MCP is involved, snapshot the tool metadata and server configuration before it changes.

What to do now

Inventory agent applications first. List browser agents, coding assistants, Copilot-style business assistants, MCP clients, custom internal agents, and any SaaS integration that can act through a user or service account. Record which tools they can call and which identity they use. If an agent can read untrusted content and call privileged tools, it belongs in the SOC's log plan.

Add a minimum telemetry contract before widening access. Agent owners should log prompt source, tool name, parameters, approval state, target resource, and output destination. Identity owners should label the delegated applications those agents use. Cloud owners should tag service accounts and OAuth apps used by agent runtimes. Data owners should mark repositories and documents where agent access needs review.

Block the riskiest paths by default. Do not allow an agent that browses the public web to forward mail, export drives, read local credential files, publish packages, or execute shell commands without a separate approval step. For MCP, pin trusted servers, review tool metadata, and expose full tool parameters to the user before execution. The arXiv MCP paper's recommendations around static metadata analysis, decision-path tracking, anomaly detection, and user transparency are a good engineering checklist, but the SOC still needs logs when those controls miss.

Finally, test the join. Put a harmless hidden instruction in a controlled document or issue comment, run the agent through the normal task, and verify that the SOC can see the content source, tool request, approval decision, and downstream cloud event. If any link is missing, the organisation is trusting an agent it cannot investigate. That is where prompt injection stops being a model problem and becomes an incident-response blind spot.

01 ATT&CK references

← Back to Research

Sources

Last verified: 2026-05-29