Research · Defensive guidance

Detecting autonomous AI cloud attack chains in audit logs

Last reviewed:

Unit 42's autonomous cloud-agent test shows the useful defender signal: AI-driven attacks still leave cloud audit traces when they enumerate, impersonate and exfiltrate.

Contribution

This post adds original detection content: a cloud-audit hunting model for autonomous AI attack chains that does not depend on seeing the agent prompt, transcript or local tool logs. Unit 42 shows a multi-agent system chaining SSRF, metadata credential theft, service-account impersonation and BigQuery data theft in a GCP sandbox. Google and Mandiant show why defenders should treat that as part of the current threat model, while GreyNoise shows attackers already probing exposed LLM surfaces at scale. The synthesis here is the control-plane trace: discovery breadth, fast hypothesis testing, identity pivots and data access from the same short-lived chain.

The pattern

Unit 42 built a proof-of-concept offensive multi-agent system to test whether agentic models can operate inside a cloud environment without a human steering every step. Their sandbox chain is familiar to anyone who has investigated cloud compromise: server-side request forgery, metadata service credential theft, service account impersonation and BigQuery data exfiltration. The new part is not the cloud bug class. The new part is the operator shape. A supervisor agent coordinates specialist agents, shares state and moves through the chain at machine speed.

That matters because defenders have spent years tuning for human cloud intrusions. A human attacker pauses, reads documentation, copies command output, retries with different flags and makes choices shaped by experience. An AI-directed workflow can compress those loops. It can enumerate broadly, test many low-cost hypotheses, hand context between specialist tools and keep going until the objective is met or the tool boundary stops it. The underlying telemetry is still ordinary cloud telemetry, but the rhythm changes.

Google Threat Intelligence's May 2026 AI threat tracker puts that research in context. GTIG describes adversaries using AI for vulnerability discovery, exploit generation, infrastructure building, obfuscation and agentic workflows. It also reports AI-enabled malware that interprets system state to generate commands and manipulate victim environments. That does not mean every cloud compromise now has an agent behind it. It means the behaviour Unit 42 tested is not an isolated lab curiosity. It sits beside real adversary experimentation.

Mandiant's M-Trends 2026 adds the operational pressure. The report describes incident response data from 2025 where attackers moved faster between initial access and high-impact activity, with the hand-off window in some criminal operations collapsing to seconds. AI is not the only cause of that acceleration. Pre-staged access, partner ecosystems and automation all matter. But for detection, the lesson is the same: waiting for a slow human dwell pattern is a losing bet. A cloud hunt has to catch the first identity pivot and the first data plane touch, not the tidy incident narrative after the fact.

GreyNoise gives the internet-edge version of the same story. Its Ollama honeypot data captured tens of thousands of attack sessions, including systematic probes against LLM model endpoints and misconfigured proxy surfaces. Those campaigns are not the same as Unit 42's cloud-agent chain, but they show the population of exposed AI infrastructure that attackers are already mapping. Exposed model APIs, gateway proxies and agent services are becoming part of the same reconnaissance graph as storage buckets, metadata services and developer tokens.

Why it matters to cloud defenders

Cloud defenders should care because the useful signal is not in the AI layer alone. In a real incident, the defender may never see the prompt, the agent scratchpad or the internal plan. A hostile operator may run the agent from outside the victim estate. A compromised developer workstation may hide the agent transcript. A rented toolchain may expose only its effects. The cloud control plane is the shared witness that remains.

The Unit 42 chain crosses four control points defenders already collect in mature environments. SSRF reaches a metadata service or an internal endpoint. Credential theft produces token use that does not match the expected workload. Service account impersonation leaves IAM and audit events. BigQuery access leaves dataset, table and job metadata. None of those events says "AI". They say a workload or identity moved through discovery, privilege use and data access faster or more broadly than normal.

That is the right framing. Do not try to detect whether a command was written by a model. That road leads to brittle style heuristics and false confidence. Detect the attack chain the model helped accelerate. AI changes speed, sequencing and breadth before it changes the logs themselves. The defender's job is to join those logs quickly enough that the chain is still live.

This also keeps the topic inside a13e's cloud-detection lane. The relevant systems are AWS CloudTrail, Azure Activity, Microsoft Entra ID, GCP Cloud Audit Logs, Google Workspace and SaaS audit trails, plus logs from agent gateways where teams run them. A post about prompt wording would be too close to model safety. A post about the control-plane trace is a cloud security problem.

ATT&CK mapping

Walk the chain in order and each step has an honest ATT&CK home. Mapping the whole chain, not just the discovery band in the middle, is what lets a detection key on the entry and the payoff rather than only the noisy reconnaissance.

T1552.005, Unsecured Credentials: Cloud Instance Metadata API, is the entry. The Unit 42 chain begins when SSRF reaches the instance metadata service and lifts the attached service-account token. The theft itself is quiet in audit logs, but the stolen token's first use is not. A credential that suddenly calls the control plane from a new source, or uses the instance's own identity in a way that workload never does, is the first honest signal.

T1580, Cloud Infrastructure Discovery, is the broad read phase. An autonomous system attacking a cloud account needs to list projects, subscriptions, regions, instances, databases, service accounts, buckets, datasets and network edges. In GCP that shows as compute.instances.list, storage.buckets.list, bigquery.datasets.list, resourcemanager.projects.getIamPolicy and similar read-heavy calls. In AWS the same phase appears in CloudTrail as DescribeInstances, ListBuckets, GetCallerIdentity and a spray of other List*, Describe* and Get* events. In Azure it appears as Microsoft.Resources/subscriptions/resources/read and Azure Resource Graph queries that sweep many providers at once.

T1526, Cloud Service Discovery, covers the service-map step. A multi-agent system can rapidly identify which managed services are present and route work to a specialist. That makes the pattern broader than one API name. A discovery spike that touches compute, storage, IAM, serverless and analytics services from the same identity or source in a short window is more interesting than any one call.

T1087.004, Account Discovery: Cloud Account, is the identity-enumeration leg, and only the enumeration. The attacker needs to learn which users, roles, groups and service accounts can reach the target, so the signal is a read of IAM principals: getIamPolicy in GCP, ListUsers, ListRoles and GetAccountAuthorizationDetails in AWS, directory and role reads through Microsoft Graph in Azure.

T1078.004, Valid Accounts: Cloud Accounts, is the pivot that the enumeration sets up, and it is a separate technique on purpose. Enumerating principals is discovery; impersonating one is use of a valid account, and conflating the two maps evidence to ATT&CK too strongly. Service-account impersonation, role assumption, OAuth token exchange, workload identity federation and app-only token use all live here, and they matter most when they follow the discovery read inside the same window. In GCP look for GenerateAccessToken and SignJwt; in AWS for AssumeRole and GetSessionToken; in Azure for a fresh sign-in by an application or managed identity.

T1530, Data from Cloud Storage, is the payoff. The Unit 42 agent grants itself storage rights, exports a BigQuery table to a bucket and pulls it out, so the closing technique is data access, not more discovery. In GCP that is BigQuery extract jobs and Cloud Storage object reads; in AWS, GetObject at volume plus snapshot and export paths; in Azure, Storage and Key Vault reads joined to the earlier sign-in. When the export then leaves for attacker-controlled infrastructure, T1537, Transfer Data to Cloud Account, is the matching exfiltration technique.

These map to ATT&CK rather than to the MITRE ATLAS adversarial-AI matrix on purpose. ATLAS frames AI-specific tactics well and is worth citing for taxonomy, but a working SOC correlates against ATT&CK today, and every step above has an honest ATT&CK home. The prompt is not the technique. The observed techniques are credential theft, discovery, valid-account use and data access, in the order the chain runs them.

Detection guidance

Build the detection around sequence, not around a single magic event. Start with a 10 to 30 minute window per principal, workload identity or source IP. Inside that window, score four behaviours: breadth of discovery, failed hypothesis testing, identity pivoting and sensitive data access. Alert when at least three appear together, or when identity pivoting is followed by sensitive data access with no matching deployment or support ticket.

Breadth of discovery is the first feature. Count distinct cloud services, projects, subscriptions, regions and resource types touched by read-only API calls. A human administrator often works in one service family during a task. An automated agent finding its way around a new estate may touch many families quickly because it is building a map for later steps. Weight first-seen service families higher than routine ones for that identity.

Failed hypothesis testing is the second feature. Agentic systems tend to try, observe and revise. In logs, that can look like a burst of denied calls, invalid resource names, missing-permission errors, non-existent dataset references or repeated API calls with small parameter changes. Some legitimate automation does this too, especially deployment tooling. The separator is context: deployment automation usually runs from known pipelines, known service accounts and known time windows. A user token, developer workstation or new external source producing the same pattern deserves a look.

Identity pivoting is the third feature. Watch for service account impersonation, role assumption, OAuth token exchange, workload identity federation use, app-only token use or sudden calls through a delegated application. The pivot matters more when it follows discovery. An identity that lists IAM policy and then impersonates a service account inside the same window is behaving like an intrusion chain, not like a normal dashboard click.

Sensitive data access is the fourth feature. In GCP, BigQuery jobs, Cloud Storage object reads, Secret Manager access and export jobs are obvious candidates. In AWS, look at S3 object reads, Secrets Manager, STS role assumption, snapshot sharing and data export paths. In Azure, join Activity logs with Entra sign-in, Key Vault, Storage, SQL, Sentinel and Microsoft Graph events. In SaaS, include Google Drive export, SharePoint download, GitHub repository clone, package token read and ticketing-system export.

A practical alert might read: same principal or source, within 20 minutes, touches more than five cloud service families, produces more than ten denied or not-found responses, enumerates IAM or service accounts, then runs a data query or object export against a resource not seen for that principal in 30 days. The alert should carry the sequence, not only the final data event. The sequence is what tells an analyst this may be an autonomous chain rather than a noisy user.

Tune hard. Cloud inventory jobs, CSPM scanners, asset catalogues and legitimate red-team exercises can look similar. Suppress known scanners by principal, source network, user agent and scheduled window. Keep the suppression narrow: scanners should not usually impersonate new service accounts and then read sensitive datasets. If they do, that is worth fixing even when benign.

Where agent gateways exist inside the estate, add their logs as a bonus signal rather than a dependency. Capture session ID, tool name, target cloud account, requested scope, approval state and output destination. Then join that to the same control-plane chain. If the gateway logs are missing, the cloud hunt should still work. That is the point of this model.

What to do now

First, pick one cloud and implement the sequence hunt in read-only mode. GCP is a good starting point if BigQuery or service-account impersonation is common; AWS is a good starting point if STS and S3 drive most incidents; Azure is a good starting point where Entra applications and delegated Graph access dominate. Do not try to cover every service on day one. Cover discovery, identity pivot and one sensitive data path.

Second, baseline your legitimate cloud scanners. List the principals used by CSPM, CNAPP, asset inventory, backup, deployment and support tooling. Record the source networks and the service families they normally touch. That list is the difference between a useful sequence alert and a pager storm.

Third, add agent infrastructure to the asset inventory. Exposed Ollama servers, model gateways, MCP servers, coding agents and browser agents should have owners, identities, scopes and network placement. GreyNoise's LLM targeting data makes a simple point: if an AI-facing service is exposed, someone will find it. Treat it like any other internet-facing control-plane-adjacent system.

Finally, rehearse the response. When the sequence fires, preserve the audit events, revoke the pivoted credential, freeze relevant service-account keys and check whether the same source touched other projects or tenants. If an internal agent gateway is involved, snapshot its tool logs before retention trims them. Autonomous does not mean invisible. It means the window is shorter, and the trace is still in the control plane if you collect it in time.

01 ATT&CK references