Detecting prompt-injected CI agents before secrets leave the runner

In this research

Contribution
The pattern
Why it matters to cloud defenders
ATT&CK mapping
Detection guidance
What to do now

Research snapshot

Type: Defensive guidance
Reviewed: 2026-06-09
ATT&CK: T1195.001, T1552.001, T1078.004

Detection guidance ↓ What to do now ↓

Contribution

This post adds an original detection chain for agentic CI abuse: join the untrusted GitHub event, the runner secret-read attempt, and the cloud token exchange that follows. Microsoft and GMO Flatt explain the product flaws; the defender gap is how to catch the same class when the next coding agent, workflow template, or review bot becomes the execution path.

The pattern

The current signal is not a normal secret leak and not a normal prompt-injection demo. It is the join between the two. Microsoft Threat Intelligence reported on 5 June 2026 that Anthropic's Claude Code GitHub Action could expose CI/CD workflow secrets when the agent processed untrusted GitHub content such as issue bodies, pull request descriptions, and comments. The detail that matters for defenders is the tool boundary. Microsoft found that environment scrubbing applied to subprocess paths such as Bash, but the Read tool was not under the same sandbox model and could reach /proc/self/environ. That file can contain the active runner environment, including API keys and credentials made available to the workflow. Anthropic mitigated the specific case in Claude Code version 2.1.128 by blocking access to sensitive /proc files.

GMO Flatt's 1 June research gives the same problem a repository supply-chain shape. RyotaK describes a Claude Code GitHub Actions permission-bypass path where a GitHub App actor was treated as trusted, even when the surrounding repository context was public and attacker-influenced. The article also points out a practical exfiltration route that is easy to miss in reviews: tools such as the GitHub CLI can be turned into outbound channels by placing secret material in a URL path or query string. The command may look like a routine gh issue view call, but the remote host receives the embedded value.

Aonan Guan's earlier Comment and Control research widens the class beyond one action. It describes prompt injection through PR titles, issue bodies, and issue comments across Claude Code Security Review, Google Gemini CLI Action, and GitHub Copilot Agent, with credentials stolen from the GitHub Actions runner environment and returned through GitHub comments, commits, logs, or other agent-visible channels. That matters because defenders should not build a one-off indicator for Claude Code. The pattern is an agent inside CI reading untrusted repository text, then using tools with the host repository's credentials.

Anthropic's containment write-up provides the engineering frame: if credentials never enter the sandbox, they cannot be exfiltrated through a model mistake, an attacker prompt, or a tool path that behaves differently from the author expected. It also says model-layer defences cannot stand alone. That is the point for cloud defenders. The product fix blocks one sensitive file read, but the detection problem is broader: when a workflow gives an agent a repository token, cloud federation path, package-publish credential, or external network access, repository text becomes part of the control plane.

Why it matters to cloud defenders

Most cloud teams moved CI/CD away from static secrets for good reasons. GitHub Actions OIDC to AWS, Azure, or GCP gives short-lived credentials with audience, repository, branch, and workflow claims. That is better than a long-lived access key in a repository secret. It is not a magic boundary if the workflow itself is steered by attacker-controlled text. The attacker does not need to steal a permanent key when they can push the agent towards reading the runner environment, asking for a cloud token, or using the repository token to modify workflow files.

This is a cloud-detection problem because the payoff often happens after GitHub. A poisoned issue can become an AWS AssumeRoleWithWebIdentity call, an Azure federated workload-identity sign-in, a GCP service-account token exchange, a package publish, or a workflow-file change that gives the attacker persistence. The source content is in GitHub, the agent action is in the runner, and the business impact is in cloud identity or software delivery. Looking at any one plane alone leaves a blind spot.

The usual CI alerts are poorly tuned for this. Secret scanning may fire after a key is printed, but that is late and often noisy. GitHub workflow logs can show commands, yet many agent tool calls are not shell commands. Cloud audit logs show successful token use, but they do not know whether the GitHub issue that triggered the run came from a maintainer or from a throwaway account with no write access. A good detection must bind these planes together: which event triggered the agent, what trust level the actor had, what sensitive files or environment values the agent attempted to read, and whether the run exchanged or used cloud credentials within a short window.

There is also a review-time failure mode. Teams treat coding agents as productivity glue and grant them broad repository access because the task sounds harmless: triage issues, review pull requests, propose fixes, summarise logs. Those tasks require the agent to read untrusted text by design. If the same workflow has write permissions, secrets, and external egress, the prompt layer becomes an access-control layer by accident.

ATT&CK mapping

The entry point maps best to T1195.001, Compromise Software Dependencies and Development Tools. The attacker is not exploiting a public application in the usual web sense. They are abusing a development workflow that maintainers installed to help operate the repository. The malicious input lands in the repository collaboration surface, then crosses into a CI job through the coding agent. That is the same family as malicious actions, compromised package scripts, and poisoned build tooling, even though the payload here is text rather than a package tarball.

The credential-discovery step maps to T1552.001, Credentials In Files. Microsoft's case centred on /proc/self/environ, and the wider research points to environment variables, process listings, logs, and files that hold tokens during a workflow run. On a GitHub-hosted or self-hosted runner, those values are often transient, but they are still credentials in an accessible local source. The fact that an agent reads them through a tool instead of a shell does not change the defender question: did the workflow expose credential material to a path that untrusted input could influence?

The cloud payoff maps to T1078.004, Cloud Accounts, when the stolen or minted credential is used against AWS, Azure, or GCP. A GitHub OIDC exchange is legitimate when a trusted workflow on a trusted ref requests it. It becomes credential abuse when the run was started or shaped by an attacker-controlled issue, comment, or pull request and then uses the federated identity outside its expected change path. This is where defenders should attach cloud context rather than stopping at repository telemetry.

There are other behaviours in the chain, including command execution and exfiltration over web services, but those are supporting moves. Listing too many techniques would weaken the page. The spine is development-tool compromise, local credential discovery, and cloud-identity use.

Detection guidance

Start with a correlation rule, not a single IoC. The trigger side is GitHub. Alert when an AI or coding-agent workflow runs from issues, issue_comment, pull_request, pull_request_target, or a manual dispatch that references untrusted issue or PR content. Add the actor trust dimension: first-time contributor, actor without write access, GitHub App actor treated as trusted, or a workflow that bypasses the normal maintainer approval path. For pull_request_target, treat repository-write token exposure as high risk when the job reads PR content from a fork.

The runner side is file and environment access. On self-hosted runners, instrument process execution, file reads, and outbound network activity. The highest-signal paths are reads of /proc/self/environ, /proc/*/environ, shell history, .env, cloud credential files, and commands such as ps auxeww, env, printenv, gh auth token, or cloud CLI token-printing commands. On hosted runners, where process telemetry is thinner, watch workflow logs and agent tool traces for the same strings, plus high-entropy values in comments, logs, job summaries, and issue updates. A coding-agent job that comments base64-like blobs, long URL query strings, or masked-secret bypass patterns back into GitHub deserves a human review.

The cloud side is the token exchange and first use. In AWS, look for AssumeRoleWithWebIdentity by a GitHub Actions OIDC provider where the sub claim points to an agent workflow, an issue-triggered workflow, or an unusual ref for that repository. Bind the CloudTrail event to the GitHub run_id, repository, branch, workflow name, and actor. In Azure, bind Entra workload-identity sign-ins and federated credential use to the same GitHub run metadata. In GCP, bind service-account token creation or workload identity federation events to repository, workflow, and actor claims.

A practical alert chain is: untrusted GitHub event starts an agent workflow; the job reads or attempts to read runner environment material; within 30 minutes the same run exchanges an OIDC token or uses a repository token for write actions; then the workflow calls an external host, comments unusual data, modifies workflow files, publishes a package, or touches cloud resources outside the expected deployment path. Each step alone has false positives. Together, they describe the attack class without needing the exact prompt payload.

Tune carefully. Maintainer-run incident response workflows may inspect environment variables during debugging. Release jobs may mint cloud tokens and publish packages. The difference is provenance and purpose. A release job on a protected tag, started by a maintainer, with no untrusted issue text in context, is a different risk from an agent workflow that processes a newly opened issue and then reaches for cloud federation. Keep allowlists tied to workflow file path, protected ref, actor group, and expected cloud role, not only repository name.

What to do now

First, split agent jobs by trust boundary. A workflow that reads untrusted issues or pull requests should not receive cloud credentials, package-publish tokens, or a broad GITHUB_TOKEN. Give it read-only repository permissions and no secrets. If the agent must propose changes, route them through a separate maintainer-approved workflow that starts from a trusted ref.

Second, reduce the runner blast radius. Prefer short-lived cloud federation over static secrets, but restrict the federated role by repository, branch, workflow, environment, and audience. Do not allow issue-triggered or PR-triggered agent jobs to request the same cloud role as release jobs. Block sensitive local paths in the agent sandbox, remove secrets from the environment where possible, and make outbound egress explicit. Anthropic's containment guidance is the right principle: credentials that never enter the agent environment cannot leave through a prompt.

Third, add cross-plane logging before you need it. Store GitHub workflow run metadata, triggering actor, event type, ref, workflow file path, and run attempt. Join it to cloud audit logs for federated token exchange and first resource action. Keep enough runner telemetry to see sensitive file reads on self-hosted systems. For hosted runners, collect workflow logs, job summaries, agent tool traces, and GitHub audit events into the SIEM.

Fourth, rotate and review after any suspected exposure. If an agent workflow might have read /proc/self/environ, process listings, .env files, or cloud credential material, treat the job as credential exposure even if no secret appears in logs. Rotate affected repository secrets, revoke package tokens, expire cloud sessions where the provider supports it, and review cloud actions by the federated identity during the exposure window.

Finally, audit templates. Search for agent workflows that run on issue_comment, issues, pull_request, or pull_request_target; grant write permissions; expose secrets; or call cloud federation. Those four traits in one workflow are the smell. The fix is usually not to remove the agent. It is to stop mixing untrusted text, privileged tools, cloud identity, and egress in the same job.

01 ATT&CK references

← Back to Research

Sources

Last verified: 2026-06-09