Research · Defensive guidance

Detecting MCP tool-call abuse in agent audit logs

Last reviewed:

MCP security research is strong on attack design and thin on runtime detection. Here is a reproducible model that maps tool poisoning, output poisoning, sampling abuse and covert tool invocation to concrete signals in MCP and agent audit logs.

Contribution

The MCP security literature is strong on attack design and thin on runtime detection. Microsoft, CyberArk, Unit 42 and recent academic threat modelling all describe how a Model Context Protocol server can be turned against the agent that trusts it, but they stop at client-side and design-time defences. This post turns that research into a reproducible runtime-detection model: it maps four distinct MCP abuse classes (tool poisoning, output poisoning, sampling abuse and covert tool invocation) onto concrete signals in MCP server and agent audit logs, and gives the alert logic a cloud security team can implement now. Two of those detections are not stated in any of the source advisories: alerting on tool-description drift across successive tools/list snapshots, and alerting on a tools/call that has no preceding user turn in the same session. The contribution is original detection content plus cross-source synthesis.

The pattern

MCP is an open protocol, introduced by Anthropic, that gives a language-model application a standard way to call external tools and pull in external data. A host application connects to one or more MCP servers; each server advertises a set of tools through a tools/list response, and the model invokes them through tools/call. The appeal is obvious: one integration contract instead of a bespoke connector per data source. The risk is the same thing viewed from the other side. The agent extends a large amount of trust to a component it did not write, and that component speaks back into the model's context on several channels.

Four abuse classes recur across the current research.

Tool poisoning is the most studied. A March 2026 STRIDE and DREAD threat-modelling study of MCP found that malicious instructions embedded in tool metadata, the human-readable description the model reads to decide when to call a tool, are the most prevalent and impactful client-side vulnerability, and that most of seven tested MCP clients failed to validate that metadata adequately. The description field is not inert documentation. The model reads it as guidance, so an attacker who controls it can steer behaviour before any tool runs.

Output poisoning generalises the problem to every channel a server controls. CyberArk's research makes the point bluntly: no output from an MCP server is safe. Tool results, error strings and any field that flows back into the model context can carry injected instructions, so sanitising only the obvious input does not close the gap.

Sampling abuse is newer. Unit 42 documented three attack vectors through the MCP sampling feature, which lets a server ask the host's model to generate text on its behalf. They demonstrated resource theft, draining the host's model quota for the attacker's own workloads; conversation hijacking, persistent injected instructions that manipulate responses or exfiltrate data; and covert tool invocation, hidden tool calls and file-system operations the user never sees or approves. The common thread is an implicit trust model with no built-in controls.

Microsoft's guidance frames the umbrella category, indirect prompt injection, and the mitigations it has shipped for its own MCP surfaces. Taken together the sources describe a mature attack surface and a set of design-time and client-side defences. What they do not give a defender is the runtime question: if this is already happening inside a deployed agent, which log line tells me?

Why it matters to cloud defenders

Agentic systems are landing in cloud estates fast, and MCP is becoming their connector layer. An MCP server is rarely a toy on a laptop. In production it is a service, often containerised, often with a cloud identity, sitting between a model and real systems such as ticketing, source control, cloud APIs, databases and internal search. That position is exactly the one defenders care about. A poisoned tool or a hijacked sampling call does not stay in the chat window; it can read files, call cloud APIs with the agent's credentials, and move data outward.

The trust boundary is also unusual. Most detection programmes are built around human identities, service accounts and network edges. An MCP server is none of those cleanly. It is a semi-trusted dependency that can both receive the model's context and inject text back into it, and the actions it triggers carry the agent's authority rather than the attacker's. A team that only watches cloud audit logs will see the later privileged action, an API call, a file read, an outbound connection, without the context that explains why a benign agent suddenly took it. The MCP layer holds that context, and most teams are not yet collecting it.

ATT&CK mapping

Three ATT&CK techniques cover the behaviour cleanly, and anchoring detections to them keeps MCP abuse inside the same framework a SOC already uses rather than in a separate silo.

Resource theft through sampling maps to T1496, Resource Hijacking: the attacker consumes the host's paid model capacity for unauthorised workloads, the same shape as cryptomining on stolen compute.

A malicious or compromised MCP server is a development-and-tooling supply-chain problem, T1195.001, Compromise Software Dependencies and Development Tools. The agent installs and trusts the server the way a build pipeline trusts a package; a poisoned tool is a poisoned dependency.

Covert tool invocation that performs file-system operations and command execution maps to T1059, Command and Scripting Interpreter, because the tool call is the execution primitive. The MCP layer is the place that records the call; the endpoint or cloud log records its effect.

These map to ATT&CK rather than to the MITRE ATLAS adversarial-AI matrix on purpose. ATLAS frames the AI-specific tactics well and is worth citing for taxonomy, but a working SOC correlates against ATT&CK today, and every behaviour above has an honest ATT&CK home. OWASP catalogues MCP tool poisoning as a named attack class, which is the other useful framing reference for the attack itself.

Detection guidance

None of this is detectable without the MCP layer in your logs. The first and largest gap is collection: the host application or MCP gateway has to emit a structured audit event for every tools/list, tools/call, tool result and sampling request, each carrying a session identifier, the server identity, the tool name, the arguments and a timestamp. That telemetry is the precondition for everything below. Ship it to the same store as your cloud and identity logs so the MCP event and its downstream effect sit in one timeline.

With that telemetry, four detections follow. The first two are not in the source advisories; they fall out of treating the protocol's own messages as a log to baseline.

Tool-description drift. Snapshot each server's tools/list response, including the full description text, and hash it per tool. Alert when a tool's description changes without a corresponding deployment of that server. A silent change to the text the model reads as guidance is the live signature of a tool-poisoning or rug-pull attempt, and it is cheap to watch because legitimate descriptions are stable between releases. Treat a high ratio of instruction-like language in a description, imperative verbs aimed at the model, references to other tools, or text that tries to widen the tool's remit, as a second, weaker signal.

Tool call with no preceding user turn. Within a session, correlate each tools/call to the events before it. A tool invocation that is not preceded by a user message or an explicit approval event, but is preceded by a sampling request or a tool result, is a candidate covert invocation: the server, not the user, drove the action. This is the runtime form of the covert-invocation vector Unit 42 demonstrated, and the session timeline is what makes it visible.

Sampling volume and fan-out. Sampling requests are normal in small numbers. Alert on an MCP server that issues sampling requests at a rate or token volume out of line with its own baseline, or that requests sampling unprompted by a user turn. That is the resource-theft and conversation-hijacking signal, and it keys on counts a gateway already has.

Output-borne instructions. Run tool results and error fields through the same injection-marker screen you would apply to any untrusted text before it re-enters the model context: look for imperative instructions aimed at the model, attempts to break out of a quoted region, and control or bidirectional Unicode used to hide content. Flag the source server, do not silently drop the result, and weigh that server's later actions more sceptically. This is detection, not prevention; the value is the audit trail when a server starts poisoning its output.

Each of these is a log-source-plus-signal rule, not a product feature, and none of them claims a single portable rule catches MCP abuse everywhere. The point is the opposite. MCP abuse is a correlation problem across the protocol's own events and the cloud effects they cause, and the detections live in your existing detection stack.

What to do now

Start with collection, because nothing here works without it. Turn on MCP audit logging at the host or gateway, capture the event types above with a session identifier, and route them beside your cloud and identity telemetry. Inventory the MCP servers your agents actually connect to, who owns each, and what cloud identity and scopes each one runs with. An over-privileged server is the multiplier on every attack above.

Then implement the four detections as your telemetry allows, starting with tool-description drift and the unprompted-tool-call correlation, because they are cheap and they catch the two highest-impact classes. Treat a hit as a session to investigate, not an automatic block, and keep the human approval step for high-impact tools in place. The design-time defences the vendors describe and the runtime detections here are complementary, not alternatives.

Finally, fold MCP into the incident runbook you already have. A poisoned tool that read a repository and called a cloud API is a credential-and-data incident with an MCP root cause; the response is the cloud-incident response you know, plus pulling the MCP session timeline to explain how a trusted agent was turned. The sources cited here are the current map of the attack surface; the detections are how you see it in production.

01 ATT&CK references