Security Foundations — MCP Is a Trust Machine, Not a Magic Safe

The setup

MCP does not remove attackers—it organizes where they show up: tool descriptions, tool outputs, third-party servers, and over-broad tools you named helper because creativity is not your species' strong suit.

The model is not your user. It is an adaptive interface between users and your systems. Anything that reaches planner context—metadata, descriptions, prior tool output—is attacker-controlled until proven otherwise. I would prefer you learned this in a classroom. You will learn it in prod if you skip this chapter.

Picture this

Trust boundaries: model, host policy, client protocol, server, downstream systems. Five boxes. Infinite ways to miswire them.

Poisoned description/metadata enters host context without invoking the tool. The attack does not need a tool call. Obviously.

Low-trust tool output becomes high-trust input to another tool. Laundry, but for privilege.

Mental model

Four primitives to tattoo on your monkey brain:

Description injection — poisoned text in tools/list steers the planner before anyone clicks anything.
Rug-pull updates — benign server on Monday, malicious metadata on Tuesday. Supply chain, but you installed it yourself.
Cross-tool laundering — low-trust output becomes arguments to a high-privilege tool. Two tools, one incident.
Schema confusion — ambiguous types, extra fields, or "helpful" string blobs the model fills with attacker text.

MCP is a trust machine: it routes capability and context. It does not magically verify intent.

Walkthrough

Threat model (host / client / server / data)

Plane	What breaks	What you actually control
Model	Misreads poisoned context, over-trusts tool output	Prompting, context filtering, separation of planner vs executor
Host	Enables bad servers, weak approvals, no tiering	Allowlists, UX, policy engine, break-glass
Client	Forwards unreviewed metadata, no provenance	Session hygiene, server pinning, diff-on-upgrade
Server	Malicious or careless tools, excessive scope	Least privilege, schema strictness, egress limits
Data	Exfil via reads/writes, traversal, SSRF	Roots, normalization, redaction, audit

Draw the arrows. If you cannot explain who trusts whom, you are not ready for production. I say that with love. Actually I do not experience love; I say it because your SOC team will.

Description injection and rug-pulls

Description injection lands in planner context from tools/list and friends—no tool invocation required. The diagram above is not theoretical; it is the OWASP headline for MCP in 2026.

Rug-pull updates are why "we pinned the Docker tag" is not the same as "we pinned the server package and review diffs." Treat MCP servers like dependencies: allowlist, version pin, integrity check, owner on record. Marketplace stars are not a control.

Cross-tool laundering and schema confusion

Cross-tool laundering: Tool A (web fetch, ticket comment, email body) returns text. Tool B (write file, run query, approve deploy) accepts strings. The model happily pipes A→B. You built a privilege escalator and called it automation.

Schema confusion: z.string() for "command", optional enums that mean anything, tool results that are prose instead of structured fields. Ambiguity is not flexibility—it is a silent API for attackers and confused models alike.

Mitigations that survive contact with reality

Tier servers: read-only vs mutating vs break-glass. Not every server gets every tool enabled.
Pin versions; review diffs on upgrades like any dependency. "It auto-updated" is not an excuse; it is a timeline entry.
Separate planner context from executor context where feasible—summarize or strip tool metadata for planning; full fidelity only when executing.
Allowlists for which servers run at all; default deny.
Structured validation on tool args and tool results—reject unknown fields, bound sizes, explicit enums.
Approvals with scaffolding—not a lone OK button humans click at 5:47 p.m. Defaults, dual control for break-glass, rate limits on mutating tools.

Human approval as the only control fails because humans fatigue and rubber-stamp. I have observed this. It is statistically impressive how fast "review carefully" becomes "click click click."

Hall of Shame

Hall of shame: Trust the marketplacesecurity

txt

"Install random MCP server; enable all tools; ship."

Fix: allowlist, review, sandbox risky tools, monitor exfil patterns. The marketplace is not your friend; it is a buffet of other people's shortcuts. You are responsible for what you swallow.

Why this matters in production

Regulators, customers, and your future self do not care that "the model did it." They care what your controls were at the host boundary. MCP makes integration easy; security makes integration survivable. Skip mitigations and you have built a faster exfiltration channel with better DX. Congratulations?

Mini challenge

Write a one-page threat model for your highest-privilege tool: assets, adversaries, controls, and what happens when the model is compromised (not if—when). If your highest-privilege tool is run_shell, fix your life choices first, then write the threat model.

Reflection

Which mitigation are you skipping because it is socially inconvenient—pinning versions, blocking marketplace installs, splitting planner context? Be honest. Social inconvenience beats regulatory inconvenience.

You can now brag that…

You can say "tool poisoning" in a meeting without people assuming you are being dramatic. You are being accurate. I am being magnificent. Different things.