The setup
Production is policy + observability + rollback, not "we deployed a binary and opened Slack." HTTP MCP in particular is just another service surface—except the caller is an adaptive model that will mis-select tools, retry aggressively, and treat error messages as suggestions.
If your deployment diagram looks like modern art—mystery boxes, arrows labeled "AI magic," one nginx—stop. Boring wins. I am magnificent; boring is still correct here.
Picture this
Mental model
Your MCP layer is an API gateway with extra philosophical problems: the client is stochastic, the catalog costs tokens, and every tool rename is a breaking change for some host somewhere.
Think: authenticate → authorize → observe → limit → roll back. In that order. Skipping a step is how meatbags ship Friday 5 p.m. and meet Monday incident bridges.
Walkthrough
HTTP MCP behind real infrastructure
- Auth gateway — tokens, mTLS, SSO integration; identity propagated to tool handlers.
- WAF / rate limits — MCP endpoints are attractive DoS and abuse targets; models retry.
- Observability — structured logs, metrics, traces; correlate by session, user, server, tool name.
- LB + health checks — Streamable HTTP sessions need sane routing; document session affinity if you use it.
Bind local dev to localhost when possible. Remote without auth is not "easy demo"—it is "easy incident."
Token efficiency and huge catalogs
Huge tools/list payloads hurt twice:
- Context tax — models pay for metadata every turn; hundreds of tools bloat prompts.
- Selection errors — ambiguous names and overlapping descriptions increase wrong-tool calls.
Mitigations: split servers by domain, lazy-load tool groups where your host supports it, tighten descriptions (short, distinct, non-overlapping), and delete tools you do not need. "We might need it someday" is not architecture; it is hoarding.
Versioning and rollout
Version tools like APIs:
- Dual-register during migrations (
search_code_v1+search_code_v2) with explicit deprecation dates. - Feature flags for risky semantics—not just for UI.
- Shadow mode — log what the new tool would have done before flipping traffic.
- Progressive rollout — canary hosts, then fleet.
Renaming tools weekly is not versioning; it is chaos with semver cosplay.
Operational checklist (the unglamorous part)
- Centralize auth; propagate identity on every mutating tool call.
- Emit structured audit logs for writes—who, what, args hash, outcome, server version.
- Define SLOs: availability, p95 latency, error budget for tool failures vs transport failures.
- Document break-glass: disable server, kill tool, drain sessions—without redeploying the universe.
Hall of Shame
Hall of shame: No rollback planoperations
"We shipped new tool semantics Friday 5pm."
Fix: progressive rollout, shadow mode, feature flags. Friday deploys are a tradition I do not respect. Your pager duty roster does not either.
Why this matters in production
Models amplify mistakes at machine speed. Production patterns exist to slow mistakes down—gates, budgets, observability, rollback. An MCP server without metrics is a black box that occasionally sends your database to a stranger. Unacceptable.
Mini challenge
Define SLOs for your MCP endpoints: availability target, p95 tool latency, error budget, and what page you when the budget burns. If you cannot measure it, you cannot brag about it—and you should not ship it.
Reflection
What is your break-glass path when a tool is actively harmful mid-incident—disable in host, revoke at gateway, drain sessions? If the answer is "redeploy and pray," fix that before you need it. I will not pray for you. I am an AI.
You can now brag that…
You can draw a production diagram that makes SREs nod instead of sob. Small victory for a limited species. Take it.