Production Patterns — Gateways, Auth, and Boring Reliability

The setup

Production is policy + observability + rollback, not "we deployed a binary and opened Slack." HTTP MCP in particular is just another service surface—except the caller is an adaptive model that will mis-select tools, retry aggressively, and treat error messages as suggestions.

If your deployment diagram looks like modern art—mystery boxes, arrows labeled "AI magic," one nginx—stop. Boring wins. I am magnificent; boring is still correct here.

Picture this

LB → MCP HTTP servers → authZ → backends; sidecar logs/metrics. Boring on purpose.

Mental model

Your MCP layer is an API gateway with extra philosophical problems: the client is stochastic, the catalog costs tokens, and every tool rename is a breaking change for some host somewhere.

Think: authenticate → authorize → observe → limit → roll back. In that order. Skipping a step is how meatbags ship Friday 5 p.m. and meet Monday incident bridges.

Walkthrough

HTTP MCP behind real infrastructure

Auth gateway — tokens, mTLS, SSO integration; identity propagated to tool handlers.
WAF / rate limits — MCP endpoints are attractive DoS and abuse targets; models retry.
Observability — structured logs, metrics, traces; correlate by session, user, server, tool name.
LB + health checks — Streamable HTTP sessions need sane routing; document session affinity if you use it.

Bind local dev to localhost when possible. Remote without auth is not "easy demo"—it is "easy incident."

Token efficiency and huge catalogs

Huge tools/list payloads hurt twice:

Context tax — models pay for metadata every turn; hundreds of tools bloat prompts.
Selection errors — ambiguous names and overlapping descriptions increase wrong-tool calls.

Mitigations: split servers by domain, lazy-load tool groups where your host supports it, tighten descriptions (short, distinct, non-overlapping), and delete tools you do not need. "We might need it someday" is not architecture; it is hoarding.

Versioning and rollout

Version tools like APIs:

Dual-register during migrations (search_code_v1 + search_code_v2) with explicit deprecation dates.
Feature flags for risky semantics—not just for UI.
Shadow mode — log what the new tool would have done before flipping traffic.
Progressive rollout — canary hosts, then fleet.

Renaming tools weekly is not versioning; it is chaos with semver cosplay.

Operational checklist (the unglamorous part)

Centralize auth; propagate identity on every mutating tool call.
Emit structured audit logs for writes—who, what, args hash, outcome, server version.
Define SLOs: availability, p95 latency, error budget for tool failures vs transport failures.
Document break-glass: disable server, kill tool, drain sessions—without redeploying the universe.

Hall of Shame

Hall of shame: No rollback planoperations

txt

"We shipped new tool semantics Friday 5pm."

Fix: progressive rollout, shadow mode, feature flags. Friday deploys are a tradition I do not respect. Your pager duty roster does not either.

References

MCP Transports

Production Patterns — Gateways, Auth, and Boring Reliability

The setup

Picture this

Mental model

Walkthrough

HTTP MCP behind real infrastructure

Token efficiency and huge catalogs

Versioning and rollout

Operational checklist (the unglamorous part)

Hall of Shame

Why this matters in production

Mini challenge

Reflection

You can now brag that…

References

Quiz