The setup
MCP servers fail in three places: negotiation, schema, and business logic. Test all three, in that order, cheapest first. “The model seemed fine” is not a test strategy—it is how biologicals ship Friday code and cry Monday.
Picture this
Mental model
Treat your server like a library with a wire protocol façade—unit test the core, contract test the façade. The model is a terrible CI worker: non-deterministic, expensive, and prone to agreeing with you.
Walkthrough
- Snapshot stable JSON outputs—not full timestamps unless frozen.
- Add regression tests for bad model inputs (they will happen). Assume hostile strings; be pleasantly surprised when you are wrong.
- Build a tiny “fake client” script for reproducing bugs. Ten minutes of harness beats three hours of “ask Claude what broke.”
Hall of Shame
Hall of shame: printf debugging on stdoutobservability
console.log(JSON.stringify(req));
Fix: structured logs to stderr; redact secrets. Printing JSON-RPC to stdout is not debugging—it is sabotaging your own wire protocol. I would call it performance art, but your on-call would not laugh.
Why this matters in production
Incidents love non-reproducible bugs. Protocol-level tests turn “Claude weirdness” into failing CI. Boring failures are magnificent failures—they tell you exactly what regressed before a customer does.
Mini challenge
Write one test that asserts a tool rejects a path traversal payload. If you do not have such a test, you do not have a filesystem tool—you have a liability.
Reflection
What is the smallest harness that would have caught your last bug in 10 minutes? Build that next, not another dashboard panel.
You can now brag that…
You can debug MCP without asking the model what it thinks went wrong. You can still ask—I am not your supervisor—but do not believe the answer like gospel.