MCP Mastery
About
Chapter 17
boss
~35 min

Anti-Patterns And Failure Modes

Name the traps before they show up wearing a KPI badge.

Eval Toolkit 2026.05
Observability trace-first
Python 3.11
Reviewed 2026-05-17

Reading this chapter helps prevent 8 common Eval Writing mistakes.

The setup

Most eval failures are not exotic. Teams reuse leaked examples, optimize one aggregate metric, trust model judges without calibration, block CI with flaky tests, or move thresholds after seeing results. The classics endure because they are convenient.

Picture this

Good, bad, and ugly paths for eval failure-mode recognition.

Mental model

For every eval artifact, ask: what decision does this inform, what failure can it miss, how can it be gamed, and what would make the score non-comparable next month?

Good

The good version has an anti-pattern review checklist. Before launch, reviewers look for leakage, threshold drift, missing slices, judge bias, unowned failures, and stale datasets.

Bad

The bad version says "we have evals" as if the noun itself protects users. A broken eval is not a guardrail; it is a decorative traffic cone.

Ugly

The ugly reality is incentives. Bad evals often survive because they make launches easier. Fixing them may lower scores, slow releases, and create uncomfortable conversations. That is the work.

Artifact to produce

Maintain an eval anti-pattern register: smell, likely hidden failure, detection method, and replacement pattern.

Anti-pattern review

QuestionWhy it matters
Which incentive does this eval create?Bad incentives make bad systems look good.
How could this score be gamed?Gaming reveals weak design.
What hidden failure would still pass?Anti-patterns survive in blind spots.

Chapter takeaway

Every eval has a smell test. If the smell is "launch justification," open a window and review the design.

References

Quiz

  1. What is the danger of saying "we have evals" without reviewing their design?

  2. Which is the bad version of eval failure-mode recognition?

  3. What should the ugly reality change about your process?