The setup
A rollout is an eval with users attached. Offline scores reduce uncertainty before exposure; online experiments reveal what offline suites missed. The bridge between them is a pre-registered rollout rule.
Picture this
Mental model
Define gates: offline pass, shadow traffic check, internal dogfood, small canary, larger canary, full launch. Each stage needs metrics, duration, stop conditions, and an owner who can actually stop it.
Good
The good version sets thresholds before the run, separates primary and guardrail metrics, and documents what happens on fail, hold, or expand. It respects sample size and does not declare victory after the first friendly hour.
Bad
The bad version watches dashboards live and launches when the line looks nice. If it dips, someone changes the window. This is not experimentation; it is astrology with SQL.
Ugly
The ugly reality is pressure. Sales wants the feature, support fears tickets, and leadership wants certainty. A written rollout rule protects the team from improvising governance during panic.
Artifact to produce
Write a rollout plan: stage, exposure, primary metric, guardrails, minimum sample, pass/hold/rollback rule, and communication channel.
Rollout review
| Question | Why it matters |
|---|---|
| What threshold was set before the rollout? | Pre-registration reduces metric shopping. |
| What sample or duration is required? | Early noise is not a launch oracle. |
| Who can pause or roll back? | A stop rule without authority is decorative. |
Chapter takeaway
A canary is not a vibe check. It is a controlled exposure with a boringly explicit escape hatch.