Experiments Canaries And Rollouts

The setup

A rollout is an eval with users attached. Offline scores reduce uncertainty before exposure; online experiments reveal what offline suites missed. The bridge between them is a pre-registered rollout rule.

Picture this

Good, bad, and ugly paths for staged rollout evaluation.

Mental model

Define gates: offline pass, shadow traffic check, internal dogfood, small canary, larger canary, full launch. Each stage needs metrics, duration, stop conditions, and an owner who can actually stop it.

Good

The good version sets thresholds before the run, separates primary and guardrail metrics, and documents what happens on fail, hold, or expand. It respects sample size and does not declare victory after the first friendly hour.

Bad

The bad version watches dashboards live and launches when the line looks nice. If it dips, someone changes the window. This is not experimentation; it is astrology with SQL.

Ugly

The ugly reality is pressure. Sales wants the feature, support fears tickets, and leadership wants certainty. A written rollout rule protects the team from improvising governance during panic.

Artifact to produce

Write a rollout plan: stage, exposure, primary metric, guardrails, minimum sample, pass/hold/rollback rule, and communication channel.

Rollout review

Question	Why it matters
What threshold was set before the rollout?	Pre-registration reduces metric shopping.
What sample or duration is required?	Early noise is not a launch oracle.
Who can pause or roll back?	A stop rule without authority is decorative.

Chapter takeaway

A canary is not a vibe check. It is a controlled exposure with a boringly explicit escape hatch.

References

Google controlled experiments overview

Experiments Canaries And Rollouts

References

Quiz