The setup
Online signals are tempting because they are real. They are also noisy, biased, delayed, and easy to misread. A thumbs-up can mean "correct," "fast," "funny," or "I stopped caring."
Picture this
Mental model
Classify signals by meaning: explicit rating, user correction, escalation, re-open, dwell time, abandonment, complaint, and downstream business event. Then decide which signals are health indicators and which are investigation triggers.
Good
The good version samples production traces, links feedback to task type and model version, and reviews failures by slice. It treats online signals as complements to offline suites, not replacements.
Bad
The bad version optimizes for engagement because the graph updates quickly. The model becomes chatty, users click more, and task completion quietly worsens. Congratulations, you built social media in miniature.
Ugly
The ugly reality is attribution. A bad outcome might come from retrieval, UI wording, user intent mismatch, or model behavior. Store enough trace context to investigate instead of arguing from one metric.
Artifact to produce
Build a feedback schema: task, model version, trace id, explicit rating, implicit event, user segment, and review status.
Online signal review
| Question | Why it matters |
|---|---|
| What does each signal actually mean? | Clicks and thumbs are ambiguous. |
| Which trace fields explain the signal? | Attribution needs context. |
| How are sampled failures reviewed? | Raw telemetry needs interpretation. |
Chapter takeaway
Production feedback is real, but not automatically wise. Users generate evidence, not neatly labeled training scripture.