Reporting And Decision Records

The setup

Eval reports are not trophies. They should say what was tested, what passed, what failed, what is unknown, and what decision follows. A useful report can be read three months later during an incident without requiring tribal memory.

Picture this

Good, bad, and ugly paths for eval decision records.

Mental model

Use a decision record: context, options, eval evidence, risks accepted, decision, owner, expiry or review date. Scores belong inside this story, not floating alone in a dashboard.

Good

The good version includes slice results, representative failures, threshold comparison, caveats, and action. It names who accepted residual risk and when the decision should be revisited.

Bad

The bad version says "evals look good" in a launch doc. It has no dataset version, no threshold, and no examples. Truly a monument to confidence over content.

Ugly

The ugly reality is political pressure. Decision records help teams separate "we did not know" from "we knew and accepted it." Those are very different incident conversations.

Artifact to produce

Write an eval decision record with: summary, versions, thresholds, pass/fail by slice, top failures, decision, owner, and follow-up date.

Decision-record review

Question	Why it matters
What decision did the report recommend?	Reports should produce action.
What risk was accepted explicitly?	Accepted risk should not become surprise risk.
When must the decision be revisited?	Eval evidence expires.

Chapter takeaway

A good eval report is boring in the best way: versions, thresholds, failures, decision, owner. Almost like future readers matter.

References

Architecture decision records

Reporting And Decision Records

References

Quiz