MCP Mastery
About
c-02-pairwise-harness

Pairwise Harness

Compare outputs pairwise without letting order bias quietly win.

mid
python
~35 min

README

# Pairwise Harness

Implement the tiny eval helper in `src/pairwise.py` until `tests/validate.py` passes.

## Good / Bad / Ugly

- **Good**: passes shuffled and edge-case examples without network calls.
- **Bad**: hard-codes the sample fixture and calls it done. Cute. No.
- **Ugly**: production data is partial, so validation must reject ambiguity before scoring.

Run with `npm run challenge -- pairwise-harness --track eval-writing`.

Hints

  • Good passes under shuffled examples and explicit edge cases.
  • Bad hard-codes the sample rows. Very brave, very detectable.
  • Ugly production data is partial; validate before scoring.

Acceptance

  • `npm run challenge -- pairwise-harness --track eval-writing` exits 0
  • Validator exercises good, bad, and ugly cases
  • Implementation avoids network calls and nondeterminism