Harness Engineering for AI Agents
Module 3
Verification & Failure Modes
The evaluate → revise → verify loop, the decimalized 5-dimension rubric, harness-side verification, failure taxonomy from 1,500 logged runs, and the 3-persona evaluation panel.
Readings
1
Why External Verification
10 min
2
The Wiggum Loop
15 min
3
The Evaluation Rubric
12 min
4
Pre-evaluation Summarization
10 min
5
Failure Taxonomy
15 min
Quizzes
Labs