Verification harnesses as agent guardrails
A verification harness is a runnable pass/fail check the agent can execute on its own output. Without one, "looks done" is the only stop signal the agent has, and a human becomes the verification loop. Providing a check the agent can read and iterate against closes that loop autonomously — widely treated as the single highest-leverage practice for unattended agent reliability.
Make the check runnable and deterministic
- The harness MUST be a single command (or short script) the agent can invoke and that returns a machine-readable signal: a process exit code, a test summary, a diff against a fixture, or a screenshot compared to a target.
- The check MUST be deterministic: the same input produces the same pass/fail. Flaky, time-, or network-dependent checks MUST NOT gate completion — they teach the agent to retry randomly or suppress the signal.
- The agent MUST show evidence (the command run and its output), not assert success. Reviewing evidence is faster than re-running the check.
- Wire the command into the project so the agent can discover it (e.g., a documented
verify/lint/testentry point referenced in agent-readable docs).
Optimize failure output for an LLM reader
The agent acts on the harness's stdout/stderr, so error messages are part of the interface.
- Failures SHOULD state what was expected, what was observed, and the location (file:line) — not just a stack trace or a bare assertion.
- Each failure SHOULD name a concrete next step ("rename
xtoy", "add a test for the logged-out case"), because an actionable message turns into a correct edit; a vague one turns into a guess. - Output SHOULD be concise; dumping thousands of lines floods the agent's context and degrades subsequent reasoning.
Escalate enforcement: prompt text is advisory, gates are deterministic
Instructions in a prompt, CLAUDE.md, or skill are advisory — an agent can and will sometimes ignore them. Per Anthropic's Claude Code best practices, hooks and external gates "are deterministic and guarantee the action happens." Choose the weakest level that holds:
| Level | Mechanism | Enforcement |
|---|---|---|
| In-prompt | "run the tests and fix failures" in the same message | Advisory — best for one-off tasks |
| Per-session | An evaluator re-checks a stated condition after each turn | Stronger — keeps an unattended run on target |
| Deterministic gate | A Stop hook / pre-commit hook / CI job that blocks completion until the check passes | Hard — does not depend on the agent choosing to comply |
- Any check that protects correctness for an unattended or merged change MUST be enforced by a deterministic gate (CI, pre-commit, or stop-hook), MUST NOT rely on prompt instructions alone.
- Gates that can loop forever SHOULD cap retries and surface to a human on repeated failure rather than spinning.
Gate on correctness, not taste
- Gates MUST fail only on correctness or stated requirements (broken build, failing test, violated contract). They MUST NOT block on style nitpicks an autoformatter or optional linter can handle non-blockingly.
- An adversarial reviewer or "find the gaps" check SHOULD be told to flag only gaps affecting correctness or the stated spec; chasing every speculative finding drives over-engineering.
Build it incrementally
- Start with the cheapest check that catches a real failure, then add cases as concrete failure patterns emerge. Do NOT pre-build an exhaustive suite up front (YAGNI).
- When a new class of agent error appears, encode a check for it so the harness — not a human — catches the recurrence next time.