Testing
24 documents
Agent evaluation and safety
Gate every agent/LLM release on two checks: a quality eval gate and a safety gate, both run in CI on every model and prompt change.
Comprehensive unit testing
Prioritize unit tests over integration tests. Test state transitions, edge cases, serialization round-trips. Every im...
Contract testing for services
Verify that independently deployed services agree on the shape of their interactions before they ship.
Database testing
Prescriptive rules for testing SQLite-backed code: in-memory vs file-based databases, test isolation strategies, migration testing, sync logic testing, and conflict resolution testing.
Design-Time Data
Use design-time data contexts for visual preview testing — verify UI renders correctly without running the app.
Eval-driven development for agent behavior
When the unit under test is agent or LLM behavior, build a calibrated eval harness instead of relying on ordinary unit tests.
Flaky Test Prevention
Flaky tests destroy confidence. Quarantine them immediately — fix or delete, never ignore.
Flaky test quarantine lifecycle
Detect, quarantine with an owner and deadline, fix the root cause, then return flaky tests to the gating suite.
Fuzz testing
Fuzz untrusted-input parsing surfaces — especially in memory-unsafe code — with coverage-guided fuzzers; adopt elsewhere only on measured need.
Groundedness and hallucination checks
Evaluate RAG groundedness with verified citations, calibrated LLM judges, and retrieval metrics, and abstain when context is insufficient.
Linting from day one
Run linting as part of your automated test and verification suite — it catches bugs that unit tests miss.
Mutation Testing
Mutation testing validates that your tests actually catch bugs — not just achieve coverage.
Post-generation verification
Every generated artifact MUST be verified:
Previews
All UI components MUST include preview declarations for rapid visual verification during development. Previews should...
Properties of Good Tests
From Kent Beck's Test Desiderata — tests should be:
Property-Based Testing
When to use: parsers, serializers, data transformers, encoders/decoders, validators — anything
Security Testing
Run security scans as part of post-generation verification (agenticdevelopercookbook://guidelines/testing/post-generation-verification). These a...
Snapshot testing discipline
Keep snapshots small and deterministic, and deliberately review every snapshot diff before accepting it.
Test Data
**Construct what you need, per test.** Large shared fixture files SHOULD be avoided.
Test Doubles
Use [Martin Fowler's taxonomy](https://martinfowler.com/bliki/TestDouble.html):
Test Pyramid
Projects SHOULD follow the Google SWE Book ratio: **80% unit / 15% integration / 5% E2E**.
The Testing Workflow
The recommended Claude Code testing workflow, combining all tools:
Tool-call evaluation
Evaluate agent tool use for selection, argument correctness, ordering, and stopping against a fixed trajectory suite with error-recovery cases.
Unit Test Patterns
**Structure — Arrange, Act, Assert (AAA):**