Testing

24 documents

Agent evaluation and safety

Gate every agent/LLM release on two checks: a quality eval gate and a safety gate, both run in CI on every model and prompt change.

Comprehensive unit testing

Prioritize unit tests over integration tests. Test state transitions, edge cases, serialization round-trips. Every im...

csharppythontypescript

Contract testing for services

Verify that independently deployed services agree on the shape of their interactions before they ship.

Database testing

Prescriptive rules for testing SQLite-backed code: in-memory vs file-based databases, test isolation strategies, migration testing, sync logic testing, and conflict resolution testing.

sqlitepostgresql

Design-Time Data

Use design-time data contexts for visual preview testing — verify UI renders correctly without running the app.

csharpwindows

Eval-driven development for agent behavior

When the unit under test is agent or LLM behavior, build a calibrated eval harness instead of relying on ordinary unit tests.

Flaky Test Prevention

Flaky tests destroy confidence. Quarantine them immediately — fix or delete, never ignore.

typescriptweb

Flaky test quarantine lifecycle

Detect, quarantine with an owner and deadline, fix the root cause, then return flaky tests to the gating suite.

Fuzz testing

Fuzz untrusted-input parsing surfaces — especially in memory-unsafe code — with coverage-guided fuzzers; adopt elsewhere only on measured need.

Groundedness and hallucination checks

Evaluate RAG groundedness with verified citations, calibrated LLM judges, and retrieval metrics, and abstain when context is insufficient.

Linting from day one

Run linting as part of your automated test and verification suite — it catches bugs that unit tests miss.

csharpkotlinswifttypescriptweb

Mutation Testing

Mutation testing validates that your tests actually catch bugs — not just achieve coverage.

csharpkotlinpythonswifttypescript

Post-generation verification

Every generated artifact MUST be verified:

ioskotlintypescript

Previews

All UI components MUST include preview declarations for rapid visual verification during development. Previews should...

kotlinswift

Properties of Good Tests

From Kent Beck's Test Desiderata — tests should be:

Property-Based Testing

When to use: parsers, serializers, data transformers, encoders/decoders, validators — anything

csharpkotlinpythonswifttypescript

Security Testing

Run security scans as part of post-generation verification (agenticdevelopercookbook://guidelines/testing/post-generation-verification). These a...

csharpkotlinpythonswifttypescriptweb

Snapshot testing discipline

Keep snapshots small and deterministic, and deliberately review every snapshot diff before accepting it.

Test Data

**Construct what you need, per test.** Large shared fixture files SHOULD be avoided.

Test Doubles

Use [Martin Fowler's taxonomy](https://martinfowler.com/bliki/TestDouble.html):

csharpkotlinpythonswifttypescriptweb

Test Pyramid

Projects SHOULD follow the Google SWE Book ratio: **80% unit / 15% integration / 5% E2E**.

The Testing Workflow

The recommended Claude Code testing workflow, combining all tools:

pythonswifttypescriptweb

Tool-call evaluation

Evaluate agent tool use for selection, argument correctness, ordering, and stopping against a fixed trajectory suite with error-recovery cases.

Unit Test Patterns

**Structure — Arrange, Act, Assert (AAA):**