Choosing a primary datastore

This guideline governs the primary datastore for a server backend — the system of record. It is distinct from the cookbook's embedded/local-storage guidance (e.g., on-device SQLite). Pick the store from data shape, query patterns, and measured scale.

Strong default

  • For a new server backend, the agent SHOULD default to a relational database (PostgreSQL is the recommended default).
  • A non-relational primary store MUST be justified by a specific, stated requirement (a query pattern, scale ceiling, or access shape the relational option cannot meet), not by a default preference, hype, or "it scales better."
  • Caveat: "just use Postgres" essays are common, but they are advocacy, not consensus. Treat the relational default as a strong starting point that a real requirement can override — not as a universal law.

Why relational is the default

  • One store covers more than it appears. A modern relational engine absorbs many "specialist" needs in-place: PostgreSQL handles semi-structured data with jsonb (and SQL/JSON JSON_TABLE as of v17), full-text search, geospatial via PostGIS, and vector similarity via the pgvector extension. Reaching for a second engine before exhausting these is premature.
  • Transactions and integrity. ACID transactions, foreign keys, and constraints are hard to bolt on later and easy to lose with an eventually-consistent store.
  • Optionality (optimize-for-change). A relational schema with normalized data is the easiest base to migrate out of later; a denormalized document model is far harder to reshape.

Decision criteria

Choose a specialist primary store only when a requirement below is concrete and, where possible, measured:

Need Signal that justifies it Candidate store
Relational, transactional, mixed queries Default — entities with relationships, joins, integrity Relational (Postgres)
Schema-fluid documents, per-record variance Shapes genuinely differ per record AND joins are rare Document store
Simple key lookups, cache, session, ultra-low latency Access is get/set by key at high throughput Key-value store
Full-text / faceted search at scale Relevance ranking, fuzzy match beyond DB full-text Search engine
High-frequency time-stamped writes + time-range queries Metrics/telemetry/IoT with retention and downsampling Time-series store
Deep multi-hop relationship traversal Queries are mostly graph walks (friend-of-friend, paths) Graph store
Semantic / embedding similarity Vector search is core; pgvector insufficient at scale Vector store

Rules

  • The agent MUST record the chosen store and the one-line requirement that justifies it (in an ADR or the plan).
  • The agent MUST NOT introduce a second datastore "for scale" without a measured limit on the first; YAGNI applies.
  • The agent SHOULD prefer a single relational store with extensions over polyglot persistence until a specific store earns its operational cost.
  • For semi-structured data, the agent SHOULD evaluate jsonb in the relational store before adopting a separate document database.
  • The agent SHOULD keep derived stores (search index, cache, vector index) as secondary projections of the relational system of record, not as the source of truth.
  • When scale is asserted as the reason for a non-relational store, the agent MUST cite an expected workload (writes/sec, dataset size, read pattern), not a generic claim.
version
1.0.0
tags
database, architecture, backend
author
Mike Fullerton
modified
2026-06-09

Change History

Version Date Author Summary
1.0.0 2026-06-09 Mike Fullerton Initial creation