Kubernetes workloads
Guidance for running production workloads on Kubernetes. Adopt Kubernetes only when measured scale and operational needs justify its complexity; once committed, declare resources and health, harden pods, and roll out safely.
Adopt only when the need is measured
- Kubernetes adds substantial operational surface (control plane, networking, RBAC, upgrades). You SHOULD NOT default to it. Prefer a managed PaaS, container service, or single VM until you have a measured need — multi-service orchestration, autoscaling, or multi-team self-service (per YAGNI).
- When the need justifies it, SHOULD use a managed control plane (e.g., EKS, GKE, AKS) rather than self-hosting, to shift undifferentiated operational load.
Resources: requests and limits
- Every container MUST declare CPU and memory
requests(for scheduling) and memorylimits(to bound usage). Without requests the scheduler cannot place pods predictably; without a memory limit a leak can evict neighbors. - Set
requests.memory == limits.memoryfor predictable, non-burstable memory. For CPU, MAY omitlimits.cputo avoid throttling latency-sensitive workloads, but always setrequests.cpu. - SHOULD assign a
priorityClassto critical workloads so they survive node pressure.
Health probes
| Probe | Purpose | Note |
|---|---|---|
startupProbe |
Gate slow-starting apps before other probes run | SHOULD use for apps with long init |
readinessProbe |
Remove pod from Service endpoints when not ready | MUST define; failing it stops traffic without a restart |
livenessProbe |
Restart a wedged container | SHOULD define; keep it cheap and distinct from readiness |
- Each probe MUST be lightweight and dependency-free where possible — a liveness probe that checks a database will cascade failures.
Rollout and availability
- SHOULD use the default
RollingUpdatestrategy with explicitmaxUnavailableandmaxSurge. SetminReadySecondsso new pods prove healthy before old ones retire. - MUST define a
PodDisruptionBudgetfor any workload that needs availability during voluntary disruptions (node drains, upgrades). - SHOULD spread replicas with
topologySpreadConstraintsacross nodes and zones.
Pod security
- Pods SHOULD run with a hardened
securityContext:runAsNonRoot: true,readOnlyRootFilesystem: true,allowPrivilegeEscalation: false, drop all Linux capabilities, and set aseccompProfileofRuntimeDefault. - SHOULD enforce baseline guarantees at the namespace level with Pod Security Admission (
restrictedprofile) as of Kubernetes 1.25+. - MUST NOT mount the default ServiceAccount token unless the workload calls the API server.
Organization and scaling
- SHOULD isolate workloads with namespaces and apply the recommended
app.kubernetes.io/*labels for selection and tooling. - SHOULD scale stateless workloads with a
HorizontalPodAutoscalerdriven by CPU, memory, or custom metrics; pair it with cluster autoscaling so capacity follows demand.
Security and isolation guidance here is engineering practice, not a compliance certification; validate against your own regulatory and threat-model requirements.