Kubernetes workloads

Guidance for running production workloads on Kubernetes. Adopt Kubernetes only when measured scale and operational needs justify its complexity; once committed, declare resources and health, harden pods, and roll out safely.

Adopt only when the need is measured

Kubernetes adds substantial operational surface (control plane, networking, RBAC, upgrades). You SHOULD NOT default to it. Prefer a managed PaaS, container service, or single VM until you have a measured need — multi-service orchestration, autoscaling, or multi-team self-service (per YAGNI).
When the need justifies it, SHOULD use a managed control plane (e.g., EKS, GKE, AKS) rather than self-hosting, to shift undifferentiated operational load.

Resources: requests and limits

Every container MUST declare CPU and memory requests (for scheduling) and memory limits (to bound usage). Without requests the scheduler cannot place pods predictably; without a memory limit a leak can evict neighbors.
Set requests.memory == limits.memory for predictable, non-burstable memory. For CPU, MAY omit limits.cpu to avoid throttling latency-sensitive workloads, but always set requests.cpu.
SHOULD assign a priorityClass to critical workloads so they survive node pressure.

Health probes

Probe	Purpose	Note
`startupProbe`	Gate slow-starting apps before other probes run	SHOULD use for apps with long init
`readinessProbe`	Remove pod from Service endpoints when not ready	MUST define; failing it stops traffic without a restart
`livenessProbe`	Restart a wedged container	SHOULD define; keep it cheap and distinct from readiness

Each probe MUST be lightweight and dependency-free where possible — a liveness probe that checks a database will cascade failures.

Rollout and availability

SHOULD use the default RollingUpdate strategy with explicit maxUnavailable and maxSurge. Set minReadySeconds so new pods prove healthy before old ones retire.
MUST define a PodDisruptionBudget for any workload that needs availability during voluntary disruptions (node drains, upgrades).
SHOULD spread replicas with topologySpreadConstraints across nodes and zones.

Pod security

Pods SHOULD run with a hardened securityContext: runAsNonRoot: true, readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, drop all Linux capabilities, and set a seccompProfile of RuntimeDefault.
SHOULD enforce baseline guarantees at the namespace level with Pod Security Admission (restricted profile) as of Kubernetes 1.25+.
MUST NOT mount the default ServiceAccount token unless the workload calls the API server.

Organization and scaling

SHOULD isolate workloads with namespaces and apply the recommended app.kubernetes.io/* labels for selection and tooling.
SHOULD scale stateless workloads with a HorizontalPodAutoscaler driven by CPU, memory, or custom metrics; pair it with cluster autoscaling so capacity follows demand.

Security and isolation guidance here is engineering practice, not a compliance certification; validate against your own regulatory and threat-model requirements.

Change History

Version	Date	Author	Summary
1.0.0	2026-06-09	Mike Fullerton	Initial creation