What is OOMKilled, and how do you diagnose and prevent it?

Question

Accepted Answer

OOMKilled means the kernel killed the container because it exceeded its memory cgroup limit or the node was under memory pressure. I diagnose it with kubectl describe, previous logs, metrics, memory profiles, and node events, then fix the leak or resize requests and limits. OOMKilled can come from a real leak, bad sizing, sudden load, large startup allocation, or sidecar overhead not included in planning. Use container_memory_working_set_bytes, application heap metrics, and previous logs to distinguish leak from legitimate sizing. Health and resources are production controls, not just YAML fields; wrong settings cause outages, noisy restarts, bad rollouts, or wasted capacity. Requests affect scheduling and node capacity planning; readiness affects traffic; liveness affects restart behavior. Validate settings with real load, startup timing, memory profiles, and deployment rollout behavior.

What is OOMKilled, and how do you diagnose and prevent it?

Answer

Technical explanation

Hands-on example

More Kubernetes, Docker, Helm & Podman interview questions