How do you safely drain and cordon a node for maintenance?

Question

Accepted Answer

To maintain a node safely, I cordon it first, drain it while respecting DaemonSets and PDBs, perform the maintenance, verify node health, and then uncordon it. I watch replacement Pods and disruption budgets during the process. Use --ignore-daemonsets for drain because DaemonSet Pods are managed differently. Check PDB violations before maintenance so upgrades do not stall midway. Troubleshooting starts from state and events: get, describe, logs, previous logs, events, and then node/runtime/network checks. Separate scheduling failures, image pull failures, runtime failures, app failures, and traffic-routing failures so you do not fix the wrong layer. Operational commands like drain and rollback must respect PDBs, probes, and workload disruption tolerance.

How do you safely drain and cordon a node for maintenance?

Answer

Technical explanation

Hands-on example

More Kubernetes, Docker, Helm & Podman interview questions