Interview › Kubernetes, Docker, Helm & Podman
What is etcd, and why is it critical to back it up?
Kubernetes, Docker, Helm & Podman · Intermediate level
Answer
etcd is the strongly consistent key-value store backing Kubernetes cluster state. Losing etcd or restoring the wrong snapshot can mean losing cluster objects, so reliable backups and tested restore procedures are critical.
Technical explanation
etcd performance affects API responsiveness; slow disk, quorum loss, or compaction issues can appear as cluster-wide instability.
Managed Kubernetes hides etcd operations, but platform teams still need to understand backup guarantees and disaster recovery options.
Kubernetes internals follow a watch-and-reconcile model over API objects stored in etcd.
Extending Kubernetes safely requires schema validation, idempotent controllers, finalizers, ownership, and observable status conditions.
Backup and restore procedures are part of the control-plane design, not an afterthought.
Hands-on example
1. Use a disposable kubeadm or kind-based lab for this exercise: inspect etcd health and backup requirements in a kubeadm lab. Do not practice destructive control-plane work on production.
2. Inspect API objects and controller behavior with kubectl get -w, events, status fields, and logs from the relevant controller.
3. For backup/restore topics, create a snapshot, restore into a separate environment, and verify objects and workloads after recovery.
4. Document the failure scenario, recovery steps, and validation commands.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Kubernetes, Docker, Helm & Podman interview questions
- What is Kubernetes, and what problem does it solve over running containers manually?
- Explain the Kubernetes control plane components (API server, etcd, scheduler, controller manager).
- What runs on a worker node (kubelet, kube-proxy, container runtime)?
- What is a Pod, and why does Kubernetes schedule Pods rather than containers?
- What is the difference between a Pod, a ReplicaSet, and a Deployment?
- How does a Deployment perform a rolling update, and how do maxSurge and maxUnavailable work?
- How do you roll back a Deployment, and how does Kubernetes track revisions?
- What is a Service, and what are the types (ClusterIP, NodePort, LoadBalancer, ExternalName)?