Interview › Kubernetes, Docker, Helm & Podman
How would you back up and restore an etcd cluster?
Kubernetes, Docker, Helm & Podman · Intermediate level
Answer
For etcd backup, I take a snapshot with etcdctl or the managed provider mechanism, store it securely, and regularly test restore in a non-production cluster. A backup is not trusted until a restore has been rehearsed.
Technical explanation
A consistent etcd restore usually recreates a cluster from the snapshot rather than merging arbitrary old state into a live cluster.
Snapshot encryption, access control, retention, and restore runbooks are as important as the snapshot command.
Kubernetes internals follow a watch-and-reconcile model over API objects stored in etcd.
Extending Kubernetes safely requires schema validation, idempotent controllers, finalizers, ownership, and observable status conditions.
Backup and restore procedures are part of the control-plane design, not an afterthought.
Hands-on example
1. Use a disposable kubeadm or kind-based lab for this exercise: take an etcd snapshot and restore it in a throwaway cluster. Do not practice destructive control-plane work on production.
2. Inspect API objects and controller behavior with kubectl get -w, events, status fields, and logs from the relevant controller.
3. For backup/restore topics, create a snapshot, restore into a separate environment, and verify objects and workloads after recovery.
4. Document the failure scenario, recovery steps, and validation commands.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Kubernetes, Docker, Helm & Podman interview questions
- What is Kubernetes, and what problem does it solve over running containers manually?
- Explain the Kubernetes control plane components (API server, etcd, scheduler, controller manager).
- What runs on a worker node (kubelet, kube-proxy, container runtime)?
- What is a Pod, and why does Kubernetes schedule Pods rather than containers?
- What is the difference between a Pod, a ReplicaSet, and a Deployment?
- How does a Deployment perform a rolling update, and how do maxSurge and maxUnavailable work?
- How do you roll back a Deployment, and how does Kubernetes track revisions?
- What is a Service, and what are the types (ClusterIP, NodePort, LoadBalancer, ExternalName)?