How do you design for disaster recovery - explain RTO and RPO and the DR strategies.
AWS · Advanced level
Answer
DR design starts with RTO and RPO. Backup-restore, pilot light, warm standby, and multi-site are increasing levels of readiness, cost, and complexity for decreasing recovery time and data loss.
Technical explanation
Lower RTO/RPO requires more pre-provisioning, replication, automation, and regular DR exercises.
Availability design should start from business impact, RTO/RPO, dependency mapping, and failure-mode testing, not only from deploying resources in multiple AZs.
Stateless compute, resilient data stores, health checks, rollback, backups, and game days are all required to prove resilience.
Lower recovery targets require higher cost, more automation, replicated data, pre-provisioned capacity, and regularly tested runbooks.
Hands-on example
1. Draw the workload dependency map, then define target RTO/RPO with the business owner.
2. Implement multi-AZ or multi-Region components required by those targets, including data replication and automated provisioning.
3. Run a game day: instance failure, AZ impairment, database failover, restore test, or regional failover depending on scope.
4. Measure actual recovery time/data loss and update the architecture or runbook if targets are missed.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More AWS interview questions
- What is the AWS shared responsibility model, and where is the line between AWS and the customer?
- Explain the difference between a Region, an Availability Zone, and an Edge Location.
- What is a VPC, and what are its core components (subnets, route tables, IGW, NAT)?
- Difference between a public and a private subnet, and how does each reach the internet?
- What is the difference between a Security Group and a Network ACL?
- Are Security Groups stateful or stateless? What about NACLs?
- What is an Internet Gateway versus a NAT Gateway, and when do you need each?
- How does a NAT Gateway differ from a NAT instance?