Interview AWS

How do you design for disaster recovery - explain RTO and RPO and the DR strategies.

AWS · Advanced level

Answer

DR design starts with RTO and RPO. Backup-restore, pilot light, warm standby, and multi-site are increasing levels of readiness, cost, and complexity for decreasing recovery time and data loss.

Technical explanation

Lower RTO/RPO requires more pre-provisioning, replication, automation, and regular DR exercises.

Availability design should start from business impact, RTO/RPO, dependency mapping, and failure-mode testing, not only from deploying resources in multiple AZs.

Stateless compute, resilient data stores, health checks, rollback, backups, and game days are all required to prove resilience.

Lower recovery targets require higher cost, more automation, replicated data, pre-provisioned capacity, and regularly tested runbooks.

Hands-on example

1. Draw the workload dependency map, then define target RTO/RPO with the business owner.

2. Implement multi-AZ or multi-Region components required by those targets, including data replication and automated provisioning.

3. Run a game day: instance failure, AZ impairment, database failover, restore test, or regional failover depending on scope.

4. Measure actual recovery time/data loss and update the architecture or runbook if targets are missed.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More AWS interview questions

← All AWS questions