Interview › Resume & Behavioral
What is your approach to capacity planning?
Resume & Behavioral · Intermediate level
Answer
Capacity planning starts with demand and service promises, not just instance size. I review traffic trends, peak events, growth forecasts, SLOs, dependency limits, and failure scenarios such as losing an AZ or a downstream service becoming slow. Then I model headroom across compute, memory, network, queues, caches, database connections, storage, and third-party limits. I validate the model with load tests, production telemetry, and alerts before saturation becomes customer impact.
Technical explanation
Capacity is multi-dimensional; CPU alone is not enough.
Plan for peak, growth, and degraded-mode scenarios, not just average traffic.
Capacity decisions should preserve latency, error rate, and availability SLOs.
Hands-on example
1. Collect 30-90 days of request rate, latency, CPU, memory, DB connections, queue depth, cache hit rate, and error rate.
2. Forecast expected growth and known events, then add risk-based headroom.
3. Run load tests to find saturation points and autoscaling lag.
4. Create alerts for capacity thresholds, rapid growth, queue backlog, database connection exhaustion, and autoscaling failure.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Resume & Behavioral interview questions
- Your title is Senior DevOps / SRE Lead - how do you personally define the difference between DevOps and SRE?
- Tell me about a typical day in your current role at Intuit.
- What does the 99.99% availability SLA you operate translate to in allowed downtime per month, and how do you track it?
- Tell me about the most business-critical incident you have owned end to end.
- Walk me through the Redis-to-Valkey migration: why migrate, what was your plan, and what could have gone wrong?
- How did you design and validate the rollback strategy for the RDS PostgreSQL and MySQL upgrades?
- What does 'minimal downtime' mean precisely for your data-store upgrades - did you achieve zero downtime, and how?
- Describe the Istio service-mesh enablement you led: what problem did it solve and how did you roll it out safely?