At VGS you reduced AWS spend by 25% - what specifically did you change and how did you avoid hurting reliability?

Question

Accepted Answer

I approach cost optimization as reliability-aware engineering, not blind cutting. I first build visibility by service, account, tag, environment, and usage pattern, then identify over-provisioned compute, idle resources, storage growth, data transfer, NAT costs, and commitment opportunities. Any change must preserve SLOs and headroom, so I validate with utilization data, load testing, canaries, and post-change monitoring. Cost savings are valuable only if they do not create fragility. Cost and reliability must be evaluated together: a cheaper system that misses SLOs is not a win. Common levers include rightsizing, autoscaling, non-production schedules, storage lifecycle, data-transfer reduction, and Savings Plans/Reserved Instances for stable usage. Measure before/after with spend, utilization, latency, saturation, error rate, incident count, and rollback readiness.

At VGS you reduced AWS spend by 25% - what specifically did you change and how did you avoid hurting reliability?

Answer

Technical explanation

Hands-on example

More Resume & Behavioral interview questions