What does the 99.99% availability SLA you operate translate to in allowed downtime per month, and how do you track it?

Question

Accepted Answer

A 99.99% availability target means the service can be unavailable for only 0.01% of the measurement window. For a 30-day month, that is about 4.32 minutes of allowed downtime; over a year it is about 52.6 minutes. I track it through user-facing SLIs, not just host uptime: successful request rate, critical journey success, latency thresholds, and sometimes synthetic checks. I also watch error-budget burn so we can react early instead of finding out at month end that the service missed its target. 99.99% leaves a 0.01% error budget. For 30 days: 43,200 minutes x 0.0001 = 4.32 minutes. Availability should be user-centric. A pod can be running while a critical API or user journey is failing. Burn-rate alerting is key for four-nines services because a short severe incident can consume the monthly budget quickly.

What does the 99.99% availability SLA you operate translate to in allowed downtime per month, and how do you track it?

Answer

Technical explanation

Hands-on example

More Resume & Behavioral interview questions