Interview Observability

What is an error budget, and how do you use it to balance reliability and velocity? [Basic]

Answer

An error budget is the allowed amount of unreliability under an SLO. It balances reliability and delivery speed: when the budget is healthy, teams can ship normally; when it is being burned too fast, reliability work takes priority.

Technical explanation

For a 99.9 percent availability SLO, the error budget is 0.1 percent of eligible requests or time in the SLO window.

Error budgets turn reliability from an opinion into an engineering control loop.

They help avoid both extremes: reckless feature velocity and over-investment in unnecessary reliability.

Hands-on example

Hands-on: create a 28-day SLO for checkout. If the burn rate is below 1x, continue normal releases. If a deployment consumes 30 percent of the monthly budget in one hour, freeze risky releases, roll back, and require a post-incident reliability fix before continuing.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Observability interview questions

← All Observability questions