What is an error budget, and how do you use it to balance reliability and velocity? [Basic]
Answer
An error budget is the allowed amount of unreliability under an SLO. It balances reliability and delivery speed: when the budget is healthy, teams can ship normally; when it is being burned too fast, reliability work takes priority.
Technical explanation
For a 99.9 percent availability SLO, the error budget is 0.1 percent of eligible requests or time in the SLO window.
Error budgets turn reliability from an opinion into an engineering control loop.
They help avoid both extremes: reckless feature velocity and over-investment in unnecessary reliability.
Hands-on example
Hands-on: create a 28-day SLO for checkout. If the burn rate is below 1x, continue normal releases. If a deployment consumes 30 percent of the monthly budget in one hour, freeze risky releases, roll back, and require a post-incident reliability fix before continuing.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]