How would you build an SLO dashboard and tie alerts to error-budget burn? [Advanced]
Answer
An SLO dashboard should show SLI value, SLO target, remaining error budget, burn rate across multiple windows, incident links, and the service signals needed to explain budget burn. Alerts should be based on error-budget burn, not unrelated infrastructure thresholds.
Technical explanation
The top of the dashboard should answer: are users impacted, how fast is budget burning, and how much budget remains?
Supporting rows should show latency, error rate, traffic, saturation, dependencies, and recent deploys.
Use recording rules so dashboard and alert math are identical.
Hands-on example
Hands-on: create recording rules for good_requests, total_requests, error_ratio, and burn_rate for 5m, 1h, 6h, and 3d windows. Grafana panels show 28-day compliance, budget remaining, fast-burn page status, slow-burn ticket status, and links to Splunk and traces.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]