Interview Observability

How would you build an SLO dashboard and tie alerts to error-budget burn? [Advanced]

Answer

An SLO dashboard should show SLI value, SLO target, remaining error budget, burn rate across multiple windows, incident links, and the service signals needed to explain budget burn. Alerts should be based on error-budget burn, not unrelated infrastructure thresholds.

Technical explanation

The top of the dashboard should answer: are users impacted, how fast is budget burning, and how much budget remains?

Supporting rows should show latency, error rate, traffic, saturation, dependencies, and recent deploys.

Use recording rules so dashboard and alert math are identical.

Hands-on example

Hands-on: create recording rules for good_requests, total_requests, error_ratio, and burn_rate for 5m, 1h, 6h, and 3d windows. Grafana panels show 28-day compliance, budget remaining, fast-burn page status, slow-burn ticket status, and links to Splunk and traces.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Observability interview questions

← All Observability questions