How would you build an SLO dashboard and tie alerts to error-budget burn? [Advanced]

Question

Accepted Answer

An SLO dashboard should show SLI value, SLO target, remaining error budget, burn rate across multiple windows, incident links, and the service signals needed to explain budget burn. Alerts should be based on error-budget burn, not unrelated infrastructure thresholds. The top of the dashboard should answer: are users impacted, how fast is budget burning, and how much budget remains? Supporting rows should show latency, error rate, traffic, saturation, dependencies, and recent deploys. Use recording rules so dashboard and alert math are identical.

How would you build an SLO dashboard and tie alerts to error-budget burn? [Advanced]

Answer

Technical explanation

Hands-on example

More Observability interview questions