Interview Observability

How do you design alerts that page a human only when action is required? [Advanced]

Answer

I design human-page alerts around user impact, urgency, ownership, and required action. If no immediate human action is needed, the signal should become a ticket, dashboard annotation, or automated remediation rather than a page.

Technical explanation

Use SLO burn-rate alerts for page-worthy service symptoms.

Require every page to include service, severity, owner, runbook, dashboard, and recent-change links.

Tune alerts with historical page reviews and remove alerts that do not lead to action.

Hands-on example

Hands-on: convert CPUHigh pages into tickets unless CPU saturation is proven to cause SLO burn. Keep pages for checkout high burn rate, payment dependency outage, and data loss risk. Add Alertmanager inhibition so pod-level alerts do not page when the service-level SLO alert is already firing.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Observability interview questions

← All Observability questions