How do you choose good SLIs for a service? [Basic]
Answer
Good SLIs are user-centric, measurable, attributable, and hard to game. I choose SLIs that represent the experience users actually care about: availability, latency, correctness, freshness, and durability depending on the service.
Technical explanation
For synchronous APIs, good SLIs are success ratio and latency below a threshold.
For pipelines, good SLIs include freshness, completeness, and processing delay.
Avoid SLIs that only measure internals, such as pod count or CPU, unless the user impact is direct and proven.
Hands-on example
Hands-on: for an order API, define good events as POST /orders returning 2xx within 750 ms, excluding client 4xx validation errors. In Prometheus, create a numerator for good requests and a denominator for total eligible requests, then graph the ratio by service and environment.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you set an SLO target, and why not just aim for 100%? [Basic]