What is observability, and how is it different from traditional monitoring? [Basic]
Answer
Observability is the ability to understand the internal state of a system from the signals it emits. Traditional monitoring tells me whether known checks are healthy; observability lets me ask new questions during unknown failure modes using metrics, logs, traces, events, and context.
Technical explanation
Monitoring is usually built around predefined dashboards and thresholds such as CPU greater than 80 percent or HTTP 5xx greater than 2 percent.
Observability focuses on debuggability: high-quality telemetry, useful dimensions, service ownership, correlation IDs, and enough context to explain why something is happening.
In SRE terms, monitoring is a subset of observability. A mature platform uses both: alerts for known user-impacting symptoms and exploratory telemetry for investigation.
Hands-on example
Hands-on: for a checkout service, expose request_count, request_duration, and error_count metrics, emit structured JSON logs with trace_id and order_id_hash, and propagate W3C trace context. When latency spikes, start from the SLO alert, open the latency dashboard, jump to slow traces for checkout to payment, then inspect only the correlated logs for those trace IDs.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]
- How do you set an SLO target, and why not just aim for 100%? [Basic]