How do you instrument a service so that an on-call engineer can debug it without code changes? [Advanced]
Answer
I instrument a service with standardized metrics, structured logs, distributed traces, correlation IDs, deployment metadata, dependency spans, and runbook links so on-call can debug without code changes. The goal is predictable telemetry for every request path.
Technical explanation
Metrics should cover RED, dependency health, queue depth, resource saturation, and business-critical counters.
Logs should be structured, sampled responsibly, and include trace_id, service, version, tenant tier, and error code.
Traces should include meaningful span names and attributes but avoid sensitive data.
Hands-on example
Implementation example: add OTel auto-instrumentation, Prometheus /metrics, JSON logging middleware, trace_id injection into logs, deployment annotations, health/readiness endpoints, and dashboards generated from service templates. Validate by running a failure drill before production launch.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]