How do you instrument a service so that an on-call engineer can debug it without code changes? [Advanced]

Question

Accepted Answer

I instrument a service with standardized metrics, structured logs, distributed traces, correlation IDs, deployment metadata, dependency spans, and runbook links so on-call can debug without code changes. The goal is predictable telemetry for every request path. Metrics should cover RED, dependency health, queue depth, resource saturation, and business-critical counters. Logs should be structured, sampled responsibly, and include trace_id, service, version, tenant tier, and error code. Traces should include meaningful span names and attributes but avoid sensitive data.

How do you instrument a service so that an on-call engineer can debug it without code changes? [Advanced]

Answer

Technical explanation

Hands-on example

More Observability interview questions