What is distributed tracing, and what is a span and a trace? [Advanced]
Answer
Distributed tracing follows a request across service boundaries. A trace is the full end-to-end request journey; a span is one timed operation within that journey, such as an HTTP handler, database call, queue publish, or downstream RPC.
Technical explanation
Each span has a trace ID, span ID, parent span ID, timestamps, attributes, events, and status.
Traces reveal dependency latency, fan-out, retries, and where errors occur in a call chain.
Tracing is most valuable when service names, routes, status codes, and error attributes follow consistent conventions.
Hands-on example
Example: checkout request trace includes spans: ingress -> checkout POST /orders -> inventory reserve -> payment authorize -> database insert -> Kafka publish. If p95 latency rises, the trace waterfall shows payment authorize consumes most of the time.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]