Interview Istio & Service Mesh

How do you debug high tail latency introduced after enabling the mesh?

Istio & Service Mesh · Advanced level

Answer

To debug high tail latency after enabling the mesh, I compare before/after latency at each hop: client, ingress gateway, source proxy, destination proxy, and application. I look for retries, connection-pool limits, mTLS CPU cost, DNS issues, telemetry overhead, EnvoyFilter cost, and downstream saturation.

Technical explanation

Tail latency is often amplified by retries, queueing, or connection limits rather than average proxy overhead.

Separate application latency from proxy-added latency using access logs, traces, and metrics from both source and destination.

Check resource throttling on istio-proxy; CPU limits can cause sharp p99 latency jumps.

Hands-on example

Debug steps:

$ kubectl top pod -n app --containers

$ istioctl proxy-config clusters deploy/frontend -n app | grep backend

PromQL: compare p99 istio_request_duration by source and destination.

Temporarily disable new retries or filters in staging to isolate the regression.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Istio & Service Mesh interview questions

← All Istio & Service Mesh questions