What metrics would you alert on for the mesh itself?

Question

Accepted Answer

I alert on mesh control-plane health, proxy sync, gateway health, xDS push errors, certificate expiration, injection failures, 5xx/error-rate at gateways, mTLS or authorization failures, high proxy CPU/memory, rejected config, and abnormal request latency introduced at the proxy layer. Control-plane alerts tell us whether the mesh can accept changes and support scaling events. Data-plane alerts tell us whether user traffic is affected. Gateway alerts need special attention because gateways are shared choke points.

What metrics would you alert on for the mesh itself?

Answer

Technical explanation

Hands-on example

More Istio & Service Mesh interview questions