How does Prometheus handle high cardinality, and why is it a problem? [Intermediate]
Answer
Prometheus handles high cardinality poorly when too many unique label combinations create too many time series. It increases memory, disk, CPU, query latency, and remote-write cost, and can make Prometheus unstable.
Technical explanation
Cardinality is the number of unique time series, not just the number of metric names.
Labels like user_id, request_id, session_id, full URL, IP address, or order ID can create unbounded series.
Prevention is better than cleanup: enforce metric naming and label-review standards before production.
Hands-on example
Hands-on: run topk(20, count by (__name__)({__name__=~'.+'})) to identify large metrics, and count by (label_name) is not directly available, so use tooling such as promtool, Mimirtool, or TSDB status. Drop bad labels at scrape or instrumentation before retention cost grows.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]