How do you prevent a single noisy service from blowing up observability costs for everyone? [Advanced]
Answer
I prevent one noisy service from blowing up shared observability costs with quotas, ownership tags, ingestion limits, cardinality policies, sampling, retention tiers, and review gates. Cost must be visible to the service owner.
Technical explanation
Each telemetry stream should include team/service/cost-center attributes so chargeback or showback is possible.
Collectors and backends should enforce limits on bytes, points per second, spans per second, and label cardinality.
Noisy services should be throttled or routed to shorter retention rather than degrading the platform for everyone.
Hands-on example
Hands-on: create per-team telemetry budgets. If service=catalog exceeds its metrics cardinality budget, the collector drops disallowed labels and sends warnings. If logs exceed daily quota, DEBUG logs are dropped first, errors are preserved, and the owning team receives a cost report.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]