What is the difference between metrics-based and log-based alerting, and the cost implications? [Advanced]
Answer
Metrics-based alerting evaluates pre-aggregated numeric time series and is usually cheaper, faster, and more reliable for paging. Log-based alerting searches event data and is useful for rare conditions or specific error patterns, but it can be more expensive and noisy at scale.
Technical explanation
Metrics are compact and purpose-built for alert evaluation, making them ideal for SLO burn, latency, traffic, and saturation.
Logs carry richer context but require high-volume ingestion and search processing.
Use log alerts sparingly for conditions that cannot be represented safely as metrics, such as specific audit violations or unique fatal error signatures.
Hands-on example
Example: page on Prometheus error-budget burn for checkout. Create a lower-volume Splunk alert for a specific security pattern such as repeated admin login failures from one IP. Do not search all application logs every minute for generic 'error' pages.
Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.
More Observability interview questions
- What is observability, and how is it different from traditional monitoring? [Basic]
- What are the three pillars of observability (metrics, logs, traces)? [Basic]
- What is the difference between monitoring and observability in practice? [Basic]
- What are the four golden signals of monitoring? [Basic]
- What is the difference between the USE method and the RED method? [Basic]
- When would you use the USE method versus the RED method? [Basic]
- What is an SLI, an SLO, and an SLA, and how do they relate? [Basic]
- How do you choose good SLIs for a service? [Basic]