Interview Observability

What is the difference between metrics-based and log-based alerting, and the cost implications? [Advanced]

Answer

Metrics-based alerting evaluates pre-aggregated numeric time series and is usually cheaper, faster, and more reliable for paging. Log-based alerting searches event data and is useful for rare conditions or specific error patterns, but it can be more expensive and noisy at scale.

Technical explanation

Metrics are compact and purpose-built for alert evaluation, making them ideal for SLO burn, latency, traffic, and saturation.

Logs carry richer context but require high-volume ingestion and search processing.

Use log alerts sparingly for conditions that cannot be represented safely as metrics, such as specific audit violations or unique fatal error signatures.

Hands-on example

Example: page on Prometheus error-budget burn for checkout. Create a lower-volume Splunk alert for a specific security pattern such as repeated admin login failures from one IP. Do not search all application logs every minute for generic 'error' pages.

Preparing for an interview?

Check how well your resume matches the role with our free resume checker— match score, ATS check, and the skills you're missing.

More Observability interview questions

← All Observability questions