How do you scale Prometheus for long-term storage and high availability (Thanos, Cortex, Mimir)? [Intermediate]

Question

Accepted Answer

To scale Prometheus for long-term storage and HA, I run at least two Prometheus replicas per shard, use remote write or sidecars, and query long-term data through systems such as Thanos, Cortex, or Mimir. The exact choice depends on tenancy, scale, and operational model. Prometheus itself is single-node per shard, so horizontal scale usually means functional sharding and federation or remote-write architectures. Thanos adds sidecar upload, object storage, global querying, compaction, and deduplication around Prometheus. Cortex and Mimir are horizontally scalable, multi-tenant metrics backends designed for remote-write ingestion and large-scale querying.

How do you scale Prometheus for long-term storage and high availability (Thanos, Cortex, Mimir)? [Intermediate]

Answer

Technical explanation

Hands-on example

More Observability interview questions