1. Library
  2. Computer Networks
  3. Performance and Reliability
  4. Observability

Updated 8 hours ago

Every 10 seconds, a new data point. Every server, every container, every endpoint. The data never stops arriving.

Traditional databases weren't built for this relentless tide—they were built for data that sits still long enough to be indexed. Time-series databases exist because monitoring a modern system means ingesting millions of measurements per second, querying them by time ranges, and throwing away the old ones before storage costs explode.

What Makes Time-Series Data Different

Time-series data follows patterns that traditional databases handle poorly:

It arrives in time order and gets queried by time ranges. You don't ask "show me the row where id=47293"—you ask "show me CPU usage for the last hour." The entire access pattern is temporal.

It's append-only. New data streams in constantly. Old data is rarely updated, never deleted manually—it ages out according to policy. This is fundamentally different from transactional data where rows get modified.

It arrives at enormous volume. A single Kubernetes cluster might generate thousands of metrics per second. A fleet of IoT sensors, millions. The write path must be relentlessly optimized.

It demands aggregation. Nobody wants to see 360 individual CPU measurements from the last hour. They want the average, the maximum, the 95th percentile. Queries aggregate rather than enumerate.

Time-series databases are purpose-built for these patterns. General-purpose databases can store timestamped data, but they'll struggle under the load and query patterns that time-series workloads demand.

How They Achieve Performance

Storage That Understands Time

Time-series databases exploit the predictability of their data:

Columnar storage keeps timestamps separate from values. When you query a time range, the database reads timestamps without touching values it doesn't need.

Compression exploits patterns. Timestamps arrive at regular intervals—store the delta from the expected time, not the full timestamp. Values often change gradually—store the difference from the previous value. These techniques routinely achieve 10:1 to 100:1 compression.

Time-based partitioning divides data into chunks (hourly, daily). A query for the last hour touches one partition, not the entire dataset.

Writes Optimized for Volume

To handle millions of writes per second:

Batching buffers incoming points and writes them together. Individual point writes would kill throughput.

Write-ahead logging ensures durability without blocking. Data hits the log immediately, gets organized into storage structures later.

In-memory buffering keeps recent data in RAM. The hot path—recent metrics being actively queried—never hits disk.

Queries That Speak Time

SELECT avg(cpu_usage)
FROM metrics
WHERE host = 'server1'
  AND time > now() - 1h
GROUP BY time(5m)

This returns average CPU usage for the last hour in 5-minute buckets. The query language understands time natively—now() - 1h and GROUP BY time(5m) are first-class operations, not awkward workarounds.

The Data Model

Time-series data organizes around four concepts:

Metric name: What you're measuring (http_request_duration_seconds)

Tags/labels: Dimensions for filtering (method=GET, endpoint=/api/users, status=200)

Value: The measurement itself (0.045)

Timestamp: When it was recorded (2024-01-15T10:30:45Z)

Tags enable the queries that matter: "Show me request duration for GET requests." "Show me request duration grouped by region." "Show me request duration where status is 5xx." Without tags, you'd have a giant undifferentiated pile of numbers.

The Cardinality Trap

Here's where time-series databases have a dangerous failure mode.

Every unique combination of tags creates a separate time series. If you have 3 datacenters and 50 services, that's 150 time series for a given metric. Manageable.

But tag a metric with user_id and you've just created 10 million separate time series. Your database doesn't store one metric anymore—it stores 10 million. Tag with request_id and you're creating billions.

High cardinality—too many unique tag combinations—overwhelms time-series databases. Their indexes bloat. Their memory consumption explodes. Queries slow to a crawl.

The rule: tags should have bounded, low cardinality. datacenter (3 values), service (50 values), http_method (5 values)—good. user_id (millions of values), trace_id (unbounded)—catastrophic.

Retention and Downsampling

Time-series data has a natural lifecycle: recent data is precious, old data is disposable.

Retention policies automatically delete aged data:

  • Keep 1-second granularity for 7 days
  • Keep 1-minute granularity for 30 days
  • Keep 1-hour granularity for 1 year

Downsampling compresses old data before deletion:

  • Raw data arrives at 10-second intervals
  • After 1 day, aggregate to 1-minute averages
  • After 30 days, aggregate to 1-hour averages

You lose the ability to see that CPU spike at 10:47:30 three months ago. You keep the ability to see that CPU trended upward throughout Q3. The tradeoff is usually worth it.

Common Time-Series Databases

Prometheus

The standard for Kubernetes and cloud-native monitoring. Pull-based—it scrapes metrics from your services rather than receiving pushes. Its query language, PromQL, is powerful but has a learning curve.

Limitations: single-node architecture means no built-in clustering. Long-term storage requires external solutions like Thanos or Cortex.

InfluxDB

Purpose-built for time-series with SQL-like queries. High write throughput, built-in retention policies. The open-source version is single-node; clustering requires the enterprise license.

TimescaleDB

A PostgreSQL extension that adds time-series capabilities. You get full SQL, the PostgreSQL ecosystem, and automatic time-based partitioning. The tradeoff: not quite as optimized as purpose-built solutions.

Cloud Managed

Amazon Timestream, Google Cloud Monitoring, Azure Time Series Insights. Fully managed, integrated with their respective platforms. The usual tradeoffs: convenience versus vendor lock-in and cost.

Query Patterns

The queries that matter in monitoring:

Time-range aggregation—the bread and butter:

SELECT avg(cpu_usage)
FROM metrics
WHERE time > now() - 6h
GROUP BY time(15m)

Filtering by tags:

SELECT avg(response_time)
FROM metrics
WHERE service = 'api' AND region = 'us-east'
  AND time > now() - 1h

Percentiles for latency analysis:

SELECT percentile(response_time, 95)
FROM metrics
WHERE time > now() - 1h
GROUP BY time(5m)

Rates calculated from counters:

SELECT rate(http_requests[5m])
FROM metrics
WHERE time > now() - 1h

The Foundation of Observability

Time-series databases underpin modern monitoring:

Metrics collection: Applications, servers, and infrastructure continuously export measurements.

Dashboards: Grafana and similar tools query the database to render real-time visualizations.

Alerting: Rules evaluate continuously—"alert if average CPU exceeds 80% for 5 minutes."

SLO tracking: Calculate success rates, latency percentiles, availability—the numbers that define whether your system is healthy.

Without efficient time-series storage, none of this works at scale. The billions of data points generated by distributed systems need a home designed specifically for their patterns.

Key Takeaways

  • Time-series databases optimize for append-only writes, time-range queries, and automatic retention—patterns that overwhelm traditional databases
  • Compression, columnar storage, and time-based partitioning achieve 10-100x space savings
  • The data model combines metric names, tags for filtering, values, and timestamps—but high-cardinality tags (like user IDs) can bring the system down
  • Retention policies and downsampling balance recent detail against historical storage costs
  • Prometheus dominates Kubernetes monitoring; InfluxDB and TimescaleDB serve different niches; cloud options trade flexibility for convenience

Frequently Asked Questions About Time-Series Databases

Was this page helpful?

😔
🤨
😃