1. Library
  2. Computer Networks
  3. Performance and Reliability
  4. Reliability

Updated 8 hours ago

When a service promises 99.9% uptime, what are they actually promising?

There are 525,600 minutes in a year. At 99.9% uptime, 526 of those minutes—8 hours and 46 minutes—can be downtime. Your service could vanish for an entire business day, once a year, and still hit the target.

That's the math. But the math hides almost everything that matters.

The Nines

Uptime is measured in "nines"—the number of 9s in the percentage. Each additional nine represents 10x less downtime:

UptimeNameAnnual DowntimeMonthly Downtime
99%Two nines87.6 hours (3.6 days)7.2 hours
99.5%Two and a half nines43.8 hours (1.8 days)3.6 hours
99.9%Three nines8.76 hours43.8 minutes
99.95%Three and a half nines4.38 hours21.9 minutes
99.99%Four nines52.6 minutes4.4 minutes
99.999%Five nines5.26 minutes26.3 seconds
99.9999%Six nines31.5 seconds2.6 seconds

Two nines is barely trying. Five nines is extraordinary. Six nines borders on theoretical.

What the Percentage Hides

Here's what 99.9% uptime doesn't tell you:

When the downtime happens. 43 minutes at 3 AM on a Tuesday affects almost no one. 43 minutes at 2 PM on Black Friday destroys your quarter. Same percentage. Vastly different consequences.

How the downtime is distributed. 99.9% could be one terrible Monday in March, or forty-three unremarkable minutes scattered across the year. The number is identical. The experience is not.

Whether it was actually down. Different providers define "downtime" differently. Some count only complete unavailability. Others count severe degradation—a site that takes 30 seconds to load. Some exclude planned maintenance entirely, giving themselves hours of "free" downtime that doesn't affect the percentage.

Where it was down. A service might achieve 99.99% in North America while delivering 99% in Southeast Asia. Global percentages hide regional failures.

How bad the degradation was. A service responding in 10 seconds instead of 100 milliseconds isn't technically "down." It's also nearly useless. Uptime percentages are blind to this.

The Cost of Each Nine

Each additional nine costs roughly 10x more than the previous one.

Two nines (99%): A single well-maintained server with occasional restarts.

Three nines (99.9%): Redundant systems with automated failover. When something fails, something else takes over automatically. You need monitoring that detects problems in minutes, not hours.

Four nines (99.99%): Multiple data centers in different locations. Redundancy at every layer—power, network, storage, compute. On-call engineers ready to respond within minutes. The architecture must survive any single component failure without users noticing.

Five nines (99.999%): The system must survive entire data center failures. Automatic failover in seconds. Teams of engineers operating around the clock. At this level, you're engineering against scenarios most organizations never consider.

Six nines (99.9999%): Near-mythical reliability. 31 seconds of annual downtime. The engineering required is so extreme that few services genuinely achieve it. Those that do spend staggering sums.

If three nines costs $100,000 annually to maintain, four nines might cost $1,000,000. Five nines might cost $10,000,000. The curve is unforgiving.

What Different Services Actually Need

Consumer websites typically target three nines. Eight hours of annual downtime is noticeable but tolerable. Users complain, then return.

E-commerce platforms need three and a half to four nines. When your site generates $100,000 per hour, five minutes of downtime costs $8,300. The math justifies the investment.

Financial services require four nines or higher. Downtime prevents transactions, triggers regulatory scrutiny, and damages trust that takes years to rebuild.

Emergency services—911 systems, hospital infrastructure—need five nines or higher. Downtime isn't an inconvenience. It's potentially fatal.

Reading an SLA

Service Level Agreements formalize uptime promises. The percentage is just the headline. The details determine whether the promise means anything:

What counts as downtime? Only complete unavailability? Degraded performance? Partial outages affecting some users?

How is it measured? Checks every minute might miss brief outages that second-by-second monitoring would catch. Monitoring from one location might miss regional failures.

What's excluded? Planned maintenance is often carved out. An SLA promising 99.9% "excluding scheduled maintenance windows" is promising less than it appears.

What happens when they miss? Typically service credits—10% off your bill if uptime drops below 99.9%, maybe 50% if it drops below 99%. These credits rarely cover actual damages. If your e-commerce site was down during your biggest sales day, a 10% credit is insulting.

Beyond the Percentage

Because uptime percentages hide so much, modern reliability engineering tracks additional metrics:

Mean Time to Recovery (MTTR) measures how quickly you recover from failures. Two services with identical uptime percentages might have very different failure patterns—one fails rarely but takes hours to fix, while the other fails frequently but recovers in minutes.

Error budget reframes the percentage as a budget to spend. At 99.9% uptime, you have 0.1% to "spend" on deployments, experiments, and incidents. When the budget runs low, you slow down and focus on stability.

Service Level Objectives (SLOs) expand beyond uptime to include latency, error rates, and throughput. A service might be "up" but responding so slowly that users abandon it. SLOs capture what uptime percentages miss.

The percentage is where the conversation starts. It's rarely where the conversation should end.

Frequently Asked Questions About Uptime

Was this page helpful?

😔
🤨
😃