Status Pages

Updated 10 hours ago

When your service breaks, customers face an immediate question: Is this me, or is this you?

Without a status page, they're left guessing. They check social media, search for news, contact support, or simply assume your service is unreliable. Each path wastes their time and erodes their trust.

A status page answers their question immediately. More importantly, it transforms their experience from "something is broken and I don't know what's happening" to "something is broken, they know, and they're working on it."

That shift—from powerlessness to informed waiting—is why status pages matter.

What a Status Page Shows

A status page is a public website, hosted separately from your main service, that displays whether your systems are operating normally or experiencing problems.

The page breaks your service into components customers care about: your website, API, mobile apps, authentication, payment processing, email delivery. Each component shows a status—operational, degraded, partial outage, or major outage.

During incidents, the page displays what's affected, what users might experience, and what's being done. Updates appear as the situation evolves.

The Trust Equation

Status pages reduce support burden—instead of hundreds of customers each reporting the same problem, they check the page and see you're already responding.

But the deeper value is trust. Customers can see when things aren't working. They know before you tell them. The question isn't whether they'll discover the problem—it's whether they'll discover it from you or despite you.

Honest status pages build trust during failures. Dishonest ones—or absent ones—destroy trust precisely when it matters most.

Essential Elements

Component Status

Break your service into components that match how customers think about it:

Website
API
Mobile App (iOS / Android)
Authentication
Payment Processing
Email Delivery
Webhooks

If customers distinguish between your free and paid tiers, show them separately. If you operate in multiple regions, show regional status.

Current Incidents

When incidents occur, they appear prominently with:

A brief, descriptive title
Affected components
What users might experience
Current status (investigating, identified, monitoring, resolved)
Timeline of updates with timestamps
Expected resolution time, if known

Updates appear in reverse chronological order. The most recent information comes first.

Incident History

Maintain a historical record of past incidents—typically 90 days or longer. This transparency shows that problems happen, and that you resolve them and communicate about them.

Uptime Metrics

Display uptime percentages over different periods: 24 hours, 7 days, 30 days, 90 days. This quantifies reliability. Customers can see that while an incident occurred today, your service has maintained 99.95% uptime over the past quarter.

Subscriptions

Let customers subscribe to updates via email, SMS, RSS, or webhooks. Let them choose which components they care about. A customer using only your API doesn't need notifications about website issues.

Scheduled Maintenance

Show upcoming maintenance windows separately from incidents. The distinction between unexpected problems and planned changes sets appropriate expectations.

What Makes a Status Page Effective

Host It Separately

If your main infrastructure goes down, customers need to access your status page to understand what's happening. Host it with a different provider or use a dedicated status page service.

Update Within 15 Minutes

Post initial incident notifications within 15 minutes of becoming aware of significant issues. Continue updating regularly throughout—every 30 minutes or hour depending on severity—even if the update is "no significant changes yet."

Quick updates demonstrate you're on top of problems. Slow updates teach customers your status page can't be trusted.

Be Honest

If your service is completely down, say so. If 50% of users are affected, don't say "some users."

Customers can see when things aren't working. Trying to downplay issues damages trust far more than honest disclosure.

Explain Impact, Not Infrastructure

"Database Performance Issue" tells customers nothing. "Some users may experience slow page loads or timeouts when accessing account history" tells them what they need to know.

Write in plain language. Avoid phrases like "isolated impact" or "limited degradation." Be specific about what users might experience.

Set Realistic Expectations

If you don't know when an issue will be resolved, don't guess. Say "We're actively working on resolution and will provide updates every 30 minutes" rather than "We expect to resolve this in 15 minutes" when you're uncertain.

Mark Resolution Clearly

When an incident is resolved, mark it clearly and explain what was done. Don't leave incidents in "monitoring" status indefinitely.

Incident Status Progression

Status pages use a progression of states that communicate where you are in response:

Investigating: You're aware of reports or indicators of a problem and are actively working to understand what's happening.

Identified: You've determined the cause or understand enough to describe the problem clearly.

Monitoring: You've implemented a fix and are watching to ensure stability before declaring resolution.

Resolved: The incident is over, service is restored, and you're confident it won't immediately recur.

This progression helps customers understand the arc of your response without needing technical details.

Common Mistakes

Late updates: Posting the first update an hour after users start experiencing problems makes your status page feel unreliable. Customers learn they can't trust it and stop checking.

Technical overload: Filling updates with infrastructure details obscures what customers actually need to know.

Inconsistent updates: Promising updates every 30 minutes then disappearing for two hours destroys credibility.

Premature resolution: Marking things operational while issues continue teaches customers your status page lies. That lesson is hard to undo.

Vague language: "Some degradation" doesn't help customers understand whether they're affected.

No status page: The biggest mistake. Customers are left guessing whether problems are on their end or yours.

Automation and Human Judgment

The best status pages integrate with monitoring systems. When monitoring detects an outage, it can automatically post initial updates. When systems recover, updates reflect that.

But full automation requires careful configuration. You don't want every momentary blip creating a status page incident. Some human judgment remains valuable for deciding what rises to customer communication.

Many teams use a hybrid approach: automated alerts notify incident responders, who manually update the status page with appropriate customer-facing language.

Internal vs. External

Some organizations maintain separate internal status pages with more granular components, technical metrics, and links to dashboards and runbooks. These serve operational teams.

The public status page remains simplified and customer-focused. The separation allows detailed internal visibility without overwhelming external audiences.

Frequently Asked Questions About Status Pages

Was this page helpful?

😔

🤨

😃