Incident Communication

Updated 10 hours ago

When systems fail, people can't see inside your war room. They can only see what you tell them. Your communication doesn't describe the incident—it IS the incident, as far as they're concerned.

This is why communication failures turn manageable technical problems into reputation disasters. And why good communication can build trust even when everything is broken.

Why Communication Matters

Without information, people fill the vacuum with assumptions—usually worse than reality.

Customer trust depends on feeling heard. Customers experiencing issues want to know you're aware and working on it. Silence feels like indifference. Transparent communication builds trust even during failures.

Team coordination breaks down without shared information. Responders need to share findings, coordinate actions, and avoid duplicating effort. Good communication keeps everyone aligned on what's happening and what's being tried.

Stakeholder decisions require context. Executives and business leaders can't make good decisions—about resources, customer communication, business impact—without understanding what's happening and what's being done.

Internal Communication

Internal communication during incidents follows different patterns than external communication. It's faster, more technical, and serves coordination as much as information.

The Dedicated Channel

When an incident begins, create a dedicated communication channel—a Slack channel, Teams space, or room in your incident management platform.

This channel becomes the source of truth. All incident-related communication happens there. When someone joins the response, they can catch up by reading the channel. When you reconstruct events later, the timeline is preserved.

Naming should be clear and consistent: #incident-2024-12-11-checkout-failure tells people immediately what the channel is for.

Frequent Updates

During active incidents, update the channel frequently. Even if you have no new information, say so: "Still investigating database performance. No significant changes in the last 10 minutes."

These updates keep everyone informed, show that progress is happening, and provide timestamps for later reconstruction.

Share what you're doing: "Scaling up database instances now." Share what you're seeing: "Error rate dropped from 45% to 12% after the restart." Share what you're trying next: "Going to check load balancer configuration."

Coordination, Not Just Status

Internal communication coordinates action, not just reports status.

"I'm looking at the database. Can someone check application logs?" divides investigation across components.

"Don't restart the web servers yet—let me finish this query first" prevents interference.

"Everyone please stop making changes for 5 minutes while we see if this stabilizes" creates breathing room to assess whether a fix worked.

Different Audiences Need Different Information

Responders need technical details, command coordination, and real-time updates on what everyone is doing.

Engineering managers need high-level status, resource needs, and escalation triggers.

Executives need impact assessment, estimated resolution time, and customer-facing implications.

Support teams need customer-facing information they can use to respond to inquiries.

Consider separate channels for different audiences, or designate people to bridge between detailed incident response and executive updates.

External Communication

External communication—to customers, users, and the public—requires different considerations. It's slower, less technical, and focused on impact rather than cause.

The First Message

The first external message should come within 15-30 minutes of detecting a significant incident. It doesn't need to explain root causes or provide solutions. It needs to show awareness and commitment:

"We're currently investigating reports of slow performance affecting some users. Our team is actively working on this, and we'll provide updates every 30 minutes."

This message acknowledges the problem, demonstrates you're working on it, and sets expectations for further updates.

Scheduled Updates

Continue updating on a schedule—every 30 minutes, every hour, depending on severity. Stick to this schedule even if you have no significant news.

"Update: We've identified the issue as a database performance problem and are working on mitigation. Next update in 30 minutes or sooner if there's significant progress."

Scheduled updates prevent customers from feeling forgotten. They know when to expect information rather than constantly checking.

What to Include

Current status: What's affected right now? What's working? What's not?

User impact: Who is affected? What symptoms might they see? What can't they do?

Workarounds: If alternative ways to accomplish tasks exist, share them.

Timeline: When did this start? When do you expect resolution?

Next steps: What are you doing? When will you update again?

What to Avoid

Technical jargon: Customers don't need to know about database query optimization. They need to know whether they can use your service.

Blame: Never blame vendors, partners, or third parties publicly—even if they caused the problem. It's unprofessional and damages relationships.

Absolute promises: "This will definitely be fixed in 30 minutes" sets you up for failure. "We expect restoration within 30 minutes" leaves room for reality.

Minimizing: Don't downplay serious issues. If many customers are affected, acknowledge it. Trying to make it sound minor damages trust when people can see the actual impact.

Speculation: Early in incidents, you often don't fully understand what happened. Don't share preliminary theories that might be wrong.

Tone and Empathy

How you say things matters as much as what you say.

Acknowledge impact: "We know you depend on our service for critical work, and we're sorry for this disruption." This shows you understand incidents cause real problems for real people.

Own the problem: "We're experiencing an issue"—not "There's an issue" or "An issue is occurring." Active voice and clear ownership demonstrate accountability.

Write like a human: "We're sorry this happened"—not "We apologize for any inconvenience this may have caused." Generic, detached language feels disconnected from real impact.

Show progress: "Our team has identified the cause and is implementing a fix" feels better than "We're still working on it." People tolerate problems better when they see movement toward resolution.

Communication Channels

Different channels serve different purposes.

Status pages provide a public, authoritative source. Customers bookmark them specifically to check during problems. Update your status page for any incident affecting customer-facing systems.

In-app messages reach users exactly when they're encountering problems. If your application is partially functional, banners can inform users who are actively trying to use the service.

Email works better for incident summaries and resolutions than real-time updates during active incidents. People don't check email fast enough for it to be primary.

Social media reaches people who might not check status pages but are discussing your service online. However, angry customers will reply to your updates, and you need to respond constructively.

Post-Resolution Communication

Communication doesn't end when the incident resolves.

Resolution notice: Send a clear message when service is restored. "The incident is resolved. All systems are operating normally. Thank you for your patience." This gives customers permission to stop worrying.

Post-incident summary: Within 24-48 hours after major incidents, share a summary explaining what happened, what was affected, what you did, and what you're doing to prevent recurrence. This demonstrates transparency and commitment to improvement. It's appropriate for major incidents, not every minor issue.

Communication Failures

Disappearing: Starting communication then going silent is worse than never communicating at all. You've taught people to expect updates, then abandoned them. If you say you'll update every 30 minutes, update every 30 minutes.

Contradictory messages: Different channels sharing different information confuses everyone. Coordinate all external communication through a single source.

Premature all-clear: Declaring resolution before confirming stability erodes trust when problems resume. Make sure service is truly restored before announcing resolution.

Defensive language: Getting defensive when customers express frustration makes things worse. Acknowledge feelings, take responsibility, focus on solutions.

Preparation

Good incident communication requires preparation before incidents happen.

Templates: Create templates for common messages—initial notification, ongoing updates, resolution notices, post-incident summaries. Templates provide structure and ensure you include necessary information when you're under pressure.

Roles: Designate who handles different communication tasks. A communications lead handles customer-facing messages, freeing technical responders to focus on resolution.

Approval processes: For major incidents, establish who approves external communications and how quickly. You want oversight without bottlenecks. For many organizations, the incident commander has authority to approve communications for speed, with executives informed but not blocking.

Practice: During simulations, practice the communication too. What would you say? Who would write what? Practicing before real incidents removes uncertainty.

Frequently Asked Questions About Incident Communication

Was this page helpful?

😔

🤨

😃