TCP Congestion Control

Updated 10 hours ago

Every TCP sender is constantly asking the same question: how fast can I go without breaking things for everyone else?

This isn't a rhetorical question. It's the actual problem TCP congestion control solves, billions of times per second, across every connection on the Internet. Without it, the network would collapse within minutes. Understanding how this works explains why some transfers crawl while others fly—and reveals one of the most successful cooperative systems ever built.

The Problem: Too Many Packets, Not Enough Pipes

When too many devices try to send data through the same network path, routers fill up. They have finite buffer space to hold packets waiting for transmission. When the buffers overflow, packets get dropped.

In the early days of TCP, senders would blast data as fast as the receiver could accept it, completely ignoring what the network between them could handle. The result was congestion collapse: the more traffic people sent, the less actually got through. Networks would grind to a halt as packets were sent, dropped, retransmitted, and dropped again.

TCP congestion control is the solution. Each sender monitors the network for signs of congestion and backs off when it detects trouble. This creates a feedback loop that keeps the Internet functional despite billions of simultaneous connections, none of which know about each other.

The Congestion Window: Your Speed Limit

The core mechanism is the congestion window (cwnd)—a limit on how much unacknowledged data a sender can have in flight at once. Think of it as a self-imposed speed limit that each sender adjusts based on network conditions.

This works alongside the receiver's advertised window (how much the receiver is ready to accept). Your actual sending rate is limited by whichever is smaller.

The congestion window starts small and grows when things go well. When congestion is detected—usually through packet loss—it shrinks. This dynamic adjustment is how TCP probes for available bandwidth without overwhelming the network.

Slow Start: Exponential Growth (Despite the Name)

The name is misleading. Slow start is actually aggressive—it just looks slow compared to sending everything at once.

When a connection begins, the congestion window starts at a small value, typically 1 to 10 segments. For every acknowledgment received, the window grows by one segment. This creates exponential growth:

Send 1 packet, get 1 ACK, window becomes 2
Send 2 packets, get 2 ACKs, window becomes 4
Send 4 packets, get 4 ACKs, window becomes 8

The window doubles every round-trip time. This continues until either the window hits a threshold called ssthresh (slow start threshold) or packet loss occurs.

This aggressive initial growth lets new connections quickly discover how much bandwidth is available. Starting conservatively and ramping up fast is smarter than starting fast and causing immediate congestion.

Congestion Avoidance: The Careful Probe

Once the congestion window reaches ssthresh, TCP shifts from exponential to linear growth. Now it's being careful.

In congestion avoidance, the window increases by roughly one segment per round-trip time, regardless of how many ACKs arrive. The connection is probing for more bandwidth, but cautiously—it's already approaching the point where congestion previously occurred.

This phase continues indefinitely, with the window slowly creeping upward until congestion is detected again.

Detecting Congestion: Reading the Signs

TCP primarily detects congestion through packet loss. Two signals matter:

Timeout: The sender waits for an acknowledgment that never comes. This indicates severe congestion or a broken path. It's the nuclear option—when this happens, TCP assumes the worst.

Duplicate ACKs: The receiver keeps acknowledging the same sequence number, meaning later packets arrived but an earlier one is missing. Three duplicate ACKs in a row is the standard threshold for assuming packet loss.

Modern networks also support Explicit Congestion Notification (ECN), where routers mark packets to signal "I'm getting congested" before they start dropping. This lets TCP back off before loss occurs—a gentler signal that produces smoother behavior.

Fast Retransmit and Fast Recovery: The Smart Response

When TCP sees three duplicate ACKs, something interesting happens. Those ACKs mean the network is still working—packets are getting through, just not all of them. This is different from a timeout, where nothing is getting through.

Fast retransmit immediately resends the missing packet without waiting for a timeout. Why wait when you already know something's wrong?

Fast recovery then adjusts the congestion window without returning to slow start. The logic: if packets are still arriving, the congestion isn't catastrophic. The window drops to half its previous size, not back to 1.

As more duplicate ACKs arrive, the window temporarily inflates to keep data flowing. Once the lost packet is finally acknowledged, normal congestion avoidance resumes from the reduced window.

This distinction—catastrophic loss versus isolated loss—lets TCP respond proportionally to the severity of congestion.

The Evolution: From Reno to BBR

TCP congestion control has evolved significantly:

Reno introduced fast retransmit and fast recovery, becoming the standard for years.

NewReno improved handling when multiple packets are lost from the same window—a common scenario that Reno handled poorly.

Cubic uses a cubic function to grow the congestion window, pushing harder when far from the last congestion point and easing off as it approaches. This works better on fast, long-distance links where the old algorithms were too conservative.

BBR takes a fundamentally different approach. Instead of using packet loss as the primary signal, it actively measures the actual bottleneck bandwidth and round-trip time. This works dramatically better on networks with large buffers or random (non-congestion) packet loss—situations where loss-based algorithms get confused.

The RTT Fairness Problem

Round-trip time affects how fast you can grow your window. Shorter RTT means ACKs arrive faster, which means faster window growth.

This creates unfairness: two connections sharing the same bottleneck won't get equal bandwidth if one has a 20ms RTT and the other has 200ms. The faster connection captures more bandwidth simply because it can react faster.

Various algorithms try to address this with varying success. BBR explicitly accounts for RTT in its calculations, aiming for fairness regardless of distance.

Bufferbloat: When Buffers Become the Problem

Here's a genuinely strange situation: your connection has great throughput, packets aren't being dropped, yet everything feels unbearably slow.

This is bufferbloat. Routers with enormous buffers don't drop packets when congested—they queue them. Packets wait seconds in buffer purgatory instead of being dropped immediately.

TCP sees no packet loss, so it doesn't back off. The queue grows. Latency skyrockets. Your video call stutters even though your bandwidth test shows great speeds.

The solutions: Active Queue Management (AQM) algorithms like CoDel and PIE that intelligently drop packets before buffers explode, and congestion control algorithms like BBR that don't rely solely on loss signals.

Explicit Congestion Notification

ECN lets routers say "I'm getting full" by marking packets instead of dropping them. TCP can then reduce its sending rate before loss occurs.

This requires support from both endpoints and every router along the path. When enabled, it produces smoother congestion response—the network signals trouble before crisis hits.

Adoption has been slow but is accelerating. Modern operating systems and network equipment increasingly support ECN.

When Defaults Aren't Enough

TCP congestion control usually works well without tuning, but some environments are special:

Satellite links have enormous latency. Algorithms designed for terrestrial networks may be too conservative to fully utilize available bandwidth.

Data centers have microsecond latencies and massive bandwidth. Specialized algorithms like DCTCP are designed for these conditions.

Wireless networks have packet loss that isn't caused by congestion. Algorithms that treat all loss as congestion will unnecessarily throttle.

The Social Contract

TCP congestion control is a cooperative system. Every sender agrees to back off when the network is stressed, even though ignoring congestion signals would give short-term advantage.

This works because the alternative is worse for everyone. A network where senders don't back off collapses into uselessness. The "selfish" choice—slowing down when congested—is also the choice that keeps the Internet functional.

Billions of devices, running code written by different people, in different decades, following the same basic rules. No central coordinator. No enforcement mechanism beyond the protocol itself. And it works.

That's the real story of TCP congestion control: not just an algorithm, but one of the largest cooperative systems in human history.

Frequently Asked Questions About TCP Congestion Control

Was this page helpful?

😔

🤨

😃