Cloud Load Balancers

Updated 8 hours ago

A load balancer is a beautiful lie: your users think they're talking to one server, but they're actually talking to whichever of your fifty servers happens to be healthy and available right now. The load balancer maintains this illusion, silently routing around failures, spreading work evenly, and making your infrastructure look like a single, impossibly reliable machine.

Unlike traditional hardware load balancers—expensive boxes bolted into data centers—cloud load balancers are software that scales on demand. They're integrated with everything: auto-scaling, health monitoring, security groups. They do things that would cost a fortune with physical appliances, or simply wouldn't be possible.

Layer 4 vs. Layer 7: Reading the Envelope or Opening the Letter

Cloud providers offer load balancers that operate at different levels of the network stack, and the distinction matters.

Layer 4 load balancers work at the transport layer. They see IP addresses and port numbers—the envelope of the message—but never look inside. They're fast because they don't need to understand what they're carrying. TCP, UDP, whatever protocol you've invented: Layer 4 doesn't care. It just routes packets based on where they came from and where they're going.

Layer 7 load balancers work at the application layer. They open the letter. They understand HTTP, can read headers and cookies, inspect URLs, and make routing decisions based on content. Send all /api/* requests to backend A, all /images/* requests to backend B. Terminate SSL so your application servers don't have to. The tradeoff: slightly more latency because understanding takes time.

The choice is usually obvious. Non-HTTP protocols or raw performance? Layer 4. Web applications that need content-based routing? Layer 7.

Regional vs. Global: How Far Does Your Lie Extend?

Regional load balancers operate within a single cloud region, spreading traffic across availability zones. If you have servers in us-east-1a and us-east-1b, a regional load balancer makes them look like one endpoint. But it won't help users in Tokyo reach servers in Frankfurt.

Global load balancers extend the illusion worldwide. They use anycast—the same IP address advertised from multiple locations—so users automatically reach the nearest healthy region. A user in Singapore hits Singapore servers. A user in London hits European servers. If Singapore goes down, traffic automatically reroutes to the next closest region.

Global load balancers enable true planetary-scale applications. They're also more expensive. Use them when you need them.

Health Checks: Finding the Living Among the Dead

A load balancer that sends traffic to dead servers isn't load balancing—it's load destroying. Health checks are how load balancers know which backends are actually alive.

Active health checks are probes. The load balancer periodically pings each backend: Can I open a TCP connection? Does /health return 200? Is the response time acceptable? Backends that fail enough checks get pulled from rotation. When they start passing again, they're added back.

Passive health checks watch real traffic. If a backend starts returning errors or dropping connections during actual requests, the load balancer notices and acts—often faster than waiting for the next scheduled probe.

The tuning matters. Too aggressive, and a backend gets pulled for a momentary hiccup. Too lenient, and dead backends serve errors for too long. Most teams err on the side of sensitivity—better to temporarily reduce capacity than to serve errors.

Session Affinity: When the Lie Gets Complicated

Sometimes you need the same user to reach the same backend. Maybe your application stores session state in memory. Maybe you're using WebSockets and need connection persistence. This is session affinity, also called sticky sessions.

Cookie-based affinity: The load balancer inserts a cookie identifying which backend served the first request. Subsequent requests with that cookie route to the same backend.

IP-based affinity: All requests from the same source IP go to the same backend. Simpler but problematic when many users share an IP (corporate NAT, mobile carriers).

Session affinity is a crutch. It undermines the whole point of load balancing—even distribution—and makes scaling harder. The better solution is stateless backends or external session storage. But when you need it, it's there.

SSL Termination: Offloading the Expensive Work

SSL/TLS encryption is computationally expensive. Every HTTPS connection requires a handshake involving cryptographic operations. SSL termination moves this work from your application servers to the load balancer.

The load balancer handles the encryption with clients, then forwards plain HTTP to backends. Benefits: reduced backend CPU load, centralized certificate management (one place to install and rotate certificates), and the load balancer can inspect traffic for content-based routing.

The drawback: traffic between the load balancer and backends is unencrypted. For most internal networks, this is fine. For high-security environments, you can use end-to-end encryption—SSL to the load balancer, then re-encrypted to backends—at the cost of more CPU and complexity.

Auto-Scaling Integration: The Elastic Illusion

Here's where cloud load balancers become magical.

Your auto-scaling group detects increased load and spins up ten new servers. The load balancer automatically discovers them, waits for health checks to pass, and adds them to rotation. Traffic starts flowing to the new capacity within minutes.

Load decreases. Auto-scaling decides to remove five servers. The load balancer receives the signal and starts connection draining—no new requests to those servers, but existing requests finish gracefully. Once connections complete (or a timeout passes), the servers terminate.

No manual intervention. Capacity expands and contracts with demand. The user never knows—they just see a responsive application.

Connection Draining: Closing Time at the Bar

When you remove a backend—for scaling, deployment, or maintenance—what happens to requests already in flight?

Connection draining handles this gracefully. The load balancer tells the backend "no new guests" while letting current requests finish. It's closing time at the bar: no one new comes in, but everyone inside can finish their drinks.

The draining timeout is a tradeoff. Too short, and long-running requests get terminated mid-stream. Too long, and deploys take forever. Most applications use 30-60 seconds. Long-polling or file upload endpoints might need more.

WebSocket and HTTP/2: Modern Protocols, Modern Challenges

HTTP was designed for request-response. WebSockets enable persistent, bidirectional connections—essential for real-time applications like chat, gaming, and live updates.

Layer 7 load balancers understand the WebSocket upgrade handshake and maintain the persistent connection to the same backend for the life of the socket. This is session affinity by necessity: you can't bounce a WebSocket between backends.

HTTP/2 multiplexes multiple streams over a single connection. The load balancer needs to understand this to route effectively. Most modern cloud load balancers handle both protocols, but verify support before choosing.

Cross-Zone Load Balancing: Even Distribution

You have three availability zones with different numbers of backends: Zone A has 10 servers, Zone B has 5, Zone C has 2. Traffic arrives roughly equally at each zone. What happens?

Without cross-zone load balancing, each zone distributes its traffic only among its own backends. Zone C's 2 servers get crushed while Zone A's 10 servers coast.

With cross-zone load balancing, all 17 backends share the load evenly regardless of zone. Traffic is redistributed across zone boundaries.

The catch: cross-zone data transfer often costs money. Know your provider's pricing before enabling it everywhere.

Internal vs. External: Public Face, Private Guts

External load balancers have public IP addresses. They accept traffic from the Internet. They're the front door to your application.

Internal load balancers have private IPs. They only accept traffic from within your VPC. They're for the parts of your architecture that should never be directly reachable from the Internet: your application tier talking to your database tier, microservices communicating with each other.

A typical architecture uses both: external load balancer for the web tier, internal load balancers between internal services. Defense in depth.

Logging and Monitoring: Seeing Through the Illusion

The load balancer sees everything. Every request. Every response. Every failure.

Access logs capture the details: client IP, requested URL, response code, processing time, which backend handled it. Essential for debugging and analytics.

Metrics show the patterns: requests per second, error rates, latency percentiles, active connections, healthy vs. unhealthy backends.

Health check status tells you why backends are failing. Connection refused? Timeout? Wrong response code?

This visibility is invaluable. When something goes wrong, the load balancer logs often tell you exactly what.

Cost: The Price of the Illusion

Cloud load balancers charge for existence, usage, and features:

Hourly costs for running the load balancer. Global load balancers cost more than regional. Application (Layer 7) load balancers often cost more than network (Layer 4).

Data processing charges per GB of traffic. This adds up at scale.

Cross-zone transfer fees for traffic between availability zones.

Optimize by choosing the right type for your needs. Don't use a global load balancer for a single-region application. Don't use Layer 7 when Layer 4 suffices. Monitor your bills.

Frequently Asked Questions About Cloud Load Balancers

Was this page helpful?

😔

🤨

😃