Updated 10 hours ago
A single server has a limit. It can handle only so many connections, process only so many requests, serve only so many users. Push past that limit and the server slows, stutters, stops. Your application dies because one machine couldn't keep up.
Load balancers exist because of this mortality. They sit in front of multiple servers and distribute incoming requests across all of them. No single server bears the full weight. And when one server fails—because servers always fail eventually—the others keep running. Users never notice.
A single server can fail. A thousand servers behind a load balancer? Something is always alive.
The Problem Load Balancers Solve
When your application outgrows one server, you have two choices.
Vertical scaling means buying a bigger server. More CPU, more RAM, faster disks. This works until it doesn't. The most powerful server money can buy still has limits. And you still have one server—if it fails, everything fails.
Horizontal scaling means adding more servers. There's no ceiling. Need more capacity? Add another server. One fails? The others continue. But horizontal scaling creates a new problem: how do users know which server to connect to?
You can't expect users to pick a server. You can't manually assign users to servers. You need something that presents a single address to the world while secretly distributing work across many machines.
That something is a load balancer.
How Load Balancers Work
From the outside, users see one address. They connect to loadbalancer.example.com and have no idea that ten servers sit behind it.
The load balancer accepts each incoming connection and makes a decision: which backend server should handle this request? It forwards the request to the chosen server, receives the response, and sends it back to the user.
If a backend server dies, the load balancer notices. It stops sending traffic there. Requests flow to the surviving servers. Users experience nothing—maybe a few milliseconds of delay while a failed request retries on a healthy server.
This is the core value: abstraction. Users see one endpoint. You can add servers, remove servers, replace servers, and users never know. The load balancer maintains the illusion of a single, immortal application.
Choosing Which Server Gets the Request
The algorithm that picks which server handles each request matters more than you might think.
Round-robin is the simplest. First request goes to server A, second to server B, third to server C, then back to A. Every server gets equal traffic. This works when all servers are identical and all requests take roughly equal effort.
But requests aren't equal. Some hit the database hard. Some return cached data instantly. Round-robin ignores this. A server processing a heavy query gets the same traffic as a server returning cached responses. Load becomes uneven.
Weighted round-robin accounts for server differences. A server with weight 3 gets three times as many requests as a server with weight 1. If you add a more powerful machine to the pool, give it a higher weight.
Least connections routes each request to whichever server currently has the fewest active connections. A server processing slow requests naturally accumulates connections. Least connections sees this and sends new requests elsewhere. Load balances itself.
Least response time goes further: route to the server with fewest connections AND fastest recent responses. A server that's both idle and fast gets priority.
IP hash uses a hash of the client's IP address to pick a server. The same client always reaches the same server (unless that server fails). This matters for session affinity—which we'll return to, because it's a trap.
Least bandwidth routes to whichever server is currently pushing the least data. Useful for media servers where bandwidth, not request count, is the constraint.
Most applications use least connections or round-robin. The sophisticated algorithms matter at scale, but they can't save you from fundamental architectural problems.
Layer 4 vs. Layer 7
Load balancers can operate at different levels of the network stack.
Layer 4 load balancing sees only IP addresses and ports. It doesn't understand HTTP. It just sees "a connection came from this IP to this port" and decides where to send it. Fast. Efficient. Dumb.
Layer 7 load balancing understands HTTP. It can read the URL, inspect headers, examine cookies, even look at the request body. This enables sophisticated routing: send /api/ requests to API servers, send /images/ to media servers, route based on the Authorization header.
The tradeoff is speed. Layer 7 must parse HTTP, which takes more processing than Layer 4's simple packet forwarding. But modern Layer 7 load balancers handle hundreds of thousands of requests per second. The flexibility is almost always worth the cost.
Most web applications use Layer 7. The ability to route based on content, terminate SSL centrally, and add/modify headers is too valuable to give up for marginal speed gains.
Health Checks: Knowing What's Alive
A load balancer that sends requests to dead servers is worse than useless. Health checks prevent this.
Active health checks: The load balancer periodically pings each backend server. This might be a TCP connection attempt, an HTTP request to /health, or a more sophisticated check that verifies database connectivity. Healthy servers respond. Unhealthy servers don't. Failed servers get removed from the pool.
Passive health checks: The load balancer monitors real traffic. If a server starts returning 500 errors or timing out on actual requests, mark it unhealthy even if active checks pass. Sometimes servers are technically alive but functionally broken.
When a server fails, traffic automatically flows to healthy servers. When the failed server recovers and starts passing health checks again, it automatically rejoins the pool. No human intervention required.
This is high availability. Individual servers can fail. The application stays up.
Session Persistence: A Necessary Evil
Some applications store user sessions in server memory. User logs in on server A, their session lives on server A. If their next request goes to server B, they appear logged out.
Session persistence (or "sticky sessions") ensures a user's requests consistently reach the same server. The load balancer tracks which server handled the user's first request and routes subsequent requests there.
This can be done with cookies (the load balancer sets a cookie identifying the backend server) or IP hashing (same IP always goes to same server).
Here's the problem: session persistence undermines everything load balancers give you. You wanted distribution and redundancy. Now you're routing users to specific servers. If that server dies, users lose their sessions. You've recreated the single point of failure you were trying to escape.
The real solution: don't store sessions on servers. Store them in Redis, a database, or another shared location. Any server can handle any request. A server dies? Users seamlessly continue on another server, sessions intact. Sticky sessions become unnecessary.
Session persistence is a crutch for architectures that weren't designed for distribution. If you're starting fresh, design for shared state from the beginning.
Global Load Balancing: Multiple Data Centers
For truly global applications, load balancing happens at the DNS level before TCP connections even begin.
A user in Tokyo queries DNS for your application. DNS returns the IP address of your Tokyo data center. A user in London gets your London data center's IP. Users automatically connect to nearby servers, reducing latency.
If your Tokyo data center fails, DNS stops returning its IP. Tokyo users get routed to your next-nearest data center. Failover at the geographic level.
The catch: DNS caching. When you update DNS, the old answers persist in caches worldwide for minutes to hours. DNS-based failover isn't instant. Most global applications combine DNS-level geographic routing with application-level load balancing within each region.
Load Balancer Availability
Load balancers solve single points of failure for your application servers. But the load balancer itself can fail. If it does, all those healthy backend servers become unreachable.
Active-passive: Two load balancers. One handles traffic. The other watches and takes over if the primary fails. Simple redundancy, but half your load balancing capacity sits idle.
Active-active: Multiple load balancers simultaneously handle traffic. DNS distributes clients across them, or network-level routing (ECMP) spreads connections. More complex, but no wasted capacity.
Cloud load balancers: AWS, Google Cloud, and Azure provide load balancing as a distributed service. They're already redundant across availability zones. You don't configure failover—it's built in.
Most modern deployments use cloud load balancers and don't think about load balancer availability at all. The cloud provider handles it.
Hardware, Software, or Cloud?
Hardware load balancers are dedicated appliances. F5, Citrix. High performance, specialized features, expensive. Scaling means buying more hardware. Mostly found in enterprises with legacy infrastructure.
Software load balancers run on standard servers. Nginx, HAProxy, Envoy, Traefik. Flexible, cost-effective, scalable by adding instances. You manage them yourself.
Cloud load balancers are managed services. AWS ALB/NLB, Google Cloud Load Balancing, Azure Load Balancer. No infrastructure to manage, automatic scaling, pay-per-use. More expensive per request than self-managed, but zero operational burden.
The industry has moved toward software and cloud. Hardware load balancers are increasingly rare outside legacy environments.
Security at the Edge
Load balancers sit at the edge of your network, making them natural places to implement security.
SSL/TLS termination: Load balancers handle HTTPS, decrypting traffic before forwarding to backends. Backends receive plain HTTP on the private network. Certificate management happens in one place instead of on every server.
DDoS absorption: Load balancers can absorb traffic spikes, rate-limit individual clients, and filter obviously malicious requests. They're the first line of defense.
Web application firewalls: Some load balancers include WAF functionality—blocking SQL injection, XSS, and other application-layer attacks before they reach your servers.
Access control: IP allowlisting and denylisting at the load balancer blocks traffic before it touches your application.
Putting security at the load balancer means your backends can focus on application logic. They trust that traffic reaching them has already been validated.
Frequently Asked Questions About Load Balancers
Was this page helpful?