Rate Limiting

Updated 10 hours ago

Rate limiting restricts how many requests a client can make within a time period. It's one of the oldest tricks in security, and one of the most effective.

The insight behind it is simple: attackers have infinite patience, but they don't have infinite time. A human trying passwords will give up after ten attempts. A bot will try ten million. Rate limiting makes bots experience time like humans do—suddenly that ten-million-attempt attack takes years instead of minutes.

The Mechanics

Rate limiting tracks requests from each client and enforces a maximum rate. Exceed the limit, and your requests get rejected, delayed, or deprioritized until you slow down.

The clever part is how limits get enforced. The token bucket algorithm gives each client a bucket that fills with tokens at a steady rate. Each request costs a token. Run out of tokens, and you wait. This elegantly handles bursts—you can spend saved-up tokens quickly—while still enforcing average rates.

The leaky bucket works differently: requests enter a queue that drains at a fixed rate. Like water through a hole, traffic flows out steadily regardless of how fast it pours in.

Sliding windows track requests over rolling time periods, avoiding the boundary problems of fixed windows where someone could make 100 requests at 11:59 and another 100 at 12:01.

Each approach trades off accuracy, memory, and complexity. But they all do the same thing: make time matter.

What It Protects Against

Brute force attacks become impractical when you can only try three passwords per minute instead of three thousand per second. The math changes from "crackable in hours" to "crackable in centuries."

API abuse gets contained. That poorly-coded client stuck in an infinite loop? It hits the limit and stops hammering your servers. That scraper trying to download your entire database? Same thing.

Resource fairness emerges naturally. No single user can monopolize shared infrastructure when everyone has the same limits.

Cost control matters when you're paying per API call or SMS message. Rate limiting caps your exposure to runaway processes or abusive users.

Where It Lives

Rate limiting can happen at multiple layers, each with different trade-offs.

Network firewalls can limit packets per IP—broad protection but coarse-grained. They can't distinguish between a login attempt and a static asset request.

Web application firewalls understand HTTP, so they can apply different limits to different endpoints. The login page gets strict limits; the homepage doesn't.

API gateways centralize rate limiting for services, often with sophisticated rules based on API keys, subscription tiers, or endpoint costs.

Application code offers the most context. You know who the user is, what they're trying to do, and whether it makes sense. But you're also closer to the resources you're trying to protect.

The best setups layer these—coarse limits at the edge to stop obvious abuse, refined limits deeper in to handle subtle cases.

The Response Spectrum

When someone exceeds limits, you have options:

Hard blocking returns an error (HTTP 429 Too Many Requests). Clear, simple, but potentially frustrating for legitimate users who didn't realize they were close to the limit.

Soft limiting slows responses instead of blocking them. Requests still succeed but take longer. Less disruptive, but requires resources to maintain those delayed connections.

Degraded service reduces quality or features. Fewer search results, lower resolution images, limited functionality. The user gets something, just not everything.

Escalating responses start soft and get harder. First warning, then delays, then blocks, then longer blocks. This distinguishes accidental overuse from determined abuse.

Talking to Clients

Good APIs tell clients where they stand:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1640000000

These headers say: you get 100 requests per window, you have 73 left, and the window resets at this timestamp. Well-behaved clients use this to pace themselves and avoid hitting limits entirely.

The Limitations

Rate limiting isn't magic. It has real constraints.

Distributed attacks spread across thousands of IPs can each stay under per-IP limits while collectively overwhelming you. If your limit is 100 requests per second per IP, an attacker with 10,000 IPs can still hit you with a million requests per second.

Shared IPs create the opposite problem. Behind corporate NAT, hundreds of legitimate users share one IP address. Strict per-IP limits punish everyone for one person's behavior.

Legitimate bursts happen. A user opens your app after a week away and it syncs everything at once. A webhook endpoint receives a flood of events during an incident. Token bucket helps here, but strict limits can still bite.

State overhead grows with users. Tracking request counts for millions of clients requires memory and coordination, especially across distributed systems.

Beyond Simple Counts

Sophisticated rate limiting looks at more than request counts.

Behavioral patterns matter. Requests spread evenly over time look different from sudden bursts. Human browsing patterns differ from automated scraping. The same request rate can be legitimate or suspicious depending on how it arrives.

Reputation systems give different limits to different clients. New accounts get strict limits. Long-standing accounts with good history get more freedom. This mirrors how trust works in the real world.

Multi-factor analysis combines signals: IP address, user account, geographic location, request patterns, time of day. A request from a known user, from their usual location, at a normal time, gets more latitude than an anonymous request from an unusual IP at 3 AM.

Adaptive limits respond to conditions. During high load, limits tighten to protect stability. During detected attacks, limits become more aggressive. During quiet periods, they relax.

Part of a Larger Defense

Rate limiting works best as one layer in defense-in-depth. It handles application-layer abuse well but can't stop volumetric DDoS attacks that overwhelm your bandwidth before packets even reach your rate limiter.

For comprehensive protection, rate limiting combines with upstream filtering, CDN distribution, and specialized DDoS mitigation services. Each layer handles what it's positioned to handle.

For monitoring infrastructure, rate limiting protects against abuse while ensuring legitimate monitoring traffic flows freely. It also helps detect anomalies—sudden spikes in request rates often indicate attacks, misconfigurations, or incidents worth investigating.

Frequently Asked Questions About Rate Limiting

Was this page helpful?

😔

🤨

😃