429 Too Many Requests

Updated 9 hours ago

The 429 Too Many Requests error is the Internet's way of saying "there's only so much of me to go around."

When you hit a 429, the server understood your request perfectly. It just refused to process it because you've been asking for too much too fast. This isn't a bug or a misconfiguration—it's a boundary. Rate limiting exists because servers are shared resources, and without limits, one greedy client could ruin the experience for everyone.

Think of it like a deli counter. You take a number, you wait your turn. If someone tried to place 100 orders per minute, the staff would eventually say "slow down" for the sake of everyone else in line.

What a 429 Response Looks Like

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1729512000

{
    "error": "Too Many Requests",
    "message": "Rate limit exceeded. Please retry after 60 seconds."
}

The critical header is Retry-After. It tells you exactly when you can try again. A well-behaved client respects this; an aggressive client that ignores it risks getting blocked entirely.

The Headers That Tell You Where You Stand

Good APIs don't just reject you—they tell you why and when you'll be welcome again:

Header	Meaning
`X-RateLimit-Limit`	Your quota (e.g., 100 requests per minute)
`X-RateLimit-Remaining`	How many requests you have left
`X-RateLimit-Reset`	Unix timestamp when your quota refills
`Retry-After`	Seconds (or date) until you can retry

Smart clients track these headers proactively. Why wait for a 429 when you can see it coming?

Why You're Getting Rate Limited

You're making requests too fast. Most rate limits are "X requests per Y time." Exceed them, and every subsequent request gets rejected until the window resets.

You're bursting. Some APIs have separate burst limits—maybe 100 requests per minute overall, but no more than 10 in any single second. This prevents stampedes.

You have too many concurrent requests. Some services limit how many requests you can have in-flight simultaneously. Six parallel requests when the limit is five? The sixth gets a 429.

Handling 429s in Your Code

The wrong way:

// This ignores rate limits entirely
const response = await fetch('/api/data');

The right way:

async function fetchWithRetry(url, maxRetries = 3) {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        const response = await fetch(url);
        
        if (response.status !== 429) {
            return response;
        }
        
        // Respect the server's guidance
        const retryAfter = response.headers.get('Retry-After');
        const waitMs = retryAfter 
            ? parseInt(retryAfter) * 1000 
            : Math.pow(2, attempt) * 1000; // Exponential backoff fallback
        
        await new Promise(resolve => setTimeout(resolve, waitMs));
    }
    
    throw new Error('Rate limit exceeded after retries');
}

Even better—track your limits and avoid hitting them:

class ApiClient {
    constructor() {
        this.remaining = Infinity;
        this.resetTime = 0;
    }
    
    async fetch(url) {
        // Wait if we know we're out of quota
        if (this.remaining === 0) {
            const waitMs = (this.resetTime - Date.now() / 1000) * 1000;
            if (waitMs > 0) await new Promise(r => setTimeout(r, waitMs));
        }
        
        const response = await fetch(url);
        
        // Update our understanding of the limits
        this.remaining = parseInt(response.headers.get('X-RateLimit-Remaining') ?? Infinity);
        this.resetTime = parseInt(response.headers.get('X-RateLimit-Reset') ?? 0);
        
        return response;
    }
}

Implementing Rate Limiting on Your Server

The simplest approach with Express:

const rateLimit = require('express-rate-limit');

app.use('/api/', rateLimit({
    windowMs: 60 * 1000,  // 1 minute
    max: 100,              // 100 requests per minute
    standardHeaders: true, // Return rate limit info in headers
    handler: (request, response) => {
        response.status(429).json({
            error: 'Too Many Requests',
            retryAfter: Math.ceil((request.rateLimit.resetTime - Date.now()) / 1000)
        });
    }
}));

For production systems, use Redis so limits persist across server restarts and work across multiple instances:

const Redis = require('ioredis');
const redis = new Redis();

async function rateLimit(request, response, next) {
    const key = `ratelimit:${request.ip}`;
    const limit = 100;
    const windowSeconds = 60;
    
    const current = await redis.incr(key);
    if (current === 1) await redis.expire(key, windowSeconds);
    
    const ttl = await redis.ttl(key);
    
    response.setHeader('X-RateLimit-Limit', limit);
    response.setHeader('X-RateLimit-Remaining', Math.max(0, limit - current));
    response.setHeader('X-RateLimit-Reset', Math.floor(Date.now() / 1000) + ttl);
    
    if (current > limit) {
        response.setHeader('Retry-After', ttl);
        return response.status(429).json({ error: 'Too Many Requests' });
    }
    
    next();
}

Different Limits for Different Users

Not all users are equal. A paying customer deserves more capacity than an anonymous scraper:

const limits = {
    anonymous: { requests: 10, window: 3600 },    // 10/hour
    free: { requests: 100, window: 3600 },        // 100/hour  
    premium: { requests: 1000, window: 3600 },    // 1,000/hour
    enterprise: { requests: 10000, window: 3600 } // 10,000/hour
};

Document these limits clearly. Surprises are frustrating. Users should know their quota before they hit it.

Rate Limiting Strategies

Fixed window: Count requests per calendar minute (00:00-00:59, 01:00-01:59). Simple but allows bursts at boundaries—a client could make 100 requests at 00:59 and another 100 at 01:00.

Sliding window: Count requests in the last 60 seconds, always. Smoother but harder to implement.

Token bucket: Tokens accumulate at a steady rate; each request costs one token. Allows controlled bursts while maintaining overall limits.

Leaky bucket: Requests queue up and drain at a fixed rate. Smoothest traffic but adds latency.

For most APIs, sliding window or token bucket strikes the best balance between fairness and implementation complexity.

Frequently Asked Questions About 429 Too Many Requests

Was this page helpful?

😔

🤨

😃