1. Library
  2. Computer Networks
  3. Http and the Web
  4. Status Codes

Updated 9 hours ago

A 504 Gateway Timeout is one of the stranger errors in HTTP. Nothing is broken. The upstream server might be working perfectly—processing your request, querying databases, doing exactly what it should. The gateway just got tired of waiting and gave up.

This makes 504 different from most errors. It's not about failure. It's about mismatched expectations between systems.

What's Actually Happening

When you see a 504, here's the story:

  1. Your request reached a gateway (like Nginx, a load balancer, or a CDN)
  2. The gateway forwarded it to an upstream server
  3. The upstream server started working on it
  4. The gateway's patience ran out before the response arrived
  5. The gateway told you it timed out—even though the upstream might still be working
Client → Gateway (60s timeout) → Backend (processing for 90s)
                ↓
        "I'm done waiting"
                ↓
        504 Gateway Timeout

The backend finishes 30 seconds later, sends its response, and... nobody's listening anymore.

Why Patience Runs Out

The upstream is doing something slow. A database query that scans millions of rows. An external API that takes forever. A report that genuinely needs two minutes to generate. The work is legitimate—it just exceeds the gateway's configured patience.

Resource starvation. The upstream is waiting for something: a database connection from an exhausted pool, a lock held by another process, memory that's being swapped. It's not doing work slowly—it's waiting to start.

Network latency. The response is ready, but the network between gateway and upstream is slow. Packets crawl. The gateway's timer expires before the bytes arrive.

Cascading delays. Your backend calls another service, which calls another service, which calls a database. Each hop adds latency. By the time the response bubbles back up, the original gateway has moved on.

504 vs. Its Cousins

504 Gateway Timeout: The upstream is probably fine—just slow. The gateway gave up.

502 Bad Gateway: The upstream returned something invalid or crashed mid-response. Something actually broke.

503 Service Unavailable: The upstream explicitly said "not right now." This was intentional—maybe it's overloaded, maybe it's in maintenance.

408 Request Timeout: Different direction entirely. The server got tired of waiting for the client to finish sending its request.

The key distinction: 504 is about the gateway's patience with the upstream. The upstream might be perfectly healthy.

Configuring Timeouts

Timeouts should reflect reality, not hope.

server {
    # Fast endpoints get short timeouts
    location /api/users {
        proxy_pass http://backend;
        proxy_read_timeout 30s;
    }

    # Report generation legitimately takes time
    location /api/reports {
        proxy_pass http://backend;
        proxy_read_timeout 180s;
    }

    # Health checks should be instant
    location /health {
        proxy_pass http://backend;
        proxy_read_timeout 5s;
    }
}

The trap: setting the same timeout everywhere. A health check and a report generator have nothing in common. Treat them differently.

The Timeout Chain Problem

Here's where distributed systems get strange. You might have:

  • Client timeout: 120 seconds
  • Gateway timeout: 60 seconds
  • Backend operation: 90 seconds

The backend will complete its work. But the gateway times out at 60 seconds. The client gets a 504. The backend finishes 30 seconds later, response ready, but the connection is gone.

Every layer needs to know about the layers below it. Gateway timeouts should exceed the longest legitimate operation time of the backends they front.

The Real Fix: Don't Make Them Wait

Increasing timeouts treats the symptom. The disease is synchronous waits for slow operations.

For operations that legitimately take time:

// Don't make the client wait
app.post('/api/reports', async function(request, response) {
    const jobId = await queue.add('generate-report', request.body);
    
    // Return immediately with a job ID
    response.status(202).json({
        jobId: jobId,
        statusUrl: `/api/jobs/${jobId}`
    });
});

// Let them poll for completion
app.get('/api/jobs/:id', async function(request, response) {
    const job = await queue.getJob(request.params.id);
    
    response.json({
        status: job.finished ? 'completed' : 'processing',
        progress: job.progress,
        result: job.finished ? job.result : null
    });
});

The 202 Accepted pattern. "I heard you. I'm working on it. Check back later." No timeout can hurt you if you respond immediately.

For operations that should be fast but aren't:

Find out why. Add database indexes. Cache expensive computations. Profile the code path. A 504 on what should be a fast endpoint is a performance bug, not a timeout configuration problem.

// Track what's actually slow
app.use(function(request, response, next) {
    const start = Date.now();
    
    response.on('finish', function() {
        const duration = Date.now() - start;
        
        if(duration > 10000) {
            logger.warn('Slow request', {
                url: request.url,
                duration: duration
            });
        }
    });
    
    next();
});

Graceful Degradation

When the upstream is slow, you have choices beyond "error" and "wait forever."

app.get('/api/data', async function(request, response) {
    try {
        const data = await Promise.race([
            fetchFreshData(),
            new Promise(function(_, reject) {
                setTimeout(function() {
                    reject(new Error('timeout'));
                }, 5000);
            })
        ]);
        
        response.json(data);
    }
    catch(error) {
        // Fresh data too slow? Return stale data.
        const cached = cache.get('data');
        
        if(cached) {
            response.json({
                ...cached,
                stale: true
            });
        }
        else {
            response.status(504).json({
                error: 'Unable to fetch data in time'
            });
        }
    }
});

Stale data is usually better than no data. The user gets something while you figure out why fresh fetches are slow.

Client Retry Strategy

504 errors are often transient. The upstream might have been momentarily overloaded, or there might have been a network hiccup. Retrying makes sense—with backoff.

async function fetchWithRetry(url, maxRetries = 3) {
    for(let attempt = 0; attempt < maxRetries; attempt++) {
        const response = await fetch(url);
        
        if(response.status === 504 && attempt < maxRetries - 1) {
            // Wait longer between each retry
            await new Promise(function(resolve) {
                setTimeout(resolve, Math.pow(2, attempt) * 1000);
            });
            continue;
        }
        
        return response;
    }
}

Exponential backoff: 1 second, then 2, then 4. Give the system time to recover.

Circuit Breakers

If an upstream keeps timing out, stop asking. A circuit breaker prevents you from piling requests onto an already-struggling system.

const CircuitBreaker = require('opossum');

const breaker = new CircuitBreaker(fetchFromBackend, {
    timeout: 30000,
    errorThresholdPercentage: 50,
    resetTimeout: 60000
});

breaker.fallback(function() {
    return { error: 'Service temporarily unavailable', cached: getCachedData() };
});

When failures exceed 50%, the circuit opens. Requests get the fallback immediately instead of waiting to timeout. After 60 seconds, it tries one request to see if the upstream recovered.

The Deeper Pattern

504 errors reveal something fundamental about distributed systems: they're held together by patience and assumptions. Every component assumes the others will respond "fast enough." When those assumptions break—when one system's definition of "fast enough" doesn't match another's—you get errors even when nothing is actually broken.

The fix isn't just configuration. It's designing systems where long waits don't propagate, where stale data is acceptable, where "still working on it" is a valid response. It's accepting that in a distributed system, time is a resource that can run out.

Frequently Asked Questions About 504 Gateway Timeout

Was this page helpful?

😔
🤨
😃
504 Gateway Timeout • Library • Connected