Updated 9 hours ago
A 504 Gateway Timeout is one of the stranger errors in HTTP. Nothing is broken. The upstream server might be working perfectly—processing your request, querying databases, doing exactly what it should. The gateway just got tired of waiting and gave up.
This makes 504 different from most errors. It's not about failure. It's about mismatched expectations between systems.
What's Actually Happening
When you see a 504, here's the story:
- Your request reached a gateway (like Nginx, a load balancer, or a CDN)
- The gateway forwarded it to an upstream server
- The upstream server started working on it
- The gateway's patience ran out before the response arrived
- The gateway told you it timed out—even though the upstream might still be working
The backend finishes 30 seconds later, sends its response, and... nobody's listening anymore.
Why Patience Runs Out
The upstream is doing something slow. A database query that scans millions of rows. An external API that takes forever. A report that genuinely needs two minutes to generate. The work is legitimate—it just exceeds the gateway's configured patience.
Resource starvation. The upstream is waiting for something: a database connection from an exhausted pool, a lock held by another process, memory that's being swapped. It's not doing work slowly—it's waiting to start.
Network latency. The response is ready, but the network between gateway and upstream is slow. Packets crawl. The gateway's timer expires before the bytes arrive.
Cascading delays. Your backend calls another service, which calls another service, which calls a database. Each hop adds latency. By the time the response bubbles back up, the original gateway has moved on.
504 vs. Its Cousins
504 Gateway Timeout: The upstream is probably fine—just slow. The gateway gave up.
502 Bad Gateway: The upstream returned something invalid or crashed mid-response. Something actually broke.
503 Service Unavailable: The upstream explicitly said "not right now." This was intentional—maybe it's overloaded, maybe it's in maintenance.
408 Request Timeout: Different direction entirely. The server got tired of waiting for the client to finish sending its request.
The key distinction: 504 is about the gateway's patience with the upstream. The upstream might be perfectly healthy.
Configuring Timeouts
Timeouts should reflect reality, not hope.
The trap: setting the same timeout everywhere. A health check and a report generator have nothing in common. Treat them differently.
The Timeout Chain Problem
Here's where distributed systems get strange. You might have:
- Client timeout: 120 seconds
- Gateway timeout: 60 seconds
- Backend operation: 90 seconds
The backend will complete its work. But the gateway times out at 60 seconds. The client gets a 504. The backend finishes 30 seconds later, response ready, but the connection is gone.
Every layer needs to know about the layers below it. Gateway timeouts should exceed the longest legitimate operation time of the backends they front.
The Real Fix: Don't Make Them Wait
Increasing timeouts treats the symptom. The disease is synchronous waits for slow operations.
For operations that legitimately take time:
The 202 Accepted pattern. "I heard you. I'm working on it. Check back later." No timeout can hurt you if you respond immediately.
For operations that should be fast but aren't:
Find out why. Add database indexes. Cache expensive computations. Profile the code path. A 504 on what should be a fast endpoint is a performance bug, not a timeout configuration problem.
Graceful Degradation
When the upstream is slow, you have choices beyond "error" and "wait forever."
Stale data is usually better than no data. The user gets something while you figure out why fresh fetches are slow.
Client Retry Strategy
504 errors are often transient. The upstream might have been momentarily overloaded, or there might have been a network hiccup. Retrying makes sense—with backoff.
Exponential backoff: 1 second, then 2, then 4. Give the system time to recover.
Circuit Breakers
If an upstream keeps timing out, stop asking. A circuit breaker prevents you from piling requests onto an already-struggling system.
When failures exceed 50%, the circuit opens. Requests get the fallback immediately instead of waiting to timeout. After 60 seconds, it tries one request to see if the upstream recovered.
The Deeper Pattern
504 errors reveal something fundamental about distributed systems: they're held together by patience and assumptions. Every component assumes the others will respond "fast enough." When those assumptions break—when one system's definition of "fast enough" doesn't match another's—you get errors even when nothing is actually broken.
The fix isn't just configuration. It's designing systems where long waits don't propagate, where stale data is acceptable, where "still working on it" is a valid response. It's accepting that in a distributed system, time is a resource that can run out.
Frequently Asked Questions About 504 Gateway Timeout
Was this page helpful?