1. Library
  2. Http and the Web
  3. Web Architecture

Updated 10 hours ago

The fastest request is the one you never make.

Every time a user loads your page, dozens of resources must be assembled: HTML, CSS, JavaScript, images, fonts, API data. Without caching, each resource requires a round-trip to your servers. Fifty resources means fifty requests, each burning time and compute.

Caching inverts this. Store copies of data closer to where they're needed—in the browser, on edge servers, in memory—and most requests never travel the wire at all. A page that took 3 seconds loads in 300 milliseconds. The server that handled a million requests now handles a hundred thousand.

But caching is a deal with the devil. You're serving copies, not originals. When the original changes, the copies lie. Knowing when to refresh stale data—cache invalidation—is one of the genuinely hard problems in computing.

Where Caches Live

Caching happens at multiple layers, each with different tradeoffs.

Browser cache stores resources on the user's device. Images, CSS, and JavaScript files are saved locally after the first download. Subsequent visits load them instantly—zero network latency. The catch: this only helps returning visitors, and you have limited control over what browsers keep and for how long.

CDN cache stores content on edge servers distributed globally. A user in Tokyo gets resources from a Tokyo server, not your origin in Virginia. This helps everyone—new visitors, returning visitors—and absorbs traffic spikes before they reach your infrastructure.

Reverse proxy cache sits in front of your application servers, caching entire responses. When the same API endpoint is requested repeatedly, the proxy serves cached results without touching your backend. Varnish, Nginx, and cloud load balancers all do this.

Application cache stores data in memory systems like Redis or Memcached. Database query results, computed values, expensive API responses—anything slow to generate but frequently needed. This layer is under your complete control.

Database cache is built into the database itself, keeping hot data and query results in memory. You don't control this directly, but understanding it helps explain why repeated queries are faster than first runs.

HTTP Cache Headers

HTTP has built-in machinery for cache control. Headers on responses tell browsers and intermediaries exactly how to handle content.

Cache-Control is the primary directive. max-age=3600 means "cache this for one hour." no-cache means "you can store it, but check with me before using it." no-store means "don't cache this at all"—essential for sensitive data. public allows shared caches (CDNs) to store the response; private restricts caching to the browser only.

Directives combine: Cache-Control: public, max-age=86400, immutable tells caches this resource will never change and can be stored for 24 hours without revalidation.

ETag is a fingerprint for a specific version of a resource. When content changes, the ETag changes. On subsequent requests, browsers send the old ETag in an If-None-Match header. If it still matches, the server responds with 304 Not Modified—a tiny response instead of the full payload.

Last-Modified works similarly but uses timestamps. Browsers send If-Modified-Since, and servers respond with 304 if nothing changed.

These headers enable conditional requests: "Has this changed since I last asked?" Most of the time, the answer is no, and a 200-byte 304 replaces a 200-kilobyte download.

The Hard Problem: Invalidation

Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things."

He wasn't joking. Cache invalidation is genuinely difficult because of a fundamental epistemological problem: how do you know when something you're not looking at has changed?

Time-based expiration is the simplest answer: assume things stay valid for a fixed duration. max-age=300 means "trust this for 5 minutes, then check." Simple, predictable, but crude. Content might change after 30 seconds or stay identical for weeks—time-based expiration doesn't know or care.

Short durations keep content fresh but reduce cache effectiveness. Long durations improve cache hits but serve stale data longer. There's no universal right answer.

Versioned URLs sidestep the problem entirely. Instead of style.css, serve style.a3f8c2.css—where the hash is computed from the file contents. When you change the CSS, the hash changes, the URL changes, and browsers fetch the new version because they've never seen that URL before. The old version wasn't invalidated; it simply stopped being referenced.

This works beautifully for static assets. Build tools generate content hashes automatically. You can set max-age=31536000 (one year) because the URL itself guarantees freshness.

Manual purging explicitly removes content from caches. When you update a blog post, you call your CDN's purge API for that URL. Precise control, but requires integration between your CMS and caching layer.

Cache tags let you purge related content together. Tag all product-related responses with product:123. When that product changes, purge everything with that tag. More maintainable than tracking individual URLs, supported by most CDNs.

Stale-while-revalidate serves cached content immediately while checking for updates in the background. Users get fast responses; updates propagate on the next request. A clever compromise between freshness and speed.

Strategies by Content Type

Different content demands different approaches.

Static assets—CSS, JavaScript, images, fonts—change only when developers deploy. Use versioned URLs and cache for a year. Every request after the first is a cache hit.

HTML pages are trickier. Purely static sites can cache aggressively. Dynamic sites need shorter durations or no-cache with ETags, checking freshness on each load while still benefiting from 304 responses.

API responses vary wildly. Reference data (country codes, product categories) can cache for hours. User-specific data shouldn't live in shared caches but might be cached in the browser. Real-time data—stock prices, live scores—can't be cached at all.

Personalized content is the hardest case. Each user sees different data, so shared caches can't help. Solutions: cache at the browser only (private), separate static shells from dynamic content, or assemble personalized pages from cached fragments.

Caching Patterns

Cache-aside (lazy loading) is the most common pattern. Check the cache first. On hit, return cached data. On miss, fetch from the database, store in cache, return. Simple, but first requests are always slow.

Write-through writes to cache and database simultaneously. Update a user profile? Both stores get the new data in the same operation. Cache stays in sync, but you're caching data that might never be read.

Write-behind writes to cache immediately, database asynchronously. Fast writes, batched database operations, but dangerous—cache failure before database write means data loss.

Refresh-ahead proactively refreshes popular items before they expire, ensuring hot data never experiences a cache miss. Works well for predictable access patterns; wastes resources on unpopular data.

Distributed Caching

When your application runs on multiple servers, local caching breaks down. Each server maintains its own cache. User requests hit different servers. Caches diverge. Inconsistency ensues.

Redis and Memcached solve this with shared caches accessible from all application servers. Every server sees the same cached data. Hit rates improve because a cache fill on one server benefits all others.

Redis offers persistence, rich data structures (lists, sets, sorted sets), pub/sub, and atomic operations. Memcached is simpler—pure key-value storage—with slightly lower latency.

The tradeoff: network round-trips. A local memory lookup takes nanoseconds; a Redis lookup takes milliseconds. Still far faster than database queries, but not free.

Cache server failure must be handled gracefully. Applications should degrade to database queries, not crash. Treat cache as an optimization, not a requirement.

Eviction Policies

Caches have finite memory. When they fill up, something must go.

LRU (Least Recently Used) evicts whatever hasn't been accessed longest. Works well when recent access predicts future access—true for most workloads.

LFU (Least Frequently Used) evicts items with the lowest access count. Favors consistently popular data over recent one-time accesses.

TTL (Time To Live) expires data after a fixed duration regardless of access patterns. Often combined with LRU—expired items are evicted first, then least-recently-used.

LRU is the default choice for good reason. It's simple, effective, and matches how most applications actually access data.

The Thundering Herd

A cache miss triggers a database query. What happens when a thousand requests simultaneously encounter the same cache miss?

A thousand database queries. At once. For identical data.

This is the thundering herd: a moment of collective ignorance causing a stampede to the database. It happens when popular cached items expire, or when caches restart empty.

Solutions: Request coalescing makes concurrent requests for the same key wait for a single database query. Cache locks let the first request acquire a lock while others wait. Stale-while-revalidate keeps serving the old value while one request refreshes it.

Measuring What Matters

Cache hit rate: What percentage of requests are served from cache? 80% means your backend handles only 20% of traffic. Track this per cache layer.

Miss latency: How long do cache misses take? High miss latency identifies slow database queries or external APIs—prime candidates for caching.

Memory utilization: Consistently hitting capacity limits means you need more memory or better eviction policies.

Staleness: How often are users seeing outdated data? This tensions against hit rate—higher cache durations improve hits but increase staleness.

Common Mistakes

Over-caching: Caching everything wastes memory on rarely-accessed data and serves stale content too long. Cache what's frequently accessed and expensive to generate.

Under-caching: Missing obvious opportunities. If the same database query runs a thousand times per minute, it should be cached.

Cache key collisions: Different data sharing the same cache key. Include all relevant parameters—user ID, locale, feature flags—in key generation.

Ignoring cache in development: Deploying changes that don't take effect because users still have old versions cached. Use versioned URLs for assets; set short cache times for development environments.

Treating cache as source of truth: Cache can disappear—evicted, expired, server restarted. Your application must function (slowly) without it.

Caching is deceptively simple in concept—store copies closer to where they're needed—and genuinely complex in practice. The performance gains are real and substantial. So are the bugs when invalidation goes wrong.

The skill isn't knowing how to cache. It's knowing when the copy is no longer good enough.

Was this page helpful?

😔
🤨
😃