Caching Strategies
Caching stores frequently accessed data closer to where it’s needed—in memory, at the edge, or in a dedicated cache layer—so you don’t hit the origin (database, API, disk) on every request.
Done well, caching reduces latency, increases throughput, lowers origin load, and cuts cost. Done poorly, it introduces stale data, cache stampedes, and hard-to-debug inconsistencies.
Cache Layers
Section titled “Cache Layers”Most systems have multiple cache layers, each with different trade-offs:
| Layer | Where | Latency | Capacity | Best For |
|---|---|---|---|---|
| Client / browser cache | User’s device | Instant | Small | Static assets, API responses with TTL |
| CDN / edge cache | CDN nodes near users | ~10-50ms | Large | Static content, public API responses, media |
| Application cache | In-process memory (e.g. local map, LRU cache) | Microseconds | Limited by instance memory | Hot keys, config, session data |
| Distributed cache | Dedicated cache cluster (e.g. Redis, Memcached) | ~1-5ms | Large, shared across instances | Shared state, session data, computed results, rate limiting |
| Database query cache | Database layer | Varies | Database-managed | Repeated identical queries |
Requests flow through layers top to bottom: client → CDN → application → distributed cache → database.
Each layer absorbs traffic so less reaches the next.
Cache Patterns
Section titled “Cache Patterns”Cache-Aside (Lazy Loading):
- Application checks cache first.
- Cache miss → read from origin, store in cache, return to caller.
- Subsequent requests hit cache until TTL expires or entry is invalidated.
Best for: Read-heavy workloads where stale data is tolerable for a short window.
Most common pattern.
Write-Through:
- Application writes to cache and origin on every write.
- Cache is always up to date.
Best for: Data that must be consistent and is read frequently after writes.
Higher write latency (two writes on every mutation).
Write-Behind (Write-Back):
- Application writes to cache.
- Cache asynchronously writes to origin (batched or delayed).
Best for: Write-heavy workloads where you can tolerate brief inconsistency.
Risk: data loss if cache fails before flushing to origin.
Read-Through:
- Cache itself fetches from origin on miss (cache manages the data source connection).
- Application only talks to cache.
Best for: Simplifying application code.
The cache acts as a transparent layer.
Cache Invalidation
Section titled “Cache Invalidation”Cache invalidation is one of the hardest problems in computer science.
Stale data is the primary risk of caching.
Strategies:
- TTL (Time To Live) — Every cache entry expires after a fixed duration. Simple and predictable. Choose TTL based on how stale the data can be (seconds for real-time, minutes for product catalogs, hours for static content).
- Event-Based Invalidation — When the underlying data changes, publish an event that invalidates or updates the cache entry. More complex but keeps data fresher.
- Version Keys — Include a version number in the cache key. When data changes, increment the version so old entries are naturally bypassed.
- Purge / Manual Invalidation — Explicitly delete cache entries when you know data has changed. Simple for small-scale; hard to maintain at scale.
TTL guidelines:
| Data Type | Suggested TTL | Rationale |
|---|---|---|
| User session | 15-30 minutes | Security + freshness |
| Product catalog | 5-15 minutes | Changes infrequently; stale data is low risk |
| Search results | 1-5 minutes | Changes frequently; slight staleness acceptable |
| Static assets | Hours to days | Rarely changes; use versioned URLs for cache busting |
| Config / feature flags | 30-60 seconds | Needs to propagate quickly for kill switches |
Caching Common Pitfalls
Section titled “Caching Common Pitfalls”- Cache Stampede (Thundering Herd) — When a popular cache entry expires, many requests simultaneously hit the origin. Mitigation: lock or “single-flight” (only one request fetches, others wait), staggered TTLs, or background refresh before expiry.
- Cold Start — After a deploy or cache flush, the cache is empty and all requests hit the origin. Mitigation: warm the cache on startup for known hot keys.
- Inconsistency — Cache and origin disagree. Especially dangerous with write-behind or when multiple services write to the same data. Keep invalidation simple and explicit.
- Over-Caching — Caching everything increases memory cost and makes debugging harder. Cache what’s hot and expensive to compute; don’t cache what’s cheap to fetch.
- Cache Key Collisions — Two different queries mapping to the same key return wrong data. Use precise, well-structured keys.
Caching And Cost
Section titled “Caching And Cost”Caching directly reduces cost by offloading work from expensive resources (databases, compute, external APIs) to cheaper ones (in-memory stores, CDNs).
Quantify the impact:
- Origin Offload Ratio — What percentage of requests are served from cache vs. origin? A 90% cache hit rate means your origin handles 10x less traffic.
- Cost Per Request — A cache hit from Redis costs a fraction of a database query. At scale, this adds up.
- Right-size The Cache — Monitor cache utilization and eviction rates. If the eviction rate is high, you may need a larger cache or better TTL strategy.
See Also
Section titled “See Also”- Load and Stress Testing — Test with caching enabled and disabled to understand the impact.
- Capacity Planning — Caching changes your capacity requirements. Factor cache hit rates into your models.
- System Design Checklist — Section 2 (Data Layer) and Section 5 (Caching and Acceleration) cover caching at the system design level.
- Redis vs Kafka: When to Use Which — Redis is commonly used as a distributed cache.