Caching Strategies

First PublishedFeb 16, 2026ByAtif Alam

Caching stores frequently accessed data closer to where it’s needed—in memory, at the edge, or in a dedicated cache layer—so you don’t hit the origin (database, API, disk) on every request.

Done well, caching reduces latency, increases throughput, lowers origin load, and cuts cost. Done poorly, it introduces stale data, cache stampedes, and hard-to-debug inconsistencies.

Cache Layers

Most systems have multiple cache layers, each with different trade-offs:

Layer	Where	Latency	Capacity	Best For
Client / browser cache	User’s device	Instant	Small	Static assets, API responses with TTL
CDN / edge cache	CDN nodes near users	~10-50ms	Large	Static content, public API responses, media
Application cache	In-process memory (e.g. local map, LRU cache)	Microseconds	Limited by instance memory	Hot keys, config, session data
Distributed cache	Dedicated cache cluster (e.g. Redis, Memcached)	~1-5ms	Large, shared across instances	Shared state, session data, computed results, rate limiting
Database query cache	Database layer	Varies	Database-managed	Repeated identical queries

Requests flow through layers top to bottom: client → CDN → application → distributed cache → database.

Each layer absorbs traffic so less reaches the next.

Cache Patterns

Cache-Aside (Lazy Loading):

Application checks cache first.
Cache miss → read from origin, store in cache, return to caller.
Subsequent requests hit cache until TTL expires or entry is invalidated.

Best for: Read-heavy workloads where stale data is tolerable for a short window.

Most common pattern.

Write-Through:

Application writes to cache and origin on every write.
Cache is always up to date.

Best for: Data that must be consistent and is read frequently after writes.

Higher write latency (two writes on every mutation).

Write-Behind (Write-Back):

Application writes to cache.
Cache asynchronously writes to origin (batched or delayed).

Best for: Write-heavy workloads where you can tolerate brief inconsistency.

Risk: data loss if cache fails before flushing to origin.

Read-Through:

Cache itself fetches from origin on miss (cache manages the data source connection).
Application only talks to cache.

Best for: Simplifying application code.

The cache acts as a transparent layer.

Cache Invalidation

Cache invalidation is one of the hardest problems in computer science.

Stale data is the primary risk of caching.

Strategies:

TTL (Time To Live) — Every cache entry expires after a fixed duration. Simple and predictable. Choose TTL based on how stale the data can be (seconds for real-time, minutes for product catalogs, hours for static content).
Event-Based Invalidation — When the underlying data changes, publish an event that invalidates or updates the cache entry. More complex but keeps data fresher.
Version Keys — Include a version number in the cache key. When data changes, increment the version so old entries are naturally bypassed.
Purge / Manual Invalidation — Explicitly delete cache entries when you know data has changed. Simple for small-scale; hard to maintain at scale.

TTL guidelines:

Data Type	Suggested TTL	Rationale
User session	15-30 minutes	Security + freshness
Product catalog	5-15 minutes	Changes infrequently; stale data is low risk
Search results	1-5 minutes	Changes frequently; slight staleness acceptable
Static assets	Hours to days	Rarely changes; use versioned URLs for cache busting
Config / feature flags	30-60 seconds	Needs to propagate quickly for kill switches

Caching Common Pitfalls

Cache Stampede (Thundering Herd) — When a popular cache entry expires, many requests simultaneously hit the origin. Mitigation: lock or “single-flight” (only one request fetches, others wait), staggered TTLs, or background refresh before expiry.
Cold Start — After a deploy or cache flush, the cache is empty and all requests hit the origin. Mitigation: warm the cache on startup for known hot keys.
Inconsistency — Cache and origin disagree. Especially dangerous with write-behind or when multiple services write to the same data. Keep invalidation simple and explicit.
Over-Caching — Caching everything increases memory cost and makes debugging harder. Cache what’s hot and expensive to compute; don’t cache what’s cheap to fetch.
Cache Key Collisions — Two different queries mapping to the same key return wrong data. Use precise, well-structured keys.

Caching And Cost

Caching directly reduces cost by offloading work from expensive resources (databases, compute, external APIs) to cheaper ones (in-memory stores, CDNs).

Quantify the impact:

Origin Offload Ratio — What percentage of requests are served from cache vs. origin? A 90% cache hit rate means your origin handles 10x less traffic.
Cost Per Request — A cache hit from Redis costs a fraction of a database query. At scale, this adds up.
Right-size The Cache — Monitor cache utilization and eviction rates. If the eviction rate is high, you may need a larger cache or better TTL strategy.