Optimization Quick Reference
This page helps you choose which optimization to add when you hit a specific pain. Find your pain, get the answer.
How to use this page: Match your pain in the quick chooser below; click a fix to jump to that category’s details (Signals, Goal / Benefit, Tools, Risks).
Quick Chooser (One-Line Rule)
Section titled “Quick Chooser (One-Line Rule)”Match your pain to the fix in the table below.
| Pain | Fix |
|---|---|
| Repeated reads are slow | Caching |
| Can’t explain failures quickly | Observability |
| DB joins too slow | Replication |
| Service goes down = everything fails | Asynchronous Processing |
| Spiky writes / slow jobs on request path | Asynchronous Processing |
| Many consumers + replay needed | Streaming / Event Log |
| Can’t trust downstream service | Rate Control |
| Queries too slow or need full-text / fuzzy search | Search & Indexing |
| Single DB/shard too hot or write ceiling | Data Partitioning |
| Ops toil / manual recovery | Coordination & Orchestration |
| Noisy neighbors or mixed priority traffic | Traffic Shaping |
| Analytics queries hurting prod DB | Storage Specialization |
Details By Category
Section titled “Details By Category”Caching
Section titled “Caching”Signals: DB/API latency high; read QPS heavy; repeated same queries; spike traffic
Goal / Benefit: Reduce latency + backend load
Tools: Redis/Memcached, CDN, local caches, TTL
Risks: Stale data, invalidation complexity, hot keys, cache stampede
Observability
Section titled “Observability”Signals: Hard to debug; unknown bottlenecks; frequent incidents; blind deploys
Goal / Benefit: Reduce MTTR + safe change
Tools: Observability (metrics/logs/traces, dashboards, alerts, SLOs)
Risks: Noise/alert fatigue, cost, missing instrumentation
Replication
Section titled “Replication”Signals: Need higher availability; read scaling; DR requirements
Goal / Benefit: Resilience + read scale
Tools: Read replicas, denormalization, multi-AZ DB, quorum systems
Risks: Replication lag, failover complexity, split-brain risk
Asynchronous Processing
Section titled “Asynchronous Processing”Signals: Slow tasks in request path; timeouts; retries explode; dependencies flaky
Goal / Benefit: Decouple + move work off request path
Tools: Queues (SQS/RabbitMQ), decoupling, workers, retries, DLQ
Risks: Eventual consistency, duplication, ordering, failure handling
Streaming / Event Log
Section titled “Streaming / Event Log”Signals: Many consumers; replay or backfill needed; audit trail; fan-out
Goal / Benefit: Fan-out, replay, durability
Tools: Stream / event log (Kafka, Pulsar, Kinesis, Redpanda)
Risks: Ordering guarantees, retention vs cost, operational complexity
When to use stream vs queue: Redis vs Kafka: when to use which.
Rate Control
Section titled “Rate Control”Signals: Traffic spikes; abuse; downstream overload; SLOs degrade under load
Goal / Benefit: Protect system + fairness
Tools: Rate limits, quotas, circuit breakers, retries, bulkheads
Risks: False positives, client friction, tuning thresholds
Search & Indexing
Section titled “Search & Indexing”Signals: DB queries too slow/complex; full-text needed; filtering+ranking
Goal / Benefit: Fast query experience
Tools: Search index (Elasticsearch/OpenSearch), secondary indexes
Risks: Dual writes, index lag, reindexing cost, relevance tuning
Data Partitioning
Section titled “Data Partitioning”Signals: Single DB/table shard too hot; write throughput ceiling; large datasets
Goal / Benefit: Horizontal scaling (throughput)
Tools: Sharding by key, partitions, consistent hashing
Risks: Cross-shard queries, rebalancing, skew/hot partitions
Coordination & Orchestration
Section titled “Coordination & Orchestration”Signals: Many services; manual deployments; failovers/scale require humans
Goal / Benefit: Automate lifecycle + coordination
Tools: Orchestration (Kubernetes/Nomad), service discovery, leader election
Risks: Operational complexity, misconfig outages, learning curve
Traffic Shaping
Section titled “Traffic Shaping”Signals: Multiple request classes; noisy neighbors; need graceful degradation
Goal / Benefit: Prioritize critical traffic
Tools: Traffic shaping (priority queues, load shedding, admission control). Common tech: Envoy, NGINX, Kong, Redis (rate limits / priority queues), Kubernetes ResourceQuota, AWS API Gateway / Cloudflare (edge throttling).
Risks: Starving lower tiers, policy complexity
Storage Specialization
Section titled “Storage Specialization”Signals: One DB can’t meet mixed needs (latency vs analytics vs blobs)
Goal / Benefit: Use the right store per workload
Tools: OLTP DB, separate OLAP store/warehouse, object store, time-series DB
Risks: Data duplication, consistency, ETL complexity
For a stage-by-stage view (MVP → Growth → Advanced) and links to staged examples, see Staged Design Examples.
Next: Redis vs Kafka: when to use which — a worked example of choosing between two common building blocks.