Skip to content

System Design Requirements

First PublishedLast UpdatedByAtif Alam

Good system design starts simple and grows intentionally.

Build an MVP (minimum viable product), watch for real bottlenecks, and scale when the signals tell you to—not before.

Use the checklist below to identify which requirements matter for your system—and use the same language when you write your own requirements.

These are the dimensions that matter when you design or review a system.

Requirements checklist — three groups: Scale & Load, Time & Correctness, Resilience

DimensionWhat it meansYou care when…
Scale & Load
Expected scaleHow big does this get? Daily active users (DAU), concurrent connections (CCU), storage growth over timeAlmost always—sets your capacity baseline
ThroughputHow many operations per second? Requests/sec, events/sec, orders/dayHigh-volume ingest or read paths
CostWhat are the cost drivers? Storage, compute, retention windows, and tiering decisionsHigh-volume ingest, long retention windows
Time & Correctness
LatencyHow fast does a request need to come back? Measured as p95/p99 response timeUsers wait for a response (API, search, feed)
FreshnessHow soon does new data need to appear for readers? Seconds, minutes, or hoursFeeds, dashboards, search indexes
ConsistencyCan readers see stale data, or must every read reflect the latest write? Strong vs eventual, read-your-writesMultiple writers, ordering matters, financial data
OrderingDoes the order of events matter? Per-channel, per-partition, or per-key orderingChat, collaborative editing, event streams
Resilience
DurabilityCan you afford to lose data? Covers replication, persistence, and delivery guarantees (at-least-once, exactly-once)Payments, audit trails, event retention
AvailabilityHow much uptime do you need? Expressed as a target like 99.9%, with failover expectationsUser-facing, revenue-critical

The trick is to treat each dimension as a signal that points you toward a component or pattern.

The mappings below are limited examples to spark your imagination—system design is a vast, detailed subject and depends on your exact app requirements. For more details, read the System Design Checklist, Infrastructure Building Blocks, and Optimization Quick Reference.

  • Millions of users → assume you need horizontal scaling from day one; stateless services behind a load balancer
  • Storage growing fast → plan tiered storage (hot/warm/cold) and think about whether a single DB can handle it
  • High CCU → connection pooling, or move to event-driven to avoid thread-per-connection limits
  • Sub-100ms reads → cache is mandatory; no live DB query on the hot path
  • Sub-50ms globally → add a CDN for static assets, consider edge computing
  • Slow writes acceptable → offload to async queue, return immediately to user
  • Seconds → push model (WebSockets, SSE) or short TTL cache; CDC from DB to downstream
  • Minutes → poll-based refresh or stream consumer updating a read store
  • Hours → batch jobs are fine; ETL pipeline is acceptable
  • This is also your hint on whether you need event streaming vs simple queuing — streams are for near-real-time freshness at scale
  • Strong (financial, inventory) → single-leader DB, avoid caches or use write-through cache; may need distributed transactions (2PC or saga)
  • Eventual is OK (social feeds, likes) → read replicas, CQRS, cache-first reads are all fine
  • Read-your-writes needed → route user’s reads to same replica they wrote to, or use sticky sessions
  • Global order needed → use a single ordered log or single partition (e.g. one Kafka partition, single-leader DB); this caps throughput. To scale, relax to per-key or per-entity order and partition by that key—you get strict order where it matters (per user, per conversation) without a global bottleneck
  • Per-key/channel order → Kafka partitioned by key solves this cleanly
  • No order needed → simple queue (SQS, RabbitMQ), parallel consumers, maximum throughput
  • 99.9% (8 hrs downtime/year) → active-passive failover is enough
  • 99.99%+ → multi-region active-active; no single points of failure anywhere; circuit breakers on all downstream calls
  • High availability + consistency together → you’re in CAP theorem territory; usually you pick one or design around it (e.g. DynamoDB global tables with eventual, or CockroachDB for strong)

When you sit down with a new system, scan the checklist and ask:

The tensions to watch for are the ones that conflict — low latency and strong consistency is the hardest combination, as is high availability and exactly-once delivery. When two requirements pull in opposite directions, that’s where the real design conversation happens.