Skip to content

Infrastructure Building Blocks

First PublishedLast UpdatedByAtif Alam

Infrastructure building blocks are the categories of technology you use to run, store, deliver, and operate a system.

Each technology you reach for—a VM, a database, Redis, Kafka, a load balancer—belongs to a type that serves a specific purpose.

This page maps types to technologies, then explains why each type exists: its purpose, main benefits, and when you’d add it.

For “when do I introduce what?” signals and decision matrices, see the Optimization Quick Reference.

#TypeTechnology
1Compute / application hostingEC2, GCE, Azure VMs, Lambda, Cloud Functions, Cloud Run, Fargate, App Engine, Elastic Beanstalk, Azure App Service, Render, Railway
2Relational database (RDBMS)PostgreSQL, MySQL, SQL Server, Oracle, MariaDB
3Traffic distribution layerLoad Balancer, Nginx, Envoy, HAProxy, ALB/NLB, Traefik
4ObservabilityPrometheus, Grafana, ELK, Datadog, OpenTelemetry, Jaeger, Loki
5Container orchestrationKubernetes, EKS, AKS, GKE, Nomad, Docker Swarm, ECS
6Object storageS3, GCS, Azure Blob, MinIO, Cloudflare R2
7In-memory cache / data storeRedis, Memcached, KeyDB, Dragonfly, Couchbase, Hazelcast
8Edge cache / content deliveryCloudflare, Akamai, Fastly, CloudFront, Bunny
9Message queueRabbitMQ, SQS, Azure Service Bus, ActiveMQ, Redis Streams
10Distributed log / event streamingKafka, Pulsar, Kinesis, Redpanda
11NoSQL Database (you operate)MongoDB, Cassandra, ScyllaDB, HBase, CockroachDB
12NoSQL Database (provider-managed)MongoDB Atlas, DynamoDB, Cosmos DB, Firestore
13Search & indexing engineElasticsearch, OpenSearch, Solr, Meilisearch, Typesense
14Rate limiting & API controlAWS API Gateway / WAF, Apigee, Azure API Management, Kong, Tyk, Cloudflare Rate Limiting, Nginx limit_req
15Coordination (leader election, service discovery)ZooKeeper, etcd, Consul

It starts with compute and the typical baseline (relational database), then describes the other building blocks you add when you hit scale or reliability limits.

Purpose: Run your application code — the most fundamental building block.
Examples: EC2, GCE, Azure VMs (VMs/instances); Lambda, Cloud Functions, Azure Functions, Cloudflare Workers (serverless); Cloud Run, Fargate (managed containers); App Engine, Elastic Beanstalk, Azure App Service, Render, Railway (managed platforms)

Benefit:

  • Runs your application logic
  • Multiple models: full control (VMs) vs managed (PaaS) vs event-driven (serverless)
  • Scale from single instance to global fleet

Used when: Always — you need compute before anything else. Choose VMs for full control, managed platforms for simplicity, serverless for event-driven or bursty workloads, and managed containers when you want container packaging without managing orchestration.

Purpose: ACID transactions, joins, and a well-understood relational model—excellent for structured data with complex relationships and strong consistency.
Examples: PostgreSQL, MySQL, SQL Server

Benefit:

  • ACID transactions
  • SQL and well-understood data model
  • Mature operational and tooling ecosystem

Used when: Default choice for structured, transactional data; strong consistency and relational modeling are the main needs.

Purpose: Route traffic across instances with health checks and failover—enables fault tolerance, horizontal scaling, and zero-downtime deploys.
Examples: ALB, NLB (AWS), Cloud Load Balancing (GCP), Azure Load Balancer / Application Gateway, Nginx, Envoy, HAProxy, Traefik

Benefit:

  • Fault tolerance
  • Horizontal scaling
  • Zero-downtime deploys

Used when: You have more than one instance.

Purpose: See what the system is doing
Examples: Prometheus, Grafana, ELK, Datadog, OpenTelemetry, Jaeger, Loki

Benefit:

  • Faster debugging
  • Safer scaling
  • Informed decisions

Used when: As soon as you have a production path (compute + database + traffic); before adding cache, queues, or other optimizations.

Purpose: Automate compute lifecycle
Examples: Kubernetes, EKS (Amazon), AKS (Azure), GKE (Google), Nomad, Docker Swarm, ECS

Benefit:

  • Auto-recovery
  • Scaling
  • Safer deployments

Used when: Many teams adopt containers (Kubernetes, EKS, AKS, GKE) early; add when you standardize how you run and deploy workloads.

Purpose: Cheap, durable storage for files and unstructured data.
Examples: S3, GCS, Azure Blob, MinIO, Cloudflare R2

Benefit:

  • Virtually infinite scale
  • High durability
  • Offloads large data from DBs

Used when: Almost every app needs it early—uploads, static assets, backups, logs, media.

Purpose: Reduce latency and load for application data (sessions, hot keys, computed results).
Examples: Redis, Memcached, KeyDB, Dragonfly, Couchbase, Hazelcast

Benefit:

  • Faster reads
  • Fewer DB hits
  • Absorbs traffic spikes

Used when: Reads dominate or latency matters; you need sub-ms or low-ms access to hot data.

Purpose: Serve static or cacheable content close to users and reduce origin load.
Examples: Cloudflare, Akamai, Fastly, CloudFront, Bunny

Benefit:

  • Lower latency for global users
  • Offload traffic from origin
  • DDoS and edge security

Used when: Static assets, media, or cacheable API responses need to be fast worldwide.

Purpose: Decouple producers and consumers with at-most-once or at-least-once delivery—async work queues that move slow or flaky tasks off the request path.

When simplicity and ops matter more than replay and fan-out, queues like RabbitMQ and SQS are easier to reason about and run: offload slow work from the request path or decouple service A from B with at-least-once delivery.
Examples: RabbitMQ, SQS, Azure Service Bus, ActiveMQ, Redis Streams

Benefit:

  • Async processing
  • Backpressure handling
  • Decoupling and reliability

Used when: Tasks are slow or flaky and should not block the request; you need work queues, not replay.

Purpose: Ordered, durable event log with replay and fan-out—multiple consumers can read the same stream independently, ideal for event sourcing and audit trails.
Examples: Kafka, Pulsar, Kinesis, Redpanda

Benefit:

  • Replay & backfill
  • Multiple consumer groups
  • Event sourcing and audit trail

Used when: You need ordering per key, replay, or many consumers reading the same stream.

Purpose: Flexible-schema, partition-tolerant storage optimized for high write throughput, document/key-value/wide-column access patterns, and predictable performance at scale—you operate or host the cluster.
Examples: MongoDB, Cassandra, ScyllaDB, HBase, CockroachDB

Benefit:

  • High write throughput
  • Partition tolerance
  • Predictable performance at scale

Used when: Your RDBMS hits scale limits; you need horizontal scaling, different access patterns, and are willing to operate the cluster.

Purpose: Managed document or key-value store with flexible schemas, serverless or pay-per-use scaling, and global replication—provider runs and operates it so you avoid cluster management.
Examples: MongoDB Atlas, DynamoDB, Cosmos DB, Firestore

Benefit:

  • Serverless or managed scaling
  • No cluster operations
  • Global replication and SLAs from the provider

Used when: You need scale and flexible schemas but want to avoid running your own distributed database.

Purpose: Fast queries over large datasets
Examples: Elasticsearch, OpenSearch, Solr

Benefit:

  • Full-text search
  • Aggregations
  • Low-latency reads

Used when: Queries become complex or user-facing.

Purpose: Enforce rate limits, quotas, and access policies at the API boundary—prevents overload, ensures fair usage, and controls abuse.
Examples: AWS API Gateway / WAF, GCP Apigee / Cloud Endpoints, Azure API Management, Cloudflare Rate Limiting, Kong, Tyk, Redis-based limiters, Nginx limit_req

Benefit:

  • Prevent overload
  • Fair usage
  • Abuse control

Used when: Traffic becomes unpredictable or external.

15. Coordination (leader election, service discovery)

Section titled “15. Coordination (leader election, service discovery)”

Purpose: Leader election, consensus, service discovery, and distributed configuration—lets many components agree on who leads and where things live.
Examples: ZooKeeper, etcd, Consul

Benefit:

  • Leader election
  • Configuration consistency
  • Service discovery

Used when: Many distributed components must agree.


Next: Optimization Quick Reference — when to add what. Or Redis vs Kafka: when to use which for a concrete example.