URL Shortener — Designed in Stages
You don’t need to design for scale on day one.
Define what you need—create short link (long URL → short code), resolve (short code → redirect to long URL), and optionally track clicks—then build the simplest thing that works and evolve as redirect traffic and features grow.
Here we use a URL shortener (TinyURL-, bit.ly-style) as the running example: short code, long URL, and optional user or click analytics. The same staged thinking applies to any system that maps a short identifier to a URL: short code uniqueness, redirect latency, high read throughput, and durability of mapping are central.
Requirements and Constraints (no architecture yet)
Section titled “Requirements and Constraints (no architecture yet)”Functional Requirements
- Create short link — user or system submits long URL; system generates a short code (e.g. 6–8 characters); store mapping (short_code → long_url); return short URL (e.g. https://short.domain/abc123). Same long URL may get same or different code (idempotent by input or always new).
- Resolve — when user visits short URL (GET /short_code), lookup long_url by short_code and respond with HTTP redirect (301 permanent or 302 temporary) to long_url; redirect latency should be low.
- Optional click analytics — record each redirect (short_code, timestamp, optional user-agent, IP, referrer); aggregate for dashboard (clicks per link, over time); can be async to avoid slowing redirect.
Quality Requirements
- Short code uniqueness — each short code must map to exactly one long URL (or support multiple if custom codes per user); no collisions; generate with enough entropy or use counter + encoding.
- Redirect latency — time from GET short URL to redirect response should be low (e.g. p95 < 50–100 ms); resolve is the dominant read path; cache and index matter.
- High read throughput — redirects can be very high (viral link); system must serve many GETs per second; read path must scale (cache, read replicas).
- Durability of mapping — once created, short_code → long_url should persist; no accidental overwrite; backup or replication for durability.
- Expected scale — number of short links, redirects per second, optional users and analytics QPS.
Key Entities
- Short code — the unique identifier in the short URL (e.g. “abc123”); used in path; indexed for lookup; generated to be short and URL-safe (e.g. base62: 0-9, a-z, A-Z).
- Long URL — the original URL; stored with short code; must be valid and optionally validated (scheme, length); may be updated (optional) or immutable.
- User (optional) — owner of links; user_id for multi-tenant; optional quota or custom short codes.
- Clicks (optional) — one record per redirect; short_code, timestamp, optional metadata (user_agent, ip, referrer); used for analytics; write-heavy if logging every click.
Primary Use Cases and Access Patterns
- Create — write path; input = long_url (optional custom code, user_id); generate short code (hash of URL with collision handling, or base62 of counter); store (short_code, long_url); return short URL; idempotent if “same long_url returns same code” desired.
- Resolve — read path; input = short_code; lookup long_url; return 301 or 302 with Location: long_url; dominant traffic; must be fast and cacheable.
- Click logging — write path; on redirect (or after redirect), async write (short_code, timestamp, …) to log or analytics store; do not block redirect.
Given this, start with the simplest MVP: one API, one DB, generate short code (hash or base62 counter), store (short_code, long_url), resolve = lookup + HTTP redirect—then add cache for hot resolutions, load balancer, DB read replicas, and optional click logging as redirect volume grows.
Stage 1 — MVP (simple, correct, not over-engineered)
Section titled “Stage 1 — MVP (simple, correct, not over-engineered)”Goal
Ship working URL shortener: create short link (long URL → short code), resolve returns redirect to long URL. One API, one DB; short code from hash or counter; resolve = lookup + 301/302; single DB.
Components
- API — REST or similar; POST create (long_url, optional custom_code) → return short_url; GET /:short_code → lookup long_url, return HTTP redirect (301 or 302). Optional auth for create (API key). Single server or small cluster.
- DB — one table: short_code (primary key), long_url, created_at; optional user_id if multi-tenant. Index by short_code (primary); lookup is single row by primary key.
- Generate short code — Option A (hash): hash long_url (e.g. MD5/SHA256), take first 6–8 chars (base62 encode); if collision, append salt and retry or use longer length. Option B (counter): atomic counter in DB or in app; encode counter in base62 to get short code; no collision; predictable length growth. Choose one; counter gives unique codes without collision handling.
- Resolve — GET /:short_code; query DB: SELECT long_url WHERE short_code = ?; if found, return 301 (permanent) or 302 (temporary) with Location: long_url; if not found, return 404. 301 allows client/browser cache; 302 allows tracking server each time (if you add analytics).
- Single DB — one database; no cache yet; vertical scaling for read capacity.
Minimal Diagram
Client (create) Client (redirect) | | v v+-----------------+| API |+-----------------+ | | v vCreate: long_url → Resolve: short_code →generate code, lookup long_url →store in DB HTTP 301/302 redirect | vDB (short_code, long_url)Patterns and Concerns (don’t overbuild)
- Code length: 6 base62 chars = 62^6 ≈ 56B combinations; 7 chars ≈ 3.5T; enough for most use cases; avoid too short (guessable) or too long (ugly).
- Validation: validate long_url (scheme, length, no javascript: etc.) before storing; reject malicious or invalid URLs.
- Basic monitoring: create rate, resolve latency, 404 rate, error rate.
Why This Is a Correct MVP
- One API, one DB, generate code (hash or counter), store mapping, resolve = lookup + redirect → enough to ship a working shortener; easy to reason about.
- Vertical scaling and single DB buy you time before you need cache and read replicas.
Stage 2 — Growth Phase (cache, load balancer, read replicas, click logging)
Section titled “Stage 2 — Growth Phase (cache, load balancer, read replicas, click logging)”What Triggers the Growth Phase?
- Redirect traffic grows; single DB can’t serve resolve QPS; need cache for hot resolutions (short_code → long_url).
- Need load balancer to distribute traffic and for high availability.
- DB read load is high; add read replicas; resolve reads from replica (or cache first).
- Product wants click analytics; log each redirect (async) without slowing redirect path.
Components to Add (incrementally)
- Cache for hot resolutions — in front of DB: on resolve, check cache (e.g. Redis, in-memory) for short_code → long_url; on hit, return redirect from cache (no DB); on miss, lookup DB, store in cache with TTL (e.g. long TTL or no expiry for immutable links), return redirect. Most traffic may hit cache; DB load drops.
- Load balancer — put load balancer in front of API servers; distribute GET (resolve) and POST (create); health checks; scale by adding more API servers.
- DB read replicas — replicate DB; resolve (read) goes to replica or to cache; create (write) goes to primary; eventual consistency acceptable for resolve (new link may take a few seconds to appear on replica).
- Optional click logging — on resolve: after sending redirect (or in parallel), enqueue or async write (short_code, timestamp, user_agent, referrer) to log table or analytics pipeline; do not block redirect; use queue or fire-and-forget; aggregate later for dashboard.
Growth Diagram
Client | vLoad balancer | vAPI servers | | v vResolve: Cache (short_code → long_url) | On miss → DB read replica vRedirect (301/302) | vOptional: async click log (queue or write) | vCreate: DB primaryPatterns and Concerns to Introduce (practical scaling)
- Cache TTL: short links are usually immutable; cache can have long TTL or no expiry; invalidate on update if you support link update; otherwise cache forever.
- Cache key: key = short_code; value = long_url; small payload; high hit rate for popular links.
- Monitoring: cache hit rate, resolve latency (cache hit vs miss), create latency, click log lag.
Still Avoid (common over-engineering here)
- Custom domains and analytics dashboard until product requires them.
- Key generation at scale (range allocation) until you have many writers and counter contention.
- Rate limiting and abuse prevention until you see abuse.
Stage 3 — Advanced Scale (custom domains, analytics, scale, abuse prevention)
Section titled “Stage 3 — Advanced Scale (custom domains, analytics, scale, abuse prevention)”What Triggers Advanced Scale?
- Custom domains: customers want short links on their domain (e.g. go.customer.com/abc); need DNS and routing so their domain points to your resolve path.
- Analytics dashboard: per-link and global stats (clicks over time, geography, device); need queryable store and dashboard backend.
- Scale: very high create rate (many writers); counter or key generation may become bottleneck; use range allocation (each server gets a range of IDs) or distributed ID generation.
- Abuse: spam, phishing, or quota abuse; rate limiting (per IP or per user), blocklist (malicious long URLs), optional CAPTCHA or auth for create.
Components (common advanced additions)
- Custom domains — customer adds CNAME (e.g. go.customer.com → your resolve service); your service accepts Host: go.customer.com and looks up short_code in context of that domain; store (short_code, long_url) per domain or global namespace with domain override; route by Host header.
- Analytics dashboard — store clicks in columnar store or data pipeline; aggregate by short_code, date, etc.; dashboard API or app queries aggregates; show clicks over time, top links; optional export.
- Scale (key generation) — avoid single counter bottleneck: range allocation (each API server gets a block of IDs from central store, e.g. 1–10000, 10001–20000; encode to base62 for codes); or snowflake-style IDs (timestamp + worker + sequence); ensure no collision across servers.
- Rate limiting and abuse prevention — rate limit create (per API key or per IP); blocklist known-bad long URL patterns or domains; optional human verification (CAPTCHA) for anonymous create; monitor and block abusive patterns.
Advanced Diagram (conceptual)
Client (create / resolve) | vLoad balancer (custom domain routing by Host) | vAPI (create: rate limit, blocklist; resolve: cache → replica) | | v vKey generation (range or distributed) Cache + DB | | v vCreate → primary Resolve → redirect | | v vClick log (async) → Analytics pipeline → Dashboard | vAbuse: blocklist, rate limit, CAPTCHAPatterns and Concerns at This Stage
- Custom domain mapping: store which short_codes belong to which domain (or default domain); resolve uses Host to choose namespace; same short_code can exist per domain if needed.
- Analytics consistency: clicks are eventually consistent; dashboard may lag; define freshness SLO.
- SLO-driven ops: resolve latency (p50, p95), create success rate, cache hit rate; error budgets and on-call; runbooks for abuse and blocklist.
Summarizing the Evolution
Section titled “Summarizing the Evolution”MVP delivers a URL shortener with one API, one DB, short code from hash or counter, store (short_code, long_url), and resolve = lookup + HTTP redirect. That’s enough to ship a working shortener.
As you grow, you add cache for hot resolutions, load balancer, DB read replicas, and optional async click logging. You keep redirect latency low and create path correct.
At advanced scale, you add custom domains, analytics dashboard, scale-friendly key generation (range allocation), and rate limiting and abuse prevention. You scale creates and redirects without over-building on day one.
This approach gives you:
- Start Simple — API + DB, generate code, store mapping, resolve = redirect; ship and learn.
- Scale Intentionally — add cache and read replicas when redirect volume demands it; add click logging when product expects it.
- Add Complexity Only When Required — avoid custom domains and key range allocation until product and scale justify them; keep redirect fast and mapping durable first.