Payments / Financial — Designed in Stages
You don’t need to design for scale on day one.
Define what you need—accounts, transactions, strong consistency, audit trail, and exactly-once semantics—then build the simplest thing that works and evolve as volume and compliance requirements grow.
Here we use a payments or financial system as the running example: accounts, ledger entries, transactions, and idempotency. The same staged thinking applies to billing, wallets, or any system where money movement and auditability are non-negotiable.
Requirements and Constraints (no architecture yet)
Section titled “Requirements and Constraints (no architecture yet)”Functional Requirements
- Account — balance or position; debit/credit operations.
- Transaction — a logical unit of work (e.g. transfer from A to B, payment, refund); must complete exactly once and be auditable.
- Ledger entries — immutable record of each debit and credit; double-entry or single-entry as required.
- Idempotency — client can retry with the same idempotency key; server deduplicates and returns the same outcome (no double debit/credit).
Quality Requirements
- Strong consistency — balance and ledger must be consistent; no phantom reads or lost updates on the critical path.
- Audit — every financial event must be traceable; append-only or immutable ledger; who, what, when.
- Exactly-once — each logical transaction is applied once despite retries, failures, or duplicate requests (idempotency keys and transactional boundaries).
- Compliance — PCI (if handling card data), regulatory reporting, retention; scope depends on product.
Key Entities
- Account — identifier, balance or position, tenant/customer.
- Transaction — idempotency key, type (transfer, payment, etc.), source/target accounts, amount, status, timestamp.
- Ledger entries — account, amount (debit/credit), transaction id, timestamp; immutable.
- Idempotency key — client-provided key; stored with outcome so repeat requests return same result.
Primary Use Cases and Access Patterns
- Debit/credit or transfer — write path; must be atomic, idempotent, and written to ledger.
- Get balance or statement — read path; must reflect committed state; may be cached with care (eventual consistency often unacceptable for balance).
- Reconciliation — batch or offline; compare internal ledger to external source (bank, processor); detect and resolve drift.
Given this, start with the simplest MVP: one API, one DB, ACID transactions, idempotency keys, and an audit log (table or append-only), then evolve with async processing and scaling without sacrificing consistency or audit.
Stage 1 — MVP (simple, correct, not over-engineered)
Section titled “Stage 1 — MVP (simple, correct, not over-engineered)”Goal
Ship a correct payments core: debit/credit or transfer with strong consistency, idempotency, and a full audit trail. One API, one DB, no double-spend, no lost updates.
Components
- API — REST or similar; auth, debit/credit or transfer endpoint; accepts idempotency key (e.g. header or body); returns transaction id and status.
- Primary DB — stores accounts, transactions, and ledger entries (or a single ledger table with account, amount, transaction_id, timestamp). Use DB transactions so that balance update and ledger insert are atomic.
- Idempotency keys — store (idempotency_key, transaction_id, outcome, timestamp) in a table or cache; on duplicate key, return stored outcome instead of re-executing.
- Audit log — every financial event in an append-only table or dedicated log (who, what, when); can be the ledger itself if immutable, or a separate audit table written in the same DB transaction.
Minimal Diagram
Client (idempotency key) | v+-----------------+| API |+-----------------+ | vPrimary DB (single node) - accounts (balance) - transactions (id, idempotency_key, status, ...) - ledger_entries (account, amount, tx_id, time) [immutable] - idempotency_store (key -> outcome) All in one ACID transactionPatterns and Concerns (don’t overbuild)
- Single writer or serialized writes for each account (or use DB transactions with proper isolation) so balance updates are consistent.
- Idempotency: check idempotency key before doing any write; if seen, return stored response; otherwise execute in transaction and store outcome.
- Audit: write ledger (and optionally audit row) in the same transaction as balance update; never delete or overwrite financial records.
- Validation: amounts, account existence, sufficient balance (for debits); fail fast with clear errors.
Why This Is a Correct MVP
- One API, one DB, ACID transactions → no double-spend, clear audit, easy to reason about.
- Idempotency keys and audit log are non-negotiable for payments; everything else (async, replicas, saga) can wait until you have scale or complexity that requires them.
Stage 2 — Growth Phase (more volume, async needs, read scaling)
Section titled “Stage 2 — Growth Phase (more volume, async needs, read scaling)”What Triggers the Growth Phase?
- API or DB becomes the bottleneck (high TPS, long-running operations blocking the critical path).
- You need to offload non-critical work (e.g. notifications, reporting) from the synchronous payment path.
- Read load (balance checks, statement queries) grows; you want to scale reads without risking consistency on writes.
Components to Add (incrementally)
- Queue for async processing — after committing the payment transaction, publish an event or enqueue a job for side effects (notifications, analytics, reporting). Use idempotency in workers too (e.g. event id or job id) so duplicate consumption doesn’t double-apply.
- Read replicas — route balance/statement reads to replicas; careful with consistency: balance read may be slightly stale (replica lag). For strict balance guarantees, either read from primary for that path or accept and document eventual consistency for read-only use cases.
- Keep write path on primary — all debits/credits and ledger writes still go to primary in a single transaction; replicas are for scaling read-only queries.
Growth Diagram
+------------------+Clients ----------> | Load Balancer | +------------------+ | +------------+------------+ | write (payment) | read (balance, statement) v v +-------------+ +-------------+ | API | | API | | (primary) | | (replicas) | +-------------+ +-------------+ | ^ v | Primary DB ------------------> Read Replicas | v Queue (async: notify, report) | v Workers (idempotent)Patterns and Concerns to Introduce (practical scaling)
- Async with idempotency: workers that consume payment events must be idempotent (same event id → same side effect once); use outbox or dedup in queue if needed.
- Replica consistency: define which reads can be eventually consistent (e.g. dashboard) vs must be strongly consistent (e.g. “can I afford this?”); route accordingly.
- Monitoring: transaction latency, replica lag, queue depth, failed idempotency lookups.
Still Avoid (common over-engineering here)
- Saga or distributed transactions across services before you have multiple services.
- Splitting ledger and balance into separate services before the single-DB write path is the proven bottleneck.
- Eventual consistency for the core debit/credit path.
Stage 3 — Advanced Scale (high TPS, multi-tenant, reconciliation)
Section titled “Stage 3 — Advanced Scale (high TPS, multi-tenant, reconciliation)”What Triggers Advanced Scale?
- Single primary DB write throughput or storage becomes the limit.
- You have multiple services (e.g. payments, billing, ledger) and need cross-service consistency (saga) or at least audit and reconciliation.
- Regulatory or operational need for formal reconciliation pipeline and strong audit guarantees.
Components (common advanced additions)
- Saga / outbox pattern — for multi-step flows spanning services: each step in a local transaction that writes to an outbox; a relay publishes to the next service. On failure, compensate or retry with idempotency. Ensures at-least-once delivery and audit trail across services.
- Reconciliation pipeline — batch or stream job that compares internal ledger to external source (bank, payment processor); flags discrepancies; supports dispute resolution and compliance.
- Sharding by account or tenant — partition accounts (and their ledger entries) by account_id or tenant_id so no single partition holds all traffic; route writes by shard key; keep transactions within a shard where possible.
- Strong focus on consistency and audit — all money-moving paths are transactional and auditable; reconciliation and alerting on drift; retention and compliance reporting.
Advanced Diagram (conceptual)
+------------------+Clients ----------> | API Gateway / LB | +------------------+ | +-----------------+-----------------+ v v v Payment Service Billing Service Ledger Service | | | v v v Outbox Outbox Outbox | | | v v v Message bus (reliable, ordered per key) | v Sharded DB (by account/tenant) Primary + Replicas per shard | v Reconciliation pipeline (internal vs external)Patterns and Concerns at This Stage
- Saga: define compensating actions for each step; idempotency for each step and for compensation; timeout and retry policy.
- Outbox: write business row + outbox row in same transaction; relay reads outbox and publishes to message bus; delete or mark outbox row after successful publish (at-least-once delivery to consumers).
- Reconciliation: schedule, idempotent runs, store results and alerts; human-in-the-loop for exceptions.
- Sharding: keep transaction boundary within one shard when possible; for cross-shard transfers, use saga or two-phase flow with clear audit.
Summarizing the Evolution
Section titled “Summarizing the Evolution”MVP delivers correct payments with one API, one DB, ACID transactions, idempotency keys, and an audit log. That’s the foundation—no double-spend, full traceability.
As you grow, you add a queue for async side effects (with idempotent workers) and read replicas for read scaling, while keeping the write path on the primary and strongly consistent. You avoid eventual consistency on the core payment path.
At very high scale, you introduce saga/outbox for cross-service flows, a reconciliation pipeline for compliance and drift detection, and sharding by account or tenant. Consistency and audit remain non-negotiable; complexity is added only where throughput or multi-service boundaries require it.
This approach gives you:
- Start Simple — one API, one DB, transactions, idempotency, audit log; ship and learn.
- Scale Intentionally — add async and replicas when load justifies; add saga and reconciliation when services or compliance demand it.
- Add Complexity Only When Required — keep the payment path consistent and auditable; avoid distributed transactions until boundaries are clear.