Synthetic Testing and Load Replay
Chaos experiments and game-days are periodic.
Synthetic testing runs continuously—generating artificial traffic or replaying real traffic to validate that the system works correctly right now, not just the last time you tested it.
Synthetic testing catches regressions between experiments. It’s the always-on complement to periodic chaos and game-day exercises.
Types Of Synthetic Testing
Section titled “Types Of Synthetic Testing”Synthetic Monitoring (Canary Tests)
Section titled “Synthetic Monitoring (Canary Tests)”Automated scripts that simulate key user journeys at regular intervals (every 30 seconds, every minute) and alert when they fail.
What to test:
- Critical Paths — Login, checkout, search, API health endpoints. The flows that matter most to users.
- Dependencies — Can the service reach its database, cache, message queue, external APIs?
- Multi-Step Flows — Not just “is the endpoint up?” but “can a user complete the full workflow?”
How it differs from real user monitoring (RUM):
| Synthetic | Real User Monitoring | |
|---|---|---|
| Traffic source | Artificial, scripted | Actual users |
| Coverage | Consistent: runs even at 3 AM with no users | Depends on traffic volume and patterns |
| Latency | Measures from known locations with known conditions | Measures real-world variance (devices, networks) |
| Best for | Availability checks, regression detection, SLO measurement | Understanding actual user experience |
Use both. Synthetic monitoring catches outages fast (even during low-traffic periods). RUM shows you how real users experience the system.
Load Replay
Section titled “Load Replay”Capture real production traffic and replay it against a test environment to validate behavior under realistic conditions.
Why replay real traffic:
- Realistic Patterns — Real traffic has distributions, edge cases, and correlations that synthetic generators miss. A load test with uniform traffic doesn’t catch the hot partition that 5% of your users trigger.
- Regression Detection — Replay last week’s traffic against this week’s build. If error rates or latency increase, you’ve found a regression.
- Pre-Deploy Validation — Before promoting a build, replay a sample of production traffic and compare results against the current production build.
How to do it:
- Capture — Record requests (headers, body, timing) from production traffic. Sanitize sensitive data (PII, auth tokens).
- Store — Keep captured traffic in a replay-ready format. Store enough to represent your traffic patterns (a few hours to a day is typical).
- Replay — Send captured requests against a test environment at the original rate (1x) or amplified (2x, 5x) for stress testing.
- Compare — Diff responses, error rates, and latencies between the replay run and the production baseline.
Sanitize captured traffic carefully. Production requests contain auth tokens, PII, and session data that must not leak to test environments or logs.
Regression Reliability Testing
Section titled “Regression Reliability Testing”Automated tests that verify resilience properties haven’t regressed across releases.
Examples:
- Timeout Behavior — “When dependency X is slow, the service returns a degraded response within 2 seconds.” Run this as a test on every deploy.
- Circuit Breaker Trips — “When dependency X returns 50% errors, the circuit breaker opens within 10 seconds.” Verify this still works after changes.
- Graceful Degradation — “When the cache is unavailable, the API still returns correct responses (from the database) with latency under 100ms.”
- Rate Limiting — “When request rate exceeds the limit, excess requests receive 429 responses, not 500s.”
These tests belong in your CI/CD pipeline (see CI/CD for Applications) and run against a production-like environment with failure injection.
When To Use What
Section titled “When To Use What”| Technique | Frequency | Cost | What It Catches |
|---|---|---|---|
| Synthetic Monitoring | Continuous (every 30s-5min) | Low | Outages, availability regressions, dependency failures |
| Load Replay | Before deploys or weekly | Medium | Performance regressions, traffic-pattern-specific bugs, scaling issues |
| Regression Reliability Tests | Every build (in CI) | Low-medium | Resilience regressions: broken timeouts, missing fallbacks, circuit breaker changes |
| Chaos Experiments | Periodic (monthly/quarterly) | Medium-high | Unknown failure modes, untested scenarios |
| Game-Days | Quarterly/annually | High | End-to-end response: people + process + systems |
These are complementary.
Synthetic monitoring and regression tests run continuously and catch regressions fast.
Load replay and chaos experiments run periodically and catch deeper issues. Game-days validate the full response chain.
Building A Reliability Testing Program
Section titled “Building A Reliability Testing Program”A practical progression:
- Start With Synthetic Monitoring. Set up canary tests for your top 3-5 critical paths. Alert when they fail. This gives you continuous visibility.
- Add Regression Reliability Tests to CI. Pick the 3 most important resilience behaviors (timeout, fallback, circuit breaker) and test them on every build.
- Introduce Load Replay. Capture a day of production traffic and replay before major releases. Compare against baseline.
- Run Chaos Experiments. Start in pre-production, graduate to production. See Chaos Experiments.
- Schedule Game-Days. Once experiments are routine, test the full response chain. See Game-Days and Drills.
See Also
Section titled “See Also”- Chaos Experiments — Periodic, hypothesis-driven failure injection.
- Game-Days and Drills — End-to-end exercises that test people and process.
- Load and Stress Testing — Performance-focused testing that complements reliability testing.
- CI/CD for Applications — Where regression reliability tests run in the pipeline.
- Error Rate and Throughput — The SLIs synthetic tests measure.
- Alerting — Synthetic test failures should trigger alerts.