# Dependency-Aware Chaos Testing ## Overview Dependency-aware chaos testing has two layers: 1. **Outbound Layer** — Intercepts outbound requests to dependencies (Stripe, APIs, DBs) 2. **Body Corruption Layer** — Corrupts HTTP response bodies (truncation, malformed data) This addresses the critical limitation of HTTP-layer chaos (v1) which only tested response schemas, not handler error handling logic. ## Two-Layer Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ OUTBOUND LAYER │ │ Tests: Handler error handling, retry logic, circuit breakers │ │ │ │ • Outbound HTTP interception (Stripe, APIs) │ │ • Dependency failure simulation │ └─────────────────────────────────────────────────────────────┘ │ ┌─────────────────────────────────────────────────────────────┐ │ BODY CORRUPTION LAYER │ │ Tests: Response parsing, validation, streaming resilience │ │ │ │ • Truncation (partial responses) │ │ • Malformed data (invalid JSON, corrupted structure) │ │ • Partial chunks (missing NDJSON lines) │ └─────────────────────────────────────────────────────────────┘ ``` ## Outbound Layer Chaos ### Outbound HTTP Interception Intercept requests from handlers to external APIs: ```javascript await fastify.apophis.contract({ depth: 'quick', chaos: { probability: 0.1, outbound: [ { target: 'api.stripe.com', delay: { probability: 0.1, minMs: 1000, maxMs: 5000 }, error: { probability: 0.05, responses: [ { statusCode: 429, headers: { 'retry-after': '60' } }, { statusCode: 503, body: { error: 'stripe_unavailable' } } ] } } ] } }) ``` **What it tests:** - Does the handler catch Stripe 429 and return retry-after header? - Does the handler handle Stripe 503 and return meaningful error? - Does the handler implement exponential backoff? **What it does NOT test:** - Response schema compliance (that's body corruption layer) ### wrapFetch Wrap a `fetch` implementation so outbound requests are intercepted: ```javascript import { wrapFetch, createOutboundInterceptor } from 'apophis-fastify' const interceptor = createOutboundInterceptor([ { target: 'api.stripe.com', delay: { probability: 0.1, minMs: 1000, maxMs: 5000 }, error: { probability: 0.05, responses: [ { statusCode: 429, headers: { 'retry-after': '60' } } ] } } ], 42) const interceptedFetch = wrapFetch(globalThis.fetch, interceptor) const res = await interceptedFetch('https://api.stripe.com/v1/charges') ``` ## Body Corruption Layer ### Response Truncation Simulate partial responses: ```javascript await fastify.apophis.contract({ depth: 'quick', chaos: { probability: 0.1, corruption: { probability: 0.1 } } }) ``` **What it tests:** - Does the client handle partial JSON gracefully? - Does streaming parser recover from truncated chunks? - Does validation fail gracefully with incomplete data? ### Malformed Data Corruption is content-type aware. Built-in strategies: | Content Type | Strategy | Kind | |-------------|----------|------| | `application/json` | Truncates objects/arrays or nulls random fields | `body-truncate` / `body-malformed` | | `application/x-ndjson` | Corrupts a random chunk | `body-malformed` | | `text/event-stream` | Corrupts SSE event format | `body-malformed` | | `multipart/form-data` | Corrupts a multipart field | `body-malformed` | | `text/plain` | Truncates text response | `body-truncate` | | `text/html` | Truncates HTML response | `body-truncate` | ## Chaos Event Reporting Every chaos injection is visible in test diagnostics: ```javascript // Outbound layer chaos { ok: false, name: 'POST /billing/plans (#1)', diagnostics: { error: 'Contract violation: status:200', chaos: { injected: true, type: 'outbound-error', details: { statusCode: 429, dependencyUrl: 'https://api.stripe.com/v1/payment_intents', reason: 'Outbound error: 429 from https://api.stripe.com/v1/payment_intents', errorResponse: { error: 'rate_limit' } } } } } // Body corruption layer { ok: false, name: 'GET /users (#2)', diagnostics: { error: 'Contract violation: response_body(this).users != null', chaos: { injected: true, type: 'corruption', details: { reason: 'Body corruption: Truncates JSON response or nulls a random field', strategy: 'json-truncate' } } } } ``` ## Dropout Semantics Dropout simulations are reported as HTTP-style failure statuses: - **504 Gateway Timeout** for timeouts (default) - **503 Service Unavailable** for network failures - Configurable: `dropout: { probability: 0.1, statusCode: 503 }` ## Blast Radius Cap Limit total chaos injections per test suite: ```javascript await fastify.apophis.contract({ depth: 'quick', chaos: { probability: 0.5, delay: { probability: 1.0, minMs: 10, maxMs: 50 }, maxInjectionsPerSuite: 10 } }) ``` ## Stateful Retry Safety Resilience verification automatically skips non-idempotent routes: ```javascript await fastify.apophis.contract({ depth: 'quick', chaos: { probability: 0.1, resilience: { enabled: true, maxRetries: 3 }, // Skip retries for routes that create side effects skipResilienceFor: ['constructor', 'mutator'] } }) ``` ## Best Practices ### 1. Use Outbound Layer for Business Logic Test handler behavior when dependencies fail: ```javascript // Good: Tests that handler catches Stripe 429 chaos: { outbound: [{ target: 'api.stripe.com', error: { probability: 0.1, responses: [{ statusCode: 429 }] } }] } // Bad: Only tests response schema chaos: { error: { probability: 0.1, statusCode: 429 } } ``` ### 2. Use Body Corruption for Parsing Resilience Test response parsing and validation: ```javascript // Good: Tests JSON parser resilience chaos: { corruption: { probability: 0.1 } } ``` ### 3. Combine Both Layers ```javascript await fastify.apophis.contract({ depth: 'quick', chaos: { probability: 0.1, // Outbound layer: dependency failures outbound: [{ target: 'api.stripe.com', error: { probability: 0.05, responses: [{ statusCode: 429 }] } }], // Body corruption: response corruption corruption: { probability: 0.05 }, // Safety: skip retries for stateful routes skipResilienceFor: ['constructor', 'mutator'] } }) ``` ### 4. Write Contracts for Error Handling ```javascript fastify.get('/billing/plans', { schema: { 'x-category': 'observer', 'x-ensures': [ 'if status:429 then response_headers(this)["retry-after"] != null else true', 'if status:503 then response_body(this).error == "stripe_unavailable" else true', 'if status:200 then response_body(this).plans != null else true' ] } }, async () => { ... }) ``` ## Migration from v1 The old HTTP-layer chaos is still supported but should be used for transport testing only: ```javascript // v1 (legacy — use for transport testing only) chaos: { probability: 0.1, error: { probability: 0.1, statusCode: 503 } } // v2.3 (recommended) chaos: { probability: 0.1, // Outbound layer outbound: [{ target: 'api.stripe.com', error: { probability: 0.1, responses: [{ statusCode: 429 }] } }], // Body corruption layer corruption: { probability: 0.05 } } ``` ## API Reference ### OutboundChaosConfig | Field | Type | Description | |-------|------|-------------| | `target` | `string` | Hostname or URL pattern to intercept | | `delay` | `{ probability, minMs, maxMs }` | Delay outbound requests | | `error` | `{ probability, responses }` | Return error responses | | `dropout` | `{ probability, statusCode? }` | Simulate network failures | ### Body Corruption Types | Type | Description | |------|-------------| | `body-truncate` | Partial response | | `body-malformed` | Invalid data | ### ChaosConfig | Field | Type | Description | |-------|------|-------------| | `probability` | `number` | Probability of injecting any chaos event (0.0 - 1.0) | | `delay` | `{ probability, minMs, maxMs }` | Delay injection | | `error` | `{ probability, statusCode, body? }` | Error injection | | `dropout` | `{ probability, statusCode? }` | Dropout injection | | `corruption` | `{ probability }` | Body corruption injection | | `outbound` | `OutboundChaosConfig[]` | Outbound HTTP interception | | `routes` | `Record>` | Per-route overrides | | `include` | `string[]` | Include only these routes | | `exclude` | `string[]` | Exclude these routes | | `resilience` | `{ enabled, maxRetries?, backoffMs? }` | Resilience verification | | `skipResilienceFor` | `string[]` | Skip resilience for categories | | `dropoutStatusCode` | `number` | Status code for dropout (default: 504) | | `maxInjectionsPerSuite` | `number` | Maximum injections per suite |