Files
apophis-fastify/docs/attic/chaos-v2.md
T

336 lines
9.6 KiB
Markdown
Raw Normal View History

# Dependency-Aware Chaos Testing
## Overview
Dependency-aware chaos testing has two layers:
1. **Outbound Layer** — Intercepts outbound requests to dependencies (Stripe, APIs, DBs)
2. **Body Corruption Layer** — Corrupts HTTP response bodies (truncation, malformed data)
This addresses the critical limitation of HTTP-layer chaos (v1) which only tested response schemas, not handler error handling logic.
## Two-Layer Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ OUTBOUND LAYER │
│ Tests: Handler error handling, retry logic, circuit breakers │
│ │
│ • Outbound HTTP interception (Stripe, APIs) │
│ • Dependency failure simulation │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ BODY CORRUPTION LAYER │
│ Tests: Response parsing, validation, streaming resilience │
│ │
│ • Truncation (partial responses) │
│ • Malformed data (invalid JSON, corrupted structure) │
│ • Partial chunks (missing NDJSON lines) │
└─────────────────────────────────────────────────────────────┘
```
## Outbound Layer Chaos
### Outbound HTTP Interception
Intercept requests from handlers to external APIs:
```javascript
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
outbound: [
{
target: 'api.stripe.com',
delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
error: {
probability: 0.05,
responses: [
{ statusCode: 429, headers: { 'retry-after': '60' } },
{ statusCode: 503, body: { error: 'stripe_unavailable' } }
]
}
}
]
}
})
```
**What it tests:**
- Does the handler catch Stripe 429 and return retry-after header?
- Does the handler handle Stripe 503 and return meaningful error?
- Does the handler implement exponential backoff?
**What it does NOT test:**
- Response schema compliance (that's body corruption layer)
### wrapFetch
Wrap a `fetch` implementation so outbound requests are intercepted:
```javascript
import { wrapFetch, createOutboundInterceptor } from 'apophis-fastify'
const interceptor = createOutboundInterceptor([
{
target: 'api.stripe.com',
delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
error: {
probability: 0.05,
responses: [
{ statusCode: 429, headers: { 'retry-after': '60' } }
]
}
}
], 42)
const interceptedFetch = wrapFetch(globalThis.fetch, interceptor)
const res = await interceptedFetch('https://api.stripe.com/v1/charges')
```
## Body Corruption Layer
### Response Truncation
Simulate partial responses:
```javascript
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
corruption: { probability: 0.1 }
}
})
```
**What it tests:**
- Does the client handle partial JSON gracefully?
- Does streaming parser recover from truncated chunks?
- Does validation fail gracefully with incomplete data?
### Malformed Data
Corruption is content-type aware. Built-in strategies:
| Content Type | Strategy | Kind |
|-------------|----------|------|
| `application/json` | Truncates objects/arrays or nulls random fields | `body-truncate` / `body-malformed` |
| `application/x-ndjson` | Corrupts a random chunk | `body-malformed` |
| `text/event-stream` | Corrupts SSE event format | `body-malformed` |
| `multipart/form-data` | Corrupts a multipart field | `body-malformed` |
| `text/plain` | Truncates text response | `body-truncate` |
| `text/html` | Truncates HTML response | `body-truncate` |
## Chaos Event Reporting
Every chaos injection is visible in test diagnostics:
```javascript
// Outbound layer chaos
{
ok: false,
name: 'POST /billing/plans (#1)',
diagnostics: {
error: 'Contract violation: status:200',
chaos: {
injected: true,
type: 'outbound-error',
details: {
statusCode: 429,
dependencyUrl: 'https://api.stripe.com/v1/payment_intents',
reason: 'Outbound error: 429 from https://api.stripe.com/v1/payment_intents',
errorResponse: { error: 'rate_limit' }
}
}
}
}
// Body corruption layer
{
ok: false,
name: 'GET /users (#2)',
diagnostics: {
error: 'Contract violation: response_body(this).users != null',
chaos: {
injected: true,
type: 'corruption',
details: {
reason: 'Body corruption: Truncates JSON response or nulls a random field',
strategy: 'json-truncate'
}
}
}
}
```
## Dropout Semantics
Dropout simulations are reported as HTTP-style failure statuses:
- **504 Gateway Timeout** for timeouts (default)
- **503 Service Unavailable** for network failures
- Configurable: `dropout: { probability: 0.1, statusCode: 503 }`
## Blast Radius Cap
Limit total chaos injections per test suite:
```javascript
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.5,
delay: { probability: 1.0, minMs: 10, maxMs: 50 },
maxInjectionsPerSuite: 10
}
})
```
## Stateful Retry Safety
Resilience verification automatically skips non-idempotent routes:
```javascript
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
resilience: {
enabled: true,
maxRetries: 3
},
// Skip retries for routes that create side effects
skipResilienceFor: ['constructor', 'mutator']
}
})
```
## Best Practices
### 1. Use Outbound Layer for Business Logic
Test handler behavior when dependencies fail:
```javascript
// Good: Tests that handler catches Stripe 429
chaos: {
outbound: [{
target: 'api.stripe.com',
error: { probability: 0.1, responses: [{ statusCode: 429 }] }
}]
}
// Bad: Only tests response schema
chaos: {
error: { probability: 0.1, statusCode: 429 }
}
```
### 2. Use Body Corruption for Parsing Resilience
Test response parsing and validation:
```javascript
// Good: Tests JSON parser resilience
chaos: {
corruption: { probability: 0.1 }
}
```
### 3. Combine Both Layers
```javascript
await fastify.apophis.contract({
depth: 'quick',
chaos: {
probability: 0.1,
// Outbound layer: dependency failures
outbound: [{
target: 'api.stripe.com',
error: { probability: 0.05, responses: [{ statusCode: 429 }] }
}],
// Body corruption: response corruption
corruption: { probability: 0.05 },
// Safety: skip retries for stateful routes
skipResilienceFor: ['constructor', 'mutator']
}
})
```
### 4. Write Contracts for Error Handling
```javascript
fastify.get('/billing/plans', {
schema: {
'x-category': 'observer',
'x-ensures': [
'if status:429 then response_headers(this)["retry-after"] != null else true',
'if status:503 then response_body(this).error == "stripe_unavailable" else true',
'if status:200 then response_body(this).plans != null else true'
]
}
}, async () => { ... })
```
## Migration from v1
The old HTTP-layer chaos is still supported but should be used for transport testing only:
```javascript
// v1 (legacy — use for transport testing only)
chaos: {
probability: 0.1,
error: { probability: 0.1, statusCode: 503 }
}
// v2.3 (recommended)
chaos: {
probability: 0.1,
// Outbound layer
outbound: [{
target: 'api.stripe.com',
error: { probability: 0.1, responses: [{ statusCode: 429 }] }
}],
// Body corruption layer
corruption: { probability: 0.05 }
}
```
## API Reference
### OutboundChaosConfig
| Field | Type | Description |
|-------|------|-------------|
| `target` | `string` | Hostname or URL pattern to intercept |
| `delay` | `{ probability, minMs, maxMs }` | Delay outbound requests |
| `error` | `{ probability, responses }` | Return error responses |
| `dropout` | `{ probability, statusCode? }` | Simulate network failures |
### Body Corruption Types
| Type | Description |
|------|-------------|
| `body-truncate` | Partial response |
| `body-malformed` | Invalid data |
### ChaosConfig
| Field | Type | Description |
|-------|------|-------------|
| `probability` | `number` | Probability of injecting any chaos event (0.0 - 1.0) |
| `delay` | `{ probability, minMs, maxMs }` | Delay injection |
| `error` | `{ probability, statusCode, body? }` | Error injection |
| `dropout` | `{ probability, statusCode? }` | Dropout injection |
| `corruption` | `{ probability }` | Body corruption injection |
| `outbound` | `OutboundChaosConfig[]` | Outbound HTTP interception |
| `routes` | `Record<string, Partial<ChaosConfig>>` | Per-route overrides |
| `include` | `string[]` | Include only these routes |
| `exclude` | `string[]` | Exclude these routes |
| `resilience` | `{ enabled, maxRetries?, backoffMs? }` | Resilience verification |
| `skipResilienceFor` | `string[]` | Skip resilience for categories |
| `dropoutStatusCode` | `number` | Status code for dropout (default: 504) |
| `maxInjectionsPerSuite` | `number` | Maximum injections per suite |