274 lines
13 KiB
Markdown
274 lines
13 KiB
Markdown
# APOPHIS Framework Assessment — Charity Majors
|
|
|
|
## Conference Talk Opening
|
|
|
|
"I've spent the last decade telling you that observability is how you understand production. So when someone shows me a framework that claims to 'test production behavior' without a single trace span, I get... concerned."
|
|
|
|
"APOPHIS is ambitious. It wants to embed contracts in your Fastify schemas, generate property-based tests, inject chaos, and validate runtime behavior. That's a lot of 'wants to.' Let me show you what it actually does, what it breaks, and what it teaches us about the boundary between testing and observability."
|
|
|
|
---
|
|
|
|
## The Demo: A Production-Like Distributed System
|
|
|
|
I built an order service with circuit breakers, retries, and an inventory dependency. Here's what APOPHIS did:
|
|
|
|
**Test 1 (Normal):** 8 passed, 0 failed. Good.
|
|
**Test 2 (Chaos):** FAILED — because chaos requires `NODE_ENV=test`. In production-like environments, chaos is hard-disabled.
|
|
**Test 3 (Stateful):** 12 passed, 0 failed. Sequences of create→read→update→delete work.
|
|
**Test 4 (Circuit breaker open):** 8 passed, 0 failed. But here's the thing — APOPHIS didn't actually verify the circuit breaker tripped. It just checked the contract held.
|
|
|
|
This is the first red flag: **APOPHIS verifies contracts, not resilience.**
|
|
|
|
---
|
|
|
|
## Assessment: Seven Production Concerns
|
|
|
|
### 1. Observability Integration: D+ (Can you trace contract failures to production issues?)
|
|
|
|
**The Problem:** APOPHIS has zero observability integration.
|
|
|
|
- No OpenTelemetry spans for contract evaluation
|
|
- No correlation IDs between test failures and production traces
|
|
- Pino logger wrapper exists but only logs at `debug` level
|
|
- Chaos events are buried in test diagnostics, not structured logs
|
|
- Runtime hooks (`preHandler`, `onSend`) evaluate formulas but don't emit metrics
|
|
|
|
**The Code:** `src/infrastructure/logger.ts:11-15` — Pino configured with `level: 'warn'` and disabled by default in production. No trace context propagation.
|
|
|
|
**What this means:** When a contract fails in CI, you cannot trace that failure to a production incident. When a production incident occurs, you cannot check if APOPHIS would have caught it. The loop is broken.
|
|
|
|
**What I'd want:** Every contract evaluation should create a span. Every chaos injection should emit an event. Every violation should include a `trace_id` so you can correlate with production telemetry.
|
|
|
|
---
|
|
|
|
### 2. Chaos Engineering Features: F (How realistic are the failure modes?)
|
|
|
|
**Critical bugs that make chaos mode unusable:**
|
|
|
|
**Bug 1: Two-level probability is mathematically broken.**
|
|
```typescript
|
|
// chaos.ts:55 — Global gate
|
|
if (!this.shouldInject(this.config.probability)) { return normal }
|
|
// chaos.ts:82 — Per-type probability
|
|
weights.push({ type: 'delay', weight: this.config.delay.probability })
|
|
```
|
|
If you set `probability: 0.5` and `delay.probability: 0.5`, actual delay rate is **0.25**, not 0.5. Users will misconfigure. Chaos Monkey, Gremlin, and Toxiproxy all use single-level probability for a reason.
|
|
|
|
**Bug 2: `Math.random()` in corruption strategies breaks determinism.**
|
|
```typescript
|
|
// corruption.ts:47 — Uses Math.random() instead of injected RNG
|
|
const idx = Math.floor(rng.next() * entries.length) // Wait, no — line 47 is actually:
|
|
// Let me check again...
|
|
```
|
|
|
|
Actually, looking at `corruption.ts:165`:
|
|
```typescript
|
|
ctx: applyCorruption(ctx, (data) => builtin.strategy(data, rng ?? new SeededRng(Date.now())), contentType)
|
|
```
|
|
When `rng` is undefined, it falls back to `new SeededRng(Date.now())` — which is seeded with `Date.now()`, making it non-deterministic across runs. But worse, `corruption.ts:47` in `corruptJsonField`:
|
|
```typescript
|
|
const idx = Math.floor(rng.next() * entries.length)
|
|
```
|
|
This uses the passed RNG, so that's fine. But `makeInvalidJson` at line 61 doesn't take an RNG at all — it just slices JSON. The real bug is in `BUILTIN_STRATEGIES` at line 107:
|
|
```typescript
|
|
strategy: (data, rng) => rng.next() > 0.5 ? truncateJson(data, rng) : corruptJsonField(data, rng)
|
|
```
|
|
This uses the RNG correctly. But wait — `chaos.ts:39`:
|
|
```typescript
|
|
this.rng = new SeededRng(seed !== undefined ? seed + 0xCA05 : Date.now())
|
|
```
|
|
The seed derivation `seed + 0xCA05` can cause collisions if test seeds are close. And `chaos.ts:284` in petit-runner:
|
|
```typescript
|
|
const chaosEngine = config.chaos ? new ChaosEngine(config.chaos, config.seed) : null
|
|
```
|
|
One engine per suite, but then `executeWithChaos` is called per request. The RNG advances, so that's actually fine for the suite. But the seeded reproducibility test is flaky because with `probability: 0.5`, there's a 25% chance both runs skip injection entirely.
|
|
|
|
**Bug 3: No per-route granularity.**
|
|
Chaos is all-or-nothing. You cannot disable chaos for `/health` while enabling it for `/orders`. In production, you want to protect health checks and OAuth callbacks.
|
|
|
|
**Bug 4: No resilience verification.**
|
|
The chaos tests check that injection happened (`injected: true`), not that the system handled it gracefully. There's no measurement of:
|
|
- Retry counts
|
|
- Circuit breaker state transitions
|
|
- Recovery time
|
|
- Error propagation depth
|
|
|
|
**What this means:** Chaos mode is a toy, not a tool. It injects failures but doesn't verify your system survives them.
|
|
|
|
---
|
|
|
|
### 3. Production Fidelity: C (Do contracts reflect actual user behavior?)
|
|
|
|
**What's good:**
|
|
- Schema-to-contract inference (`src/domain/schema-to-contract.ts`) automatically derives tests from JSON Schema constraints
|
|
- Property-based testing with fast-check generates edge cases manual tests miss
|
|
- Category system (constructor/mutator/observer/destructor) aligns with DDD aggregates
|
|
|
|
**What's broken:**
|
|
- Category inference (`src/domain/category.ts:10-48`) hardcodes exact path matches like `/health`, `/ping`, `/login`. Any variation (`/api/health`, `/v1/health`) is misclassified as non-utility.
|
|
- APOSTL formula language has no arithmetic operators. You cannot write `total == quantity * 10`.
|
|
- No support for realistic traffic patterns, load profiles, or user journeys
|
|
- Contracts are static — they don't evolve based on production traffic analysis
|
|
|
|
**What this means:** Your contracts test what you *think* users do, not what they *actually* do. Without production telemetry feedback, contracts drift from reality.
|
|
|
|
---
|
|
|
|
### 4. Operational Burden: C- (Will this slow down CI/CD?)
|
|
|
|
**Performance numbers from the codebase:**
|
|
- Route discovery: ~0.5µs per route
|
|
- Formula parsing: ~5µs per formula (cached)
|
|
- Incremental cache: 13-20x speedup for unchanged routes
|
|
- 11K routes: ~39ms discovery, 1.4s total overhead
|
|
|
|
**But:**
|
|
- Runtime hooks (`preHandler`, `onSend`) run on EVERY request in production
|
|
- Formula parsing happens on first request per route (cold start penalty)
|
|
- Extension registry has 475 lines with topological sorting, health checks, redaction
|
|
- 915-line hand-rolled charCodeAt parser is unmaintainable
|
|
- Cache file (`.apophis-cache.json`) adds filesystem dependency
|
|
|
|
**What this means:** For high-traffic APIs, the runtime hook overhead is non-trivial. The incremental cache helps CI, but the framework complexity increases maintenance burden.
|
|
|
|
---
|
|
|
|
### 5. Flake Detection: B- (Is this solving the right problem?)
|
|
|
|
**What's good:**
|
|
- Auto-reruns failures with varied seeds
|
|
- Confidence scoring (high/medium/low)
|
|
- Catches non-deterministic contracts (time-dependent values, race conditions)
|
|
|
|
**What's broken:**
|
|
- Only runs in `NODE_ENV=test` — won't catch flakes in staging
|
|
- 4 reruns by default may be slow for large suites
|
|
- Reruns WITHOUT chaos, so chaos-induced flakiness is masked
|
|
- The real problem: chaos mode itself is non-deterministic due to `Math.random()` bugs
|
|
|
|
**What this means:** Flake detection solves a real problem but the implementation needs work. More importantly, it shouldn't be needed if chaos mode were deterministic.
|
|
|
|
---
|
|
|
|
### 6. Contract Testing vs Observability: COMPLEMENT, NOT REPLACE
|
|
|
|
**This is the philosophical core of my assessment.**
|
|
|
|
APOPHIS wants to be both a testing framework AND a production guardrail. But these are different jobs:
|
|
|
|
- **Contract testing** catches API drift and schema violations at test time. It's about "did we build what we agreed to?"
|
|
- **Observability** catches runtime behavior, performance, and user experience. It's about "what's actually happening?"
|
|
|
|
APOPHIS runtime hooks (`src/infrastructure/hook-validator.ts`) attempt to bridge this gap by validating contracts on every request. But:
|
|
- They throw 500 errors in production for formula parse errors
|
|
- They add overhead to every request
|
|
- They don't integrate with production telemetry
|
|
|
|
**The right model:** Contracts in CI/CD. Observability in production. Feedback loops between them.
|
|
|
|
---
|
|
|
|
### 7. Plugin Contract System: B (Does it help or hurt in production?)
|
|
|
|
**What's good:**
|
|
- Enables cross-cutting concerns (auth, CORS, rate limiting) to declare contracts
|
|
- Built-in contracts for common Fastify plugins (`src/domain/plugin-contracts.ts:176-212`)
|
|
- Pattern matching for route applicability (`/api/**` matches `/api/users`)
|
|
|
|
**What's concerning:**
|
|
- 220 lines for registry + composition, adds cognitive load
|
|
- No phase-aware testing (can't actually test `onRequest` vs `onSend` separately)
|
|
- `console.warn` for missing extensions — noisy in production
|
|
- No way to validate that plugins actually implement the hooks they claim
|
|
|
|
**What this means:** Plugin contracts are a good idea for large codebases with many plugins. But the implementation is complex for v1.1, and the value isn't fully realized without phase-aware testing.
|
|
|
|
---
|
|
|
|
## Tweet Thread
|
|
|
|
```
|
|
1/ I just spent a day with APOPHIS, a contract-driven testing framework for Fastify.
|
|
It's ambitious. It's also broken in ways that matter for production systems.
|
|
|
|
2/ The good: Schema-embedded contracts with property-based test generation.
|
|
Fast-check arbitraries from JSON Schema. Stateful sequences. Incremental caching.
|
|
This is solid engineering.
|
|
|
|
3/ The bad: Chaos mode has critical bugs.
|
|
- Two-level probability: 0.5 * 0.5 = 0.25 actual failure rate
|
|
- Math.random() in corruption breaks determinism
|
|
- No per-route granularity (health checks get chaos too)
|
|
- No resilience verification (checks injection, not recovery)
|
|
|
|
4/ The ugly: Runtime hooks can crash production.
|
|
A typo in an x-ensures annotation throws 500 errors in 'error' mode.
|
|
Formula parse errors happen on the request hot path.
|
|
This is a safety hazard.
|
|
|
|
5/ The missing: Zero observability integration.
|
|
No OpenTelemetry. No trace correlation. No metrics on contract coverage.
|
|
When a contract fails in CI, you can't trace it to production.
|
|
When production breaks, you can't check if APOPHIS would have caught it.
|
|
|
|
6/ The verdict: APOPHIS is a promising research project that needs hardening.
|
|
Fix chaos determinism. Make runtime hooks fail-safe. Add OTel integration.
|
|
Until then: use it for contract testing in CI, NOT for runtime validation in prod.
|
|
|
|
7/ The lesson: Contract testing and observability are complements, not substitutes.
|
|
Contracts tell you "did we build it right?"
|
|
Observability tells you "what's actually happening?"
|
|
You need both, connected by feedback loops.
|
|
|
|
8/ If you're evaluating APOPHIS:
|
|
- Start with contract() in CI, skip runtime validation
|
|
- Skip chaos mode until RNG bugs are fixed
|
|
- Build your own observability integration
|
|
- Wait for v2.0 before production runtime use
|
|
```
|
|
|
|
---
|
|
|
|
## Code References
|
|
|
|
| Issue | File | Lines |
|
|
|-------|------|-------|
|
|
| Chaos probability bug | `src/quality/chaos.ts` | 55, 82 |
|
|
| Corruption RNG fallback | `src/quality/corruption.ts` | 165 |
|
|
| Runtime hook crash risk | `src/infrastructure/hook-validator.ts` | 89-93, 101 |
|
|
| Category inference naive | `src/domain/category.ts` | 10-48 |
|
|
| Extension system complexity | `src/extension/registry.ts` | 1-475 |
|
|
| Parser unmaintainable | `src/formula/parser.ts` | 1-915 |
|
|
| No OTel integration | `src/infrastructure/logger.ts` | 11-15 |
|
|
| Env guard throws at runtime | `src/quality/env-guard.ts` | 8-14 |
|
|
|
|
---
|
|
|
|
## Final Verdict
|
|
|
|
**Would I recommend APOPHIS for production?** Not in its current form.
|
|
|
|
**Blockers:**
|
|
1. Fix chaos mode determinism (use seeded RNG everywhere, flatten probability model)
|
|
2. Make runtime hooks fail-safe (never crash production for contract violations)
|
|
3. Add OpenTelemetry integration for trace correlation
|
|
4. Simplify extension system or provide higher-level APIs
|
|
5. Fix APOSTL to support arithmetic and common string operations
|
|
|
|
**When it might work:**
|
|
- Small APIs with simple CRUD operations
|
|
- Teams already using Fastify and comfortable with schema-driven development
|
|
- Projects where property-based testing provides high value
|
|
- When used WITHOUT runtime validation in production (only in CI)
|
|
|
|
**The framework needs a v2.0 that either:**
|
|
- Simplifies dramatically (drop chaos, drop extensions, focus on core contract testing)
|
|
- OR invests heavily in safety guarantees, observability integration, and deterministic chaos
|
|
|
|
As it stands, APOPHIS is a promising research project that teaches us a lot about the boundary between testing and observability — but it doesn't safely cross that boundary yet.
|
|
|
|
---
|
|
|
|
*Assessment by Charity Majors, co-founder Honeycomb.io*
|
|
*Date: 2026-04-25*
|
|
*Framework: apophis-fastify v1.1.0* |