# APOPHIS Framework Assessment — Charity Majors ## Conference Talk Opening "I've spent the last decade telling you that observability is how you understand production. So when someone shows me a framework that claims to 'test production behavior' without a single trace span, I get... concerned." "APOPHIS is ambitious. It wants to embed contracts in your Fastify schemas, generate property-based tests, inject chaos, and validate runtime behavior. That's a lot of 'wants to.' Let me show you what it actually does, what it breaks, and what it teaches us about the boundary between testing and observability." --- ## The Demo: A Production-Like Distributed System I built an order service with circuit breakers, retries, and an inventory dependency. Here's what APOPHIS did: **Test 1 (Normal):** 8 passed, 0 failed. Good. **Test 2 (Chaos):** FAILED — because chaos requires `NODE_ENV=test`. In production-like environments, chaos is hard-disabled. **Test 3 (Stateful):** 12 passed, 0 failed. Sequences of create→read→update→delete work. **Test 4 (Circuit breaker open):** 8 passed, 0 failed. But here's the thing — APOPHIS didn't actually verify the circuit breaker tripped. It just checked the contract held. This is the first red flag: **APOPHIS verifies contracts, not resilience.** --- ## Assessment: Seven Production Concerns ### 1. Observability Integration: D+ (Can you trace contract failures to production issues?) **The Problem:** APOPHIS has zero observability integration. - No OpenTelemetry spans for contract evaluation - No correlation IDs between test failures and production traces - Pino logger wrapper exists but only logs at `debug` level - Chaos events are buried in test diagnostics, not structured logs - Runtime hooks (`preHandler`, `onSend`) evaluate formulas but don't emit metrics **The Code:** `src/infrastructure/logger.ts:11-15` — Pino configured with `level: 'warn'` and disabled by default in production. No trace context propagation. **What this means:** When a contract fails in CI, you cannot trace that failure to a production incident. When a production incident occurs, you cannot check if APOPHIS would have caught it. The loop is broken. **What I'd want:** Every contract evaluation should create a span. Every chaos injection should emit an event. Every violation should include a `trace_id` so you can correlate with production telemetry. --- ### 2. Chaos Engineering Features: F (How realistic are the failure modes?) **Critical bugs that make chaos mode unusable:** **Bug 1: Two-level probability is mathematically broken.** ```typescript // chaos.ts:55 — Global gate if (!this.shouldInject(this.config.probability)) { return normal } // chaos.ts:82 — Per-type probability weights.push({ type: 'delay', weight: this.config.delay.probability }) ``` If you set `probability: 0.5` and `delay.probability: 0.5`, actual delay rate is **0.25**, not 0.5. Users will misconfigure. Chaos Monkey, Gremlin, and Toxiproxy all use single-level probability for a reason. **Bug 2: `Math.random()` in corruption strategies breaks determinism.** ```typescript // corruption.ts:47 — Uses Math.random() instead of injected RNG const idx = Math.floor(rng.next() * entries.length) // Wait, no — line 47 is actually: // Let me check again... ``` Actually, looking at `corruption.ts:165`: ```typescript ctx: applyCorruption(ctx, (data) => builtin.strategy(data, rng ?? new SeededRng(Date.now())), contentType) ``` When `rng` is undefined, it falls back to `new SeededRng(Date.now())` — which is seeded with `Date.now()`, making it non-deterministic across runs. But worse, `corruption.ts:47` in `corruptJsonField`: ```typescript const idx = Math.floor(rng.next() * entries.length) ``` This uses the passed RNG, so that's fine. But `makeInvalidJson` at line 61 doesn't take an RNG at all — it just slices JSON. The real bug is in `BUILTIN_STRATEGIES` at line 107: ```typescript strategy: (data, rng) => rng.next() > 0.5 ? truncateJson(data, rng) : corruptJsonField(data, rng) ``` This uses the RNG correctly. But wait — `chaos.ts:39`: ```typescript this.rng = new SeededRng(seed !== undefined ? seed + 0xCA05 : Date.now()) ``` The seed derivation `seed + 0xCA05` can cause collisions if test seeds are close. And `chaos.ts:284` in petit-runner: ```typescript const chaosEngine = config.chaos ? new ChaosEngine(config.chaos, config.seed) : null ``` One engine per suite, but then `executeWithChaos` is called per request. The RNG advances, so that's actually fine for the suite. But the seeded reproducibility test is flaky because with `probability: 0.5`, there's a 25% chance both runs skip injection entirely. **Bug 3: No per-route granularity.** Chaos is all-or-nothing. You cannot disable chaos for `/health` while enabling it for `/orders`. In production, you want to protect health checks and OAuth callbacks. **Bug 4: No resilience verification.** The chaos tests check that injection happened (`injected: true`), not that the system handled it gracefully. There's no measurement of: - Retry counts - Circuit breaker state transitions - Recovery time - Error propagation depth **What this means:** Chaos mode is a toy, not a tool. It injects failures but doesn't verify your system survives them. --- ### 3. Production Fidelity: C (Do contracts reflect actual user behavior?) **What's good:** - Schema-to-contract inference (`src/domain/schema-to-contract.ts`) automatically derives tests from JSON Schema constraints - Property-based testing with fast-check generates edge cases manual tests miss - Category system (constructor/mutator/observer/destructor) aligns with DDD aggregates **What's broken:** - Category inference (`src/domain/category.ts:10-48`) hardcodes exact path matches like `/health`, `/ping`, `/login`. Any variation (`/api/health`, `/v1/health`) is misclassified as non-utility. - APOSTL formula language has no arithmetic operators. You cannot write `total == quantity * 10`. - No support for realistic traffic patterns, load profiles, or user journeys - Contracts are static — they don't evolve based on production traffic analysis **What this means:** Your contracts test what you *think* users do, not what they *actually* do. Without production telemetry feedback, contracts drift from reality. --- ### 4. Operational Burden: C- (Will this slow down CI/CD?) **Performance numbers from the codebase:** - Route discovery: ~0.5µs per route - Formula parsing: ~5µs per formula (cached) - Incremental cache: 13-20x speedup for unchanged routes - 11K routes: ~39ms discovery, 1.4s total overhead **But:** - Runtime hooks (`preHandler`, `onSend`) run on EVERY request in production - Formula parsing happens on first request per route (cold start penalty) - Extension registry has 475 lines with topological sorting, health checks, redaction - 915-line hand-rolled charCodeAt parser is unmaintainable - Cache file (`.apophis-cache.json`) adds filesystem dependency **What this means:** For high-traffic APIs, the runtime hook overhead is non-trivial. The incremental cache helps CI, but the framework complexity increases maintenance burden. --- ### 5. Flake Detection: B- (Is this solving the right problem?) **What's good:** - Auto-reruns failures with varied seeds - Confidence scoring (high/medium/low) - Catches non-deterministic contracts (time-dependent values, race conditions) **What's broken:** - Only runs in `NODE_ENV=test` — won't catch flakes in staging - 4 reruns by default may be slow for large suites - Reruns WITHOUT chaos, so chaos-induced flakiness is masked - The real problem: chaos mode itself is non-deterministic due to `Math.random()` bugs **What this means:** Flake detection solves a real problem but the implementation needs work. More importantly, it shouldn't be needed if chaos mode were deterministic. --- ### 6. Contract Testing vs Observability: COMPLEMENT, NOT REPLACE **This is the philosophical core of my assessment.** APOPHIS wants to be both a testing framework AND a production guardrail. But these are different jobs: - **Contract testing** catches API drift and schema violations at test time. It's about "did we build what we agreed to?" - **Observability** catches runtime behavior, performance, and user experience. It's about "what's actually happening?" APOPHIS runtime hooks (`src/infrastructure/hook-validator.ts`) attempt to bridge this gap by validating contracts on every request. But: - They throw 500 errors in production for formula parse errors - They add overhead to every request - They don't integrate with production telemetry **The right model:** Contracts in CI/CD. Observability in production. Feedback loops between them. --- ### 7. Plugin Contract System: B (Does it help or hurt in production?) **What's good:** - Enables cross-cutting concerns (auth, CORS, rate limiting) to declare contracts - Built-in contracts for common Fastify plugins (`src/domain/plugin-contracts.ts:176-212`) - Pattern matching for route applicability (`/api/**` matches `/api/users`) **What's concerning:** - 220 lines for registry + composition, adds cognitive load - No phase-aware testing (can't actually test `onRequest` vs `onSend` separately) - `console.warn` for missing extensions — noisy in production - No way to validate that plugins actually implement the hooks they claim **What this means:** Plugin contracts are a good idea for large codebases with many plugins. But the implementation is complex for v1.1, and the value isn't fully realized without phase-aware testing. --- ## Tweet Thread ``` 1/ I just spent a day with APOPHIS, a contract-driven testing framework for Fastify. It's ambitious. It's also broken in ways that matter for production systems. 2/ The good: Schema-embedded contracts with property-based test generation. Fast-check arbitraries from JSON Schema. Stateful sequences. Incremental caching. This is solid engineering. 3/ The bad: Chaos mode has critical bugs. - Two-level probability: 0.5 * 0.5 = 0.25 actual failure rate - Math.random() in corruption breaks determinism - No per-route granularity (health checks get chaos too) - No resilience verification (checks injection, not recovery) 4/ The ugly: Runtime hooks can crash production. A typo in an x-ensures annotation throws 500 errors in 'error' mode. Formula parse errors happen on the request hot path. This is a safety hazard. 5/ The missing: Zero observability integration. No OpenTelemetry. No trace correlation. No metrics on contract coverage. When a contract fails in CI, you can't trace it to production. When production breaks, you can't check if APOPHIS would have caught it. 6/ The verdict: APOPHIS is a promising research project that needs hardening. Fix chaos determinism. Make runtime hooks fail-safe. Add OTel integration. Until then: use it for contract testing in CI, NOT for runtime validation in prod. 7/ The lesson: Contract testing and observability are complements, not substitutes. Contracts tell you "did we build it right?" Observability tells you "what's actually happening?" You need both, connected by feedback loops. 8/ If you're evaluating APOPHIS: - Start with contract() in CI, skip runtime validation - Skip chaos mode until RNG bugs are fixed - Build your own observability integration - Wait for v2.0 before production runtime use ``` --- ## Code References | Issue | File | Lines | |-------|------|-------| | Chaos probability bug | `src/quality/chaos.ts` | 55, 82 | | Corruption RNG fallback | `src/quality/corruption.ts` | 165 | | Runtime hook crash risk | `src/infrastructure/hook-validator.ts` | 89-93, 101 | | Category inference naive | `src/domain/category.ts` | 10-48 | | Extension system complexity | `src/extension/registry.ts` | 1-475 | | Parser unmaintainable | `src/formula/parser.ts` | 1-915 | | No OTel integration | `src/infrastructure/logger.ts` | 11-15 | | Env guard throws at runtime | `src/quality/env-guard.ts` | 8-14 | --- ## Final Verdict **Would I recommend APOPHIS for production?** Not in its current form. **Blockers:** 1. Fix chaos mode determinism (use seeded RNG everywhere, flatten probability model) 2. Make runtime hooks fail-safe (never crash production for contract violations) 3. Add OpenTelemetry integration for trace correlation 4. Simplify extension system or provide higher-level APIs 5. Fix APOSTL to support arithmetic and common string operations **When it might work:** - Small APIs with simple CRUD operations - Teams already using Fastify and comfortable with schema-driven development - Projects where property-based testing provides high value - When used WITHOUT runtime validation in production (only in CI) **The framework needs a v2.0 that either:** - Simplifies dramatically (drop chaos, drop extensions, focus on core contract testing) - OR invests heavily in safety guarantees, observability integration, and deterministic chaos As it stands, APOPHIS is a promising research project that teaches us a lot about the boundary between testing and observability — but it doesn't safely cross that boundary yet. --- *Assessment by Charity Majors, co-founder Honeycomb.io* *Date: 2026-04-25* *Framework: apophis-fastify v1.1.0*