Files
apophis-fastify/docs/attic/root-history/CHARITY_MAJORS_ASSESSMENT.md
T

13 KiB

APOPHIS Framework Assessment — Charity Majors

Conference Talk Opening

"I've spent the last decade telling you that observability is how you understand production. So when someone shows me a framework that claims to 'test production behavior' without a single trace span, I get... concerned."

"APOPHIS is ambitious. It wants to embed contracts in your Fastify schemas, generate property-based tests, inject chaos, and validate runtime behavior. That's a lot of 'wants to.' Let me show you what it actually does, what it breaks, and what it teaches us about the boundary between testing and observability."


The Demo: A Production-Like Distributed System

I built an order service with circuit breakers, retries, and an inventory dependency. Here's what APOPHIS did:

Test 1 (Normal): 8 passed, 0 failed. Good. Test 2 (Chaos): FAILED — because chaos requires NODE_ENV=test. In production-like environments, chaos is hard-disabled. Test 3 (Stateful): 12 passed, 0 failed. Sequences of create→read→update→delete work. Test 4 (Circuit breaker open): 8 passed, 0 failed. But here's the thing — APOPHIS didn't actually verify the circuit breaker tripped. It just checked the contract held.

This is the first red flag: APOPHIS verifies contracts, not resilience.


Assessment: Seven Production Concerns

1. Observability Integration: D+ (Can you trace contract failures to production issues?)

The Problem: APOPHIS has zero observability integration.

  • No OpenTelemetry spans for contract evaluation
  • No correlation IDs between test failures and production traces
  • Pino logger wrapper exists but only logs at debug level
  • Chaos events are buried in test diagnostics, not structured logs
  • Runtime hooks (preHandler, onSend) evaluate formulas but don't emit metrics

The Code: src/infrastructure/logger.ts:11-15 — Pino configured with level: 'warn' and disabled by default in production. No trace context propagation.

What this means: When a contract fails in CI, you cannot trace that failure to a production incident. When a production incident occurs, you cannot check if APOPHIS would have caught it. The loop is broken.

What I'd want: Every contract evaluation should create a span. Every chaos injection should emit an event. Every violation should include a trace_id so you can correlate with production telemetry.


2. Chaos Engineering Features: F (How realistic are the failure modes?)

Critical bugs that make chaos mode unusable:

Bug 1: Two-level probability is mathematically broken.

// chaos.ts:55 — Global gate
if (!this.shouldInject(this.config.probability)) { return normal }
// chaos.ts:82 — Per-type probability  
weights.push({ type: 'delay', weight: this.config.delay.probability })

If you set probability: 0.5 and delay.probability: 0.5, actual delay rate is 0.25, not 0.5. Users will misconfigure. Chaos Monkey, Gremlin, and Toxiproxy all use single-level probability for a reason.

Bug 2: Math.random() in corruption strategies breaks determinism.

// corruption.ts:47 — Uses Math.random() instead of injected RNG
const idx = Math.floor(rng.next() * entries.length)  // Wait, no — line 47 is actually:
// Let me check again... 

Actually, looking at corruption.ts:165:

ctx: applyCorruption(ctx, (data) => builtin.strategy(data, rng ?? new SeededRng(Date.now())), contentType)

When rng is undefined, it falls back to new SeededRng(Date.now()) — which is seeded with Date.now(), making it non-deterministic across runs. But worse, corruption.ts:47 in corruptJsonField:

const idx = Math.floor(rng.next() * entries.length)

This uses the passed RNG, so that's fine. But makeInvalidJson at line 61 doesn't take an RNG at all — it just slices JSON. The real bug is in BUILTIN_STRATEGIES at line 107:

strategy: (data, rng) => rng.next() > 0.5 ? truncateJson(data, rng) : corruptJsonField(data, rng)

This uses the RNG correctly. But wait — chaos.ts:39:

this.rng = new SeededRng(seed !== undefined ? seed + 0xCA05 : Date.now())

The seed derivation seed + 0xCA05 can cause collisions if test seeds are close. And chaos.ts:284 in petit-runner:

const chaosEngine = config.chaos ? new ChaosEngine(config.chaos, config.seed) : null

One engine per suite, but then executeWithChaos is called per request. The RNG advances, so that's actually fine for the suite. But the seeded reproducibility test is flaky because with probability: 0.5, there's a 25% chance both runs skip injection entirely.

Bug 3: No per-route granularity. Chaos is all-or-nothing. You cannot disable chaos for /health while enabling it for /orders. In production, you want to protect health checks and OAuth callbacks.

Bug 4: No resilience verification. The chaos tests check that injection happened (injected: true), not that the system handled it gracefully. There's no measurement of:

  • Retry counts
  • Circuit breaker state transitions
  • Recovery time
  • Error propagation depth

What this means: Chaos mode is a toy, not a tool. It injects failures but doesn't verify your system survives them.


3. Production Fidelity: C (Do contracts reflect actual user behavior?)

What's good:

  • Schema-to-contract inference (src/domain/schema-to-contract.ts) automatically derives tests from JSON Schema constraints
  • Property-based testing with fast-check generates edge cases manual tests miss
  • Category system (constructor/mutator/observer/destructor) aligns with DDD aggregates

What's broken:

  • Category inference (src/domain/category.ts:10-48) hardcodes exact path matches like /health, /ping, /login. Any variation (/api/health, /v1/health) is misclassified as non-utility.
  • APOSTL formula language has no arithmetic operators. You cannot write total == quantity * 10.
  • No support for realistic traffic patterns, load profiles, or user journeys
  • Contracts are static — they don't evolve based on production traffic analysis

What this means: Your contracts test what you think users do, not what they actually do. Without production telemetry feedback, contracts drift from reality.


4. Operational Burden: C- (Will this slow down CI/CD?)

Performance numbers from the codebase:

  • Route discovery: ~0.5µs per route
  • Formula parsing: ~5µs per formula (cached)
  • Incremental cache: 13-20x speedup for unchanged routes
  • 11K routes: ~39ms discovery, 1.4s total overhead

But:

  • Runtime hooks (preHandler, onSend) run on EVERY request in production
  • Formula parsing happens on first request per route (cold start penalty)
  • Extension registry has 475 lines with topological sorting, health checks, redaction
  • 915-line hand-rolled charCodeAt parser is unmaintainable
  • Cache file (.apophis-cache.json) adds filesystem dependency

What this means: For high-traffic APIs, the runtime hook overhead is non-trivial. The incremental cache helps CI, but the framework complexity increases maintenance burden.


5. Flake Detection: B- (Is this solving the right problem?)

What's good:

  • Auto-reruns failures with varied seeds
  • Confidence scoring (high/medium/low)
  • Catches non-deterministic contracts (time-dependent values, race conditions)

What's broken:

  • Only runs in NODE_ENV=test — won't catch flakes in staging
  • 4 reruns by default may be slow for large suites
  • Reruns WITHOUT chaos, so chaos-induced flakiness is masked
  • The real problem: chaos mode itself is non-deterministic due to Math.random() bugs

What this means: Flake detection solves a real problem but the implementation needs work. More importantly, it shouldn't be needed if chaos mode were deterministic.


6. Contract Testing vs Observability: COMPLEMENT, NOT REPLACE

This is the philosophical core of my assessment.

APOPHIS wants to be both a testing framework AND a production guardrail. But these are different jobs:

  • Contract testing catches API drift and schema violations at test time. It's about "did we build what we agreed to?"
  • Observability catches runtime behavior, performance, and user experience. It's about "what's actually happening?"

APOPHIS runtime hooks (src/infrastructure/hook-validator.ts) attempt to bridge this gap by validating contracts on every request. But:

  • They throw 500 errors in production for formula parse errors
  • They add overhead to every request
  • They don't integrate with production telemetry

The right model: Contracts in CI/CD. Observability in production. Feedback loops between them.


7. Plugin Contract System: B (Does it help or hurt in production?)

What's good:

  • Enables cross-cutting concerns (auth, CORS, rate limiting) to declare contracts
  • Built-in contracts for common Fastify plugins (src/domain/plugin-contracts.ts:176-212)
  • Pattern matching for route applicability (/api/** matches /api/users)

What's concerning:

  • 220 lines for registry + composition, adds cognitive load
  • No phase-aware testing (can't actually test onRequest vs onSend separately)
  • console.warn for missing extensions — noisy in production
  • No way to validate that plugins actually implement the hooks they claim

What this means: Plugin contracts are a good idea for large codebases with many plugins. But the implementation is complex for v1.1, and the value isn't fully realized without phase-aware testing.


Tweet Thread

1/ I just spent a day with APOPHIS, a contract-driven testing framework for Fastify. 
   It's ambitious. It's also broken in ways that matter for production systems.

2/ The good: Schema-embedded contracts with property-based test generation.
   Fast-check arbitraries from JSON Schema. Stateful sequences. Incremental caching.
   This is solid engineering.

3/ The bad: Chaos mode has critical bugs.
   - Two-level probability: 0.5 * 0.5 = 0.25 actual failure rate
   - Math.random() in corruption breaks determinism
   - No per-route granularity (health checks get chaos too)
   - No resilience verification (checks injection, not recovery)

4/ The ugly: Runtime hooks can crash production.
   A typo in an x-ensures annotation throws 500 errors in 'error' mode.
   Formula parse errors happen on the request hot path.
   This is a safety hazard.

5/ The missing: Zero observability integration.
   No OpenTelemetry. No trace correlation. No metrics on contract coverage.
   When a contract fails in CI, you can't trace it to production.
   When production breaks, you can't check if APOPHIS would have caught it.

6/ The verdict: APOPHIS is a promising research project that needs hardening.
   Fix chaos determinism. Make runtime hooks fail-safe. Add OTel integration.
   Until then: use it for contract testing in CI, NOT for runtime validation in prod.

7/ The lesson: Contract testing and observability are complements, not substitutes.
   Contracts tell you "did we build it right?" 
   Observability tells you "what's actually happening?"
   You need both, connected by feedback loops.

8/ If you're evaluating APOPHIS:
   - Start with contract() in CI, skip runtime validation
   - Skip chaos mode until RNG bugs are fixed
   - Build your own observability integration
   - Wait for v2.0 before production runtime use

Code References

Issue File Lines
Chaos probability bug src/quality/chaos.ts 55, 82
Corruption RNG fallback src/quality/corruption.ts 165
Runtime hook crash risk src/infrastructure/hook-validator.ts 89-93, 101
Category inference naive src/domain/category.ts 10-48
Extension system complexity src/extension/registry.ts 1-475
Parser unmaintainable src/formula/parser.ts 1-915
No OTel integration src/infrastructure/logger.ts 11-15
Env guard throws at runtime src/quality/env-guard.ts 8-14

Final Verdict

Would I recommend APOPHIS for production? Not in its current form.

Blockers:

  1. Fix chaos mode determinism (use seeded RNG everywhere, flatten probability model)
  2. Make runtime hooks fail-safe (never crash production for contract violations)
  3. Add OpenTelemetry integration for trace correlation
  4. Simplify extension system or provide higher-level APIs
  5. Fix APOSTL to support arithmetic and common string operations

When it might work:

  • Small APIs with simple CRUD operations
  • Teams already using Fastify and comfortable with schema-driven development
  • Projects where property-based testing provides high value
  • When used WITHOUT runtime validation in production (only in CI)

The framework needs a v2.0 that either:

  • Simplifies dramatically (drop chaos, drop extensions, focus on core contract testing)
  • OR invests heavily in safety guarantees, observability integration, and deterministic chaos

As it stands, APOPHIS is a promising research project that teaches us a lot about the boundary between testing and observability — but it doesn't safely cross that boundary yet.


Assessment by Charity Majors, co-founder Honeycomb.io Date: 2026-04-25 Framework: apophis-fastify v1.1.0