Files

23 KiB

Arbiter → Apophis Feedback Report

Date: 2026-04-27 Reporter: Arbiter Engineering Team Context: Integration of Apophis v2.2 into Arbiter Platform for behavioral contract testing


Executive Summary

Apophis provides genuinely valuable capabilities for behavioral contract testing that go beyond traditional unit/integration tests. The schema-to-contract inference, cross-operation verification, and chaos testing infrastructure are compelling. However, we encountered 3 bugs in core infrastructure and several design friction points that should be addressed for wider adoption.

Overall Assessment: Strong value proposition for teams willing to invest in schema-driven testing. Needs polish on edge cases and configurability.


Part 1: How Chaos Injection Would Help Arbiter

Current State

Arbiter is a multi-tenant SaaS platform with:

  • 500+ API endpoints across 15 route families
  • Billing, graph storage, auth, sessions, webhooks, etc.
  • Mock Stripe integration for payment processing
  • In-memory and persistent storage backends
  • Complex middleware chain: auth → tenant boundary → permissions → preflight → handler

Where Chaos Testing Adds Value

1. Middleware Resilience Verification

Our middleware chain has implicit dependencies:

Transport → AuthN → Scope → AuthZ → Challenge → Preflight → Handler

Chaos testing would verify:

  • What happens when preflight() times out? Does the handler still execute?
  • If auth middleware fails with 503, do we get proper retry headers?
  • Does a slow tenant boundary check cascade to response timeouts?

Concrete scenario: If the billing preflight gate (budget check) is slow, does the subscription creation handler wait or fail? Our contracts say response_time < 2000ms — chaos would tell us if that's actually enforced.

2. Mock Service Degradation

We use MockStripeService for payment processing. In production, Stripe can:

  • Return 429 (rate limit)
  • Time out on paymentIntents.create
  • Return network errors

Chaos testing would inject:

if chaos:stripe-timeout then response_code == 503
if chaos:stripe-rate-limit then retry-after header != null

This validates our fallback logic — currently untested because mocks always succeed.

3. Resource Leak Detection

Our BillingApplicationService uses in-memory Maps. Chaos scenarios:

  • Create 1000 plans, delete 500, verify GET on deleted returns 404
  • Cancel subscriptions mid-renewal cycle
  • Concurrent PATCH operations on same plan

Cross-operation contracts catch this for single requests, but chaos tests concurrent state corruption.

4. Entitlement Boundary Testing

We have credit-based preflight gates. Chaos could:

  • Exhaust credits mid-test
  • Verify 402 (Payment Required) is returned
  • Ensure no partial mutations occur when budget is depleted

This is business-critical: we cannot bill customers for operations that fail.

5. Auth Token Expiry

JWT tokens expire. Chaos could:

  • Expire tokens between POST and follow-up GET
  • Verify 401 with proper WWW-Authenticate header
  • Test refresh token flow under load

Proposed Chaos Scenarios for Arbiter

billing_chaos:
  - name: stripe-timeout
    target: POST /billing/invoices/:id/pay
    inject: { stripe_delay_ms: 5000 }
    expected: { status: 503, retry_after: "> 0" }
  
  - name: storage-corruption
    target: DELETE /billing/plans/:id
    inject: { skip_deletion: true }
    expected: { status: 200, follow_up_get: 404 }
  
  - name: rate-limit
    target: POST /billing/plans
    inject: { rate_limit: 10 }
    expected: { status: 429, x_retry_after: "> 0" }
  
  - name: auth-expiry
    target: PATCH /billing/plans/:id
    inject: { expire_token_after_ms: 100 }
    expected: { status: 401, www_authenticate: "Bearer" }

Part 2: Bugs Found

Bug 1: Scope Registry Ignores Configured Default Scope

Severity: High (breaks auth in cross-operation tests) File: dist/infrastructure/scope-registry.js Line: 60, 76-77

Problem:

const scope = scopeName !== null ? this.scopes.get(scopeName) : undefined;
const base = scope ?? this.defaultScope;  // Always uses empty DEFAULT_SCOPE

When getHeaders(null) is called, it uses this.defaultScope which is initialized to { headers: {}, metadata: {} } on line 60, ignoring any "default" scope passed in the constructor.

Impact: Cross-operation requests (e.g., response_code(GET /users/{id})) don't inherit auth headers from the configured scope, causing 401 failures on protected routes.

Fix:

const base = scope ?? this.scopes.get('default') ?? this.defaultScope;

Reproduction:

await app.register(apophis, {
  scopes: {
    default: { headers: { 'authorization': 'Bearer token' } }
  }
});
// Cross-operation GET /users/123 gets 401 because auth header is not passed

Bug 2: Contract Builder Drops Routes Option

Severity: High (route filtering doesn't work) File: dist/plugin/contract-builder.js Line: 8-15

Problem:

const config = {
    depth: opts.depth ?? 'standard',
    scope: opts.scope,
    seed: opts.seed,
    timeout: opts.timeout,
    chaos: opts.chaos,
    // Missing: routes: opts.routes
};

The routes option is documented but never passed to runPetitTests, causing all routes to be tested regardless of the routes filter.

Impact: Tests run against all 500+ routes instead of the 4 specified, making debugging impossible and CI times explode.

Fix:

const config = {
    depth: opts.depth ?? 'standard',
    scope: opts.scope,
    seed: opts.seed,
    timeout: opts.timeout,
    chaos: opts.chaos,
    routes: opts.routes,  // Add this
};

Reproduction:

await app.apophis.contract({
  routes: ['POST /billing/plans']  // Tests ALL routes instead
});

Bug 3: Invariant Checking Not Configurable

Severity: Medium (false failures for non-hierarchical APIs) File: dist/test/petit-runner.js Line: 386-398

Problem: Built-in invariants (no-orphaned-resources, parent-reference-integrity, resource-integrity) run unconditionally for all routes. These assume parent-child resource hierarchies (e.g., /workspaces/:id/projects/:id).

Impact: For flat resource models (like our billing plans), routes with x-category: 'constructor' trigger invariant failures because resources don't have parentType/parentId.

Workaround: We set x-category: 'observer' to avoid resource tracking, but this loses the semantic meaning of the route.

Suggested Fix:

// In config
invariants: ['resource-integrity']  // Opt-in per test
// Or
invariants: false  // Disable all
// Or per-route
schema: {
  'x-invariants': ['custom-only']
}

Part 3: Design Feedback

1. Schema Inference is Too Aggressive

Issue: const values in JSON Schema generate unconditional contracts.

Example:

{
  "response": {
    "200": {
      "properties": {
        "fragment_type": { "const": "Action" }
      }
    }
  }
}

Generates: response_body(this).fragment_type == "Action" (checked for ALL responses)

This fails when the route returns 404 with fragment_type: "Error".

Suggestion: Infer conditional contracts based on status code:

if status:200 then response_body(this).fragment_type == "Action" else true

Or add an option to disable inference: inferContracts: false.

2. Cross-Operation Headers Not Documented

The scope.headers behavior for cross-operation requests is not documented. We had to read source code to discover that:

  • createOperationResolver(fastify, request.headers) passes request headers
  • But request.headers comes from scope.getHeaders(null)
  • Which had bug #1 above

Suggestion: Document that cross-operation requests inherit the scope headers of the original request.

3. Missing 400 Response Handling

When Fastify schema validation fails (e.g., enum mismatch), it returns 400 with a validation error object. Apophis treats this as a contract failure unless:

  • The schema has a 400 response documented
  • The contract explicitly accepts 400

Most developers won't document 400 responses. Apophis should either:

  • Auto-generate 400 contracts from validation rules
  • Or provide a global 400 handler pattern

4. HEAD Routes Cause Noise

Fastify auto-generates HEAD routes for every GET. These have no response body, causing response_body(this).id != null failures.

Suggestion: Auto-skip HEAD routes in contract tests, or provide skipMethods: ['HEAD'] option.

5. Error Suggestions Need Context

When a contract fails, the error is:

Field 'fragment_type' does not match expected value 'Error'.

But it doesn't say:

  • What the actual status code was
  • What the actual response body was
  • Which route generated the request

Suggestion: Include actual vs expected in violation objects.


Part 4: What We Love

1. Cross-Operation Contracts

if status:201 then response_code(GET /billing/plans/{response_body(this).data.plan_id}) == 200 else true

This is genuinely hard to test manually. Apophis makes it declarative and automatic.

2. Property-Based Generation

Fast-check found edge cases we missed:

  • Empty string name (schema allowed it, service rejected it)
  • Invalid billing_interval values
  • Missing required fields

3. Schema as Single Source of Truth

Once schemas are correct, contracts are free. The x-ensures array supplements rather than replaces schema validation.

4. Fast Feedback Loop

Contract tests run in ~1.5s for 4 routes. Much faster than spinning up a full test environment.


Part 5: Feature Requests

1. Hypermedia Contract Support

Arbiter returns LDF (Linked Data Fragment) responses with controls and actions. We'd love to verify:

if status:200 then response_body(this).controls.self == request_url(this) else true
if status:200 then response_body(this).actions.create.method == "POST" else true
if status:200 then response_body(this).actions.update.target == "/billing/plans/{response_body(this).data.id}" else true

Currently we have to write these manually. Could Apophis infer hypermedia controls from route registration?

2. Conditional Schema Contracts

Instead of removing const from schemas, allow:

{
  "response": {
    "200": {
      "properties": {
        "fragment_type": { "const": "Action", "x-apophis-conditional": "status:200" }
      }
    }
  }
}

This preserves schema expressiveness while generating correct contracts.

3. Middleware Contract Verification

Our middleware chain is critical. We'd like to verify:

if request_headers(this).authorization == null then status:401 else true
if request_headers(this).x-tenant-id == null then status:400 else true

Apophis already supports request_headers — making this a first-class feature (e.g., x-requires) would be powerful.

4. State Cleanup Hooks

After destructive tests (DELETE), we need to clean up:

await app.apophis.contract({
  routes: ['DELETE /billing/plans/:id'],
  cleanup: async (state) => {
    // Remove created plans from database
    await db.plans.deleteMany({ id: { $in: state.createdPlans } });
  }
});

This would enable stateful testing without polluting the test environment.

5. Contract Coverage Report

After running tests, we'd like:

Contract Coverage:
  POST /billing/plans:
    - 201 response: ✓ tested (42 cases)
    - 400 response: ✓ tested (8 cases)
    - 503 response: ✗ not tested
    - Cross-op GET: ✓ tested (42 cases)

This helps identify gaps in contract coverage.


Conclusion

Apophis is a powerful tool that fills a gap in API testing — behavioral contracts and chaos testing. The core concepts are solid, but the implementation needs hardening for production use:

Must-fix: Bugs #1 and #2 (scope registry, route filtering) Should-fix: Bug #3 (configurable invariants), inference aggressiveness Nice-to-have: Hypermedia support, middleware contracts, coverage reports

We're committed to using Apophis for Arbiter's contract testing and will contribute fixes upstream. The value of cross-operation verification alone justifies the investment.


Contact: Arbiter Engineering Team Repository: https://github.com/anomalyco/apophis (we'll open issues for each bug)

Critical Feedback: Why Current Chaos Injection is Insufficient for Production APIs

To: Apophis Engineering Team
From: Arbiter Platform Engineering
Date: 2026-04-27
Context: Production SaaS platform with 500+ endpoints, Stripe integration, complex middleware chains


The Core Problem

Current chaos injection operates exclusively at the HTTP transport layer (executeHttp() wrapper). This tests:

  • Response schemas under forced errors
  • Timeout contracts with artificial delays
  • Response validation with corrupted bodies

But production APIs fail at the dependency layer, not the transport layer:

  • Stripe API returns 429 rate limit
  • Database connection pool exhausted
  • Redis cache timeout
  • Third-party webhook delivery fails
  • Message queue backlog

Current chaos cannot simulate these. It can force a 503 response, but it cannot simulate "Stripe returned 429, so we need to propagate retry-after header" because the handler never sees the Stripe error.


Specific Pain Points

1. Error Injection is Backwards

Current behavior:

Handler runs → creates side effects → response overridden to 503

What we need:

Handler runs → Stripe call fails with 429 → handler catches error → returns 503 with retry-after

The current approach tests "what does our 503 response look like" but not "does our handler correctly handle Stripe errors." These are different:

  • Current: Tests schema compliance for hardcoded error responses
  • Needed: Tests business logic for dependency failures

Impact: We have 503 contracts that pass, but our handler might not actually set the retry-after header when Stripe fails. The contract gives false confidence.

2. Chaos Events Are Invisible

When chaos injects, the test result shows:

POST /billing/plans (#1): FAIL
  Error: Contract violation: if status:503 then response_body(this).data.error != null else true

But there's no indication that:

  • Chaos was the cause (not a real bug)
  • What type of chaos was injected (error? corruption? delay?)
  • What the original response was before override

Impact: Debugging chaos failures is impossible. We can't tell if our contract is wrong or if chaos mutated the response unexpectedly.

3. Resilience Verification is Dangerous for Stateful APIs

When resilience: { enabled: true }, Apophis retries the same request up to maxRetries times.

For POST /billing/plans:

  • Attempt 1: Creates plan A → gets 503 → retries
  • Attempt 2: Creates plan B → gets 503 → retries
  • Attempt 3: Creates plan C → gets 503 → retries
  • Attempt 4: Creates plan D → succeeds

Result: 4 plans created, 1 expected. This pollutes state and makes follow-up tests (GET, PATCH, DELETE) behave unpredictably.

Impact: Can't use resilience testing on stateful routes without idempotency. Most real APIs are stateful.

4. Dropout Returns Status Code 0

Network failures in production don't return status code 0. They:

  • Time out (status undefined, error "ETIMEDOUT")
  • Reset connection (error "ECONNRESET")
  • Return 503 from load balancer

Status 0 is a browser-specific artifact. Node.js HTTP clients don't produce status 0.

Impact: Contracts can't match status 0. We have to either:

  • Add status:0 to all contracts (meaningless)
  • Or ignore dropout failures (makes dropout useless)

What Would Make Chaos Useful for Arbiter

Option A: Outbound Request Contracts (Preferred)

Apophis intercepts outbound HTTP requests from the handler:

// In Apophis config
chaos: {
  outbound: {
    'api.stripe.com': {
      delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
      error: { 
        probability: 0.05, 
        responses: [
          { statusCode: 429, headers: { 'retry-after': '60' } },
          { statusCode: 503, body: { error: 'stripe_unavailable' } }
        ]
      }
    }
  }
}

Benefits:

  • Handler sees real dependency failures
  • Tests actual error handling logic
  • Side effects only occur when handler succeeds
  • No state pollution from retries

Option B: Service Method Wrapping

Apophis wraps methods on decorated services:

// Fastify decorator
app.decorate('stripe', new StripeService());

// Apophis wraps it
apophis.chaos.wrap(app.stripe, {
  'paymentIntents.create': {
    delay: { probability: 0.1, ms: 5000 },
    error: { probability: 0.05, throws: new StripeTimeoutError() }
  }
});

Benefits:

  • Works with any service pattern (HTTP, DB, queue)
  • Tests business logic directly
  • Minimal changes to existing code

Option C: Event-Driven Chaos

For async architectures:

chaos: {
  events: {
    'webhook.received': {
      drop: { probability: 0.1 },  // Simulate webhook loss
      delay: { probability: 0.2, ms: 30000 }  // Simulate queue delay
    }
  }
}

P0 (Critical): Fix Event Reporting

Every chaos injection should be visible:

// In test results
test.diagnostics.chaos = {
  injected: true,
  type: 'error',
  details: {
    statusCode: 503,
    originalStatusCode: 201,
    strategy: 'override'
  }
}

Without this, chaos failures are indistinguishable from real bugs.

P1 (High): Add Dependency-Aware Chaos

Implement outbound request interception or service wrapping. Current HTTP-layer chaos is too superficial for production APIs.

P2 (Medium): Fix Dropout Semantics

Return proper status codes:

  • 504 Gateway Timeout for timeouts
  • 503 Service Unavailable for network failures
  • Or make it configurable: dropout: { statusCode: 503 }

P3 (Low): Stateful Retry Safety

Either:

  • Make retries use unique IDs (prevent duplicate creation)
  • Or document that resilience requires idempotent handlers
  • Or skip resilience for non-idempotent routes

What We're Doing Instead

Since current chaos doesn't serve our needs, we're writing application-layer failure tests:

test('Stripe rate limit handling', async () => {
  // Mock Stripe to return 429
  app.stripe.paymentIntents.create = async () => {
    const err = new Error('Rate limit exceeded');
    err.statusCode = 429;
    err.headers = { 'retry-after': '60' };
    throw err;
  };
  
  const res = await payInvoice({ invoiceId: 'test' });
  
  assert.strictEqual(res.statusCode, 429);
  assert.strictEqual(res.json().data.error, 'stripe_rate_limit');
  assert.strictEqual(res.headers['retry-after'], '60');
});

This tests what we actually need: handler behavior when dependencies fail.


Conclusion

Apophis chaos is a good start for HTTP-layer resilience testing, but it's insufficient for production APIs with external dependencies. The framework needs to evolve from "HTTP response mutator" to "dependency failure simulator" to be truly valuable.

We want Apophis to succeed. The schema-driven contract approach is innovative and valuable. But chaos testing needs to be dependency-aware to be useful for real-world APIs.

Happy to collaborate on designing the outbound interception API or service wrapping approach.


Appendix: Concrete Proposals for Apophis Improvements

Proposal 1: Conditional Schema Inference

Instead of removing const from schemas, generate conditional contracts:

// Current behavior (WRONG):
// Schema: { properties: { fragment_type: { const: "Action" } } }
// Generates: response_body(this).fragment_type == "Action"  // Applies to ALL responses

// Proposed behavior:
// Generates: if status:200 then response_body(this).fragment_type == "Action" else true

Implementation:

function inferContractsFromResponseSchema(responseSchema, statusCode) {
  const formulas = [];
  // ... existing inference logic ...
  
  // Wrap in conditional if status code is 2xx
  if (statusCode >= 200 && statusCode < 300) {
    return formulas.map(f => `if status:${statusCode} then ${f} else true`);
  }
  return formulas;
}

Proposal 2: Configurable Invariants

// In test config
const result = await app.apophis.contract({
  invariants: ['resource-integrity'],  // Opt-in specific invariants
  // Or
  invariants: false,  // Disable all
});

// Or per-route in schema
schema: {
  'x-invariants': ['resource-integrity'],
  'x-invariants-exclude': ['no-orphaned-resources']
}

Proposal 3: Outbound Request Interception

// Apophis provides fetch/http client wrapper
const stripeClient = apophis.createChaosAwareClient({
  name: 'stripe',
  baseURL: 'https://api.stripe.com',
  defaults: {
    headers: { 'Authorization': `Bearer ${process.env.STRIPE_KEY}` }
  }
});

// In chaos config
chaos: {
  outbound: {
    'stripe': {
      delay: { probability: 0.1, minMs: 1000, maxMs: 5000 },
      error: {
        probability: 0.05,
        responses: [
          { statusCode: 429, headers: { 'retry-after': '60' } },
          { statusCode: 503, body: { error: 'stripe_unavailable' } }
        ]
      }
    }
  }
}

Implementation approach:

  • Monkey-patch fetch or http.request at module level
  • Track outbound requests by hostname
  • Match against chaos config
  • Inject delays/errors before request reaches network

Proposal 4: Service Method Wrapping

// After Fastify ready
app.addHook('onReady', () => {
  apophis.chaos.wrap(app.billingService, {
    'createPricingPlan': {
      delay: { probability: 0.1, ms: 100 },
      error: { 
        probability: 0.05, 
        throws: new ServiceUnavailableError('stripe_timeout')
      }
    }
  });
});

Proposal 5: Chaos Event Reporting

// In petit-runner, after chaos execution
const chaosEvents = result.events || [];
for (const event of chaosEvents) {
  results.push({
    ok: true,  // Chaos events are informational, not failures
    name: `${route.method} ${route.path} (chaos: ${event.type})`,
    diagnostics: {
      chaos: {
        injected: true,
        type: event.type,
        details: event.details
      }
    }
  });
}

Proposal 6: Dropout Semantics

// Configurable dropout behavior
chaos: {
  dropout: {
    probability: 0.1,
    statusCode: 503,  // Default: 503 instead of 0
    body: { error: 'network_failure' }
  }
}

Proposal 7: Hypermedia Contract Support

// New APOSTL operation headers
response_body(this).controls.self == request_url(this)
response_body(this).actions.update.method == "PATCH"
response_body(this).actions.update.target == "/billing/plans/{response_body(this).data.id}"

Or schema annotation:

{
  "x-apophis-hypermedia": {
    "controls": ["self", "next", "prev"],
    "actions": ["create", "update", "delete"]
  }
}