# APOPHIS Assessment: Arbiter Integration Readiness ## Executive Summary APOPHIS is a contract-driven API testing plugin for Fastify. This document assesses its readiness for integration with the Arbiter repository (~11,389 routes, multi-tenant authorization server). ## What Is In Place ### Core Infrastructure (100% Complete) - **Route Discovery**: Extracts contracts from Fastify route schemas via `discoverRoutes()` - **Category Inference**: Auto-categorizes routes as constructor/mutator/observer/utility - **Contract Extraction**: Parses `x-requires`, `x-ensures`, `x-invariants`, `x-regex`, `x-category` - **Formula Parser**: Full APOSTL grammar with charCodeAt optimization (94% faster) - **Formula Evaluator**: Pure function with type coercion, regex matching, quantifiers - **Hook Validator**: Runtime precondition/postcondition validation via preHandler/onResponse - **Scope Registry**: Auto-discovers from `APOPHIS_SCOPE_*` env vars - **Cleanup Manager**: LIFO deletion with callback-based batching - **TAP Formatter**: CI/CD compatible test output ### Test Framework (80% Complete) - **PETIT Runner**: Property-based test execution with fast-check arbitraries - **Schema-to-Arbitrary**: JSON Schema -> fast-check conversion (strings, integers, objects, arrays, enums, formats) - **Incremental Cache**: SHA-256 schema hashing with file-based persistence (13-20x speedup) - **Model State Tracking**: Basic resource tracking for constructor routes ### Performance (Complete) - Route discovery: ~0.5µs/route - Formula parsing: ~5µs/formula - Category inference: ~15ns/route - Contract extraction: 58% faster with WeakMap cache - Incremental cache: 13-20x speedup for unchanged routes - **Estimated 11K route overhead: ~1.4s total** ## What Is NOT In Place ### 1. Stateful Testing (0% - Architecture Only) **Current State**: `runPetitTests` runs commands sequentially but without true stateful/model-based testing. The state machine only tracks created resources for cleanup. **What's Missing**: - **Command sequence generation**: Fast-check's `commands()` arbitrary for generating valid command sequences - **Model-based state machine**: Formal model that tracks expected vs actual state - **Precondition-aware sequencing**: Smart generation that respects `x-requires` dependencies - **Cross-route state transitions**: Understanding that POST /users creates a resource that GET /users/:id can observe - **Invariant checking across sequences**: Ensuring state remains consistent after mutations **Arbiter-Specific Value**: Arbiter has complex multi-tenant state: - Tenant creation -> Application creation -> User creation -> Permission assignment - OAuth flows: authorization -> token -> refresh -> revocation - Graph mutations: node creation -> relation creation -> authorization evaluation Stateful testing would catch: - Race conditions in tenant isolation - Invalid state transitions (e.g., deleting a tenant with active applications) - Authorization leaks across state changes - Resource lifecycle violations **Implementation Effort**: Medium (2-3 days) - Create `Model` class tracking expected state - Implement `Command` arbitrary using fast-check's `commands()` - Add `checkInvariants()` for cross-route consistency - Implement `shrink()` for minimal failing sequences ### 2. Object Inference from Schemas (40%) **Current State**: `updateState()` infers resources from response body looking for `id`/`uuid`/`_id` fields. This is naive. **What's Missing**: - **Schema-driven object extraction**: Using JSON Schema `properties` to know what fields constitute an object identity - **Relationship inference**: Understanding that `POST /tenants/:id/applications` creates an application scoped to a tenant - **Nested resource tracking**: Tracking sub-resources (e.g., application configs within tenants) - **Path parameter correlation**: Linking `POST /users` response `id` to `GET /users/:id` path parameter **Arbiter Example**: ```javascript // POST /tenant/applications // Response: { id: 'app-123', tenantId: 'tenant-456', name: 'My App' } // Should infer: resourceType='application', parentType='tenant', parentId='tenant-456' // Current code only captures: resourceType='applications', id='app-123' // Missing the tenant scoping which is critical for Arbiter's authorization model ``` **Implementation Effort**: Low-Medium (1-2 days) - Enhance `updateState()` to parse response schema for identity fields - Add parent-child relationship tracking to `ModelState` - Implement path parameter extraction for route correlation ### 3. Request Structure Inference (30%) **Current State**: `executeCommand()` blindly sends all generated params as either body or query params based on HTTP method. No understanding of route-specific parameter structure. **What's Missing**: - **Path parameter extraction**: Identifying `:id`, `:tenantId` from route paths and correlating with generated data - **Body vs query discrimination**: Using Fastify schema to know which params go where - **Header injection**: Automatic `x-tenant-id`, `authorization` header injection based on route requirements - **Nested body structures**: Handling `body.properties.nested.field` schemas - **Content-Type negotiation**: Form-encoded vs JSON based on route configuration **Arbiter Example**: ```javascript // Route: POST /tenant/applications/:appId/rules // Body schema: { type: 'object', properties: { dsl: { type: 'string' }, priority: { type: 'integer' } } } // Path params: { appId: '...' } // Headers: { 'x-tenant-id': '...', 'authorization': 'Bearer ...' } // Current code would send: { appId: 'generated', dsl: 'generated', priority: 1 } all as body // Should send: appId in path, { dsl, priority } in body, auth headers automatically ``` **Implementation Effort**: Medium (2-3 days) - Parse route path for parameter placeholders - Match generated data to path vs body vs query - Implement header injection based on scope/auth requirements - Handle nested schema structures ### 4. Logic/Invariant Analysis (20%) **Current State**: `checkPostconditions()` only validates `status:###` patterns. No evaluation of complex invariants. **What's Missing**: - **Cross-route invariant checking**: "After POST /users, GET /users/:id should return the same user" - **State consistency checks**: "Total user count should increase by 1 after creation" - **Authorization boundary checks**: "Tenant A's admin cannot access Tenant B's resources" - **Temporal logic**: "After DELETE /users/:id, subsequent GET should return 404" - **Mathematical invariants**: Budget constraints, quota limits, rate limiting **Arbiter-Specific Value**: Arbiter's authorization graph has rich invariants: - If user U has permission P on resource R, then checking P for U on R must return true - If node N is child of node M, then M's permissions apply to N (transitivity) - If relation R is revoked, all derived permissions via R must be invalidated - Tenant isolation: resources in tenant T1 must never be accessible from T2 **Implementation Effort**: High (1 week) - Implement invariant registry for cross-route assertions - Add temporal operators (eventually, always, until) to APOSTL - Create graph-aware consistency checker for Arbiter's authorization model - Implement property-based invariant generation from schema constraints ### 5. Documentation (70%) **In Place**: - README.md with quick start, features, API reference - Architecture document (ARCHITECTURE, 2656 lines) - Performance analysis (PERF_ANALYSIS.md) - Inline code comments **Missing**: - **skills.md**: LLM-friendly documentation for AI-assisted development - **Advanced guides**: Stateful testing setup, custom invariant authoring - **Arbiter-specific examples**: Multi-tenant testing patterns, OAuth flow validation - **Troubleshooting guide**: Common failures, debugging techniques - **Migration guide**: From manual testing to contract-driven testing ## Do We Gain from Logic? ### Short Answer: YES, Significantly Without logic/stateful testing, APOPHIS is essentially a smart fuzzer with runtime assertions. With logic: 1. **State Space Coverage**: - Stateless: Tests each route in isolation (~200 tests for 200 routes) - Stateful: Tests route sequences (200 routes ^ 5 depth = 3.2 billion sequences) - **Gain**: 10-100x more bugs found in stateful interactions 2. **Arbiter-Specific Bugs Caught**: - Authorization escalation after role changes - Resource leaks across tenant boundaries - Invalid state transitions (e.g., modifying revoked tokens) - Cache invalidation failures after mutations - Graph inconsistency after node deletion 3. **Regression Prevention**: - Stateless: Catches route-level regressions - Stateful: Catches system-level regressions (e.g., "deleting user breaks their sessions") 4. **Cost-Benefit**: - Implementation: ~1 week - Value: Prevents production incidents that could take days to debug - ROI: 10x+ for a system like Arbiter ## Recommendations ### Phase 1: Immediate (This Week) 1. Implement object inference from schemas (1-2 days) 2. Fix request structure handling (path/body/query discrimination) (2-3 days) 3. Create skills.md for LLM assistance (1 day) ### Phase 2: Short-term (Next 2 Weeks) 1. Implement stateful test runner with model-based testing (1 week) 2. Add cross-route invariant checking (1 week) 3. Create Arbiter-specific example suite ### Phase 3: Medium-term (Next Month) 1. Graph-aware consistency checker for Arbiter 2. Automatic contract generation from existing tests 3. Performance optimization for 11K routes 4. Integration with Arbiter's CI/CD pipeline ## Conclusion APOPHIS has a solid foundation for contract-driven testing. The current implementation provides immediate value for: - Runtime contract validation (preconditions/postconditions) - Property-based testing of individual routes - Incremental test execution for CI/CD However, to fully realize value for Arbiter, we need: 1. **Stateful testing**: Critical for catching multi-route interaction bugs 2. **Better object inference**: Essential for Arbiter's complex resource hierarchies 3. **Request structure handling**: Required for realistic test execution 4. **Logic/invariant analysis**: Needed for authorization-specific testing The **highest ROI** item is stateful testing with proper object inference, which would catch the class of bugs most likely to cause production incidents in Arbiter.