# APOPHIS Assessment: Arbiter Integration Readiness

## Executive Summary

APOPHIS is a contract-driven API testing plugin for Fastify. This document assesses its readiness for integration with the Arbiter repository (~11,389 routes, multi-tenant authorization server).

## What Is In Place

### Core Infrastructure (100% Complete)
- **Route Discovery**: Extracts contracts from Fastify route schemas via `discoverRoutes()`
- **Category Inference**: Auto-categorizes routes as constructor/mutator/observer/utility
- **Contract Extraction**: Parses `x-requires`, `x-ensures`, `x-invariants`, `x-regex`, `x-category`
- **Formula Parser**: Full APOSTL grammar with charCodeAt optimization (94% faster)
- **Formula Evaluator**: Pure function with type coercion, regex matching, quantifiers
- **Hook Validator**: Runtime precondition/postcondition validation via preHandler/onResponse
- **Scope Registry**: Auto-discovers from `APOPHIS_SCOPE_*` env vars
- **Cleanup Manager**: LIFO deletion with callback-based batching
- **TAP Formatter**: CI/CD compatible test output

### Test Framework (80% Complete)
- **PETIT Runner**: Property-based test execution with fast-check arbitraries
- **Schema-to-Arbitrary**: JSON Schema -> fast-check conversion (strings, integers, objects, arrays, enums, formats)
- **Incremental Cache**: SHA-256 schema hashing with file-based persistence (13-20x speedup)
- **Model State Tracking**: Basic resource tracking for constructor routes

### Performance (Complete)
- Route discovery: ~0.5µs/route
- Formula parsing: ~5µs/formula  
- Category inference: ~15ns/route
- Contract extraction: 58% faster with WeakMap cache
- Incremental cache: 13-20x speedup for unchanged routes
- **Estimated 11K route overhead: ~1.4s total**

## What Is NOT In Place

### 1. Stateful Testing (0% - Architecture Only)

**Current State**: `runPetitTests` runs commands sequentially but without true stateful/model-based testing. The state machine only tracks created resources for cleanup.

**What's Missing**:
- **Command sequence generation**: Fast-check's `commands()` arbitrary for generating valid command sequences
- **Model-based state machine**: Formal model that tracks expected vs actual state
- **Precondition-aware sequencing**: Smart generation that respects `x-requires` dependencies
- **Cross-route state transitions**: Understanding that POST /users creates a resource that GET /users/:id can observe
- **Invariant checking across sequences**: Ensuring state remains consistent after mutations

**Arbiter-Specific Value**:
Arbiter has complex multi-tenant state:
- Tenant creation -> Application creation -> User creation -> Permission assignment
- OAuth flows: authorization -> token -> refresh -> revocation
- Graph mutations: node creation -> relation creation -> authorization evaluation

Stateful testing would catch:
- Race conditions in tenant isolation
- Invalid state transitions (e.g., deleting a tenant with active applications)
- Authorization leaks across state changes
- Resource lifecycle violations

**Implementation Effort**: Medium (2-3 days)
- Create `Model` class tracking expected state
- Implement `Command` arbitrary using fast-check's `commands()`
- Add `checkInvariants()` for cross-route consistency
- Implement `shrink()` for minimal failing sequences

### 2. Object Inference from Schemas (40%)

**Current State**: `updateState()` infers resources from response body looking for `id`/`uuid`/`_id` fields. This is naive.

**What's Missing**:
- **Schema-driven object extraction**: Using JSON Schema `properties` to know what fields constitute an object identity
- **Relationship inference**: Understanding that `POST /tenants/:id/applications` creates an application scoped to a tenant
- **Nested resource tracking**: Tracking sub-resources (e.g., application configs within tenants)
- **Path parameter correlation**: Linking `POST /users` response `id` to `GET /users/:id` path parameter

**Arbiter Example**:
```javascript
// POST /tenant/applications
// Response: { id: 'app-123', tenantId: 'tenant-456', name: 'My App' }
// Should infer: resourceType='application', parentType='tenant', parentId='tenant-456'

// Current code only captures: resourceType='applications', id='app-123'
// Missing the tenant scoping which is critical for Arbiter's authorization model
```

**Implementation Effort**: Low-Medium (1-2 days)
- Enhance `updateState()` to parse response schema for identity fields
- Add parent-child relationship tracking to `ModelState`
- Implement path parameter extraction for route correlation

### 3. Request Structure Inference (30%)

**Current State**: `executeCommand()` blindly sends all generated params as either body or query params based on HTTP method. No understanding of route-specific parameter structure.

**What's Missing**:
- **Path parameter extraction**: Identifying `:id`, `:tenantId` from route paths and correlating with generated data
- **Body vs query discrimination**: Using Fastify schema to know which params go where
- **Header injection**: Automatic `x-tenant-id`, `authorization` header injection based on route requirements
- **Nested body structures**: Handling `body.properties.nested.field` schemas
- **Content-Type negotiation**: Form-encoded vs JSON based on route configuration

**Arbiter Example**:
```javascript
// Route: POST /tenant/applications/:appId/rules
// Body schema: { type: 'object', properties: { dsl: { type: 'string' }, priority: { type: 'integer' } } }
// Path params: { appId: '...' }
// Headers: { 'x-tenant-id': '...', 'authorization': 'Bearer ...' }

// Current code would send: { appId: 'generated', dsl: 'generated', priority: 1 } all as body
// Should send: appId in path, { dsl, priority } in body, auth headers automatically
```

**Implementation Effort**: Medium (2-3 days)
- Parse route path for parameter placeholders
- Match generated data to path vs body vs query
- Implement header injection based on scope/auth requirements
- Handle nested schema structures

### 4. Logic/Invariant Analysis (20%)

**Current State**: `checkPostconditions()` only validates `status:###` patterns. No evaluation of complex invariants.

**What's Missing**:
- **Cross-route invariant checking**: "After POST /users, GET /users/:id should return the same user"
- **State consistency checks**: "Total user count should increase by 1 after creation"
- **Authorization boundary checks**: "Tenant A's admin cannot access Tenant B's resources"
- **Temporal logic**: "After DELETE /users/:id, subsequent GET should return 404"
- **Mathematical invariants**: Budget constraints, quota limits, rate limiting

**Arbiter-Specific Value**:
Arbiter's authorization graph has rich invariants:
- If user U has permission P on resource R, then checking P for U on R must return true
- If node N is child of node M, then M's permissions apply to N (transitivity)
- If relation R is revoked, all derived permissions via R must be invalidated
- Tenant isolation: resources in tenant T1 must never be accessible from T2

**Implementation Effort**: High (1 week)
- Implement invariant registry for cross-route assertions
- Add temporal operators (eventually, always, until) to APOSTL
- Create graph-aware consistency checker for Arbiter's authorization model
- Implement property-based invariant generation from schema constraints

### 5. Documentation (70%)

**In Place**:
- README.md with quick start, features, API reference
- Architecture document (ARCHITECTURE, 2656 lines)
- Performance analysis (PERF_ANALYSIS.md)
- Inline code comments

**Missing**:
- **skills.md**: LLM-friendly documentation for AI-assisted development
- **Advanced guides**: Stateful testing setup, custom invariant authoring
- **Arbiter-specific examples**: Multi-tenant testing patterns, OAuth flow validation
- **Troubleshooting guide**: Common failures, debugging techniques
- **Migration guide**: From manual testing to contract-driven testing

## Do We Gain from Logic?

### Short Answer: YES, Significantly

Without logic/stateful testing, APOPHIS is essentially a smart fuzzer with runtime assertions. With logic:

1. **State Space Coverage**: 
   - Stateless: Tests each route in isolation (~200 tests for 200 routes)
   - Stateful: Tests route sequences (200 routes ^ 5 depth = 3.2 billion sequences)
   - **Gain**: 10-100x more bugs found in stateful interactions

2. **Arbiter-Specific Bugs Caught**:
   - Authorization escalation after role changes
   - Resource leaks across tenant boundaries
   - Invalid state transitions (e.g., modifying revoked tokens)
   - Cache invalidation failures after mutations
   - Graph inconsistency after node deletion

3. **Regression Prevention**:
   - Stateless: Catches route-level regressions
   - Stateful: Catches system-level regressions (e.g., "deleting user breaks their sessions")

4. **Cost-Benefit**:
   - Implementation: ~1 week
   - Value: Prevents production incidents that could take days to debug
   - ROI: 10x+ for a system like Arbiter

## Recommendations

### Phase 1: Immediate (This Week)
1. Implement object inference from schemas (1-2 days)
2. Fix request structure handling (path/body/query discrimination) (2-3 days)
3. Create skills.md for LLM assistance (1 day)

### Phase 2: Short-term (Next 2 Weeks)
1. Implement stateful test runner with model-based testing (1 week)
2. Add cross-route invariant checking (1 week)
3. Create Arbiter-specific example suite

### Phase 3: Medium-term (Next Month)
1. Graph-aware consistency checker for Arbiter
2. Automatic contract generation from existing tests
3. Performance optimization for 11K routes
4. Integration with Arbiter's CI/CD pipeline

## Conclusion

APOPHIS has a solid foundation for contract-driven testing. The current implementation provides immediate value for:
- Runtime contract validation (preconditions/postconditions)
- Property-based testing of individual routes
- Incremental test execution for CI/CD

However, to fully realize value for Arbiter, we need:
1. **Stateful testing**: Critical for catching multi-route interaction bugs
2. **Better object inference**: Essential for Arbiter's complex resource hierarchies
3. **Request structure handling**: Required for realistic test execution
4. **Logic/invariant analysis**: Needed for authorization-specific testing

The **highest ROI** item is stateful testing with proper object inference, which would catch the class of bugs most likely to cause production incidents in Arbiter.