docs/attic/root-history/PERF_ANALYSIS.md

# Parallelization and Incremental Testing Analysis

## 1. Parallelization with Worker Threads

### Feasibility: PARTIAL

APOPHIS has three phases, each with different parallelization potential:

**Phase 1: Route Discovery**
- Fastify stores routes in a single array
- Reading routes is already O(n) and fast (~0.5µs/route)
- Parallelizing would require sharing the Fastify instance across threads
- Fastify instances are NOT thread-safe
- **Verdict**: NOT worth parallelizing. Bottleneck is negligible.

**Phase 2: Test Generation (Schema → Arbitrary)**
- CPU-bound: fast-check arbitrary construction
- Independent per route
- Could shard routes across worker threads
- Each worker needs only the schema subset
- **Verdict**: HIGH POTENTIAL. Could get near-linear speedup with core count.

**Phase 3: Test Execution (fastify.inject)**
- Fastify is single-threaded
- Cannot share instance across workers
- Creating multiple Fastify instances wastes memory and breaks integration tests
- **Verdict**: NOT feasible for integration testing.

### Implementation Strategy (if needed):
```javascript
// Phase 2 parallelization
const { Worker } = require('worker_threads')

async function generateTestsParallel(routes, numWorkers = os.cpus().length) {
  const chunks = chunk(routes, Math.ceil(routes.length / numWorkers))
  
  const workers = chunks.map(chunk => 
    new Worker('./test-generator-worker.js', {
      workerData: { routes: chunk }
    })
  )
  
  const results = await Promise.all(
    workers.map(w => new Promise((res, rej) => {
      w.on('message', res)
      w.on('error', rej)
    }))
  )
  
  return results.flat()
}
```

**Expected Speedup**: 2-4x on 8-core machine for generation phase only.
**Complexity**: Medium. Need to serialize/deserialize schemas and arbitraries.
**When to use**: Only if generation phase exceeds 5 seconds.

---

## 2. Incremental Testing with Schema Hashing

### Feasibility: HIGH

Instead of regenerating all tests every run, hash each route's schema and only regenerate changed ones.

### Algorithm:
1. Compute deterministic hash of each route's schema
2. Compare with cached hashes from previous run
3. For unchanged routes: reuse previous test commands
4. For changed routes: regenerate from scratch
5. Save new hashes to cache file

### Simple Implementation:
```javascript
import { createHash } from 'node:crypto'

function hashSchema(schema) {
  return createHash('sha256')
    .update(JSON.stringify(schema))
    .digest('hex')
    .slice(0, 16) // 64 bits is enough
}

// Cache structure
const cache = {
  version: 1,
  schemas: {
    'hash123': { commandTemplates: [...], lastRun: timestamp },
    'hash456': { commandTemplates: [...], lastRun: timestamp }
  }
}
```

### Expected Impact:
- First run: 100% generation (baseline)
- Typical commit (50 routes changed of 11,389): **0.4% regeneration**
- Schema-only changes (types, constraints): **near-instant**

### Cache Invalidation Strategy:
- Cache key: `sha256(JSON.stringify(schema))`
- Cache file: `.apophis-cache.json` (gitignored)
- TTL: Infinite (schemas are immutable once defined)
- Manual invalidation: `rm .apophis-cache.json`

### JSONHash Integration:
The JSONHash library from `~/Business/workspace/lsh_libs` provides **structural similarity** detection, which could enable:
- **Fuzzy cache hits**: If schema changed slightly but structure is similar, reuse and mutate test data
- **Schema migration detection**: Identify which routes changed structurally vs cosmetically
- **Test suite deduplication**: Detect routes with similar schemas that can share test patterns

However, for the primary use case (skip unchanged routes), a simple SHA-256 hash is sufficient and faster.

### Recommendation:
1. **Immediate**: Implement simple SHA-256 schema cache (1-2 hours work, huge CI/CD win)
2. **Future**: Integrate JSONHash for fuzzy similarity and smart test data reuse
3. **Parallelization**: Defer until generation phase proves to be the bottleneck in practice

---

## 3. Current Bottleneck Analysis

From profiling:
- `convertSchema`: 823ms (37% of total) — CPU bound, parallelizable
- `discoverRoutes`: 1,649ms (74% of total) — Memory/allocation bound
- `evaluate`: 156ms (7% of total) — Fast enough
- `parse`: 85ms (4% of total) — Cached, fast enough

The real bottleneck is `discoverRoutes` which is memory-bound (creating objects). Parallelization won't help here because:
1. Object allocation is single-threaded in V8
2. Fastify routes array must be read sequentially
3. WeakMap cache is already optimizing the repeated case

**Incremental testing would eliminate the discoverRoutes cost entirely for unchanged routes.**

---

## 4. Implementation Priority

1. **Schema hash cache** (HIGH): Eliminates 74% of work for unchanged routes
2. **Parallel generation** (MEDIUM): Could speed up remaining 26% by 2-4x
3. **JSONHash similarity** (LOW): Nice-to-have for advanced use cases
chore: crush git history - reborn from consolidation on 2026-03-10 2026-03-10 00:00:00 -07:00			`# Parallelization and Incremental Testing Analysis`

			`## 1. Parallelization with Worker Threads`

			`### Feasibility: PARTIAL`

			`APOPHIS has three phases, each with different parallelization potential:`

			`Phase 1: Route Discovery`
			`- Fastify stores routes in a single array`
			`- Reading routes is already O(n) and fast (~0.5µs/route)`
			`- Parallelizing would require sharing the Fastify instance across threads`
			`- Fastify instances are NOT thread-safe`
			`- Verdict: NOT worth parallelizing. Bottleneck is negligible.`

			`Phase 2: Test Generation (Schema → Arbitrary)`
			`- CPU-bound: fast-check arbitrary construction`
			`- Independent per route`
			`- Could shard routes across worker threads`
			`- Each worker needs only the schema subset`
			`- Verdict: HIGH POTENTIAL. Could get near-linear speedup with core count.`

			`Phase 3: Test Execution (fastify.inject)`
			`- Fastify is single-threaded`
			`- Cannot share instance across workers`
			`- Creating multiple Fastify instances wastes memory and breaks integration tests`
			`- Verdict: NOT feasible for integration testing.`

			`### Implementation Strategy (if needed):`
			```javascript
			`// Phase 2 parallelization`
			`const { Worker } = require('worker_threads')`

			`async function generateTestsParallel(routes, numWorkers = os.cpus().length) {`
			`const chunks = chunk(routes, Math.ceil(routes.length / numWorkers))`

			`const workers = chunks.map(chunk =>`
			`new Worker('./test-generator-worker.js', {`
			`workerData: { routes: chunk }`
			`})`
			`)`

			`const results = await Promise.all(`
			`workers.map(w => new Promise((res, rej) => {`
			`w.on('message', res)`
			`w.on('error', rej)`
			`}))`
			`)`

			`return results.flat()`
			`}`
			```

			`Expected Speedup: 2-4x on 8-core machine for generation phase only.`
			`Complexity: Medium. Need to serialize/deserialize schemas and arbitraries.`
			`When to use: Only if generation phase exceeds 5 seconds.`

			`---`

			`## 2. Incremental Testing with Schema Hashing`

			`### Feasibility: HIGH`

			`Instead of regenerating all tests every run, hash each route's schema and only regenerate changed ones.`

			`### Algorithm:`
			`1. Compute deterministic hash of each route's schema`
			`2. Compare with cached hashes from previous run`
			`3. For unchanged routes: reuse previous test commands`
			`4. For changed routes: regenerate from scratch`
			`5. Save new hashes to cache file`

			`### Simple Implementation:`
			```javascript
			`import { createHash } from 'node:crypto'`

			`function hashSchema(schema) {`
			`return createHash('sha256')`
			`.update(JSON.stringify(schema))`
			`.digest('hex')`
			`.slice(0, 16) // 64 bits is enough`
			`}`

			`// Cache structure`
			`const cache = {`
			`version: 1,`
			`schemas: {`
			`'hash123': { commandTemplates: [...], lastRun: timestamp },`
			`'hash456': { commandTemplates: [...], lastRun: timestamp }`
			`}`
			`}`
			```

			`### Expected Impact:`
			`- First run: 100% generation (baseline)`
			`- Typical commit (50 routes changed of 11,389): 0.4% regeneration`
			`- Schema-only changes (types, constraints): near-instant`

			`### Cache Invalidation Strategy:`
			- Cache key: `sha256(JSON.stringify(schema))`
			- Cache file: `.apophis-cache.json` (gitignored)
			`- TTL: Infinite (schemas are immutable once defined)`
			- Manual invalidation: `rm .apophis-cache.json`

			`### JSONHash Integration:`
			The JSONHash library from `~/Business/workspace/lsh_libs` provides structural similarity detection, which could enable:
			`- Fuzzy cache hits: If schema changed slightly but structure is similar, reuse and mutate test data`
			`- Schema migration detection: Identify which routes changed structurally vs cosmetically`
			`- Test suite deduplication: Detect routes with similar schemas that can share test patterns`

			`However, for the primary use case (skip unchanged routes), a simple SHA-256 hash is sufficient and faster.`

			`### Recommendation:`
			`1. Immediate: Implement simple SHA-256 schema cache (1-2 hours work, huge CI/CD win)`
			`2. Future: Integrate JSONHash for fuzzy similarity and smart test data reuse`
			`3. Parallelization: Defer until generation phase proves to be the bottleneck in practice`

			`---`

			`## 3. Current Bottleneck Analysis`

			`From profiling:`
			- `convertSchema`: 823ms (37% of total) — CPU bound, parallelizable
			- `discoverRoutes`: 1,649ms (74% of total) — Memory/allocation bound
			- `evaluate`: 156ms (7% of total) — Fast enough
			- `parse`: 85ms (4% of total) — Cached, fast enough

			The real bottleneck is `discoverRoutes` which is memory-bound (creating objects). Parallelization won't help here because:
			`1. Object allocation is single-threaded in V8`
			`2. Fastify routes array must be read sequentially`
			`3. WeakMap cache is already optimizing the repeated case`

			`Incremental testing would eliminate the discoverRoutes cost entirely for unchanged routes.`

			`---`

			`## 4. Implementation Priority`

			`1. Schema hash cache (HIGH): Eliminates 74% of work for unchanged routes`
			`2. Parallel generation (MEDIUM): Could speed up remaining 26% by 2-4x`
			`3. JSONHash similarity (LOW): Nice-to-have for advanced use cases`